<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Agile Data N’ Info: AgileData Engineering Patterns]]></title><description><![CDATA[An AgileData Engineering Pattern is a repeatable, proven approach for solving a common data engineering challenge in a simple, consistent, and scalable way, designed to reduce rework, speed up delivery, and embed quality by default.]]></description><link>https://agiledata.info/s/agiledata-engineering-patterns</link><image><url>https://substackcdn.com/image/fetch/$s_!ErtR!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8892c64-a0c7-4c7b-9f49-a73be5280f22_1280x1280.png</url><title>Agile Data N’ Info: AgileData Engineering Patterns</title><link>https://agiledata.info/s/agiledata-engineering-patterns</link></image><generator>Substack</generator><lastBuildDate>Wed, 08 Apr 2026 16:59:49 GMT</lastBuildDate><atom:link href="https://agiledata.info/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Agile Data Limited]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[DataNInfo@agiledataguides.com]]></webMaster><itunes:owner><itunes:email><![CDATA[DataNInfo@agiledataguides.com]]></itunes:email><itunes:name><![CDATA[Shagility]]></itunes:name></itunes:owner><itunes:author><![CDATA[Shagility]]></itunes:author><googleplay:owner><![CDATA[DataNInfo@agiledataguides.com]]></googleplay:owner><googleplay:email><![CDATA[DataNInfo@agiledataguides.com]]></googleplay:email><googleplay:author><![CDATA[Shagility]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[AgileData Data Match, AgileData Engineering Pattern #7]]></title><description><![CDATA[The Data Match pattern provides an automated, granular comparison capability to efficiently identify and report discrepancies between two datasets, moving from row counts to specific data values.]]></description><link>https://agiledata.info/p/agiledata-data-match-agiledata-engineering</link><guid isPermaLink="false">https://agiledata.info/p/agiledata-data-match-agiledata-engineering</guid><dc:creator><![CDATA[Shagility]]></dc:creator><pubDate>Thu, 04 Sep 2025 21:24:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!_NEi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>Data Match</strong></h1><h2><strong>Quicklinks</strong></h2><blockquote><p><strong><a href="https://agiledata.substack.com/i/172820886/description">Description</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/172820886/pattern-context-diagram">Context Diagram</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/172820886/agiledata-pattern-template">Pattern Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/172820886/press-release-template">Press Release Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/172820886/agiledata-app-platform-example">AgileData App / Platform Example</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/172820886/agiledata-podcast-episode">AgileData Podcast Episode</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/172820886/agiledata-podcast-episode-mindmap">AgileData Podcast Mind Map</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/172820886/agiledata-podcast-episode-transcript">AgileData Podcast Transcript</a></strong></p></blockquote><h2><strong>Agile Data Engineering Pattern</strong></h2><p>An AgileData Engineering Pattern is a repeatable, proven approach for solving a common data engineering challenge in a simple, consistent, and scalable way, designed to reduce rework, speed up delivery, and embed quality by default.</p><h2><strong>Pattern Description</strong></h2><p>The <strong>Data Match</strong> pattern provides an <strong>automated, granular comparison</strong> capability to efficiently identify and report discrepancies between two datasets, moving from row counts to specific data values. </p><p>This 'data diff' solution transforms <strong>hours of manual data reconciliation into minutes</strong> by optimising comparisons for cloud analytics databases like BigQuery, serving as a <strong>support feature for on-demand exception handling</strong> rather than a continuous trust rule.</p><h2><strong>Pattern Context Diagram</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_NEi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_NEi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png 424w, https://substackcdn.com/image/fetch/$s_!_NEi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png 848w, https://substackcdn.com/image/fetch/$s_!_NEi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png 1272w, https://substackcdn.com/image/fetch/$s_!_NEi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_NEi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png" width="1056" height="509" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:509,&quot;width&quot;:1056,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53914,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/172820886?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_NEi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png 424w, https://substackcdn.com/image/fetch/$s_!_NEi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png 848w, https://substackcdn.com/image/fetch/$s_!_NEi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png 1272w, https://substackcdn.com/image/fetch/$s_!_NEi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31b50478-2ff6-437e-abba-c0c94f9fe50b_1056x509.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>Pattern Template</strong></h2><h3>Pattern Name</h3><p><strong>Data Match</strong></p><h3>The Problem It Solves</h3><p>You know that moment when you're trying to figure out <strong>why your data numbers don't add up</strong> between two systems or tables? Or you're trying to check if <strong>everything from your source has made it to your target</strong>? </p><p>Often, you're faced with hours, or even days, of painstaking manual reconciliation, writing complex SQL queries, or dealing with inefficient brute-force comparisons that cost a fortune in compute resources. </p><p>This pattern solves the problem of <strong>quickly and efficiently identifying discrepancies</strong> between two datasets, saving immense time and frustration.</p><h3><strong>When to Use It</strong></h3><p>Use Data Match primarily as an <strong>exception thing</strong> or a <strong>support feature</strong>. It's most useful:</p><ul><li><p>When <strong>something goes wrong</strong> and you suspect data misalignment between a source and a target.</p></li><li><p>For <strong>reconciling data</strong> after a migration or a complex transformation, especially when trying to pinpoint missing records.</p></li><li><p>When you need to <strong>quickly compare two tables or datasets</strong> to find differences without writing custom SQL.</p></li><li><p>When <strong>manual reconciliation is proving horrendous</strong> due to large volumes or complex logic.</p></li><li><p>It's not designed as a core trust rule for every data movement, but rather for <strong>on-demand verification</strong>.</p></li></ul><h3>How It Works</h3><p>This pattern turns a complex data reconciliation task into a few simple clicks.</p><p><strong>Trigger:</strong> A user needs to verify data consistency between two datasets because a discrepancy is suspected, or an audit is required.</p><p><strong>Inputs:</strong></p><ul><li><p>A <strong>"table on the left"</strong> (source data) and a <strong>"table on the right"</strong> (target data). This could include data uploaded from an Excel spreadsheet as a "new tile".</p></li><li><p>Specific <strong>"things in each table they want to double check"</strong>, such as primary keys or particular columns.</p></li><li><p>Access to a <strong>data catalog</strong> where all relevant "tiles" (data assets) are loaded.</p></li></ul><p><strong>Steps:</strong></p><ol><li><p>The user <strong>selects the first dataset</strong> (e.g., "tile A") and the <strong>second dataset</strong> (e.g., "tile B") from an interface.</p></li><li><p>The user specifies the <strong>columns or keys</strong> within each dataset that need to be compared.</p></li><li><p>The user initiates the comparison, often with a simple "hit go" or a <strong>"1 2 3 4 five click exercise"</strong>.</p></li><li><p>Under the covers, the system performs an <strong>increasingly granular match</strong>:</p><ul><li><p>It starts by comparing <strong>row counts</strong>.</p></li><li><p>Then, it compares <strong>keys</strong> between the two tables.</p></li><li><p>Finally, it compares <strong>specific data values</strong> (e.g., "date of births"). This layering of rules <strong>optimises the comparison</strong> and avoids costly brute-force operations.</p></li></ul></li><li><p>The system <strong>optimises the underlying queries</strong> for the specific database environment (e.g., BigQuery), leveraging features like column storage and partition pruning for efficiency.</p></li></ol><p><strong>Outputs:</strong></p><ul><li><p>A <strong>report</strong> detailing "all the things in the left that aren't in the right or vice versa".</p></li><li><p><strong>Specific identification of discrepant records</strong>, such as a list of "customer IDs that haven't flowed".</p></li></ul><h3>Why It Works</h3><p>Data Match works because it <strong>automates and optimises a typically complex and manual process</strong>. It replaces hours of writing and running custom SQL with an intuitive, guided workflow, essentially providing a "data diff" capability as a service. </p><p>The pattern's effectiveness comes from its <strong>layered approach to comparison</strong>, moving from high-level checks (like row counts) to granular value comparisons, which makes it highly efficient and cost-effective, particularly for large datasets in cloud analytics databases. </p><p>It's like having an <strong>automated detective</strong> that quickly sifts through vast amounts of data to highlight the exact discrepancies, allowing analysts to focus on <em>why</em> the data is different, rather than <em>how</em> to find the differences.</p><h3>Real-World Example</h3><p>Consider a scenario where a data engineering team is trying to <strong>reconcile customer data</strong> that has been processed through new business rules with an existing Excel spreadsheet used by the business. Despite their efforts, they constantly find themselves "one customer out" after processing 100,000 customers, and each discrepancy is for a different, often obscure, reason. Manually finding that single missing customer is a "horrendous" and time-consuming task.</p><p>With <strong>Data Match</strong>, the team can quickly upload the Excel data as a new "tile," then use Data Match to compare it directly with their processed customer data. The tool rapidly <strong>highlights the exact single record that is out</strong>, turning "many hours of frustration" into "minutes" of investigation. This allows the team to spend their time understanding the root cause of the discrepancy with the business, rather than painstakingly searching for it.</p><p>Another example involves a <strong>data migration project</strong> where 100,000 customer records were sent via an API to a new vendor system, but only 80,000 appeared in the new system. Manually debugging this took "hours if not days". If Data Match had been available, they could have "back flushed" the final data loaded by the vendor as a "tile" and then compared it with the data they sent. This would have <strong>immediately identified the 20,000 records that didn't make it</strong>, saving significant time and effort in proving where the discrepancy occurred (e.g., showing the vendor that changes were made on their side, despite an agreement not to).</p><h3>Anti-Patterns or Gotchas</h3><ul><li><p><strong>Brute-Force Comparisons on Large Datasets:</strong> Trying to match everything between two very large tables without any optimisation or "layering of rules" will be <strong>extremely costly</strong> in terms of compute, credits, or tokens.</p></li><li><p><strong>Using Non-Optimised Tools:</strong> Relying on generic open-source libraries that are not specifically optimised for your cloud analytics database (e.g., a tool skewed towards row storage databases like Postgres when you're using a column-oriented database like BigQuery) will lead to <strong>inefficient queries and high costs</strong>, failing to leverage the database's performance benefits.</p></li><li><p><strong>Overuse as a Primary Trust Mechanism:</strong> Data Match is an <strong>"exception thing,"</strong> not a core "trust rule" to be run for every data movement. Over-relying on it for continuous validation can be inefficient and indicates a potential gap in proactive data quality monitoring.</p></li></ul><h3>Tips for Adoption</h3><ul><li><p><strong>Implement for On-Demand Use:</strong> Position Data Match as a powerful, on-demand <strong>"support feature"</strong> for when anomalies occur or specific reconciliations are needed, rather than an always-on data quality check.</p></li><li><p><strong>Optimise for Your Platform:</strong> If developing an internal version, ensure it's <strong>specifically tailored and optimised for your primary data platform</strong> (e.g., BigQuery) to maximise efficiency and minimise costs.</p></li><li><p><strong>Integrate with Data Catalogues:</strong> Make it easy for users to pick and compare any "tile" (data asset) loaded in your data catalogue, reducing the overhead of manual configuration.</p></li><li><p><strong>Focus on Post-Detection Analysis:</strong> Emphasise that Data Match quickly identifies <em>what</em> is different, enabling data professionals to then spend their valuable time on <em>why</em> the data differs and <em>how</em> to fix it.</p></li></ul><h3>Related Patterns</h3><ul><li><p><strong>Data Diff:</strong> This is the general term for the concept that Data Match embodies.</p></li><li><p><strong>Tracing Values:</strong> This related feature helps users specifically look for the flow of individual data points once discrepancies are identified by Data Match.<br> </p></li></ul><h2><strong>Press Release Template</strong></h2><h3>Capability Name</h3><p>Data Match</p><h3>Headline </h3><p>AgileData Launches <strong>Data Match</strong> to Slash Data Reconciliation Time from Hours to Minutes for Data Teams</p><h3>Introduction</h3><p>AgileData is thrilled to announce the availability of <strong>Data Match</strong>, a powerful new capability designed to simplify and accelerate the process of identifying discrepancies between two datasets. This feature empowers data analysts, engineers, and business users to quickly verify data consistency and pinpoint missing or mismatched records with unprecedented ease, ensuring greater confidence in their data.</p><h3>Problem</h3><p>"As a data professional, I've spent countless hours, sometimes even days, painstakingly trying to figure out <strong>why my numbers don't match</strong> between two systems or after a data migration. It's a horrendous, manual process of writing complex SQL or sifting through spreadsheets, often just to find that one elusive missing record. I just want to know what's different, quickly, so I can fix it."</p><h3>Solution </h3><p>Data Match<strong> </strong>transforms this laborious task into a quick, intuitive process. Users simply select two datasets (or "tiles"), specify the keys or columns they wish to compare, and with a few clicks, the system performs an <strong>optimised, granular comparison</strong>. It efficiently checks everything from row counts to specific data values, then generates a clear report highlighting all discrepancies. This eliminates the need for manual SQL queries and immediately pinpoints the exact records that are out of sync, saving <strong>hours of frustration and compute costs</strong>.</p><h3>Data Platform Product Manager</h3><p> "With <strong>Data Match</strong>, we're not just offering a new feature; we're fundamentally improving <strong>trust and auditability</strong> within our data ecosystem. It provides our users with an on-demand, highly efficient tool to quickly validate data alignment, ensuring that discrepancies are identified swiftly, reinforcing confidence in our data pipelines and overall data quality."</p><h3>Data Platform User</h3><p>"Honestly, <strong>Data Match is a game-changer</strong>. What used to take me 'hours, if not days,' to manually reconcile data or prove a discrepancy, now literally takes 'minutes' with just a few clicks. I don't have to remember complex queries; I just hit 'go' and get my answers, letting me focus on solving the <em>why</em>, not just finding the <em>what</em>."</p><h3>Get Started </h3><p>Ready to transform your data reconciliation process from hours to minutes? <strong>Data Match</strong> is available now within the AgileData platform. Connect with your AgileData team today to learn more about how to leverage this powerful capability, or visit agiledata.io for further details on adopting new patterns to craft your Agile Data way of working.</p><h2>AgileData App / Platform Example</h2><p></p><p></p><h2>AgileData Podcast Episode</h2><p><a href="https://podcast.agiledata.io/e/data-match-agiledata-engineering-pattern-7-episode-75/">https://podcast.agiledata.io/e/data-match-agiledata-engineering-pattern-7-episode-75/</a><br></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://podcast.agiledata.io/e/data-match-agiledata-engineering-pattern-7-episode-75/&quot;,&quot;text&quot;:&quot;Listen to Podcast Episode&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://podcast.agiledata.io/e/data-match-agiledata-engineering-pattern-7-episode-75/"><span>Listen to Podcast Episode</span></a></p><p></p><blockquote><p><strong>Subscribe:</strong> <a href="https://podcasts.apple.com/nz/podcast/agiledata/id1456820781">Apple Podcast</a> | <a href="https://open.spotify.com/show/4wiQWj055HchKMxmYSKRIj">Spotify</a> | <a href="https://www.google.com/podcasts?feed=aHR0cHM6Ly9wb2RjYXN0LmFnaWxlZGF0YS5pby9mZWVkLnhtbA%3D%3D">Google Podcast </a>| <a href="https://music.amazon.com/podcasts/add0fc3f-ee5c-4227-bd28-35144d1bd9a6">Amazon Audible</a> | <a href="https://tunein.com/podcasts/Technology-Podcasts/AgileBI-p1214546/">TuneIn</a> | <a href="https://iheart.com/podcast/96630976">iHeartRadio</a> | <a href="https://player.fm/series/3347067">PlayerFM</a> | <a href="https://www.listennotes.com/podcasts/agiledata-agiledata-8ADKjli_fGx/">Listen Notes</a> | <a href="https://www.podchaser.com/podcasts/agiledata-822089">Podchaser</a> | <a href="https://www.deezer.com/en/show/5294327">Deezer</a> | <a href="https://podcastaddict.com/podcast/agiledata/4554760">Podcast Addict</a> |</p></blockquote><div id="youtube2-G7L5JDMIP7E" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;G7L5JDMIP7E&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/G7L5JDMIP7E?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>AgileData Podcast Episode MindMap</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!glVt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c75717c-6c0e-44b3-bbd8-2b0530b11857_4949x10936.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!glVt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c75717c-6c0e-44b3-bbd8-2b0530b11857_4949x10936.png 424w, https://substackcdn.com/image/fetch/$s_!glVt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c75717c-6c0e-44b3-bbd8-2b0530b11857_4949x10936.png 848w, https://substackcdn.com/image/fetch/$s_!glVt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c75717c-6c0e-44b3-bbd8-2b0530b11857_4949x10936.png 1272w, https://substackcdn.com/image/fetch/$s_!glVt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c75717c-6c0e-44b3-bbd8-2b0530b11857_4949x10936.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!glVt!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c75717c-6c0e-44b3-bbd8-2b0530b11857_4949x10936.png" width="1200" height="2651.3736263736264" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c75717c-6c0e-44b3-bbd8-2b0530b11857_4949x10936.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:3217,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:2581633,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/172820886?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c75717c-6c0e-44b3-bbd8-2b0530b11857_4949x10936.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!glVt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c75717c-6c0e-44b3-bbd8-2b0530b11857_4949x10936.png 424w, https://substackcdn.com/image/fetch/$s_!glVt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c75717c-6c0e-44b3-bbd8-2b0530b11857_4949x10936.png 848w, https://substackcdn.com/image/fetch/$s_!glVt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c75717c-6c0e-44b3-bbd8-2b0530b11857_4949x10936.png 1272w, https://substackcdn.com/image/fetch/$s_!glVt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c75717c-6c0e-44b3-bbd8-2b0530b11857_4949x10936.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>AgileData Podcast Episode Transcript</h2><p><strong>Shane</strong>: Welcome to the Agile Data Podcast. I'm Shane Gibson. And I'm Nigel Vining. Hey, Nigel. Another data engineering bytes. Today we are gonna talk about a feature that we term data match. So tell me what it is and why I care </p><p><strong>Nigel</strong>: Data match. Came out of the age old question, how do I know something in my source is in my target?</p><p>Or as we like to say, what's in the left isn't in the right or vice versa. This is generally called a data diff in a lot of places. Generally, it's a Pattern of doing an increasingly granular match of something on the left, which is generally a table of. Data and something on the right. So we start off and we say, is the number of records the same?</p><p>Yes, it is. Great. Cool. Are all the keys the same between these two tables? Yep, they are. Cool. Now is the date of births on the table on the left the same as the date of births on the table. And we effectively, we go from the very wide row count down to something very specific. Do the actual values match from left to right?</p><p>Now that all sounds pretty straightforward and it technically is, but under the covers there's a whole lot of SQL and engineering patterns that are happening to basically run all those queries. So that's not something we would expect a analyst to generally do 'cause it's a bit of a faf. So we came up with this feature called Data Match, where we effectively lead a user.</p><p>Pick a table on the left. Pick a table on the right. Pick the thing in each table. They want to double check and hit, go under the covers. Then we optimize those comparisons and then we produce a report straight back to the user saying these are all the things in the left that aren't in the right, or vice versa.</p><p>So we've made it a 1, 2, 3, 4, 5 click exercise and you can reconcile anything in your environment and it's. </p><p><strong>Shane</strong>: I think this one from memory was an interesting problem. So we had a customer that we were doing the data work for. They had a series of business logic or rules that was in a fairly horrendous Excel spreadsheet.</p><p>So we used our way of working and we extracted the, the. Concept of those rules and we modeled the data properly and we applied those rules. And whenever we were trying to reconcile the numbers we got with the numbers in the spreadsheet, we were always one thing out. So let's just say it was a reconciling customers, they would have a hundred thousand and one customers.</p><p>We would have a hundred thousand customers. So we'd manually go through, find that one customer, work out that it was a timing problem or make sure we ran it at the same time. There would be one customer out and we'd go and check it. And then there was a bit of logic that they had, they didn't tell us about.</p><p>So we had to add that rule and somehow we just got into this loop where. We always won customer out and it was always for a different reason, but the cost of doing that manual reconciliation was horrendous. Data match allowed me to go, I can run that really quick. We grab the Excel data, I'd upload it, just dump it in like we do, get a new tile, compare it to the numbers we were producing consistently, and it would then highlight the one record that was out, you know.</p><p>Very short amount of time, and then I could spend all the time trying to work with them about why they had this record that we didn't. Or vice versa. So yeah, it just again, took something that was many hours of frustration and made up minutes, which was great. The idea of layering those rules though, that's important because otherwise you're just gonna brute force two very large tables.</p><p>Match everything and that is gonna cost you a shit ton of compute, a shit ton of credits, a shit ton of tokens depending on how your cloud analytics database vendor is charging you for that compute. </p><p><strong>Nigel</strong>: Yeah, so we poked a couple of reasonably well known open source libraries when we first started, 'cause we're like, we're not gonna reinvent the wheel.</p><p>This seems to be a fairly solved thing. Surely there's just gonna be a package we can pull down, point it to two tables and hit go. And that's, and we will run it. Technically there are, and we did start with one and it did work. Where we tend to run into, where we ran into rolls was it needed quite a lot of configuration, so we effectively had to come up with a whole wrap to pass it, enough configuration to make it work.</p><p>And that was fine. That was more just a bit of app development to give it what it needs. But then some of the problem was, it was, as is usually the case, it had been developed to run. On a particular database, I think it was Postgres from memory, which is quite common, or was MySQL. So it was heavily skewed towards a row storage database and how row storage databases work.</p><p>And so it was optimized. So the queries. So the queries when we came to run them on BigQuery, they ran and that was fine, but we didn't really get any. The benefits of BigQuery being a column and database and partition pruning and the like. So we played with it and played with it, and it got closer and closer.</p><p>In the end, we thought actually it'd be just quicker to write a template that would run a BigQuery and we'll effectively do the same thing, but we'll make a template and make it. Specific BigQuery. And we did, and that's effectively where we got to so we can optimize what we give to BigQuery. So it's very efficient and it runs very quickly and it doesn't really cost us anything.</p><p>'cause we know where the performance and cost savings are with BigQuery and that's how we got to our Pattern. We effectively just took an open source, one found the strengths and weaknesses. Rolled a variant of it, uh, for us, for BigQuery. </p><p><strong>Shane</strong>: And I think the other thing is we only run this when we need to. So it's not baked in as a core trust rule for every movement of data through every layer for every tile.</p><p>Is it? </p><p><strong>Nigel</strong>: No. This is effectively an exception thing. This is when something goes wrong. This is somewhere where we can quickly go click and say, ah, there's 10 customers that aren't. Aren't in this table where we'd expect them to be. So it's a really quick way without having to regress to and go and customize something because it already has all the tiles and a catalog loaded.</p><p>You can just go pick tile A, pick tile B, compare them, show me the differences, uh, and go away. So it takes the first layer of context and the overhead, sorry, of thinking about it gives you your answer in a report. Then you can go and do, as you said, do the analysis. 'cause now I've got a list of customer IDs that haven't flowed.</p><p>I can grab one of those customer IDs and actually go specifically look for it. And that's a really quite simple proposition because that nicely flows onto some of the other features we've built around looking, tracing values. </p><p><strong>Shane</strong>: I remember when we did that data migration use case, remember, where we grabbed data from a legacy source system and then pushed it through us, and then made it available as a API so that the new vendor could migrate the old data into the new platform.</p><p>And we had that gentleman's agreement, which was we do all the logic to match the new business rules for the new system. So effectively they'd hit the API for the data, grab the data, load it straight into their system, and there'll be no transformations between those steps. So that we always knew when we needed to change the way the data looked, it sat with us.</p><p>And when we did that test run and all of a sudden, let's say customer again, we passed. A hundred thousand customers out and only 80,000 turned up in their system. And we spent all that time manually trying to figure out why. And actually the answer was they had done some changes on their side between getting the data and loading it through their APIs, even though they said they wouldn't.</p><p>If I had just been able to take the final result that they'd loaded from their system and back flushed it in as a tile and then said, compare, that would've told me exactly which records didn't make it. And then yes, I would still have to talk to 'em about how come they didn't make it. But again, that would've saved hours if not days of proving we send a hundred thousand, you loaded 80.</p><p>We know. Therefore, it's somewhere between those steps and it's nothing to do with. Everything to the left of us would've saved us time. If we had to build it back then. </p><p><strong>Nigel</strong>: Yeah, it's, that's why, I guess it's in the app, it's what I would call a support feature. It's something we don't use very often, but if we need to, it's there to quickly do something and we don't have to remember how do I data diff, what queries do I need to run?</p><p>Grab out some queries, change the table names and the key names in them to run them. Again, it's click, click, here's my report. You know, it's a small overhead, but. When you're trying to do a whole lot of things. Yeah. That you're grateful for it. </p><p><strong>Shane</strong>: Yep. Hours to minutes. That's what I care about. Yep. </p><p><strong>Nigel</strong>: Excellent.</p><p><strong>Shane</strong>: Alright. I hope everybody has a simply magical day.</p>]]></content:encoded></item><item><title><![CDATA[AgileData Activity Event Tile, AgileData Engineering Pattern #6]]></title><description><![CDATA[The Activity Event Tile pattern tracks a sequence of events (e.g., subscription signups, subscription payments) for a core business concept by storing minimal event data.]]></description><link>https://agiledata.info/p/agiledata-activity-event-tile-agiledata</link><guid isPermaLink="false">https://agiledata.info/p/agiledata-activity-event-tile-agiledata</guid><dc:creator><![CDATA[Shagility]]></dc:creator><pubDate>Thu, 07 Aug 2025 20:07:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!liVp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>Automated Load Patterns based on Source Data Profiles</strong></h1><h2><strong>Quicklinks</strong></h2><blockquote><p><strong><a href="https://agiledata.substack.com/i/170387919/description">Description</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/170387919/pattern-context-diagram">Context Diagram</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/170387919/agiledata-pattern-template">Pattern Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/170387919/press-release-template">Press Release Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/170387919/agiledata-app-platform-example">AgileData App / Platform Example</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/170387919/agiledata-podcast-episode">AgileData Podcast Episode</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/170387919/agiledata-podcast-episode-mindmap">AgileData Podcast Mind Map</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/170387919/agiledata-podcast-episode-transcript">AgileData Podcast Transcript</a></strong></p></blockquote><h2><strong>Agile Data Engineering Pattern</strong></h2><p>An AgileData Engineering Pattern is a repeatable, proven approach for solving a common data engineering challenge in a simple, consistent, and scalable way, designed to reduce rework, speed up delivery, and embed quality by default.</p><h2><strong>Pattern Description</strong></h2><p>The <strong>Activity Event Tile</strong> pattern tracks a <strong>sequence of events</strong> (e.g., subscription signups, subscription payments) for a <strong>core business concept</strong> by storing minimal event data and using a <strong>hydration layer to automatically generate pre-calculated metrics as views</strong>.</p><p>This allows business users and analysts to easily understand <strong> business performance</strong> and <strong>event sequencing</strong> without complex SQL, providing immediate, consistent insights into activity movements and financial impacts</p><h2><strong>Pattern Context Diagram</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!liVp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!liVp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png 424w, https://substackcdn.com/image/fetch/$s_!liVp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png 848w, https://substackcdn.com/image/fetch/$s_!liVp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png 1272w, https://substackcdn.com/image/fetch/$s_!liVp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!liVp!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png" width="1200" height="323.0769230769231" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:392,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:227344,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/170387919?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!liVp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png 424w, https://substackcdn.com/image/fetch/$s_!liVp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png 848w, https://substackcdn.com/image/fetch/$s_!liVp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png 1272w, https://substackcdn.com/image/fetch/$s_!liVp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2039ae86-2efb-4dab-bd0e-a4da5f83d652_2924x787.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>Pattern Template</strong></h2><h3>Pattern Name</h3><p><strong>AgileData Activity Event Tile</strong></p><h3>The Problem It Solves</h3><p>You know that moment when you're trying to analyse a sequence of related business events, like customer subscriptions changing over time, and you need to answer questions about trends, movements, or financial impacts?. </p><p>Typically, this involves writing "really horrible SQL" with complex "windowing functions" to connect these events and calculate metrics like 'time between' or 'dollar changes'. </p><p>The problem is, these manual calculations are often "not repeatable or reusable in other reports," leading to inefficiency and inconsistent insights. Furthermore, dealing with activities that happen out of expected sequence can be a significant challenge without this pattern</p><h3>When to Use It</h3><p>Use this pattern when:</p><ul><li><p>Track a <strong>sequence of events</strong> related to a single, core business concept, such as subscriptions.</p></li><li><p>Derive <strong>time-based metrics and comparisons</strong> (e.g., changes between months, opening/closing balances, first/last occurrences).</p></li><li><p>Easily understand the <strong>financial impact</strong> or count of specific activities over time.</p></li><li><p>Simplify complex reporting that traditionally requires extensive SQL "windowing functions" or "slamming events together".</p></li><li><p>Handle and identify <strong>activities that occur out of expected sequence</strong> (e.g., a subscription going from "cancelled" to "paid") without hardcoding sequencing logic.</p></li><li><p>Provide <strong>repeatable and reusable calculations</strong> for your "last mile tool" or reporting layer</p></li></ul><h3>How It Works</h3><p>This pattern provides a structured and automated way to manage and report on a sequence of activities.</p><h4>Trigger: </h4><ul><li><p>A relevant activity occurs for a core business concept (e.g., a subscription is created, paid, renewed, or cancelled).</p></li></ul><h4>Inputs:</h4><ul><li><p>A <strong>business key</strong> that identifies the core concept (e.g., the subscription ID).</p></li><li><p>A <strong>clear description</strong> of the activity itself (e.g., "subscription created," "subscription paid").</p></li><li><p>A <strong>financial value</strong> associated with the activity (e.g., the payment amount) or a simple '1' if it's a non-financial occurrence.</p></li><li><p>The <strong>business date</strong> when the activity occurred</p></li></ul><h4>Steps:</h4><ol><li><p>For each activity, a <strong>minimal record</strong> is created containing only the business key, activity description, value, and date.</p></li><li><p>These records are <strong>appended</strong> to a dedicated "activity event table". This table is intentionally "tiny".</p></li><li><p>The "magic happens in the <strong>hydration layer</strong>".</p></li><li><p>A "consume tile" is automatically generated. This tile includes a list of <strong>standard metrics that are calculated "on the fly" as views</strong>. These metrics automatically provide insights such as:</p><ol><li><p>Is this the first or last event seen for that concept?</p></li><li><p>How many occurrences have been seen?</p></li><li><p>What is the revenue impact or value change compared to previous activities?</p></li><li><p>What are the opening and closing balances over time?</p></li><li><p>What was the previous activity for every record?</p></li></ol></li></ol><h4>Outputs:</h4><ul><li><p>A highly efficient, append-only <strong>activity event table</strong> that can hold "squillions of records" with minimal storage.</p></li><li><p>A <strong>"consume tile" that acts as a ready-to-use Data Asset</strong>, providing pre-calculated, real-time metrics and views.</p></li><li><p>The ability to directly plug this data into "last mile tools" for immediate and intuitive reporting on business health metrics, such as subscription performance.</p></li></ul><h3>Why It Works</h3><p>It's like having a dedicated, intelligent accountant for every ongoing business item, automatically calculating its dynamic financial and state changes. This pattern works because it:</p><ul><li><p><strong>Automates complex calculations:</strong> It handles the "horrible SQL" and "windowing functions" that are typically required to analyse event sequences and calculate movements or changes, making these complex tasks simple for end-users.</p></li><li><p><strong>Provides real-time, non-stale metrics:</strong> By generating calculations as views "on the fly," the metrics always reflect the latest reality from the activity feed, preventing stale data issues.</p></li><li><p><strong>Enables easy trend analysis:</strong> It allows businesses to effortlessly answer questions about "movement between months" or "total value of cancellations" by simply dragging pre-calculated columns onto a report.</p></li><li><p><strong>Efficiently stores data:</strong> The core activity event table is "tiny" and contains only essential keys, values, and dates, making it very performant and cheap to store vast amounts of event data.</p></li><li><p><strong>Optimises query performance:</strong> Clustering the table based on activity types and dates ensures that queries for calculations are "fast and cheap".</p></li><li><p><strong>Simplifies sequencing and anomaly detection:</strong> By tracking every activity and its predecessor, it inherently handles activities happening out of normal order, making it "really easy to pull out" anomalies like a subscription going from "cancelled to paid". It also simplifies creating "funnel reports" by providing current and last seen activities</p></li></ul><h3>Real-World Example</h3><p>Consider a <strong>Software as a Service (SaaS) company</strong> that needs to monitor key business metrics, often referred to as "pirate metrics," which track user acquisition, activation, retention, revenue, and referral.</p><p>Before adopting the Activity Event Tile pattern, an analyst might manually track events like "subscription created," "subscription paid," "subscription renewed," and "subscription cancelled" in a standard event table. To determine things like the net change in subscriptions month-over-month, the value of cancelled subscriptions in a given period, or the time elapsed between different subscription states, they would have to write highly intricate and time-consuming SQL queries involving multiple joins and complex window functions. These queries would often be specific to one report and difficult to reuse.</p><p>With the Activity Event Tile pattern, the company's data platform automatically records each subscription activity with its business key, type, value, and date. The pattern then generates a consumable data tile that exposes columns such as "subscriptions created this month versus last month," "total value of cancellations this month," or even the "previous activity" for each subscription. </p><p>Now, the analyst can simply <strong>drag and drop these pre-calculated columns</strong> into their reporting tool (their "last mile tool"), and the backend automatically performs the complex calculations, providing immediate and reliable insights into the health of their subscription business</p><h3>Anti-Patterns or Gotchas</h3><ul><li><p><strong>Bundling Unrelated Concepts:</strong> A major pitfall is trying to include multiple, distinct core concepts (e.g., users, customers, and subscriptions) within the <em>same</em> activity event tile. This makes reporting "skewed" because these are separate activities, even if related, and keying them together becomes "problematic." This pattern works best when focused on a single, binding core concept. </p></li><li><p>The original "activity schema patterns" that attempted to manage addiotnal related Core Business Concepts this with extra columns or JSON stringing doesn&#8217;t solve this problem at scale.</p></li><li><p><strong>Hardcoding Activity Sequencing:</strong> Don't attempt to hardcode the expected sequence of activities. This pattern naturally reveals when activities happen out of normal order, providing valuable insights rather than breaking the system.</p></li><li><p><strong>Pre-storing All Metric Combinations:</strong> Avoid the temptation to pre-store every possible combination of every metric. The pattern relies on calculations as views, which ensures metrics are always real-time and avoids persisting "stale value[s]"</p></li></ul><h3>Tips for Adoption</h3><ul><li><p><strong>Understand its Specialised Use:</strong> Recognise that this is "a slightly different Pattern" compared to standard event tiles, designed specifically for tracking streams of activity for a core concept.</p></li><li><p><strong>Keep the Activity Table Minimal:</strong> Ensure the underlying activity event table stores only the bare essentials: the business key, activity type, associated value, and the event date. This keeps it "tiny" and highly performant.</p></li><li><p><strong>Leverage Clustering:</strong> Cluster your activity event table on fields like activity types and dates. This significantly improves query speed and cost-effectiveness for the automated calculations.</p></li><li><p><strong>Focus on a Single Core Concept:</strong> When designing an activity event tile, ensure it revolves around one central business concept (e.g., subscriptions). Resist the urge to combine unrelated concepts into the same tile.</p></li><li><p><strong>Trust the Hydration Layer:</strong> Rely on the pattern's "hydration layer" to automatically generate the complex, real-time metrics and views, rather than trying to pre-calculate everything manually</p></li></ul><h3><strong>Related Patterns</strong></h3><ul><li><p><strong>Standard Event Tiles:</strong> This pattern is a variation of the AgileData Event Tile.  Event Tiles typically record "a row of keys into the event table like customer, orders, product" at a specific "point in time" for reporting. The Activity Event Tile extends this to handle sequences of events and time-series analysis more effectively.<br> </p></li></ul><h2><strong>Press Release Template</strong></h2><h3>Capability Name</h3><p>Activity Event Tile Data Modeling</p><h3>Headline</h3><p>New Activity Event Tile Delivers Easy-to-Understand Business Insights from Complex Sequences of Events for Analysts and Business Users</p><h3>Introduction</h3><p>Today, we're excited to announce the launch of our new <strong>Activity Event Tile</strong> capability. This powerful feature is designed to transform how businesses track and analyse sequences of events related to a core concept, such as customer subscriptions or product usage. It provides automated metrics and views, making it incredibly simple for business analysts and users to gain deep, consistent insights into dynamic business performance without needing to write complex code</p><h3>Problem</h3><p>"As a business analyst, I used to dread trying to track changes in our subscriptions and derive meaningful insights. I had to write <strong>really horrible SQL with complex windowing functions</strong> just to connect events and calculate things like 'time between' activities or 'dollar changes'. The worst part was, those calculations were often <strong>not repeatable or reusable in other reports</strong>, which meant I was constantly reinventing the wheel and couldn't rely on consistent data."</p><h3>Solution</h3><p>The Activity Event Tile solves this by creating a highly efficient, append-only table that records minimal data for each activity &#8211; just the business key, a clear description of the activity, an associated financial value (or a simple '1' if non-financial), and the business date. The true power lies in its <strong>"hydration layer"</strong>, which automatically generates a "consume tile" with a comprehensive list of <strong>standard metrics calculated "on the fly" as views</strong>.</p><p>This means users can now easily answer questions like "how many new subscriptions did I get this month compared to last year?" or "what's the total value of cancellations this month?" by simply <strong>dragging these pre-calculated columns onto their reports</strong>. It also inherently handles activities that happen out of expected sequence, providing insights into anomalies without hardcoding sequencing logic. This capability eliminates the need for complex manual SQL, providing real-time, non-stale insights directly in your reporting tools.</p><h3>Data Platform Product Manager</h3><p>"With the Activity Event Tile, we've transformed complex, bespoke reporting into a <strong>repeatable, auditable, and highly performant capability</strong>. Our data platform now consistently delivers quick business insights from multiple events, dramatically improving reusability and building greater trust in our data assets across the organisation."</p><h3>Data Platform User</h3><p>"It's incredible! Now, I can just <strong>drag and drop pre-calculated metrics</strong> like 'total value of cancellations this month' or 'movement between months' directly onto my reports. I get <strong>instant, reliable answers</strong> to my key business questions, without wrestling with complicated SQL or worrying about stale data &#8211; it's completely changed how I track our business health!"</p><h3>Get Started</h3><p>The Activity Event Tile capability is <strong>available today</strong>. Simply look for the new <strong>"consume tile"</strong> in your last mile tool, which exposes all the pre-calculated metrics. For more detailed information or to discuss specific use cases, please contact your data platform product manager.</p><h2>AgileData App / Platform Example</h2><p></p><p></p><h2>AgileData Podcast Episode</h2><p><a href="https://podcast.agiledata.io/e/activity-event-tile-agiledata-engineering-pattern-6-episode-72/">https://podcast.agiledata.io/e/activity-event-tile-agiledata-engineering-pattern-6-episode-72/</a><br></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://podcast.agiledata.io/e/activity-event-tile-agiledata-engineering-pattern-6-episode-72/&quot;,&quot;text&quot;:&quot;Listen to the Podcast Episode&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://podcast.agiledata.io/e/activity-event-tile-agiledata-engineering-pattern-6-episode-72/"><span>Listen to the Podcast Episode</span></a></p><div id="youtube2-B9WMwqfImHM" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;B9WMwqfImHM&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/B9WMwqfImHM?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>AgileData Podcast Episode MindMap</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Muq6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F712f3452-fee2-459d-8177-24ff799f9d01_5693x7136.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Muq6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F712f3452-fee2-459d-8177-24ff799f9d01_5693x7136.png 424w, https://substackcdn.com/image/fetch/$s_!Muq6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F712f3452-fee2-459d-8177-24ff799f9d01_5693x7136.png 848w, https://substackcdn.com/image/fetch/$s_!Muq6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F712f3452-fee2-459d-8177-24ff799f9d01_5693x7136.png 1272w, https://substackcdn.com/image/fetch/$s_!Muq6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F712f3452-fee2-459d-8177-24ff799f9d01_5693x7136.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Muq6!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F712f3452-fee2-459d-8177-24ff799f9d01_5693x7136.png" width="1200" height="1504.1208791208792" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/712f3452-fee2-459d-8177-24ff799f9d01_5693x7136.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:1825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:1906781,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/170387919?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F712f3452-fee2-459d-8177-24ff799f9d01_5693x7136.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Muq6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F712f3452-fee2-459d-8177-24ff799f9d01_5693x7136.png 424w, https://substackcdn.com/image/fetch/$s_!Muq6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F712f3452-fee2-459d-8177-24ff799f9d01_5693x7136.png 848w, https://substackcdn.com/image/fetch/$s_!Muq6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F712f3452-fee2-459d-8177-24ff799f9d01_5693x7136.png 1272w, https://substackcdn.com/image/fetch/$s_!Muq6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F712f3452-fee2-459d-8177-24ff799f9d01_5693x7136.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>AgileData Podcast Episode Transcript</h2><p><strong>Shane</strong>: Welcome to the Agile Data Podcast. I'm Shane Gibson. </p><p><strong>Nigel</strong>: And I'm Nigel Vining. </p><p><strong>Shane</strong>: Hey, Nigel. Onto the next Agile Data Engineering Patterns episode. This time we're gonna talk about activity events. For anybody listening, if you haven't listened to the previous episode of Data Engineering Patterns, where we talked about our event tiles, I suggest you go and listen to that first, because we're gonna assume you listened and understood that before we move on to this one.</p><p>So activity events, it's a slightly different Pattern that we use for our event tiles compared to our standard Pattern. So module, why don't you explain to me what it is and why I care. </p><p><strong>Nigel</strong>: Sure. So the previous one we talked about events where we basically. Insert a row of keys into our event table, like customer orders, product.</p><p>And that's cool because those three things are lined up at a point in time and we report on those. The activity events are a little bit different because effectively it's like a stream of things happening that we care about. Subscription is a really good example of where. Because we've got this, basically a whole lot of events come in where subscriptions are created, subscriptions are paid, subscriptions are renewed, subscriptions are canceled, and generally the type of.</p><p>Reporting that businesses do around subscriptions is, how many new subscriptions did I get this month compared to last year? And my number of subscriptions going up or going down, is the amount of revenue from my subscription static increasing, decreasing? So there's all these really useful questions that the activity schema can answer.</p><p>So the difference with our rule template for activities, they're really simple. All we capture is what the activity is, the business key. Subscription in this case. Basically a nice clear description. Like I said, subscription created, subscription paid, subscription canceled. We effectively describe what it is.</p><p>We also assign the. Financial value to each activity. Some activities don't lend themselves to a value. It may literally be a one that something happened compared to a dollar amount where a subscription payment is received, and that's where we would have that amount. So we can track those over time. And of course, the effectively the business data of when this activity.</p><p>It's really quite simple. We just create those and we just keep appending them into that table. The magic happens in the hydration layer, so where you take that list of keys attached to activities and we hydrate them into a. Tile and at the same time we include a list of standard metrics into that view that we calculate on the fly.</p><p>'cause we're looking across time periods of data, so we automatically have things like, is this the first subscription created event we've seen? Is this the last subscription created event we've seen? How many occurrences of subscription created have we seen? Are they repeated? What is the revenue impact of this?</p><p>Subscription paid activity, is the value going up or is the value going down when we effectively put it against the previous payment for that subscription? And what is that value change? What is the opening balance if we add up all the financial impact? Values. What's the opening balance of those over time?</p><p>What's the closing balance of those? Over time, although very simplistic, each of these has a relatively straightforward window and function, but we apply all those. Automatically on our activity data, so you don't care about it, you just throw your activities in and a consumer tile will turn up and all those metrics are there.</p><p>You can plug it straight to your last mile tool, and you are immediately reporting by the health of your business based on its </p><p><strong>Shane</strong>: subscriptions. The key for this one is we were building out some reporting for software as a service company that needed to do all the pirate metrics and all that kind of stuff.</p><p>And what I was forced to do is create a bunch of events. So I'd say that a subscription was created, a subscription was paid, description was renewed, description was canceled using our standard event table, but then to be able to work out the between time or the between dollars for those events, I had to write really horrible sequel to slam those events together.</p><p>And then doing all those windowing functions. So what I ended up doing was slamming them all together and then doing all the window functions and calculations of the last. Which means that none of those calculations were repeatable or reusable in other reports. So by introducing this new Pattern, I can effectively now go pick that data up and I can say, what's the movement between months really easily?</p><p>How many subscriptions were traded this month versus last month? How many subscriptions were created three months ago and then canceled this month? What's the total value of cancellations this month? They're all really simple because I just drag, those calculations or those columns onto my report and.</p><p>The back end does all the heavy work. And so that's one of the things that is quite important. Those calculations are effectively views. So we're not pre storing every combination of every metric. It's each column is effectively being generated or calculated on the fly when I grab it into that last mile tool, correct?</p><p><strong>Nigel</strong>: That is correct. Yep. 'cause we always. Want those metrics to reflect the reality in real time based on our last feed of subscriptions, because it's moving all the time and we're not moving historically, but we might be comparing to historic. So, but doing as a view, it really doesn't cost us a lot of extra and it makes sure we never persist a stale value.</p><p><strong>Shane</strong>: The other thing is we are effectively storing. The idea of an event happening twice. 'cause we will still have the event in a standard event tile and then we are storing a another version of it effectively in the activity event tile. But the data is minimal because we're holding a very small number of columns that are effectively just holding keys or time or dollar values, correct.</p><p><strong>Nigel</strong>: Yeah, the activity event table is tiny. It is literally the business key, what the activity was, the value associated with it, and when it happened, literally four or five columns, minimal. It's nothing. It's when we hydrate it and we go wide with the attributes and the metrics, but the actual activity table, we can put squillions of records in there and that's all good.</p><p>We cluster that one. Based on the activity types and the dates around that, because generally all of our calculations and metrics are all based around when something happened and what it was. So by clustering on those fields, it means that. The queries are actually very cheap 'cause we say we know where all the subscription created records are on the table and effectively all the dates are ordered within that.</p><p>So any of the calculations automatically based on cluster data. So it's fast and cheap. </p><p><strong>Shane</strong>: The other thing that allows us to do is deal with those horrible, any sequencing problems. Subscriptions, created subscriptions, paid subscriptions, canceled subscriptions, renewed, right? It's reopened. The person's come back after six months and turn that subscription back on, and so we don't actually have to hard code the sequencing of the activities.</p><p>Because those activities come from the data. But what it means is we can see when activities happen out of the normal order we would've expected because they're turning up as something that happened that we didn't expect. And that's really important. But we get that for free because effectively an activity happens and activities inserted, and now I can see a state change for that subscription when it happened and what the dollar value, the impact of it was.</p><p>And that saves a lot of complex SQL querying. Last mile. You have to answer those type of questions. </p><p><strong>Nigel</strong>: Yeah, because we keep track of what the previous activity is for every record, so we know if something's happened. We also give you the previous one for free because then you can say that they went from, as you said, canceled to paid.</p><p>That's an anomaly, and that's really easy to pull out because you shouldn't expect to see that. </p><p><strong>Shane</strong>: But we. Show you that it's happened, which means if I want to create a funnel report, it's really simple because I have the last seen activity and the current activity and that's data. And therefore most of the reporting tools will be a pretty funnel report with very little need for me to do a whole lot of sequel gymnastics.</p><p>Absolutely. That's exactly what's happening. So what we've done though is we've iterated a version of activity schema modeling. Yes. So we've taken the bits that work for us, the idea of having a core concept that's. Finding as the core content for this activity event tiles, in this case subscription. The idea of having activity types and having a row every time we see a, an activity type happen for that subscription.</p><p>The idea of holding a date and a financial metric or value, which we need to report off the idea of. Precalculating or holding all those complex calculations of betweens and increases and decreases and last seen. But what we haven't done is gone to the next step, which is have the ability to hold multiple other concepts.</p><p>And that's because when we looked at the activity schema Pattern, it didn't deal with that particularly well. It was effectively an anti-patent that they pushed. And originally what it said was you had two extra columns and that kind of didn't work. Then they started Jason stringing all those other concepts in there.</p><p>So you put them in and unpassed them. And again, that didn't work for us. So what we took is the things that worked for us to solve that one particular problem. It's given us a whole lot of value for a bunch of other use cases, but we haven't picked up that last part of the Pattern because in our view.</p><p>Doesn't actually solve the problem that we need to solve for that use case. </p><p><strong>Nigel</strong>: Yeah, bundling users, customers, subscriptions into the same activity doesn't make a lot of sense because effectively the reporting becomes a little bit skewed from it. 'cause they don't really go together. There are three activities that are happening side by side.</p><p>They're related, but to key them together is problematic. </p><p><strong>Shane</strong>: Yeah. So we'll probably solve that another day, but not today. Alright, and on that note, I hope everybody has a simply magical day.</p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Patterns for the AgileData Event Tile, AgileData Engineering Pattern #5]]></title><description><![CDATA[The Event Tile Data Modeling pattern captures business processes and transactions within AgileData's three-layered data architecture, modeling "who does what" at a specific point in time.]]></description><link>https://agiledata.info/p/data-engineering-patterns-for-the</link><guid isPermaLink="false">https://agiledata.info/p/data-engineering-patterns-for-the</guid><dc:creator><![CDATA[Shagility]]></dc:creator><pubDate>Wed, 30 Jul 2025 23:08:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!oExY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>Automated Load Patterns based on Source Data Profiles</strong></h1><h2><strong>Quicklinks</strong></h2><blockquote><p><strong><a href="https://agiledata.substack.com/i/169697974/description">Description</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/169697974/pattern-context-diagram">Context Diagram</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/169697974/agiledata-pattern-template">Pattern Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/169697974/press-release-template">Press Release Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/169697974/agiledata-app-platform-example">AgileData App / Platform Example</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/169697974/agiledata-podcast-episode">AgileData Podcast Episode</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/169697974/agiledata-podcast-episode-mindmap">AgileData Podcast Mind Map</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/169697974/agiledata-podcast-episode-transcript">AgileData Podcast Transcript</a></strong></p></blockquote><h2><strong>Agile Data Engineering Pattern</strong></h2><p>An AgileData Engineering Pattern is a repeatable, proven approach for solving a common data engineering challenge in a simple, consistent, and scalable way, designed to reduce rework, speed up delivery, and embed quality by default.</p><h2><strong>Pattern Description</strong></h2><p>The <strong>Event Tile Data Modeling</strong> pattern captures business processes and transactions within AgileData's three-layered data architecture, modeling "who does what" at a specific point in time. </p><p>It achieves this by storing only a small number of core business keys in <strong>insert-only</strong> event tables. </p><p>These event records are then <strong>automatically hydrated</strong> into a "consume tile" by joining with all relevant detail attributes that were current at the exact moment the event occurred, ensuring an accurate, time-staged narrative. </p><p>This provides data consumers with a rich, wide, and historically accurate view of events, simplifying the querying of complex historical changes and state transitions.</p><h2><strong>Pattern Context Diagram</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oExY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oExY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png 424w, https://substackcdn.com/image/fetch/$s_!oExY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png 848w, https://substackcdn.com/image/fetch/$s_!oExY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png 1272w, https://substackcdn.com/image/fetch/$s_!oExY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oExY!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png" width="1200" height="426.9230769230769" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:518,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:194491,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/169697974?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oExY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png 424w, https://substackcdn.com/image/fetch/$s_!oExY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png 848w, https://substackcdn.com/image/fetch/$s_!oExY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png 1272w, https://substackcdn.com/image/fetch/$s_!oExY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F139e3053-b1f3-425c-b118-dbc46dde56dd_2677x952.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>Pattern Template</strong></h2><h3>Pattern Name</h3><p><strong>AgileData Event Tile</strong></p><h3>The Problem It Solves</h3><p>You know that moment when you're dealing with complex business processes, like an order progressing through various statuses (entered, updated, approved, paid), and you need to understand exactly what happened at each point in time? </p><p>Or when a customer's details change, and you need to see the state of an event <em>at the exact moment it occurred</em>? </p><p>If you only use simple Concepts and Details (C&amp;D) for modelling, tracking these complex, temporal changes becomes a real headache, making it hard to query and maintain a consistent view of historical events. </p><p>The Event Tile pattern solves this by providing a clear, point-in-time view of business processes, ensuring accurate historical context for every transaction</p><h3>When to Use It</h3><p>Use this pattern when:</p><ul><li><p>You need to model a <strong>business process or transaction</strong> that happens, such as a customer ordering a product or an invoice status changing.</p></li><li><p>There are <strong>complex changes within the event</strong> itself, or if the attributes related to the event's participants (like customer addresses) change over time, and you need to capture the state at the moment of the event.</p></li><li><p>You require <strong>auditability and accountability of changes</strong>, providing a clear history of what happened, when, and why.</p></li><li><p><strong>Transparency of progress and impediments</strong> for data changes is valuable, similar to how transparency aids team coordination.</p></li><li><p>You're working within a <strong>three-layered data architecture</strong> consisting of history, designed, and consume layers, and the event represents a core object type in the designed layer.</p></li><li><p>You need a <strong>single, consistent pattern</strong> for loading various types of event data, regardless of their native source format</p></li></ul><h3>How It Works</h3><p>This pattern works by creating lean, insert-only records that represent a specific business occurrence, then richly hydrating them for consumption.</p><h4>Trigger: </h4><ul><li><p>A business process or transaction occurs, or a significant change happens to a designated "driving concept" within that process. For example, a new order is placed, or an existing order's quantity or status changes.</p></li></ul><h4>Inputs:</h4><ul><li><p>A set of <strong>business keys</strong> that uniquely identify the event and its associated participants at a specific point in time (e.g., customer ID, order ID, product ID).</p></li><li><p><strong>A driving concept</strong> identified for each event tile to dictate when new event records are inserted.</p></li><li><p><strong>Concepts:</strong> These are the "things" involved in the event, such as Customers, Orders, or Products, which hold unique identifiers.</p></li><li><p><strong>Details:</strong> These are the attributes related to the Concepts (e.g., customer name, product description, order quantity).</p></li></ul><h4>Steps:</h4><ol><li><p>A business process or transaction is identified for modelling.</p></li><li><p>An <strong>Event Tile</strong> is created, designed as a simple table that is a <strong>collection of only business keys</strong>. These tables typically contain no more than seven or eight keys, often closer to three to five, representing a "who does what" structure (e.g., customer orders product).</p></li><li><p>Event tables are <strong>insert-only</strong>; new records are only added if the unique situation described by the keys and effective date has not previously occurred.</p></li><li><p>Each event record includes an <strong>effective date</strong> (or business effective date), capturing the precise moment the event happened.</p></li><li><p>A <strong>driving concept</strong> (e.g., the 'order' in a 'customer orders product' event) is chosen. Any change to this driving concept (like an order update or quantity change) triggers the insertion of a new event record.</p></li><li><p>Records in the event tile are <strong>never end-dated</strong>. Instead, <strong>windowing functions</strong> are used during query time to determine which event was active at a specific point in time and what details relate to it.</p></li><li><p>The event tile is then <strong>automatically hydrated</strong> into a <strong>Consume Tile</strong>. This involves joining the event's keys to the relevant Concepts and their Details, ensuring that the attributes picked up are those that were current at the event's effective date.</p></li><li><p>For native source events (like Google Analytics data that arrives as an event), they can be <strong>unbundled into their component parts</strong> (concepts and details) and then re-hydrated back into an Event Tile. This standardises the loading pattern. Alternatively, if unique elements within the native event (e.g., page titles from GA4) need to be reported on, it can be more efficient to treat that native event as a Concept directly and split out the relevant data</p></li></ol><h4>Outputs:</h4><ul><li><p>lean, history-preserving <strong>Event Tile table</strong> consisting solely of business keys and an effective date, representing a specific business occurrence.</p></li><li><p>A <strong>Consume Tile</strong> that provides a "rich, wide narrative" of the event, encompassing all relevant attributes (from associated concepts and details) as they stood at the event's time. This provides a fully contextualised record for analysis</p></li></ul><h3>Why It Works</h3><p>It's like having a perfect, unalterable ledger of every significant business action. </p><p>Instead of storing all attributes directly with the event (which would lead to data duplication and difficulty in managing change), the Event Tile stores only the <em>relationship</em> of keys at a point in time. This simplicity allows for <strong>insert-only operations</strong>, which inherently preserves history and makes auditing straightforward. </p><p>When a view of the event is needed, the details are rehydrated from their respective Concepts, always ensuring the attributes are accurate <em>for the moment the event occurred</em>.</p><p>This approach allows for a <strong>single, hardened pattern for loading events</strong> regardless of their original source, making data engineering workflows far more efficient, easier to troubleshoot, and simpler to understand. </p><p>It provides a consistent, reliable mechanism for handling complex data changes and ensuring that historical analysis always reflects the true state of affairs at the time of an event</p><h3>Real-World Example</h3><p>Consider a retail scenario where a <strong>"Customer Orders Product"</strong> event occurs. The Event Tile for this would simply contain keys like <code>CustomerID</code>, <code>OrderID</code>, <code>ProductID</code>, and the <code>OrderDate</code> (as the effective date). The <code>OrderID</code> might be chosen as the <strong>driving concept</strong>.</p><p>Later, the customer updates their address, or the order quantity changes. Because the <code>OrderID</code> is the driving concept, if the order quantity changes, a <em>new</em> event record is inserted into the Event Tile, capturing the new state. No existing records are updated.</p><p>When a user queries for the <strong>"Customer Orders Product"</strong> event data, the system automatically hydrates the Consume Tile.</p><p>This involves joining the event record with the Customer Concept's details (e.g., customer name, address <em>at the time of the order</em>) and the Product Concept's details (e.g., product name, price <em>at the time of the order</em>), and the Order Concept's details (e.g. order quantity, value <em>at the time of the order</em>). This way, even if the customer's address changed last week, a query about an order placed two months ago will show the address the customer had <em>two months ago</em>.</p><p>This pattern is also crucial for modelling complex state changes, such as an <strong>invoice</strong> moving through <code>entered</code>, <code>updated</code>, <code>approved</code>, and <code>paid</code> statuses. Each status change would trigger a new event record, providing a complete, granular history of the invoice's lifecycle.</p><h3>Anti-Patterns or Gotchas</h3><ul><li><p><strong>Storing detail or numeric attributes directly on the event tile:</strong> This is a major pitfall. The Event Tile should <em>only</em> contain keys and an effective date. Storing details or numeric values on the event tile negates its core benefits, leading to data duplication and complexity when attributes change.</p></li><li><p><strong>Applying it to overly simple cases:</strong> For straightforward data, where attributes are self-contained and don't require complex historical context, a simple Concept and Detail (C&amp;D) model might suffice and be more efficient. Don't force Event Tiles where they aren't needed.</p></li><li><p><strong>Ignoring the overhead of joins during consumption:</strong> While the Event Tile itself is lean, the hydration into Consume Tiles involves joins. If not managed well (e.g., with efficient partitioning and clustering in BigQuery), this can incur significant computational cost.</p></li><li><p><strong>Failing to define a clear "driving concept":</strong> Without a consistent rule for when a new event record should be inserted (i.e., what constitutes a "change" to the event), the data can become inconsistent or incomplete.</p></li><li><p><strong>Directly treating native source events as-is when internal unique reporting is needed:</strong> If unique concepts embedded within a native event (like unique page titles in Google Analytics data) need to be reported on, it's often more efficient to unbundle and model them as distinct concepts rather than querying across all hydrated event</p></li></ul><h3>Tips for Adoption</h3><ul><li><p><strong>Establish a clear three-layered architecture:</strong> This pattern thrives within a "history, designed, consume" data architecture, with Event Tiles residing in the designed layer.</p></li><li><p><strong>Strictly define object types:</strong> Adhere to the principle that your designed layer only contains three types of objects: Concepts, Details, and Events.</p></li><li><p><strong>Enforce insert-only behaviour:</strong> Ensure that Event Tiles are append-only. New records are inserted, never updated or deleted, to maintain a full history.</p></li><li><p><strong>Utilise effective dating and windowing functions:</strong> Implement effective dates on Event Tiles and use windowing functions for querying to correctly pull associated details that were current at the time of the event.</p></li><li><p><strong>Carefully select the driving concept:</strong> Clearly identify which concept's changes will trigger a new event record insertion to ensure the granularity of your event history is appropriate.</p></li><li><p><strong>Optimise for joins:</strong> When hydrating, ensure your underlying data platform is configured for efficient joining (e.g., correct partitioning and clustering in BigQuery) to mitigate computational costs.</p></li><li><p><strong>Standardise native event ingestion:</strong> Develop a consistent process for unbundling and re-modelling native source events (like Google Analytics) to fit the Event Tile pattern, or identify when it's more beneficial to model parts of them as standalone concepts</p></li></ul><h3><strong>Related Patterns</strong></h3><ul><li><p><strong>Concepts:</strong> The fundamental "things" or entities in your data model that Event Tiles link together (e.g., Customer, Order, Product).</p></li><li><p><strong>Details:</strong> The attributes associated with Concepts that are joined to Event Tiles during hydration to form a Consume Tile (e.g., Customer Name, Product Price).</p></li><li><p><strong>Consume Tile:</strong> The wide, denormalised table that results from hydrating an Event Tile with its associated Concepts and Details at a specific point in time.</p></li><li><p><strong>Activity Event:</strong> A discussed variation of the Event Tile pattern that provides another way of doing event modelling, intended for a future deep dive.</p></li><li><p><strong>Data Vault Modelling (Link Tables):</strong> While Event Tiles share similarities with Data Vault's Link tables (representing relationships between business keys), they deviate by not directly connecting to "details" in the same way, creating a variation of the pattern.</p></li><li><p><strong>Dimensional Modelling (Fact Tables):</strong> Event Tiles behave somewhat like fact tables as they capture business events, but they do not store numeric attributes directly. Instead, these attributes are stored in related Concepts and hydrated at the Consume layer<br> </p></li></ul><h2><strong>Press Release Template</strong></h2><h3>Capability Name</h3><p>Event Tile Data Modeling</p><h3>Headline</h3><p>New Event Tile Pattern Delivers Accurate, Time-Staged Event Narratives for Data Consumers</p><h3>Introduction</h3><p>Today, the AgileData team is excited to announce the adoption and refined application of our Event Tile data modeling pattern. As a core component of our three-layered data architecture, Event Tiles provide a revolutionary way to capture business processes and transactions. By modelling "who does what" at a specific point in time, this capability ensures that data consumers receive a rich, wide, and historically accurate view of events, making complex data easy to understand and use</p><h3>Problem</h3><p>"As a data consumer, I always struggled with understanding complex changes in our data over time. It was a real problem to query historical events accurately, especially when attributes like a customer's address changed. I also faced massive problems trying to decide what the grain of an event was, or what triggered a new event or a change to an existing one."</p><h3>Solution</h3><p>The Event Tile pattern addresses these challenges by specifically modeling business processes and transactions as simple collections of business keys&#8212;typically between three to five, and never more than eight. These event tables are <strong>insert-only</strong>, meaning new records are added only when a unique situation occurs. A "driving concept," such as an order, dictates when a new event record is inserted, capturing every change of state and ensuring a complete historical record.</p><p>When an event is captured, it is automatically <strong>hydrated</strong> into a "consume tile". This process involves joining the event's keys with all relevant detail attributes that were current <em>at the exact moment the event occurred</em>. This results in a comprehensive, time-staged narrative that provides an accurate snapshot of the business process at any given point in time, solving the complexity of querying historical changes. This approach also provides a consistent and efficient pattern for loading events, regardless of their source</p><h3>Data Platform Product Manager</h3><p>"With our refined Event Tile pattern, we've transformed how we manage historical data, directly solving the long-standing issue of accurately tracking complex event changes. This capability significantly enhances the auditability and integrity of our data assets, leading to a far more efficient and robust data engineering pipeline for our team."</p><h3>Data Platform User</h3><p>"I absolutely love that when I query an event, I now get a complete and accurate picture of precisely what happened at that exact moment in time, including all the relevant details. It makes understanding complex business processes so much clearer, and I no longer have to worry about data inconsistencies or attributes changing unexpectedly."</p><h3>Get Started</h3><p>The Event Tile data modeling pattern is a fundamental part of AgileData's architectural design and is automatically applied to new data product development, ensuring robust and accurate data delivery. For more details on how this capability underpins your data solutions, please speak with your AgileData Platform Product Manager.</p><h2>AgileData App / Platform Example</h2><h3>Event Tile</h3><p><strong>Data Preview</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3X7u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea40759-170f-42b7-9aa6-b8a0f71abea6_1869x1109.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3X7u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea40759-170f-42b7-9aa6-b8a0f71abea6_1869x1109.png 424w, https://substackcdn.com/image/fetch/$s_!3X7u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea40759-170f-42b7-9aa6-b8a0f71abea6_1869x1109.png 848w, https://substackcdn.com/image/fetch/$s_!3X7u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea40759-170f-42b7-9aa6-b8a0f71abea6_1869x1109.png 1272w, https://substackcdn.com/image/fetch/$s_!3X7u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea40759-170f-42b7-9aa6-b8a0f71abea6_1869x1109.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3X7u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea40759-170f-42b7-9aa6-b8a0f71abea6_1869x1109.png" width="1456" height="864" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fea40759-170f-42b7-9aa6-b8a0f71abea6_1869x1109.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:864,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:266460,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/169697974?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea40759-170f-42b7-9aa6-b8a0f71abea6_1869x1109.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3X7u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea40759-170f-42b7-9aa6-b8a0f71abea6_1869x1109.png 424w, https://substackcdn.com/image/fetch/$s_!3X7u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea40759-170f-42b7-9aa6-b8a0f71abea6_1869x1109.png 848w, https://substackcdn.com/image/fetch/$s_!3X7u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea40759-170f-42b7-9aa6-b8a0f71abea6_1869x1109.png 1272w, https://substackcdn.com/image/fetch/$s_!3X7u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffea40759-170f-42b7-9aa6-b8a0f71abea6_1869x1109.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Fields (Schema)</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eXzz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9153cc6b-44a6-4a30-8e53-7084d8a8ceb6_1869x794.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eXzz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9153cc6b-44a6-4a30-8e53-7084d8a8ceb6_1869x794.png 424w, https://substackcdn.com/image/fetch/$s_!eXzz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9153cc6b-44a6-4a30-8e53-7084d8a8ceb6_1869x794.png 848w, https://substackcdn.com/image/fetch/$s_!eXzz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9153cc6b-44a6-4a30-8e53-7084d8a8ceb6_1869x794.png 1272w, https://substackcdn.com/image/fetch/$s_!eXzz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9153cc6b-44a6-4a30-8e53-7084d8a8ceb6_1869x794.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eXzz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9153cc6b-44a6-4a30-8e53-7084d8a8ceb6_1869x794.png" width="1456" height="619" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9153cc6b-44a6-4a30-8e53-7084d8a8ceb6_1869x794.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:619,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:111639,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/169697974?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9153cc6b-44a6-4a30-8e53-7084d8a8ceb6_1869x794.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eXzz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9153cc6b-44a6-4a30-8e53-7084d8a8ceb6_1869x794.png 424w, https://substackcdn.com/image/fetch/$s_!eXzz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9153cc6b-44a6-4a30-8e53-7084d8a8ceb6_1869x794.png 848w, https://substackcdn.com/image/fetch/$s_!eXzz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9153cc6b-44a6-4a30-8e53-7084d8a8ceb6_1869x794.png 1272w, https://substackcdn.com/image/fetch/$s_!eXzz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9153cc6b-44a6-4a30-8e53-7084d8a8ceb6_1869x794.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Related Tiles</strong></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P9OR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d27abde-0824-42f8-95da-92bfd6fb2b03_1870x323.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P9OR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d27abde-0824-42f8-95da-92bfd6fb2b03_1870x323.png 424w, https://substackcdn.com/image/fetch/$s_!P9OR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d27abde-0824-42f8-95da-92bfd6fb2b03_1870x323.png 848w, https://substackcdn.com/image/fetch/$s_!P9OR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d27abde-0824-42f8-95da-92bfd6fb2b03_1870x323.png 1272w, https://substackcdn.com/image/fetch/$s_!P9OR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d27abde-0824-42f8-95da-92bfd6fb2b03_1870x323.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P9OR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d27abde-0824-42f8-95da-92bfd6fb2b03_1870x323.png" width="1456" height="251" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d27abde-0824-42f8-95da-92bfd6fb2b03_1870x323.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:251,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:70952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/169697974?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d27abde-0824-42f8-95da-92bfd6fb2b03_1870x323.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!P9OR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d27abde-0824-42f8-95da-92bfd6fb2b03_1870x323.png 424w, https://substackcdn.com/image/fetch/$s_!P9OR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d27abde-0824-42f8-95da-92bfd6fb2b03_1870x323.png 848w, https://substackcdn.com/image/fetch/$s_!P9OR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d27abde-0824-42f8-95da-92bfd6fb2b03_1870x323.png 1272w, https://substackcdn.com/image/fetch/$s_!P9OR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d27abde-0824-42f8-95da-92bfd6fb2b03_1870x323.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!efqV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c12d4f-536c-4066-bc95-988141d372eb_1869x320.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!efqV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c12d4f-536c-4066-bc95-988141d372eb_1869x320.png 424w, https://substackcdn.com/image/fetch/$s_!efqV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c12d4f-536c-4066-bc95-988141d372eb_1869x320.png 848w, https://substackcdn.com/image/fetch/$s_!efqV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c12d4f-536c-4066-bc95-988141d372eb_1869x320.png 1272w, https://substackcdn.com/image/fetch/$s_!efqV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c12d4f-536c-4066-bc95-988141d372eb_1869x320.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!efqV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c12d4f-536c-4066-bc95-988141d372eb_1869x320.png" width="1456" height="249" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0c12d4f-536c-4066-bc95-988141d372eb_1869x320.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:249,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:71075,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/169697974?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c12d4f-536c-4066-bc95-988141d372eb_1869x320.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!efqV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c12d4f-536c-4066-bc95-988141d372eb_1869x320.png 424w, https://substackcdn.com/image/fetch/$s_!efqV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c12d4f-536c-4066-bc95-988141d372eb_1869x320.png 848w, https://substackcdn.com/image/fetch/$s_!efqV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c12d4f-536c-4066-bc95-988141d372eb_1869x320.png 1272w, https://substackcdn.com/image/fetch/$s_!efqV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0c12d4f-536c-4066-bc95-988141d372eb_1869x320.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><h3>Consume Tile</h3><p><strong>Data Preview</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CDls!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013581ff-af6e-4db0-8a79-0c084f1402c0_1870x1196.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CDls!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013581ff-af6e-4db0-8a79-0c084f1402c0_1870x1196.png 424w, https://substackcdn.com/image/fetch/$s_!CDls!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013581ff-af6e-4db0-8a79-0c084f1402c0_1870x1196.png 848w, https://substackcdn.com/image/fetch/$s_!CDls!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013581ff-af6e-4db0-8a79-0c084f1402c0_1870x1196.png 1272w, https://substackcdn.com/image/fetch/$s_!CDls!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013581ff-af6e-4db0-8a79-0c084f1402c0_1870x1196.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CDls!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013581ff-af6e-4db0-8a79-0c084f1402c0_1870x1196.png" width="1456" height="931" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/013581ff-af6e-4db0-8a79-0c084f1402c0_1870x1196.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:931,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:286430,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/169697974?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013581ff-af6e-4db0-8a79-0c084f1402c0_1870x1196.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CDls!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013581ff-af6e-4db0-8a79-0c084f1402c0_1870x1196.png 424w, https://substackcdn.com/image/fetch/$s_!CDls!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013581ff-af6e-4db0-8a79-0c084f1402c0_1870x1196.png 848w, https://substackcdn.com/image/fetch/$s_!CDls!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013581ff-af6e-4db0-8a79-0c084f1402c0_1870x1196.png 1272w, https://substackcdn.com/image/fetch/$s_!CDls!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F013581ff-af6e-4db0-8a79-0c084f1402c0_1870x1196.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Fields (Schema)</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YHjC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22a45d9b-0416-4bd8-ac13-582ced8b27e9_1870x1244.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YHjC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22a45d9b-0416-4bd8-ac13-582ced8b27e9_1870x1244.png 424w, https://substackcdn.com/image/fetch/$s_!YHjC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22a45d9b-0416-4bd8-ac13-582ced8b27e9_1870x1244.png 848w, https://substackcdn.com/image/fetch/$s_!YHjC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22a45d9b-0416-4bd8-ac13-582ced8b27e9_1870x1244.png 1272w, https://substackcdn.com/image/fetch/$s_!YHjC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22a45d9b-0416-4bd8-ac13-582ced8b27e9_1870x1244.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YHjC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22a45d9b-0416-4bd8-ac13-582ced8b27e9_1870x1244.png" width="1456" height="969" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/22a45d9b-0416-4bd8-ac13-582ced8b27e9_1870x1244.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:969,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:242353,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/169697974?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22a45d9b-0416-4bd8-ac13-582ced8b27e9_1870x1244.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YHjC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22a45d9b-0416-4bd8-ac13-582ced8b27e9_1870x1244.png 424w, https://substackcdn.com/image/fetch/$s_!YHjC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22a45d9b-0416-4bd8-ac13-582ced8b27e9_1870x1244.png 848w, https://substackcdn.com/image/fetch/$s_!YHjC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22a45d9b-0416-4bd8-ac13-582ced8b27e9_1870x1244.png 1272w, https://substackcdn.com/image/fetch/$s_!YHjC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F22a45d9b-0416-4bd8-ac13-582ced8b27e9_1870x1244.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wbf4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecfed39-ae43-4c5d-8bcf-994a64196dd3_1870x1244.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wbf4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecfed39-ae43-4c5d-8bcf-994a64196dd3_1870x1244.png 424w, https://substackcdn.com/image/fetch/$s_!wbf4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecfed39-ae43-4c5d-8bcf-994a64196dd3_1870x1244.png 848w, https://substackcdn.com/image/fetch/$s_!wbf4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecfed39-ae43-4c5d-8bcf-994a64196dd3_1870x1244.png 1272w, https://substackcdn.com/image/fetch/$s_!wbf4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecfed39-ae43-4c5d-8bcf-994a64196dd3_1870x1244.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wbf4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecfed39-ae43-4c5d-8bcf-994a64196dd3_1870x1244.png" width="1456" height="969" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ecfed39-ae43-4c5d-8bcf-994a64196dd3_1870x1244.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:969,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:226634,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/169697974?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecfed39-ae43-4c5d-8bcf-994a64196dd3_1870x1244.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wbf4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecfed39-ae43-4c5d-8bcf-994a64196dd3_1870x1244.png 424w, https://substackcdn.com/image/fetch/$s_!wbf4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecfed39-ae43-4c5d-8bcf-994a64196dd3_1870x1244.png 848w, https://substackcdn.com/image/fetch/$s_!wbf4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecfed39-ae43-4c5d-8bcf-994a64196dd3_1870x1244.png 1272w, https://substackcdn.com/image/fetch/$s_!wbf4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ecfed39-ae43-4c5d-8bcf-994a64196dd3_1870x1244.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JVdz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a9b121-24da-4355-badb-2fc7d699bfd5_1870x348.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JVdz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a9b121-24da-4355-badb-2fc7d699bfd5_1870x348.png 424w, https://substackcdn.com/image/fetch/$s_!JVdz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a9b121-24da-4355-badb-2fc7d699bfd5_1870x348.png 848w, https://substackcdn.com/image/fetch/$s_!JVdz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a9b121-24da-4355-badb-2fc7d699bfd5_1870x348.png 1272w, https://substackcdn.com/image/fetch/$s_!JVdz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a9b121-24da-4355-badb-2fc7d699bfd5_1870x348.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JVdz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a9b121-24da-4355-badb-2fc7d699bfd5_1870x348.png" width="1456" height="271" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9a9b121-24da-4355-badb-2fc7d699bfd5_1870x348.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:271,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:70776,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/169697974?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a9b121-24da-4355-badb-2fc7d699bfd5_1870x348.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JVdz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a9b121-24da-4355-badb-2fc7d699bfd5_1870x348.png 424w, https://substackcdn.com/image/fetch/$s_!JVdz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a9b121-24da-4355-badb-2fc7d699bfd5_1870x348.png 848w, https://substackcdn.com/image/fetch/$s_!JVdz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a9b121-24da-4355-badb-2fc7d699bfd5_1870x348.png 1272w, https://substackcdn.com/image/fetch/$s_!JVdz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9a9b121-24da-4355-badb-2fc7d699bfd5_1870x348.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><h2>AgileData Podcast Episode</h2><p><a href="https://podcast.agiledata.io/e/data-engineering-patterns-for-the-agiledata-event-tile-agiledata-engineering-pattern-5-episode-71/">https://podcast.agiledata.io/e/data-engineering-patterns-for-the-agiledata-event-tile-agiledata-engineering-pattern-5-episode-71/</a><br></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://podcast.agiledata.io/e/data-engineering-patterns-for-the-agiledata-event-tile-agiledata-engineering-pattern-5-episode-71/&quot;,&quot;text&quot;:&quot;Listen to the Podcast Episode&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://podcast.agiledata.io/e/data-engineering-patterns-for-the-agiledata-event-tile-agiledata-engineering-pattern-5-episode-71/"><span>Listen to the Podcast Episode</span></a></p><div id="youtube2-gVxBciERQ_U" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;gVxBciERQ_U&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/gVxBciERQ_U?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>AgileData Podcast Episode MindMap</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mny6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F630e4f9b-181f-49bd-9cc1-bb5cc214a498_6592x8938.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mny6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F630e4f9b-181f-49bd-9cc1-bb5cc214a498_6592x8938.png 424w, https://substackcdn.com/image/fetch/$s_!Mny6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F630e4f9b-181f-49bd-9cc1-bb5cc214a498_6592x8938.png 848w, https://substackcdn.com/image/fetch/$s_!Mny6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F630e4f9b-181f-49bd-9cc1-bb5cc214a498_6592x8938.png 1272w, https://substackcdn.com/image/fetch/$s_!Mny6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F630e4f9b-181f-49bd-9cc1-bb5cc214a498_6592x8938.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mny6!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F630e4f9b-181f-49bd-9cc1-bb5cc214a498_6592x8938.png" width="1200" height="1626.923076923077" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/630e4f9b-181f-49bd-9cc1-bb5cc214a498_6592x8938.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:1974,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:2367599,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/169697974?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F630e4f9b-181f-49bd-9cc1-bb5cc214a498_6592x8938.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mny6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F630e4f9b-181f-49bd-9cc1-bb5cc214a498_6592x8938.png 424w, https://substackcdn.com/image/fetch/$s_!Mny6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F630e4f9b-181f-49bd-9cc1-bb5cc214a498_6592x8938.png 848w, https://substackcdn.com/image/fetch/$s_!Mny6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F630e4f9b-181f-49bd-9cc1-bb5cc214a498_6592x8938.png 1272w, https://substackcdn.com/image/fetch/$s_!Mny6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F630e4f9b-181f-49bd-9cc1-bb5cc214a498_6592x8938.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>AgileData Podcast Episode Transcript</h2><p>Shane: Welcome to the Agile Data Podcast. I'm Shane Gibson.</p><p>And I'm Nigel Vining.</p><p>Hey, Nigel, another episode in our data engineering Pattern series, and today we're going to talk about event modeling. I think what we'll do is I'll start off talking a little bit about the concept of event modeling, and then from there we can deep dive into the engineering practices and patterns that you use to make.</p><p>So we've talked before about we hold a three layered architecture. Data comes into history. We then move it into a designed layer where we measure it all up, make it fit for purpose, and then we have a consumer layer where everybody comes in and uses that data. Within that designed layer, we have three major object types.</p><p>We have, a concept, which really is a thing we see. It holds effectively a key for a thing that we know exists. Customers, employees, suppliers, orders, payments. We another tile type object type, which is detail, which holds the attributes about those things. So customer name, customer date of birth, supplier type, employee address. And so those details are bound next to the concepts. And then the third type we have is an event which basically says we saw a business process or a transaction happen. And within those event tiles, we have two types that we deploy. The first one is our standard event. This is where we see a business process happen a customer orders a product, a customer pays for an order, somebody ships the product or the order to the customer.</p><p>The customer returns a product to an order. And a little while ago, we introduced a second patent that we used conceptually called an activity event. And we can talk about that next time. But for now, let's deep dive into this idea that we've actually got a business process or a business event and we model it as an event.</p><p><strong>Nigel</strong>: Yeah. So effectively what we do to make this magic happen is an event tile is basically just a collection of keys that something happened at a point in time, and these are all the business keys that are associated with it. So as you said at this date. This customer id, this order id, this product id, we effectively create that row in our event table of customer orders product, and then the next step is we automatically hydrate that event into a consumed tile.</p><p>So we effectively take that event and we join it to the details that are associated with it at that point in time. So the consumer's quite wide because effectively it's got all the customer attributes at that point in time. All the order attributes in a nice wide table, and that's all done at each point in time.</p><p>So then when something else happens with that order, or it's changed or it's status changed, we effectively insert another event record, which generates another consumer record out the other side to reflect the change to some aspect of that. And it also picks up all the same product attributes and customer attributes at that point in time.</p><p>So it's quite a simplistic thing because the tile is really quite boring. It's just a whole lot of business keys. But then when we hydrate it into a consume, it becomes a really rich, wide narrative for these events that are happening. </p><p><strong>Shane</strong>: And so effectively that tile just holds a small number of columns.</p><p>And unique IDs or keys sitting in there. So we've found that when we model it, it's typically no more than seven or eight keys in that table. </p><p><strong>Nigel</strong>: Yep. Seven or eight is getting up there. I haven't seen many past four, but Yep. I think seven is pretty much acknowledged as generally as wide as you go for those keys.</p><p>I haven't seen any business processes that tend to need more than that, </p><p><strong>Shane</strong>: and because we use that format from Lawrence Co Beam of who does what. We end up with event tables that are always more than two keys. So we don't relationally model our event tables as two-way links. We're effectively modeling that core business process using that language.</p><p>So we always have, like you said, three to four to five concepts. Part of an event. </p><p><strong>Nigel</strong>: Yeah. And it is generally , some variant of a, who does what? Who being a customer, for example, or a user that does being what they've done. They've ordered something. They've created something. And then the third part of the care is usually the what?</p><p>The thing. They've acted  upon a product. </p><p><strong>Shane</strong>:And one of the key things for me is because the orbit. The thing that typically holds a value, like quantity dollar is still a concept and it's part of that event the like a fact. Table from dimensional modeling, except we don't store the numeric attributes in that event.</p><p>We get those attributes when we build the consume because those attributes are held by the order concept . And if the event is customer orders product, then we know that the order quantity order value is sitting there and we rehydrate it. But it's different to a fact from dimensional modeling because we don't hold any attributes at all on that event tile, do we?</p><p>There's no detail on the event tile if I ever go and query it. It's just a bunch of keys. </p><p><strong>Nigel</strong>: Yes. The detail comes from effectively the. Concept keys, concepts are associated with attributes. So yes, we don't store any attributes on that event table because we don't need to because they're sitting in our detail tables.</p><p>When we hydrate that we get the applicable. Detail attributes at the point in time. </p><p><strong>Shane</strong>: The other thing is it's a form of a link table from DataBot modeling, but we don't follow the DataBot 2.0 standard where there is detail directly connected to that link table event table. In our term, it's not a link table because we don't follow the rules.</p><p>So we take the part of the patent we liked, and we created , a variation or our own patent for it. The other thing is those event tables are insert only. So we only ever insert new records in there. We never do any updates. </p><p><strong>Nigel</strong>: That is correct. We check if the unique situation, that we're about to put in there is has already happened.</p><p>And if it hasn't, we then insert that, , a new row of keys effectively. </p><p><strong>Shane</strong>: And then how do we deal with effective dating or dating those events? </p><p><strong>Nigel</strong>: So what we do is the event table has an effective tape. So we say at this point in time, or this business effective tape, this customer had a relationship with this order and this product.</p><p>So we've effectively, we've created our slice, and then when we join the attributes to that road to hydrate it, we basically pick the attribute that was current at the point in time of that event. So we say what customer record? Was the one that was applicable because it may have changed since the customer may have gone on and done something else.</p><p>So they've updated their address, they've updated something else, or their status or whatever. So what we do is when we hydrate that row and the consume, we hydrate it at the point in time the event was created. So it is always accurate for when it occurred. </p><p><strong>Shane</strong>: And then there's also this concept of a driving concept we call it.</p><p>Do you wanna talk me through that and how that works from an engineering point of view? </p><p><strong>Nigel</strong>: So we picked the first concept in the event as our driving concept, and that's the one we look to. Changes occur. We insert a new event row, so customer orders product, the driving concept in that case is likely to be the order, because the order is created, the order is updated, it's fulfilled, dispatched, so we want all those changes of state of the order.</p><p>So that's the one that we drive a new record. So when their order is updated. We insert new record and we pick up the applicable customer and product at the time. If that order's not updated, we don't create a new event row because we are not driving off the customer or the product. So if the customer's updated, it doesn't matter because this is an order event.</p><p><strong>Shane</strong>: just to clarify that when we talk about order being updated, what we're talking about is a new order Id turning up, so if a new order number turns up. Of an.</p><p>Quantity changed. We will then go and drive an insert of an event that related to that order changed. </p><p><strong>Nigel</strong>: For that instance, we would, because the order is the driver, so that order has changed. So we want a new record to basically capture that the order has changed. So when we hydrate that record, you can see that the order quantity has changed on it.</p><p>Okay. So we're </p><p><strong>Shane</strong>: effectively doing an insert of every. Of that event, driving off the driving concept. Yep. And then we don't end date any records, so now we are using a windowing function to see which event was active at a certain point of time. Yes. And therefore, what details relate to it? That is exactly right.</p><p>So what happens when we go and get data from somewhere like Google Analytics, which is another type of event, and it's not really customer orders product, it might be individual views webpage. Do we treat that differently </p><p><strong>Nigel</strong>: at all? We've done both versions of this. We went through a phase of treating Native Source events as events, and then we went through a phase of basically saying an event in that situation can be treated as a concept because every row from GA four is effectively a concept because it's a marker that someone.</p><p>Something and it was captured. And then the second part of it is the detail is what that something is. So GA four is a little bit unique in that it arrives as an event. So we can treat it as a concept or we can model it as event by splitting keys out. They both get the end result. One's just a little bit more technical 'cause you're unbundling the event just to turn </p><p><strong>Shane</strong>: it back into an event.</p><p>If we think about that, so we take what is an event? Record, and then we unbundle it to its component parts, a bunch of concepts and details and the idea that there was a relationship or an event happened, and then we rehydrate it back to itself because then we have a single Pattern for how we load events regardless of what type of data comes in.</p><p>And that makes us far more efficient in terms of hardening our code and troubleshooting and understanding the stuff. </p><p><strong>Nigel</strong>: There is use cases where we would split an event that's coming in when we want to report on the individual concepts that are bundled in it. So if there's a requirement to report on all the unique, say for example, page titles coming through, that's where splitting it out into a page title or page.</p><p>Concept makes a lot of sense because then you effectively have this master tile, which has all your titles in it. So for reporting, you can go straight to that. You don't have to go through a convoluted Pattern just to get a unique title. As </p><p><strong>Shane</strong>: one of the examples there is we have customers that hold customer attributes on their Google Tag Manager GA four Data.</p><p>So when it comes through, and we know we're gonna have to report on how many of those unique things there are. It's much cheaper for us to break those out to be concepts so we can report on them because they are things we wanna count and manage versus having to query across all the hydrated events to then find that attribute.</p><p>And on that, if you think about it, we're creating some event tiles. And then we're having to join to the concepts and then join to the details. There's lots of joins and we run on BigQuery. So isn't that an anti-patent to join all that data? Doesn't that cost us a fortune? </p><p><strong>Nigel</strong>: That is a really good point.</p><p>Joins do come with an overhead and a cost and a computational cost. We mitigate some of that by making sure the amount of partitions we are presenting to join and. We are to effectively bring down that overhead, but join we a bit overhead.</p><p>Doing the things that keep it efficient. </p><p><strong>Shane</strong>: What's the anti-patent for event tiles, do you reckon? When wouldn't we use this Pattern? We did try using Season Ds only because effectively I could create a concept of an order and then I could put the customer and the product and the detail for it. Yes, but we found as soon as we got complex changes within the event, like the customer's address changed.</p><p>Then it became a real problem to try and query it and event solved it because it gave us a view of everything about that event at the time the event happened. I can't think of an anti Pattern when we wouldn't do it. I think if we weren't modeling exactly the way we are modeling, we probably wouldn't use event times if we didn't have the simplicity of concepts, details, and events as being the only three types of objects that we model in our design layer.</p><p>We may have done something different. </p><p><strong>Nigel</strong>: Yeah, season Ds definitely have their place and because it's really easy to take ized data, to wrap it in a concept and the detail and then rehydrate it as a consume. And for some data sets, that's absolutely fine 'cause we pass it straight through. And that's all it is because the concept is really simple.</p><p>It might be simple attributes about, , a user. So we've got a user ID as the concept, and then the user detail is the user's name and login and stuff. So it's a really simple thing and it's self-contained. We get a file of updates for users when they last logged in, and that's a good example of a c and d, and we just basically rehydrate that as a user tile.</p><p>We've got a unique list of users in the concept. And that's fine. And we might just run that as it is. And then as you said, we may have another consume, which is something else, and then we would create a rule to put them together at the consume layer, because sometimes that is just straightforward. </p><p><strong>Shane</strong>: And I think that's the simple case.</p><p>And then we get the complex case, which is things like status change, invoice was entered, invoice was updated, invoice was approved, invoice was paid. And so that. Used to always cause me massive problems trying to decide what the grain of the event was, what was the trigger to say that a new event turned up or a change to that event happened, which probably leads into this idea of activity events that added as a another way of doing the event modeling for a whole bunch of reasons.</p><p>But I think we'll keep that one for another day. So I hope everybody has a simply magical day.</p>]]></content:encoded></item><item><title><![CDATA[Automated Load Patterns based on Source Data Profiles, AgileData Engineering Pattern #4]]></title><description><![CDATA[The Automated Load Patterns based on Source Data Profiles pattern automatically profiles incoming data to determine its optimal loading pattern.]]></description><link>https://agiledata.info/p/automated-load-patterns-based-on</link><guid isPermaLink="false">https://agiledata.info/p/automated-load-patterns-based-on</guid><dc:creator><![CDATA[Shagility]]></dc:creator><pubDate>Tue, 22 Jul 2025 02:13:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!WDNf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>Automated Load Patterns based on Source Data Profiles</strong></h1><h2><strong>Quicklinks</strong></h2><blockquote><p><strong><a href="https://agiledata.substack.com/i/168912285/description">Description</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/168912285/pattern-context-diagram">Context Diagram</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/168912285/agiledata-pattern-template">Pattern Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/168912285/press-release-template">Press Release Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/168912285/agiledata-app-platform-example">AgileData App / Platform Example</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/168912285/agiledata-podcast-episode">AgileData Podcast Episode</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/168912285/agiledata-podcast-episode-mindmap">AgileData Podcast Mind Map</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/168912285/agiledata-podcast-episode-transcript">AgileData Podcast Transcript</a></strong></p></blockquote><h2><strong>Agile Data Engineering Pattern</strong></h2><p>An AgileData Engineering Pattern is a repeatable, proven approach for solving a common data engineering challenge in a simple, consistent, and scalable way, designed to reduce rework, speed up delivery, and embed quality by default.</p><h2><strong>Pattern Description</strong></h2><p>The <strong>Automated Load Patterns based on Source Data Profiles</strong> pattern automatically profiles incoming data to determine its optimal loading pattern. </p><p>It classifies data as either <strong>event data</strong> (immutable, append-only) or <strong>change data</strong> (evolving records) by analysing characteristics like unique key volume and column names over time. </p><p>This automated classification then dictates the load strategy: an efficient <strong>partition replace</strong> for event data or a cost-effective <strong>upsert</strong> (end-dating historical records) for change data. </p><p>This capability <strong>removes the manual burden and accelerates data onboarding</strong>, providing a trustworthy and efficient process even when the data type is initially unknown.</p><h2><strong>Pattern Context Diagram</strong></h2><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WDNf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WDNf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png 424w, https://substackcdn.com/image/fetch/$s_!WDNf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png 848w, https://substackcdn.com/image/fetch/$s_!WDNf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png 1272w, https://substackcdn.com/image/fetch/$s_!WDNf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WDNf!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png" width="1200" height="443.4065934065934" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:538,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:65059,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/168912285?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WDNf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png 424w, https://substackcdn.com/image/fetch/$s_!WDNf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png 848w, https://substackcdn.com/image/fetch/$s_!WDNf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png 1272w, https://substackcdn.com/image/fetch/$s_!WDNf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57b21e4-96f2-476f-a273-f6eb2c7b7821_1598x591.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>Pattern Template</strong></h2><h3>Pattern Name</h3><p><strong>Automated Load Patterns based on Source Data Profiles</strong></p><h3>The Problem It Solves</h3><p>You know that moment when you receive new data from a customer or system, and you're not sure if it contains historical changes, is a stream of unique events, or a mix? </p><p>Manually figuring out how to load this data efficiently and correctly can be a real headache. It slows down data onboarding, requires a lot of manual thought, and if you pick the wrong approach, you could end up with duplicated data or incorrect historical views </p><p>This pattern solves that by <strong>automating the decision of which data load method to use</strong>, significantly speeding up onboarding and reducing the cognitive burden on data engineers.</p><h3>When to Use It</h3><p>Use this pattern when:</p><ul><li><p>You're receiving <strong>new data with an unknown or uncertain structure</strong> &#8211; you're not sure if it's event data (immutable, append-only) or change data (records that might be updated over time).</p></li><li><p>You need to <strong>onboard data quickly</strong>, especially when prototyping new loads for customers.</p></li><li><p><strong>Cost-efficiency and speed of data loading are crucial</strong>, as different data types benefit from different, optimised load methods.</p></li><li><p>You require <strong>auditing and accountability of changes</strong>, as the pattern helps manage historical data accurately by end-dating previous records.</p></li><li><p>You want an <strong>automated initial classification with a safety net for errors.</strong></p></li></ul><h3>How It Works</h3><p>This pattern functions like a smart data intake system, analysing incoming data to determine the best loading strategy.</p><h4>Trigger: </h4><ul><li><p>New data arrives in your system, often a file dropped into a secure bucket.</p></li></ul><h4>Inputs:</h4><ul><li><p>The incoming data itself</p></li><li><p>Optionally, user-defined change rules that help classify the data.</p></li></ul><h4>Steps:</h4><ol><li><p><strong>Data Classification:</strong> The system first classifies the incoming data into one of two primary types: <strong>event data</strong> (immutable, e.g., GA4 user views, sensor readings, which typically have event timestamps and event names) or <strong>change data</strong> (records that can change, e.g., customer status, order details, often linked by a business key).</p></li><li><p><strong>Source Data Profiling:</strong> The system profiles the incoming data, typically looking back over a recent period (e.g., 90 days). This involves:</p><ol><li><p>Analysing column names for identifiers common to event data.</p></li><li><p>Determining the "shape" of the data, such as how many unique keys appear per day. For example, if 200,000 unique customer IDs appear daily, it's likely event data; if only a few hundred change, it's more likely change data.</p></li><li><p>Applying thresholds based on known characteristics of event and change data.</p></li></ol></li><li><p>Based on the profiling, the data is "lightly tagged" as either event or change data.</p></li><li><p><strong>Load Pattern Determination:</strong> The system then uses this tag to select the appropriate load pattern:</p><ol><li><p>For <strong>'Concepts' data</strong>, which involves inserting new unique keys, an <strong>Insert pattern</strong> is always used, regardless of event or change classification.</p></li><li><p>For <strong>'Details' data classified as Change Data</strong>, an <strong>Upsert pattern</strong> is employed. If a record is new, it's inserted. If an existing record's "row hash" has changed, the existing row is marked as 'ended' (end-dated), and a new row with the updated information is inserted. This is chosen for its cost-effectiveness due to partitioning and clustered keys.</p></li><li><p>For <strong>'Details' data classified as Event Data</strong>, a simple <strong>Partition Replace pattern</strong> is used. Since event data is typically received only once as a stream, the system deletes the last couple of days of relevant partitions and then re-inserts those events. This is a very cheap and fast method as it avoids looking through the entire table for existing records</p></li></ol></li><li><p><strong>Automated Trust Rules:</strong> Post-load, "automated trust rules" monitor the loaded data. If the system's initial guess about the data type was incorrect (e.g., 1% of the time), these rules will trigger alerts, such as "unique key warnings" or "effective date" inconsistencies, indicating a potential misclassification that needs review.</p></li></ol><h4>Outputs:</h4><ul><li><p>Data that is <strong>correctly and efficiently loaded</strong> into the data platform using the most suitable pattern.</p></li><li><p><strong>Early identification of potential data quality issues</strong> or misclassifications through automated alerts.</p></li></ul><h3>Why It Works</h3><p>This pattern works because it's like having an intelligent gatekeeper for your data pipeline. Instead of a human needing to manually inspect and configure every new data source, the system takes on the burden of "cognition". It leverages the inherent characteristics of data (timestamps, keys, volume patterns) to make an informed, automated decision. </p><p>This leads to <strong>faster data onboarding</strong>, as teams can prototype new loads without knowing the data's exact nature upfront. The selection of specific load patterns (like cost-effective partition replace for events or end-dating for changes) ensures <strong>optimal performance and reduced processing costs</strong>. </p><p>Finally, the integrated "automated trust rules" act as a <strong>safety net</strong>, alerting you if the initial automated guess was off, turning potential errors into actionable insights and maintaining data trustworthiness. </p><p>It builds a system that <strong>hints at the right approach</strong> and then <strong>tells you if it's going wrong</strong>, allowing for quick corrections</p><h3>Real-World Example</h3><p>Imagine a new client starts sending you daily data dumps into a secure cloud storage bucket. One day, they send a file named <code>web_analytics_clicks.csv</code>. </p><p>With the Automated Load Pattern Selection pattern, when <code>web_analytics_clicks.csv</code> arrives, the system automatically profiles it. </p><p>It observes that each record has a unique <code>click_timestamp</code> and that <strong>hundreds of thousands of unique records</strong> (like "GA4 user viewed page" events) appear daily. </p><p>Based on this profile and volume, it correctly tags it as <strong>event data</strong>. The system then automatically selects and executes a <strong>Partition Replace load pattern</strong>, efficiently deleting and re-inserting the latest days of click data.</p><p>A few weeks later, the same client sends <code>customer_master_updates.csv</code>. </p><p>The system profiles this file. </p><p>It identifies a <code>customer_id</code> as a business key and notes that while the volume of new customer IDs is low (e.g., 100-1000 per day), existing customer records occasionally show changes in fields like <code>customer_address</code>. </p><p>This pattern is recognised as <strong>change data</strong>, prompting the system to automatically select an <strong>Upsert load pattern</strong>. This means if a customer's address changes, the old record is effectively "end-dated" by the system, and a new record with the updated address is inserted, maintaining a full history of changes cost-effectively. </p><p>If, by chance, the system misclassified a very low-volume event stream as change data, "automated trust rules" would trigger an alert for "unique key warnings" because of unexpected duplicates, prompting a human review</p><h3>Anti-Patterns or Gotchas</h3><ul><li><p><strong>Drastically Different Data Profiles:</strong> The pattern can be tripped up if the incoming data's characteristics dramatically change or are unusual for its type. For example, if an event stream suddenly has a very small volume (e.g., 5-10 events a day instead of thousands), the system might misclassify it as change data, leading to key errors and duplicate records on subsequent loads.</p></li><li><p><strong>Ignoring Automated Trust Rule Alerts:</strong> If the system guesses incorrectly (which happens in about 1% of cases), it will generate alerts (e.g., "unique key warnings," duplicates, or issues with effective dates). Ignoring these alerts will lead to untrustworthy data and broken processes, defeating the purpose of the pattern.</p></li><li><p><strong>Over-reliance without Understanding:</strong> Simply letting the system do its thing without understanding the underlying data types or the logic of the chosen patterns can lead to confusion when errors do occur, requiring manual intervention to correct configurations.</p></li></ul><h3>Tips for Adoption</h3><ul><li><p><strong>Trust the Automation (Initially):</strong> For new data, "let the system do that quick profile for us". This allows for much quicker prototyping and onboarding.</p></li><li><p><strong>Heed the Trust Rules:</strong> Pay close attention to the "automated trust rules". These are your early warning system for misclassifications or data quality issues.</p></li><li><p><strong>Understand Profiling Thresholds:</strong> Be aware that the system uses "thresholds based on what we know event data tends to look like". For very unusual data volumes, a manual check might be necessary, or the system might flag it for review.</p></li><li><p><strong>Consider AI Enhancements:</strong> As the data landscape evolves, explore using AI (e.g., LLMs) to provide better "hints" or "judge type patterns" to decide when profiling is unclear or volume is insufficient.</p></li><li><p><strong>Combine Patterns:</strong> The strength of this approach comes from "racking and stacking those patterns together to automate that work" &#8211; integrating profiling with specific load patterns and automated validation.</p></li></ul><h3><strong>Related Patterns</strong></h3><ul><li><p><strong>Automated Trust Rules:</strong> This pattern is highly complementary, as it validates the accuracy of the automated load pattern selection and alerts on issues.</p></li><li><p><strong>Schema Detection:</strong> Can work in conjunction, especially when initial profiling is inconclusive, to infer data structure and further inform load pattern decisions.</p></li><li><p><strong>Judge Type Pattern:</strong> A potential advanced pattern to resolve ambiguities when the automated profiling struggles to definitively classify data (e.g., very low volume events).</p></li><li><p><strong>Specific Load Patterns Enabled:</strong> This pattern selects from established data loading patterns such as:</p><p></p><ul><li><p><strong>Insert Pattern:</strong> For append-only data like 'Concepts'.</p></li><li><p><strong>Upsert Pattern:</strong> For managing changes in 'Details' data by end-dating previous records and inserting new ones.</p></li><li><p><strong>Partition Replace:</strong> A highly efficient method for loading 'Details' event data by replacing specific data partitions.</p></li><li><p><strong>End-Dating:</strong> A key component of the Upsert pattern for managing historical versions of records<br> </p></li></ul></li></ul><h2><strong>Press Release Template</strong></h2><h3>Capability Name</h3><p>Automated Data Loading with Smart Profiling</p><h3>Headline</h3><p>New Automated Data Loading Feature Accelerates Data Onboarding and Boosts Trustworthiness for Data Teams and Users</p><h3>Introduction</h3><p>Today, the Data Platform team is excited to announce the launch of our new <strong>Automated Data Loading with Smart Profiling</strong> capability. This revolutionary feature automatically determines the optimal way to load incoming data, removing manual guesswork and significantly speeding up the process of getting new data into your analytics systems. It's designed for anyone who needs to bring data into the platform, ensuring efficiency and accuracy from the first moment.</p><h3>Problem</h3><p>"As a data engineer, when new data arrives &#8211; especially from external sources &#8211; I always have to spend time figuring out if it's event data, change data, or something else. This manual classification slows down onboarding new data sources and carries the risk of picking the wrong load method, leading to data quality issues like duplicates or incorrect historical views. I wish the system could just figure it out for me!".</p><h3>Solution</h3><p>Our new Automated Data Loading capability intelligently profiles incoming data to automatically determine its 'shape' and optimal load pattern. When new data arrives, the system first classifies it as either <strong>event data</strong> (immutable streams like user views or sensor readings) or <strong>change data</strong> (records that update over time, like customer details or order statuses).</p><p>The system achieves this by:</p><ul><li><p><strong>Profiling the source data</strong> over a recent period (e.g., 90 days), analysing factors like the number of unique keys appearing per day and common column names associated with event data. For instance, if millions of unique records appear daily, it's likely event data; if only a few hundred changes occur, it's probably change data.</p></li><li><p><strong>Lightly tagging</strong> the data based on this profile.</p></li><li><p><strong>Automatically selecting the most efficient load pattern</strong>:</p><ul><li><p> For 'Concept' data (new unique keys), a simple <strong>Insert pattern</strong> is always used.</p></li><li><p> For 'Detail' data classified as <strong>Change Data</strong>, an <strong>Upsert pattern</strong> is applied. This means if a record is new, it's inserted; if an existing record has changed, the previous row is automatically 'end-dated', and a new row with the updated information is inserted, maintaining full history cost-effectively.</p></li><li><p> For 'Detail' data classified as <strong>Event Data</strong>, a <strong>Partition Replace pattern</strong> is used. Since event data is typically append-only, the system efficiently deletes and re-inserts only the most recent partitions of data, avoiding expensive full-table lookups.</p></li></ul></li></ul><p>This automated approach removes the manual burden and potential for errors. Furthermore, <strong>automated trust rules</strong> monitor the loaded data, alerting users to potential misclassifications (e.g., unexpected duplicates or effective date inconsistencies) if the system's initial guess was incorrect, ensuring data trustworthiness and providing a safety net.</p><h3>Data Platform Product Manager</h3><p>"With Automated Data Loading with Smart Profiling, we're removing the cognitive burden from our data engineers, allowing them to onboard new customer data significantly faster and with greater confidence. This capability enhances the trustworthiness and auditability of our data assets by ensuring the correct load patterns are applied, and crucially, it actively flags any data quality issues that arise from misclassification, turning potential problems into actionable insights."</p><h3>Data Platform User</h3><p>"I absolutely love that I can just drop new data files into our secure bucket, and the system automatically figures out the best way to load them. It&#8217;s incredibly fast, and I don't have to worry about complex configurations or whether I'm managing changes correctly. It just works, making my job so much easier and giving me trusted data right away!&#8221;</p><h3>Get Started</h3><p>You can start benefiting from Automated Data Loading with Smart Profiling immediately. Simply provide your data files as usual, and the system's intelligent profiling will do the rest, automatically applying the most efficient and correct load pattern. For more information on how this capability works or to understand the trust rules, please contact your data platform product manager.</p><h2>AgileData App / Platform Example</h2><h3>Concept / Detail / Event Rules Logic</h3><p><strong>Concept = Insert</strong></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dmxr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac84881-4029-4dcd-9a03-80971746c289_1834x203.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dmxr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac84881-4029-4dcd-9a03-80971746c289_1834x203.png 424w, https://substackcdn.com/image/fetch/$s_!dmxr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac84881-4029-4dcd-9a03-80971746c289_1834x203.png 848w, https://substackcdn.com/image/fetch/$s_!dmxr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac84881-4029-4dcd-9a03-80971746c289_1834x203.png 1272w, https://substackcdn.com/image/fetch/$s_!dmxr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac84881-4029-4dcd-9a03-80971746c289_1834x203.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dmxr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac84881-4029-4dcd-9a03-80971746c289_1834x203.png" width="1456" height="161" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ac84881-4029-4dcd-9a03-80971746c289_1834x203.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:161,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30629,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/168912285?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac84881-4029-4dcd-9a03-80971746c289_1834x203.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dmxr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac84881-4029-4dcd-9a03-80971746c289_1834x203.png 424w, https://substackcdn.com/image/fetch/$s_!dmxr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac84881-4029-4dcd-9a03-80971746c289_1834x203.png 848w, https://substackcdn.com/image/fetch/$s_!dmxr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac84881-4029-4dcd-9a03-80971746c289_1834x203.png 1272w, https://substackcdn.com/image/fetch/$s_!dmxr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ac84881-4029-4dcd-9a03-80971746c289_1834x203.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>Detail = Upsert</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GZ9Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963d3d34-2540-4367-8aa7-728e18b35663_1834x492.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GZ9Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963d3d34-2540-4367-8aa7-728e18b35663_1834x492.png 424w, https://substackcdn.com/image/fetch/$s_!GZ9Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963d3d34-2540-4367-8aa7-728e18b35663_1834x492.png 848w, https://substackcdn.com/image/fetch/$s_!GZ9Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963d3d34-2540-4367-8aa7-728e18b35663_1834x492.png 1272w, https://substackcdn.com/image/fetch/$s_!GZ9Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963d3d34-2540-4367-8aa7-728e18b35663_1834x492.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GZ9Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963d3d34-2540-4367-8aa7-728e18b35663_1834x492.png" width="1456" height="391" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/963d3d34-2540-4367-8aa7-728e18b35663_1834x492.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:391,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98339,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/168912285?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963d3d34-2540-4367-8aa7-728e18b35663_1834x492.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GZ9Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963d3d34-2540-4367-8aa7-728e18b35663_1834x492.png 424w, https://substackcdn.com/image/fetch/$s_!GZ9Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963d3d34-2540-4367-8aa7-728e18b35663_1834x492.png 848w, https://substackcdn.com/image/fetch/$s_!GZ9Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963d3d34-2540-4367-8aa7-728e18b35663_1834x492.png 1272w, https://substackcdn.com/image/fetch/$s_!GZ9Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F963d3d34-2540-4367-8aa7-728e18b35663_1834x492.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Event = Insert</strong></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xy6B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcabeadd-a2b2-4306-b98b-4b0451e3547d_1834x325.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xy6B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcabeadd-a2b2-4306-b98b-4b0451e3547d_1834x325.png 424w, https://substackcdn.com/image/fetch/$s_!Xy6B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcabeadd-a2b2-4306-b98b-4b0451e3547d_1834x325.png 848w, https://substackcdn.com/image/fetch/$s_!Xy6B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcabeadd-a2b2-4306-b98b-4b0451e3547d_1834x325.png 1272w, https://substackcdn.com/image/fetch/$s_!Xy6B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcabeadd-a2b2-4306-b98b-4b0451e3547d_1834x325.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xy6B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcabeadd-a2b2-4306-b98b-4b0451e3547d_1834x325.png" width="1456" height="258" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bcabeadd-a2b2-4306-b98b-4b0451e3547d_1834x325.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:258,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65898,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/168912285?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcabeadd-a2b2-4306-b98b-4b0451e3547d_1834x325.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xy6B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcabeadd-a2b2-4306-b98b-4b0451e3547d_1834x325.png 424w, https://substackcdn.com/image/fetch/$s_!Xy6B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcabeadd-a2b2-4306-b98b-4b0451e3547d_1834x325.png 848w, https://substackcdn.com/image/fetch/$s_!Xy6B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcabeadd-a2b2-4306-b98b-4b0451e3547d_1834x325.png 1272w, https://substackcdn.com/image/fetch/$s_!Xy6B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbcabeadd-a2b2-4306-b98b-4b0451e3547d_1834x325.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><h2>AgileData Podcast Episode</h2><p><a href="https://podcast.agiledata.io/e/automated-load-patterns-based-on-source-data-profiles-agiledata-engineering-pattern-4-episode-70/">https://podcast.agiledata.io/e/automated-load-patterns-based-on-source-data-profiles-agiledata-engineering-pattern-4-episode-70/</a><br></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://podcast.agiledata.io/e/automated-load-patterns-based-on-source-data-profiles-agiledata-engineering-pattern-4-episode-70/&quot;,&quot;text&quot;:&quot;Listen to the Podcast Episode&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://podcast.agiledata.io/e/automated-load-patterns-based-on-source-data-profiles-agiledata-engineering-pattern-4-episode-70/"><span>Listen to the Podcast Episode</span></a></p><div id="youtube2-Oav6rH2jauY" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Oav6rH2jauY&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Oav6rH2jauY?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>AgileData Podcast Episode MindMap</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9ceJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7509c1d8-3c0f-4429-90fb-f7833345e011_3858x5538.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9ceJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7509c1d8-3c0f-4429-90fb-f7833345e011_3858x5538.png 424w, https://substackcdn.com/image/fetch/$s_!9ceJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7509c1d8-3c0f-4429-90fb-f7833345e011_3858x5538.png 848w, https://substackcdn.com/image/fetch/$s_!9ceJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7509c1d8-3c0f-4429-90fb-f7833345e011_3858x5538.png 1272w, https://substackcdn.com/image/fetch/$s_!9ceJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7509c1d8-3c0f-4429-90fb-f7833345e011_3858x5538.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9ceJ!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7509c1d8-3c0f-4429-90fb-f7833345e011_3858x5538.png" width="1200" height="1722.5274725274726" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7509c1d8-3c0f-4429-90fb-f7833345e011_3858x5538.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:2090,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:1048057,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/168912285?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7509c1d8-3c0f-4429-90fb-f7833345e011_3858x5538.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9ceJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7509c1d8-3c0f-4429-90fb-f7833345e011_3858x5538.png 424w, https://substackcdn.com/image/fetch/$s_!9ceJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7509c1d8-3c0f-4429-90fb-f7833345e011_3858x5538.png 848w, https://substackcdn.com/image/fetch/$s_!9ceJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7509c1d8-3c0f-4429-90fb-f7833345e011_3858x5538.png 1272w, https://substackcdn.com/image/fetch/$s_!9ceJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7509c1d8-3c0f-4429-90fb-f7833345e011_3858x5538.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>AgileData Podcast Episode Transcript</h2><p><strong>Shane</strong>: Welcome to the Agile Data Podcast. I'm Shane Gibson.</p><p><strong>Nigel</strong>: And I'm Nigel Vining.</p><p><strong>Shane</strong>: Hey, Nigel. Today, another data engineering Pattern. This time we're gonna talk about automated load patterns based on source data profiling. So take me away with what it is and why I care.</p><p><strong>Nigel</strong>: This one's quite an important one, and it underpins our product in lots of ways. So essentially we have, we classify the two types of data that turn up in our projects as event data. Think GA4 user viewed page or sensor type data which captured a reading of something or change data info about a customer.</p><p>Customer changing status, customer changing address, orders, order is created and order is processed. We do a little bit of magic when you data turns up and we look at the data to determine the shape. If we think it's event data or change data, event data is generally quite clear cut because it will have event timestamps.</p><p>You'll have an event name. It'll have attributes that basically give us pretty good clue that it's event data. But what we also do is we look basically back over the last 90 days. Of that data and we profile it to see what shape of it. How many records are turning up each day based on the timestamps or the business key that we know.</p><p>So if we have business keys and we see changes of the data in the business key, we're fairly confident that it's changed data. So once we know. Whether the data is event or change data, we lightly tag it and then what we do is when we go to load it, this is where we determine which load pattern where we use.</p><p>So you know, we use concepts, details, events, concepts we always use in. Insert pattern and that's fine. Whether it's event data or change data, it doesn't matter. All we're doing is inserting a new key when we see it. When we get to detail about the data, this is where we change. So for change data we run in upsert pattern.</p><p>So if it's change data we haven't seen before, we insert it. If it's change data, . We already have, but it's changed. The row hash has changed. Then we update the existing row to mark it as effectively ended and we insert a new row. Whereas if it's event data, event data is generally only given to us once.</p><p>Cause it's just a long stream of something happened, something happened, something happened. So what we use for that is we don't need to go looking back through the whole table to see if we've got it. We use a simple partition replace and we basically just delete the last couple of days of petitions and then we insert those events again.</p><p>So we're always basically putting the new events and at the end of the table and we don't go looking for them. Cause this is a really cheap and fast pattern. So that's effectively what we're doing with this pattern.</p><p><strong>Shane</strong>: So we're looking to say as the data we're being given immutable it's an event that happened. We'll never be given that exact same event again. That event never changes. So that's typically, gA fours where we use it a lot. Somebody viewed a webpage. And then if it is, then we do an insert Pattern ,</p><p>we just basically insert those records because we know, we don't have to worry about whether they have changed or not. 'cause they're all immutable and new. Otherwise it's a different Pattern where we know the data might change. Probably comes from some form of relational schema in the source system.</p><p>So we know that, the customer name might change or the customer address might change or the order quantity may change. And we therefore have to manage those changes and rack and stack them. So that's, so therefore we do an upset Pattern , where effectively, end date, the previous record and insert the new record</p><p><strong>Nigel</strong>: correct, yes,</p><p><strong>Shane</strong>: That end dating is quite an old Pattern , it's a bit of a ghost of data past. Why don't we do that fancy one where we don't end data and then just leave the insert dates and then go through into a whole lot of windowing functions whenever we need to determine what is the latest record.</p><p><strong>Nigel</strong>: a lot of that's round cost we partition by those dates so we can quickly look at the current data. If we weren't to do that, it means we would be looking across. Potentially all the rows in the table every time for each business key to find out where that record started and ended.</p><p>So there's a lot more overhead to do it dynamically, whereas to do a quick look up because it's a partition and a clustered key, we can do it and endate the record very cheaply.</p><p><strong>Shane</strong>: Cool. So it's a cost decision that we've used to pick that Pattern.</p><p><strong>Nigel</strong>: That's exactly right. We could definitely do it with a lot less Complexity just to load all the records and then at the end do a window as you said, look for the latest record and mark it as current, but that's more processing. We have to read more data to do that pattern.</p><p><strong>Shane</strong>: Talk me through the way the profiling works, how does it actually determine whether it thinks it's an event or change data?</p><p><strong>Nigel</strong>: So the profiling is based on the change rule that the user creates, so the user will create rules, say in this case a. concept rule to load a concept tile. Once we have that rule, we can effectively run that rule and bucket the results that come back to work out how many unique keys are appearing.</p><p>Per day and we also check the column names as well looking for identifiers that we know that event data typically have, but profiling generally tells us because if we are getting lots of unique keys turning up every day, it's generally indicative of event data because you don't create say 200 ,000 customer IDs every day.</p><p>You may create. You know, 100 or 1000, but not large volume. So if we see 200 ,000 keys turning up every day, those are probably all unique event keys and we're pretty safe with that. So we use some thresholds in there based on what we know event data tends to look like.</p><p><strong>Shane</strong>: And why do we do it? Like why don't you just force me to tell the system what kind of data it is when I start to collect it.</p><p><strong>Nigel</strong>: that's a good question. We're effectively. Removing a little bit of cognition from your load and also what happens is it means we can onboard data a lot quicker because typically when we prototype a new load for a customer, we just keep giving a file. We may not know if it's event or change data. We just load that file.</p><p>And effectively we lead the system do that quick profile for us so you can happily go and create rules on that in 99 percent of the time it's going to go cool. It's event data and a load like this you say. Great, it's changed data below like this. The 1 percent of the time that we get our guests wrong, it'll still load, but effectively we would generally find that the second load throws a few errors and warnings because we're not loading the data quite right.</p><p><strong>Shane</strong>: And that's where the trust rules come in, those automated trust rules will give us an alert to say this has been flagged as event data, but we are starting to see that the customer key is constantly getting change records coming through. That's an anti-patent for the patent that you've been selected.</p><p>Therefore, you probably need to go and change the config in context and rebuild some stuff so that it's safe, it&#8217;s trustworthy. </p><p><strong>Nigel</strong>: that's exactly what happens. Effectively, the first thing we get is usually a unique key warnings because we're always checking for unique records. If we're not loading the data quite right, it'll start to create duplicates very quickly. The other thing we check for is the effectivity date. Of those records, because if we're getting lots of records turning up with the same effective date and we're trying to update based on that, it's also an indication that we possibly using the wrong pattern because we don't have an effective date for each business key.</p><p>So it's more likely to be event data.</p><p><strong>Shane</strong>: And then Antipas where wouldn't you use this? Or what's the gotchas? You gotta watch out for?</p><p><strong>Nigel</strong>: The anti pattern is where we get data turning up. That's very different to what we've seen and what we're based our profile on. Like an example would be event data, but very small number of events. Say you know five or 10 events a day, whereas typically events are in the magnitude of thousands, tens of thousands, hundreds of thousands.</p><p>You wouldn't expect to see. Four or five events being created each day, so that would trip us up. We would go. Oh, that's changed data. Is that volume so low and it would load right for a day and then we would on day two would start to get key errors because it would start to duplicate the data.</p><p><strong>Shane</strong>: I like the way it effectively gives us a hint, that picks it for us. And then if something goes wrong, it tells us and we can fix it. I like. That because often we have customers that just send us their data, so they're not comfortable with us reaching in and collecting it, connecting internally to their systems or their cloud systems.</p><p>So we give them a secure bucket and they effectively just dump some data in there and we've never seen it before. So this helps us do that first bit of design around that data. I think with the new AI wave, there'll be some interesting ways we can enhance it. In theory as we see more and more systems, we could actually use that as a hint to one of the LMS to actually say, here's all the patterns of data we've seen before.</p><p>And then we could actually tag certain systems like google Analytics to say it is event data so that when we get, a customer that's only got three events happening through Google Analytics on a daily basis, it's gonna drop back to say, volume's too small. Let's go look at some of the other patterns and use a judge type Pattern to decide which one wins and also flags it to say not enough volume to actually determine the profile properly.</p><p>Fill back to schema detection. you probably want to go and look at this one and just check it or wait to see if you get any alerts to say the data's not so trustworthy. It's all about racking the, stacking those patents together to automate that work. Excellent. I hope everybody has a simply magical day.</p>]]></content:encoded></item><item><title><![CDATA[Trust Rules - Automated Data Validation, AgileData Engineering Pattern #3]]></title><description><![CDATA[The Trust Rules pattern provides automated data validation to ensure all incoming data is fit for purpose and trustworthy]]></description><link>https://agiledata.info/p/trust-rules-automated-data-validation</link><guid isPermaLink="false">https://agiledata.info/p/trust-rules-automated-data-validation</guid><dc:creator><![CDATA[Shagility]]></dc:creator><pubDate>Mon, 14 Jul 2025 00:08:46 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Y74r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>Trust Rules - Automated Data Validation</strong></h1><h2><strong>Quicklinks</strong></h2><blockquote><p><strong><a href="https://agiledata.substack.com/i/167964917/pattern-description">Description</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167964917/pattern-context-diagram">Context Diagram</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167964917/agiledata-pattern-template">Pattern Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167964917/press-release-template">Press Release Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167964917/agiledata-app-platform-example">AgileData App / Platform Example</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167964917/agiledata-podcast-episode">AgileData Podcast Episode</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167964917/agiledata-podcast-episode-mindmap">AgileData Podcast Mind Map</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167964917/agiledata-podcast-episode-transcript">AgileData Podcast Transcript</a></strong></p></blockquote><h2><strong>Agile Data Engineering Pattern</strong></h2><p>An AgileData Engineering Pattern is a repeatable, proven approach for solving a common data engineering challenge in a simple, consistent, and scalable way, designed to reduce rework, speed up delivery, and embed quality by default.</p><h2><strong>Pattern Description</strong></h2><p>The <strong>Trust Rules</strong> pattern provides <strong>automated data validation</strong> to ensure all incoming data is <strong>fit for purpose</strong> and trustworthy. </p><p>It <strong>bakes in essential checks</strong> such as unique business keys and business effective dates, which run automatically upon data load or table refresh. </p><p>Users can also define <strong>custom validation rules</strong> for specific columns. </p><p>Results are collected, persisted, and surfaced via applications or alerts, with the system <strong>optimising validation for cost and speed</strong> through smart partitioning and clustered columns.</p><h2><strong>Pattern Context Diagram</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Y74r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Y74r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png 424w, https://substackcdn.com/image/fetch/$s_!Y74r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png 848w, https://substackcdn.com/image/fetch/$s_!Y74r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png 1272w, https://substackcdn.com/image/fetch/$s_!Y74r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Y74r!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png" width="1200" height="703.021978021978" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:853,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:113953,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/167964917?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Y74r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png 424w, https://substackcdn.com/image/fetch/$s_!Y74r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png 848w, https://substackcdn.com/image/fetch/$s_!Y74r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png 1272w, https://substackcdn.com/image/fetch/$s_!Y74r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c94a8db-3cee-45c6-b694-e5ea95840ce8_1767x1035.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>Pattern Template</strong></h2><h3>Pattern Name</h3><p><strong>Trust Rules - Automated Data Validation</strong></p><h3>The Problem It Solves</h3><p>You know that moment when you're working with data, and you're just not sure if you can actually rely on it? Is it accurate? Is it complete? Are there duplicate entries mucking things up downstream? Trust Rules effectively solve the problem of ensuring data is "fit for purpose" and addressing the "bane of engineers lives" when it comes to data validation. When teams don't automatically validate data, crucial issues like duplicate business keys or missing effective dates can go unnoticed, leading to untrustworthy data and downstream problems. This pattern ensures that a core set of necessary data quality checks are performed automatically, removing the burden from individual engineers to constantly remember and implement them</p><h3>When to Use It</h3><p>Use Trust Rules as a default, not an exception, and have them <strong>"baked into the product from day one"</strong></p><p>Use this pattern when:</p><ul><li><p><strong>When new data arrives or a table is refreshed.</strong></p></li><li><p>When you need to <strong>"check its fit for purpose"</strong> and ensure data is trustworthy<strong>.</strong></p></li><li><p>To ensure fundamental data integrity, such as checking for <strong>unique business keys</strong> or the presence of a <strong>business effective date for every row.</strong></p></li><li><p>When specific, <strong>user-defined validations</strong> are needed for particular columns (e.g., email validity, phone number format).</p></li><li><p>To <strong>reduce the effort and cognitive load</strong> on data engineers by automating common, mandatory checks<strong>.</strong></p></li></ul><p>It's especially helpful for establishing a baseline of data quality that is <strong>always applied, regardless of context</strong></p><h3>How It Works</h3><p>This pattern establishes a robust, automated data validation system.</p><h4>Trigger: </h4><p>The pattern is initiated when:</p><ul><li><p>A task is identified that requires data validation, typically when <strong>new data is loaded, turns up, or a table is refreshed.</strong></p></li><li><p>This process is often initiated by a PubSub message indicating that a table refresh is complete</p></li></ul><h4>Inputs:</h4><ul><li><p>Pre-defined <strong>automated trust rules</strong> (e.g., unique keys, effective dates).</p></li><li><p>Pre-defined <strong>user applied trust rules</strong> (e.g., not null, is number, is email, is date).</p></li><li><p><strong>User-specified trust rules</strong> (e.g., email domain validation, value between, regex, masking, formulas).</p></li></ul><h4>Steps:</h4><ol><li><p>A table is loaded or refreshed with new data.</p></li><li><p>Upon completion (triggered by PubSub), a <strong>series of data validation checks are run</strong> against the new data.</p></li><li><p><strong>Automated (baked-in) checks</strong> are executed: These are mandatory, cannot be turned off, and include fundamental validations like checking for unique business keys, a business effective date for every row, and non-nulls in strategic places.</p></li><li><p><strong>User-specified checks</strong> are executed: Users can define additional, custom validation rules for specific columns they care about (e.g., validating email formats or phone numbers using regex). The logic for these rules can often be leveraged from open-source libraries.</p></li><li><p>The <strong>results of all checks are collected and persisted</strong> into a data layer, treated like any other data.</p></li><li><p>These results are then <strong>surfaced to users</strong> through various channels such as an application interface, reports, or alerts (e.g., via Slack), with the severity of the failure dictating the notification method.</p></li><li><p>A <strong>history of all validation rules</strong> run against each table (down to a column level), including successes and failures, is maintained</p></li></ol><h4>Outputs:</h4><ul><li><p><strong>Validated data</strong> that is confirmed as "fit for purpose".</p></li><li><p>A <strong>clear, traceable history of data quality</strong> for every table and column.</p></li><li><p><strong>Early identification of data quality issues</strong>, allowing for immediate action and reducing downstream problems.</p></li><li><p><strong>Increased trust in the data</strong> for all users and stakeholders</p></li></ul><h3>Why It Works</h3><p>This pattern works by <strong>automating the foundational data quality checks</strong> that data engineers would otherwise have to manually implement for every new dataset. It removes the burden of remembering and applying these core rules, allowing the "machine to do it for me". By baking these rules in "by default and not as an exception after the fact," it ensures consistent quality from day one. The system tells the user if they've done "something dumb," providing immediate feedback and fostering correct behaviour without constant manual oversight.</p><p>Furthermore, the pattern works by <strong>optimising the validation process for cost and speed</strong>. By moving from external, expensive, or less controllable tools to an internally managed system, and by leveraging database features like partition pruning and clustered columns, checks can be performed more efficiently and cheaply. For instance, with a column-oriented storage engine like BigQuery, only the relevant column for a specific time window is scanned, significantly reducing compute and query costs. It turns code into a shared asset, not a fragile mess of files (this is for Git, but the underlying principle of making data a shared, trustworthy asset applies).</p><h3>Real-World Example</h3><p>Imagine a data engineer onboarding a new "history tile" of data. With Trust Rules, they don't have to manually write checks for every fundamental requirement. The system automatically performs checks to ensure there's a <strong>unique business key for every row</strong> and that the <strong>effective dates are correct</strong>, without the engineer having to worry about these basic validations.</p><p>Separately, if a user specifically cares about an "email" column within that data, they can define a <strong>user-specified trust rule</strong> to ensure that "all the values loaded into here are from a specific email domain" This check runs automatically when new data is loaded, and if any emails are invalid, the system persists the result and can send an alert, perhaps to Slack, notifying the team of the data quality issue.</p><h3>Anti-Patterns or Gotchas</h3><ul><li><p><strong>"Data quality on everything everywhere, every time"</strong>: Applying every possible data quality test to the entire table, regardless of necessity, leads to excessive cost and performance issues. Some data becomes immutable, meaning earlier checks don't need re-running over the full historical dataset. Smart partitioning and focused checks on the latest data window are crucial.</p></li><li><p><strong>The "Noise Problem"</strong>: Creating too many trust rules, especially for columns or conditions that don't genuinely impact data trustworthiness or stakeholder value. This can lead to a deluge of alerts that are ignored, causing truly important issues to be missed. If "nobody cared about that rule," then it shouldn't generate noise.</p></li><li><p><strong>Using external, expensive, or overly technical data quality products</strong>: While good for initial exploration, relying on external services that are difficult to maintain, run, or lack granular control (e.g., over partition pruning) can become a "technical" and "expensive" anti-pattern. Bringing the pattern "back into the core of our context plane and our execution patterns" allowed for greater control and cost-efficiency</p></li></ul><h3>Tips for Adoption</h3><ul><li><p><strong>Bake it in by default</strong>: Ensure that fundamental, non-negotiable data quality checks (like unique keys, non-nulls in critical fields) are <strong>automated and unchangeable</strong> from the outset.</p></li><li><p><strong>Provide user-defined flexibility</strong>: Allow users to easily specify additional validation rules for data columns they specifically care about, empowering them to ensure quality for their particular use cases.</p></li><li><p><strong>Persist and surface results</strong>: Store the results of every rule execution as data and make it easily accessible through reports, applications, or alerts, adjusting the notification method based on the severity of the failure.</p></li><li><p><strong>Optimize for cost and speed</strong>: Design the system to leverage underlying database capabilities (like partitioning and clustering) to scan only necessary data windows, reducing query costs and execution time.</p></li><li><p><strong>Prioritise value over quantity</strong>: Only apply trust rules where they truly add value and address a genuine concern about data trustworthiness, rather than implementing "data quality on everything everywhere, every time". This helps reduce noise and ensures that teams focus on critical issues.</p></li></ul><h3><strong>Related Patterns</strong></h3><ul><li><p><strong>PubSub</strong>: Used as the mechanism to trigger trust rule execution after a table refresh is complete.</p></li><li><p><strong>Load Patterns</strong>: Trust rule results data can be used to assess if data is arriving at the expected time, indicating missing expectation levels from the organisation.</p></li><li><p><strong>Context Plane and Execution Patterns</strong>: The trust rules themselves are defined within a "context plane" and are part of the broader "execution patterns".</p></li><li><p><strong>Data / Reporting Layer</strong>: The results of trust rule executions are persisted in a data layer and surfaced through a reporting layer<br> </p></li></ul><h2><strong>Press Release Template</strong></h2><h3>Capability Name</h3><p>Trust Rules</p><h3>Headline</h3><p>New Trust Rules Feature Ensures Data is Always Reliable and Ready for Use for Data Engineers and Information Consumers</p><h3>Introduction</h3><p>We're thrilled to announce the launch of <strong>Trust Rules</strong>, a powerful new capability designed to automatically validate your data. This feature provides immediate confidence in data quality for data engineers and business users alike, ensuring all data is <strong>"fit for purpose"</strong> and trustworthy from the moment it lands. It's about ensuring data is reliable by default, not by exception signed for anyone who needs to effortlessly stack data from multiple similar tables, such as those from multiple publishers in a data clean room, without having to write intricate or "horrible SQL". It delivers a "magical" experience by making data integration both easy and efficient</p><h3>Problem</h3><p>"As a data engineer, I used to have to constantly remember to build standard data validation checks for every new dataset, like making sure keys were unique or that effective dates were always present. It was a real pain, and if I missed something, it could cause big problems all the way downstream. The 'noise problem' from too many irrelevant alerts also made it hard to spot what truly mattered."</p><h3>Solution</h3><p>Trust Rules automates the foundational data validation process, making it easy and largely automated. </p><p>Whenever new data is loaded or a table is refreshed, the system automatically performs essential, <strong>baked-in checks</strong>, such as verifying <strong>unique business keys</strong> and the presence of a <strong>business effective date for every row</strong>. </p><p>Furthermore, users can easily define their own <strong>custom rules</strong> for specific columns, like ensuring all emails are valid or checking phone number formats, without needing to write complex code. </p><p>The results of these checks are collected, persisted, and then clearly surfaced through an application, reports, or alerts (e.g., via Slack), with the severity of the failure determining the notification method. </p><p>This not only ensures data integrity but also optimises the validation process for cost and speed by leveraging smart partitioning and clustered columns</p><h3>Data Platform Product Manager</h3><p>"With Trust Rules, we're taking away the 'bane of engineers' lives' by automating core data validation, making our data platform significantly more <strong>auditable</strong> and profoundly boosting <strong>trust</strong> in the data. This foundational capability means our teams can focus on delivering high-value work, knowing the data quality is inherently managed and consistently applied."</p><h3>Data Platform User</h3><p>"What I love about Trust Rules is that I don't have to constantly worry if the data is accurate; the system automatically tells me if I've done something 'dumb' before it becomes a problem. Now, when I load data, I just know the essential checks are handled, and I can easily add my own custom validations for what I care about&#8212;it's truly a game-changer for my confidence in the data&#8221;</p><h3>Get Started</h3><p>Trust Rules are <strong>baked into the platform by default</strong> for all core data, running automatically when new data arrives or a table is refreshed. To leverage user-defined trust rules for your specific data quality needs or to learn more about the automated validations, please consult the AgileData platform documentation or speak with your Data Platform Product Manager.</p><h2>AgileData App / Platform Example</h2><h3>Notifications Dashboard</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tWSy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b33b0ff-71c2-4b35-9bc1-a1c06baddea7_1866x1246.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tWSy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b33b0ff-71c2-4b35-9bc1-a1c06baddea7_1866x1246.png 424w, https://substackcdn.com/image/fetch/$s_!tWSy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b33b0ff-71c2-4b35-9bc1-a1c06baddea7_1866x1246.png 848w, https://substackcdn.com/image/fetch/$s_!tWSy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b33b0ff-71c2-4b35-9bc1-a1c06baddea7_1866x1246.png 1272w, https://substackcdn.com/image/fetch/$s_!tWSy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b33b0ff-71c2-4b35-9bc1-a1c06baddea7_1866x1246.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tWSy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b33b0ff-71c2-4b35-9bc1-a1c06baddea7_1866x1246.png" width="1456" height="972" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b33b0ff-71c2-4b35-9bc1-a1c06baddea7_1866x1246.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:972,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:200291,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/167964917?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b33b0ff-71c2-4b35-9bc1-a1c06baddea7_1866x1246.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tWSy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b33b0ff-71c2-4b35-9bc1-a1c06baddea7_1866x1246.png 424w, https://substackcdn.com/image/fetch/$s_!tWSy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b33b0ff-71c2-4b35-9bc1-a1c06baddea7_1866x1246.png 848w, https://substackcdn.com/image/fetch/$s_!tWSy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b33b0ff-71c2-4b35-9bc1-a1c06baddea7_1866x1246.png 1272w, https://substackcdn.com/image/fetch/$s_!tWSy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b33b0ff-71c2-4b35-9bc1-a1c06baddea7_1866x1246.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>Trust Rule results for a Tile</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Gi-X!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc340976f-d540-4b92-9003-f6a3514f7b9c_978x984.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Gi-X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc340976f-d540-4b92-9003-f6a3514f7b9c_978x984.png 424w, https://substackcdn.com/image/fetch/$s_!Gi-X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc340976f-d540-4b92-9003-f6a3514f7b9c_978x984.png 848w, https://substackcdn.com/image/fetch/$s_!Gi-X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc340976f-d540-4b92-9003-f6a3514f7b9c_978x984.png 1272w, https://substackcdn.com/image/fetch/$s_!Gi-X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc340976f-d540-4b92-9003-f6a3514f7b9c_978x984.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Gi-X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc340976f-d540-4b92-9003-f6a3514f7b9c_978x984.png" width="978" height="984" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c340976f-d540-4b92-9003-f6a3514f7b9c_978x984.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:984,&quot;width&quot;:978,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:126133,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/167964917?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc340976f-d540-4b92-9003-f6a3514f7b9c_978x984.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Gi-X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc340976f-d540-4b92-9003-f6a3514f7b9c_978x984.png 424w, https://substackcdn.com/image/fetch/$s_!Gi-X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc340976f-d540-4b92-9003-f6a3514f7b9c_978x984.png 848w, https://substackcdn.com/image/fetch/$s_!Gi-X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc340976f-d540-4b92-9003-f6a3514f7b9c_978x984.png 1272w, https://substackcdn.com/image/fetch/$s_!Gi-X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc340976f-d540-4b92-9003-f6a3514f7b9c_978x984.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>Apply Predefined Trust Rule</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YTf7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67188e37-7a9d-417b-acae-d022e2341bcf_1863x1247.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YTf7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67188e37-7a9d-417b-acae-d022e2341bcf_1863x1247.png 424w, https://substackcdn.com/image/fetch/$s_!YTf7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67188e37-7a9d-417b-acae-d022e2341bcf_1863x1247.png 848w, https://substackcdn.com/image/fetch/$s_!YTf7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67188e37-7a9d-417b-acae-d022e2341bcf_1863x1247.png 1272w, https://substackcdn.com/image/fetch/$s_!YTf7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67188e37-7a9d-417b-acae-d022e2341bcf_1863x1247.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YTf7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67188e37-7a9d-417b-acae-d022e2341bcf_1863x1247.png" width="1456" height="975" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67188e37-7a9d-417b-acae-d022e2341bcf_1863x1247.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:975,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:240600,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/167964917?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67188e37-7a9d-417b-acae-d022e2341bcf_1863x1247.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YTf7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67188e37-7a9d-417b-acae-d022e2341bcf_1863x1247.png 424w, https://substackcdn.com/image/fetch/$s_!YTf7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67188e37-7a9d-417b-acae-d022e2341bcf_1863x1247.png 848w, https://substackcdn.com/image/fetch/$s_!YTf7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67188e37-7a9d-417b-acae-d022e2341bcf_1863x1247.png 1272w, https://substackcdn.com/image/fetch/$s_!YTf7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67188e37-7a9d-417b-acae-d022e2341bcf_1863x1247.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>Create Custom Trust Rule</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L-b5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c0e7524-8f6d-44fe-8e54-59e422946d1f_1863x1247.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L-b5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c0e7524-8f6d-44fe-8e54-59e422946d1f_1863x1247.png 424w, https://substackcdn.com/image/fetch/$s_!L-b5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c0e7524-8f6d-44fe-8e54-59e422946d1f_1863x1247.png 848w, https://substackcdn.com/image/fetch/$s_!L-b5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c0e7524-8f6d-44fe-8e54-59e422946d1f_1863x1247.png 1272w, https://substackcdn.com/image/fetch/$s_!L-b5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c0e7524-8f6d-44fe-8e54-59e422946d1f_1863x1247.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L-b5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c0e7524-8f6d-44fe-8e54-59e422946d1f_1863x1247.png" width="1456" height="975" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8c0e7524-8f6d-44fe-8e54-59e422946d1f_1863x1247.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:975,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:275256,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/167964917?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c0e7524-8f6d-44fe-8e54-59e422946d1f_1863x1247.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L-b5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c0e7524-8f6d-44fe-8e54-59e422946d1f_1863x1247.png 424w, https://substackcdn.com/image/fetch/$s_!L-b5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c0e7524-8f6d-44fe-8e54-59e422946d1f_1863x1247.png 848w, https://substackcdn.com/image/fetch/$s_!L-b5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c0e7524-8f6d-44fe-8e54-59e422946d1f_1863x1247.png 1272w, https://substackcdn.com/image/fetch/$s_!L-b5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c0e7524-8f6d-44fe-8e54-59e422946d1f_1863x1247.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>AgileData Podcast Episode</h2><p><a href="https://podcast.agiledata.io/e/trust-rules-to-validate-your-data-agiledata-engineering-pattern-3-episode-69/">https://podcast.agiledata.io/e/trust-rules-to-validate-your-data-agiledata-engineering-pattern-3-episode-69/</a><br></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://podcast.agiledata.io/e/trust-rules-to-validate-your-data-agiledata-engineering-pattern-3-episode-69/&quot;,&quot;text&quot;:&quot;Listen to the Podcast Episode&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://podcast.agiledata.io/e/trust-rules-to-validate-your-data-agiledata-engineering-pattern-3-episode-69/"><span>Listen to the Podcast Episode</span></a></p><div id="youtube2-gic-1iC55VI" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;gic-1iC55VI&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/gic-1iC55VI?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>AgileData Podcast Mind Map</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EyTm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc7bcb7-8f0e-4a11-9ec6-535a17eb48c4_5712x8536.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EyTm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc7bcb7-8f0e-4a11-9ec6-535a17eb48c4_5712x8536.png 424w, https://substackcdn.com/image/fetch/$s_!EyTm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc7bcb7-8f0e-4a11-9ec6-535a17eb48c4_5712x8536.png 848w, https://substackcdn.com/image/fetch/$s_!EyTm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc7bcb7-8f0e-4a11-9ec6-535a17eb48c4_5712x8536.png 1272w, https://substackcdn.com/image/fetch/$s_!EyTm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc7bcb7-8f0e-4a11-9ec6-535a17eb48c4_5712x8536.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EyTm!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc7bcb7-8f0e-4a11-9ec6-535a17eb48c4_5712x8536.png" width="1200" height="1793.4065934065934" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1cc7bcb7-8f0e-4a11-9ec6-535a17eb48c4_5712x8536.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:2176,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:2131289,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/167964917?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc7bcb7-8f0e-4a11-9ec6-535a17eb48c4_5712x8536.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EyTm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc7bcb7-8f0e-4a11-9ec6-535a17eb48c4_5712x8536.png 424w, https://substackcdn.com/image/fetch/$s_!EyTm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc7bcb7-8f0e-4a11-9ec6-535a17eb48c4_5712x8536.png 848w, https://substackcdn.com/image/fetch/$s_!EyTm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc7bcb7-8f0e-4a11-9ec6-535a17eb48c4_5712x8536.png 1272w, https://substackcdn.com/image/fetch/$s_!EyTm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1cc7bcb7-8f0e-4a11-9ec6-535a17eb48c4_5712x8536.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>AgileData Podcast Episode Transcript</h2><p><strong>Shane</strong>: Welcome to the Agile Data Podcast. I'm Shane Gibson. And I'm Nigel Vining. Hey, Nigel, another episode of Agile Data Engineering Patterns. Today we are going to talk about trust rules at a high level this time because this one has patterns within patterns within patterns. So let's start from the beginning.</p><p><strong>Nigel</strong>: Cool. So trust rules, one of those bane of engineers lives, but it was something that we believed and passionately we believed it should be easy, largely automated and baked into the product from day one. Trust rules was something that we did by default and not as an exception after the fact. Trust rules effectively.</p><p>They're data validation rules, as I talked about on our previous bytes. Um, when a table is loaded, when new data turns up, we refresh a table. Then as I said, we use pub sub to say, Hey, I'm done. What do you want me to do now? So one of those things that we do off the end of a table refresh because new data's turned up, is basically to turn around and say, cool, new data's gone to the table.</p><p>We need to run data validation on that data. To check its fit for purpose. So what we then do is we turn around and we basically run a series of checks. Largely they're automated things like, are the business keys unique? Do we have a business effective date for every row? Those run every time, regardless.</p><p>You can't turn them off. You can't change them. They are baked in. 'cause if we have duplicate rows, then we've got a problem all the way downstream. The second level of checks are user specified. Ones a user may say. Hey, I really care about this email column. I need to make sure all the values loaded into here are valid emails.</p><p>So the user will effectively say, Hey, run an email validation check when this table is loaded. Once we've run those checks, we basically grab the results. We persist them. Uh. Reporting report to notify users if any of them fail and how many of them fail and we keep track of them so we can see the history for every table down to a column level, what validation rules have been run against it, and all the successes and passes.</p><p>That's in a nutshell. </p><p><strong>Shane</strong>: So effectively we define the the trust rules which sit in our context claim. We then execute them when they need to be executed. We store the results of every execution as if it's data, and then we surface that data, those results, we call them in any way we want to. Maybe in the app, maybe a report, maybe an alert coming out via Slack, depending on the severity of that failure.</p><p>It's interesting, the name trust rules, you often hear of them called data quality, data validation. Why did we call it trust? Actually, these are the things that help people trust the data they're seeing. So that's what we wanna do. We wanna make sure it's trustworthy. And that's why we picked trust rules as a name.</p><p>And then the other thing is if you're a data engineer, there's a bunch of automated tests you always have to build, like you said, are the keys unique? Do we have duplicate values? All those kind of things. And as part of the trust rules patent, we just automated them as the user of the system. I don't have to care.</p><p>If I do something dumb, the system tells me I've done something dumb and it stops me. I don't actually have to keep remembering that there's a core set of things that I need to do because the machine does it for me, and I think that's part of it, is that I don't need to do the effort. I don't need to remember to do it.</p><p>It just. Gives me a, a lovely message when I'm doing the wrong thing and I change what I do. So I think that was actually one of the biggest wins for us when we implemented this patent. </p><p><strong>Nigel</strong>: Yeah, exactly. The built in rules, by default, it's fantastic because we can onboard new data, you can chuck it in, you can tick a few boxes and get that pipeline basically productionized.</p><p>Within an hour or two. And then you don't have to worry about, you don't have to go, oh, I need to make sure that's unique. Oh, I need to make sure that's not now. Or I need to make sure that's a proper sequence. Because effectively we say, Shane has just onboarded a history tile. These are the things that a history tile has to have checked to be safe.</p><p>And we do those checks, and it doesn't actually matter what data's going into that tile because we still need to make sure we've got a. Unique business key for those rows. We need to make sure that the effect of those rows is correct and we're checking for nails in some strategic places. So all of those things, there is no context, none have to worry about.</p><p>It's not afterthought, it does it automatically. So you just load your data. Use it, the checks being taken care of for you. </p><p><strong>Shane</strong>: And then there's some things I do care about. So then I use the user to find trust rules and I can create, is it an email address? Is it a phone number? All those re rejects field to make sure it's got one.</p><p>I need all those kind of, not custom, but those commonly applied ones. But I don't want them applied every time. And one of the keys that we found was. That the logic for those user defined trust rules, effectively the rejects or the masking or the formula to say is this thing, this thing. It is readily available out there.</p><p>There's a large number of open source products where basically you can grab that library of data quality rules, and you can apply it yourself. You don't actually have to reinvent the wheel for that one, but from memory, in the beginning, we did actually use a Google Cloud service. To define and act, build and execute all the data quality checks.</p><p>And then a while ago, I think it was a year or two ago, we actually bought it back into the core of our context plane and our execution patterns. Why was that? </p><p><strong>Nigel</strong>: Yeah, that's right. We started out using an open source product that Google wrapped. It was basically a data quality. Engine defined by the YAML files, which described all the tests, which were basically the sequel that would be run, and that was pointed at tables and it executed those tests and it produced the results and stuff.</p><p>And that sort of got us up and running and gave us a flavor of how these things work. But it became apparent that it was quite technical to maintain it. And run it. And we wanted to pull basically trust rules back into the app and make it as simple as ticking a box rather than have to configure the separate product outside.</p><p>The second thing was it was quite expensive, relatively a little bit more expensive. With the way the validations were run. We had a little bit less control over. Partition pruning and using clustered columns. So effectively we took the Pattern, the open source Pattern, and we basically wrote our own version of it.</p><p>We followed some of it, we wrote our own, and effectively we sped up the tests. We can do a lot more tests and less time 'cause there was less building on the fly happening. And they were tended to be cheaper because we were doing a lot. Smarter petitioning pruning because we knew the watermark of the data we just loaded so we effectively could pass.</p><p>Hey, we've just loaded 14 days worth of data. You don't need to check prior to that 14 days 'cause we've already done that. So we did a whole lot of smart stuff to just speed up those tests and make them cheaper. And that's effectively what we've had now for, oh, it's probably coming up two years. It's quite a while.</p><p><strong>Shane</strong>: I think that's probably the core anti-patent that we see, isn't it, is a data quality on everything everywhere, every time. So that idea that you have to scan the full table for every data quality test is an anti-patent because there are certain tests where the data becomes immutable. That test will never change.</p><p>So you can run it, store it. And then there are other tests where, for example, you want to check how many distinct values there are in a particular column. Then you are going to have to table scan or do something smart and to reduce the cost of running that. </p><p><strong>Nigel</strong>: Yeah, and that's what I mean by we know those because we have architected the tables themselves.</p><p>We know which tables are partitioned and those are the ones we will partition scan for the data. We know which ones are clustered, so they're the ones typically where we're gonna look at. Every key, like a concept table of customer IDs, because we would automatically cluster the customer ID to do a is unique check on that clustered column is basically a freebie because under the covers, the database knows that column is unique.</p><p>We don't have to do a lot of work to do that. So on a concept, we only care that the key is.</p><p>We are scanning a window of data in multiple partitions because we're looking at actual attributes, like you said, email addresses, but we don't need to look at the email addresses from last month because we've already checked those. We only need to look at the email addresses that have turned up today, yesterday in the current window, and we can do that very cheaply.</p><p>'cause basically we are saying just grab the email address column for. Two days, three days, and check them. And it's a tiny shard of data and it works really </p><p><strong>Shane</strong>: well. I think that's one of the benefits of using a column or storage engine engine under the covers like BigQuery because we just grab one column outta the entire table and records for a small period of time.</p><p>And therefore the compute cost, the query cost is, is tiny. And the other one is when you do the everything everywhere, every time is the noise problem. And when I first started off, because it was so easy for me to create trust rules, I would add them everywhere and never look at them, and then I'd start getting massive amount of alerts because actually a column I really didn't care about that much, or we didn't need to validate for any trust reasons, would just keep giving me constant problems.</p><p>It wasn't that the data was untrustworthy, it was that the data didn't quite match that rule, but nobody cared about that rule. Nobody was gonna go fix it in the source system. And so that noise to signal ratio is always a problem. And we can go through lots of examples in more detail around how we do trust rules, but.</p><p>One of the trust rules we'll talk about in the future is the idea of the load patterns. So using the load patterns for tables to determine when we may be missing the expectation level from the organization of when data should turn up. So again, you can use this trust rule results data to solve many problems, but the key is I only put it on where it has value.</p><p>Otherwise, you spend money where you don't need to. And you've gotta go manage thousands of messages and you're just gonna ignore them and miss the ones you really care about. </p><p><strong>Nigel</strong>: Exactly. Yep. Keep it simple. </p><p><strong>Shane</strong>: Yeah. Reduce that complexity. All right. That's another agile data engineering Pattern in the can. I hope everybody has a simply magical day.</p>]]></content:encoded></item><item><title><![CDATA[Union two or more tables together automatically, AgileData Engineering Pattern #2]]></title><description><![CDATA[The Automated Table Unioning pattern automatically combines two or more tables by intelligently looking up column names and data types to generate safe SQL under the covers]]></description><link>https://agiledata.info/p/union-two-or-more-tables-together</link><guid isPermaLink="false">https://agiledata.info/p/union-two-or-more-tables-together</guid><dc:creator><![CDATA[Shagility]]></dc:creator><pubDate>Thu, 03 Jul 2025 23:27:36 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1ee59d0a-164c-483c-85d8-454a808517aa_1456x259.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1><strong>Automated Table Unioning</strong></h1><h2><strong>Quicklinks</strong></h2><blockquote><p><strong><a href="https://agiledata.substack.com/i/167481120/pattern-description">Description</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167481120/pattern-context-diagram">Context Diagram</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167481120/agiledata-pattern-template">Pattern Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167481120/press-release-template">Press Release Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167481120/agiledata-app-platform-example">AgileData App / Platform Example</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167481120/agiledata-podcast-episode">AgileData Podcast Episode</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167481120/agiledata-podcast-mind-map">AgileData Podcast Mind Map</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167481120/agiledata-podcast-episode-transcript">AgileData Podcast Transcript</a></strong></p></blockquote><h2><strong>Agile Data Engineering Pattern</strong></h2><p>An AgileData Engineering Pattern is a repeatable, proven approach for solving a common data engineering challenge in a simple, consistent, and scalable way, designed to reduce rework, speed up delivery, and embed quality by default.</p><h2><strong>Pattern Description</strong></h2><p>The <strong>Automated Table Unioning pattern</strong> automatically combines two or more tables by <strong>intelligently looking up column names and data types</strong> to <strong>generate safe SQL under the covers</strong>. It supports <strong>disparate data sources</strong>, such as those from multiple publishers in a data clean room, and creates a <strong>view</strong> or <strong>incrementally loads the unified data into a physical table</strong> while tracking load watermarks to prevent duplicates.</p><h2><strong>Pattern Context Diagram</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zHce!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e8d90f-e9b6-4bf1-b464-68f1436f60d0_2123x361.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zHce!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e8d90f-e9b6-4bf1-b464-68f1436f60d0_2123x361.png 424w, https://substackcdn.com/image/fetch/$s_!zHce!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e8d90f-e9b6-4bf1-b464-68f1436f60d0_2123x361.png 848w, https://substackcdn.com/image/fetch/$s_!zHce!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e8d90f-e9b6-4bf1-b464-68f1436f60d0_2123x361.png 1272w, https://substackcdn.com/image/fetch/$s_!zHce!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e8d90f-e9b6-4bf1-b464-68f1436f60d0_2123x361.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zHce!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e8d90f-e9b6-4bf1-b464-68f1436f60d0_2123x361.png" width="1200" height="204.3956043956044" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37e8d90f-e9b6-4bf1-b464-68f1436f60d0_2123x361.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:248,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:96335,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/167481120?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e8d90f-e9b6-4bf1-b464-68f1436f60d0_2123x361.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zHce!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e8d90f-e9b6-4bf1-b464-68f1436f60d0_2123x361.png 424w, https://substackcdn.com/image/fetch/$s_!zHce!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e8d90f-e9b6-4bf1-b464-68f1436f60d0_2123x361.png 848w, https://substackcdn.com/image/fetch/$s_!zHce!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e8d90f-e9b6-4bf1-b464-68f1436f60d0_2123x361.png 1272w, https://substackcdn.com/image/fetch/$s_!zHce!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37e8d90f-e9b6-4bf1-b464-68f1436f60d0_2123x361.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p></p><p></p><h2><strong>Pattern Template</strong></h2><h3>Pattern Name</h3><p><strong>Automated Table Unioning</strong> (or Union Tables Pattern)</p><h3>The Problem It Solves</h3><p>You know that moment when you need to combine data from multiple sources into one cohesive view, but the thought of writing complex SQL for unioning gives you a headache?<strong> </strong>Especially when dealing with tables that have hundreds of columns and potentially differing column names or data types, manually handling the complexity is a huge mission.<strong> </strong> This pattern solves the problem of needing to "stack together" multiple tables automatically, without requiring users to write "horrible SQL".<strong> </strong> It streamlines the process of consolidating data from various origins into a single, unified view that end customers can easily consume.</p><h3>When to Use It</h3><p>Use this pattern when:</p><ul><li><p>You need to <strong>combine two or more tables</strong> to create a single, unified dataset.</p></li><li><p>The <strong>complexity of managing column names and data types</strong> during the unioning process is too high for manual SQL.</p></li><li><p>You are receiving <strong>data from multiple disparate sources</strong> (e.g., different publishers in a data clean room) that needs to be consolidated for a single customer view.</p></li><li><p>The data coming from these different sources is <strong>"roughly similar"</strong> in its underlying structure or intent.</p></li><li><p>You want to <strong>avoid writing extensive, repetitive, and potentially error-prone SQL</strong> for union operations.</p></li><li><p>Input tables have a <strong>large number of columns</strong> (e.g., over a hundred) making manual column mapping or selection impractical.</p></li></ul><h3>How It Works</h3><h4>Trigger: </h4><p>The pattern is initiated when:</p><ul><li><p>The need to combine multiple tables from various sources into a single, comprehensive table or view.</p></li></ul><h4>Inputs:</h4><ul><li><p>Two or more tables that contain related or similar data.</p></li><li><p>These tables might originate from different "tenancies" or "projects".</p></li></ul><h4>Steps:</h4><ol><li><p><strong>User Selection</strong>: The user selects a "driving table" and then specifies other tables to union with it.</p></li><li><p><strong>Metadata Lookup</strong>: Under the covers, the system performs a comprehensive dictionary lookup of the data types and column names across all selected tables<strong>.</strong> This lookup can extend to tables in other projects if shared tables are used, ensuring the most up-to-date metadata is acquired.</p></li><li><p><strong>Safe SQL Generation</strong>: The system then intelligently works out the "safe SQL" required to union the tables, ensuring that column names and data types are correctly matched and aligned.</p></li><li><p><strong>Robust Column Matching</strong>: The logic for matching columns is robust, handling inconsistencies like mixed case or reversed column names, making it "a whole lot smarter" than a simplistic string match.</p></li><li><p><strong>SQL Execution &amp; Output</strong>: The generated SQL is produced and executed, loading data into downstream "consumption tables".</p></li><li><p><strong>Physicalisation &amp; Incremental Loading</strong>: While the first version created a view, the pattern was iterated to "physicalisation," where it creates a new physical table. This table is then incrementally loaded daily, keeping track of "load watermarks" for each source table and safely appending new data without inserting duplicates.</p></li><li><p><strong>Column Exclusion Choice</strong>: The user maintains the choice to exclude specific columns that they do not wish to bring through to the final unioned table.</p></li></ol><h4>Outputs:</h4><ul><li><p>A single, combined, and incrementally loaded table that stacks data from multiple tables, ready for consumption.</p></li></ul><h3>Why It Works</h3><p>This pattern works like a smart, automated data assembler. Instead of you manually figuring out how to fit different pieces (tables) together, it automatically assesses their shapes (column names) and materials (data types), then precisely connects them. It's akin to having a super-efficient robot manage all the complex plumbing behind the scenes, ensuring data lines up perfectly without any human intervention in the intricate SQL details.</p><p>It "just works," providing a "magical" experience by automating a traditionally time-consuming and error-prone task<strong>. </strong>By automatically handling schema alignment and ensuring safe, duplicate-free incremental loads, it significantly reduces manual effort and improves data consistency, ultimately saving "hours" of development time<strong>. </strong>The iteration to physicalisation also makes it highly cost-effective for large volumes of data.</p><h3>Real-World Example</h3><p>Imagine running a <strong>data clean room</strong> where data arrives from five different publishers, each in their own private data tenancy. Each publisher's data might be in a separate table, and after cleaning and anonymising sensitive information, you need to combine all these 5 or even 10 tables into a <strong>single, unified table</strong> for your end customer to report off. Instead of writing a massive, complex SQL query to union these tables&#8212;especially if each table has 100 or more columns&#8212;the Automated Table Unioning pattern allows you to simply say, "pick this table and union it," and the system handles all the underlying complexity automatically. The result is one easily consumable table for the customer, without the need for manual SQL coding or tedious column mapping</p><h3>Anti-Patterns or Gotchas</h3><ul><li><p><strong>Unioning Unrelated Data</strong>: The primary anti-pattern is using this feature to combine tables that have <strong>no inherent relationship or similar data intent</strong>. While the pattern will technically run and produce valid SQL, if you "grab three random tables and union the hell out of it," the resulting table will likely be "90% null cells" because the columns won't match, and it won't meet any meaningful reporting or analysis purpose. It will do "the job, but probably not the job that you wanted to do".</p></li><li><p><strong>Lack of Human Oversight</strong>: Relying purely on automation without a human in the loop to understand the underlying data relationships and intent can lead to meaningless outputs.</p></li></ul><h3>Tips for Adoption</h3><ul><li><p><strong>Build When Needed</strong>: This pattern was developed out of a direct need when existing methods were too difficult or required automation. Adopt it strategically when a clear problem of complex unioning arises.</p></li><li><p><strong>Iterate and Evolve</strong>: Be prepared to iterate on the pattern based on performance, cost, and changes in other related patterns. For example, the pattern evolved from creating views to physicalising tables for cost efficiency, and its column matching logic was enhanced to accommodate changes in automated field naming conventions or shared data practices</p></li><li><p><strong>Understand Data Relationship</strong>: Ensure the data you are about to union is genuinely "similar" and has a logical relationship to meet your reporting intent.</p></li><li><p><strong>Leverage Exclusion Choices</strong>: Utilise the ability to selectively exclude columns you don't need, which helps manage the width of the final table, especially with <strong>inputs having many columns.</strong></p></li></ul><h3><strong>Related Patterns</strong></h3><ul><li><p><strong>Data Ops Platform</strong>: The Automated Table Unioning pattern is an overarching feature within the broader data operations platform.</p></li><li><p><strong>Shared Tiles</strong>: The pattern had to be iterated to work with "shared tiles," where data is safely shared across different tenancies or projects.</p></li><li><p><strong>Automated Field Naming Convention</strong>: Changes to this convention necessitated updates to the unioning pattern's column matching logic.</p></li><li><p><strong>Append Pattern</strong>: The incremental loading aspect of the union pattern leverages an "append pattern" to safely add new data without inserting duplicates.<br> </p></li></ul><h2><strong>Press Release Template</strong></h2><h3>Capability Name</h3><p>Automated Table Unioning</p><h3>Headline</h3><p>New Automated Table Unioning Feature Simplifies Data Consolidation for Users, Eliminating Complex SQL</p><h3>Introduction</h3><p>We're thrilled to announce the launch of our new Automated Table Unioning capability, a powerful feature that allows you to automatically stack data from two or more tables into a single, unified table. This is designed for anyone who needs to effortlessly stack data from multiple similar tables, such as those from multiple publishers in a data clean room, without having to write intricate or "horrible SQL". It delivers a "magical" experience by making data integration both easy and efficient</p><h3>Problem</h3><p>"As a data analyst, I constantly needed to bring together data from different sources that could have over a hundred columns and inconsistent naming conventions. I hated writing pages of really horrible SQL just to union tables, especially when dealing with data from multiple publishers in our data clean room that needed to be stacked together into one single table. It was a huge mission that wasted hours of my time trying to manually map columns</p><h3>Solution</h3><p>The Automated Table Unioning feature streamlines this process by allowing you to simply select a "driving table" and then specify other tables to union with it. </p><p>Under the covers, the system performs a comprehensive dictionary lookup of data types and column names across all selected tables, automatically generating the "safe SQL" required for the union. </p><p>It robustly handles inconsistencies like mixed case or reversed column names, making the matching "a whole lot smarter"<strong>.</strong> </p><p>You also have the choice to exclude specific columns you don't wish to bring through to the final unioned table. </p><p>This capability also incrementally loads the final output into a physical table, keeping track of "load watermarks" for each source table and safely appending new data without inserting duplicates, ensuring both cost-effectiveness and up-to-date information detected, generally taking care of 99% of common issues without manual intervention.</p><h3>Data Platform Product Manager</h3><p>"This capability is a game-changer for our data platform, significantly enhancing maintainability and trust by automating complex data integration tasks that previously required substantial manual effort. It has proven to be incredibly cost-effective at scale and highly adaptable to evolving data patterns and shared data practices across tenancies."</p><h3>Data Platform User</h3><p>"This feature is truly magical! I can just pick my tables and say 'union this,' and it just works, saving me hours of tedious SQL writing and column mapping. I get one clean, combined table exactly how I need it, without the headache, allowing me to focus on analysis rather than data preparation&#8221;</p><h3>Get Started</h3><p>The Automated Table Unioning capability is available today within the  AgileData platform. Simply select your desired tables and apply the union rule step to effortlessly stack your data. Talk to your data platform product manager for more details and to access this powerful feature for your data.</p><h2>AgileData App / Platform Example</h2><h3>Context Set via Change Rule UI</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Eho!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4023a9e2-bfb5-4df0-966f-d42342241b76_1873x1132.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Eho!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4023a9e2-bfb5-4df0-966f-d42342241b76_1873x1132.png 424w, https://substackcdn.com/image/fetch/$s_!4Eho!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4023a9e2-bfb5-4df0-966f-d42342241b76_1873x1132.png 848w, https://substackcdn.com/image/fetch/$s_!4Eho!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4023a9e2-bfb5-4df0-966f-d42342241b76_1873x1132.png 1272w, https://substackcdn.com/image/fetch/$s_!4Eho!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4023a9e2-bfb5-4df0-966f-d42342241b76_1873x1132.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Eho!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4023a9e2-bfb5-4df0-966f-d42342241b76_1873x1132.png" width="1456" height="880" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4023a9e2-bfb5-4df0-966f-d42342241b76_1873x1132.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:155360,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/167481120?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4023a9e2-bfb5-4df0-966f-d42342241b76_1873x1132.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4Eho!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4023a9e2-bfb5-4df0-966f-d42342241b76_1873x1132.png 424w, https://substackcdn.com/image/fetch/$s_!4Eho!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4023a9e2-bfb5-4df0-966f-d42342241b76_1873x1132.png 848w, https://substackcdn.com/image/fetch/$s_!4Eho!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4023a9e2-bfb5-4df0-966f-d42342241b76_1873x1132.png 1272w, https://substackcdn.com/image/fetch/$s_!4Eho!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4023a9e2-bfb5-4df0-966f-d42342241b76_1873x1132.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>AgileData Sudo Code Generated</h3><pre><code>GIVEN new or updated records in the consume.adt_products input tile

UNION the consume.dbt_products tile

UNION the consume.xyz_products tile

AND a business effective date of TIMESTAMP(effective_date)

THEN populate the all_products consume tile

USING the replace pattern</code></pre><h2>AgileData Podcast Episode</h2><p><a href="https://agiledata.podbean.com/e/orchestrating-dynamic-data-flows-agiledata-engineering-pattern-1-episode-67-1751583778/">https://agiledata.podbean.com/e/orchestrating-dynamic-data-flows-agiledata-engineering-pattern-1-episode-67-1751583778/</a><br></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://agiledata.podbean.com/e/orchestrating-dynamic-data-flows-agiledata-engineering-pattern-1-episode-67-1751583778/&quot;,&quot;text&quot;:&quot;Listen to the Podcast Episode&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://agiledata.podbean.com/e/orchestrating-dynamic-data-flows-agiledata-engineering-pattern-1-episode-67-1751583778/"><span>Listen to the Podcast Episode</span></a></p><div id="youtube2-vqFNUIdc73c" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;vqFNUIdc73c&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/vqFNUIdc73c?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>AgileData Podcast Mind Map</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZFl4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad10537-412f-4d51-91f1-009e4a5ee0f8_4949x9036.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZFl4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad10537-412f-4d51-91f1-009e4a5ee0f8_4949x9036.png 424w, https://substackcdn.com/image/fetch/$s_!ZFl4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad10537-412f-4d51-91f1-009e4a5ee0f8_4949x9036.png 848w, https://substackcdn.com/image/fetch/$s_!ZFl4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad10537-412f-4d51-91f1-009e4a5ee0f8_4949x9036.png 1272w, https://substackcdn.com/image/fetch/$s_!ZFl4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad10537-412f-4d51-91f1-009e4a5ee0f8_4949x9036.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZFl4!,w_2400,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad10537-412f-4d51-91f1-009e4a5ee0f8_4949x9036.png" width="1200" height="2190.6593406593406" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ad10537-412f-4d51-91f1-009e4a5ee0f8_4949x9036.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;large&quot;,&quot;height&quot;:2658,&quot;width&quot;:1456,&quot;resizeWidth&quot;:1200,&quot;bytes&quot;:2260249,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://agiledata.substack.com/i/167481120?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad10537-412f-4d51-91f1-009e4a5ee0f8_4949x9036.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:&quot;center&quot;,&quot;offset&quot;:false}" class="sizing-large" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZFl4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad10537-412f-4d51-91f1-009e4a5ee0f8_4949x9036.png 424w, https://substackcdn.com/image/fetch/$s_!ZFl4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad10537-412f-4d51-91f1-009e4a5ee0f8_4949x9036.png 848w, https://substackcdn.com/image/fetch/$s_!ZFl4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad10537-412f-4d51-91f1-009e4a5ee0f8_4949x9036.png 1272w, https://substackcdn.com/image/fetch/$s_!ZFl4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad10537-412f-4d51-91f1-009e4a5ee0f8_4949x9036.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>AgileData Podcast Episode Transcript</h2><p><strong>Shane:</strong> Welcome to the Agile Data Podcast. I'm Shane Gibson. </p><p><strong>Nigel:</strong> And I'm Nigel Vining. </p><p><strong>Shane:</strong> Hey, Nigel. We're onto our second agile data engineering Pattern so today we're gonna talk about union two or more tables together automatically. Or automatically. Take me away with what it is and why I cared. </p><p><strong>Nigel:</strong> Cool, thanks Shane. This one was topical this week because Snowflake did an announcement around, basically this very feature, unioning tables together and taking care of the complexity of the covers, which I thought was funny because it's something that crossed our paths going back about four years now, we needed the ability to union two tables together, or more, and have the complexity of this taken care of under the covers. Now, the complexity is that when you union two tables together, you need to take into account the column names, and the data types. To basically line all the data up correctly, and usually this is handled in the SQL, but if you have a end user like Shane, working out that SQL's a step too far, it's quite a mission sometimes, considering that some of the tables may have, you know, 100 columns in them and have some different data types. </p><p>So we come up with this pattern called basically union tables. and it's a rule. So let's Shane pick a driving table and then effectively pick another table and say union this. Pick another table if he wants and say union this. Under the covers we do a basically a big dictionary lookup of the data types and the columns in those tables. </p><p>And then we work out what the safe SQL is to union them together. Basically, make sure the columns match up and make sure the data types match up. And then we produce the SQL and it runs and it does all the normal stuff that loads out downstream consumption tiles. So that's it in a nutshell. </p><p><strong>Shane:</strong> And, I just, I was just looking at this we actually built it 14th of June, 2022. We released it. So the interesting thing for me is we built that feature way back then because we needed it. It comes back to this, one of the overarching patterns of the data ops platform is build it when you need it. </p><p>When something is too hard or needs to be automated, then we build it, we use it. If we don't need it, we don't build it until we need it. So for me, what this allowed me to do was grab multiple tables, tiles in our language and to slam them together. And then I could come up with one view or magically off that and just use it. </p><p>And the reason we needed to do that was we were running data clean room. So we'd have data coming in from multiple publishers. Each one going into their own private tenancy. And then once we cleaned all the data and hidden the stuff that we needed to hide from our customer we then needed to slam all those tables together. </p><p>So we'd bring in, 2, 5, 10 tables, one from each publisher, and I needed to combine them to be a single table. And I didn't want to write a whole lot of really horrible sequel to do that. So this Pattern effectively allowed me to say, pick this table, and union it. </p><p>And then all the complexity is done under the covers. </p><p><strong>Nigel:</strong> Yeah, exactly right. In the end it turned out to be quite a versatile pattern for that Because yes, we have data arriving from different rooms. As you said. And then effectively you just want to put it all together because the end customer just wants one view across all that data. </p><p>So I've already got a consistent format. So. Basically, as long as all the data comes up from the cleanrooms, roughly similar, the union pattern takes care of it and safely puts them together and incrementally loads that table every day because the second part of the pattern, the covers, is it's keeping track of the Effectively the load watermarks for each of those tables in the consume and when new data turns up basically a big append pattern but done in a safe way where we check we never inserting duplicates. </p><p>Yeah, it just works. It's it's actually a quite a magical one. </p><p><strong>Shane:</strong> I was just reading our release notes. We actually iterated it to do that. So the first version of that patent that we did, we actually created a view. Data was stored in the original tables, and we created a view across the top as we got to a large volume of data. </p><p>Where BigQuery still ran it really fast, but the cost went up. Then we moved to physicalization, right? We changed it. That actually creates a table at the end and you incrementally load that. So again, we iterated the Pattern as we found something that didn't quite work. Off version one. </p><p><strong>Nigel:</strong> Yeah, yeah, that's that's exactly right, the cost went up and so we the engineering a little bit and the cost came way down now and now that's quite a cheap pad to run because we only incrementally add a few partitions into that table each day, which is really effective. Okay. </p><p><strong>Shane:</strong> the second thing we did is as we changed some of our other patterns, we had to iterate that union Pattern. So for example, when we started using shared tiles, so instead of data moving across tenancies, we shared data between tenancies in a safe way. , You had to iterate that Pattern, </p><p>And then the other one was we had a automated field naming convention for our consumed tiles. And we decided that actually the readability of those columns, those field names, was not best. </p><p>So we flipped it. And actually changed the way we dynamically generated those field names. So again, when we did that, you had to add another set of logic to the union rule so that it could actually pick up both field formatting </p><p><strong>Nigel:</strong> yes, so there are two things that went on. As you see, the first one was we were unioning in tables from other projects, so we needed to extend the metadata lookup wider into the project that that table was coming from. So we had the most up -to -date list of Column names and data types, so we would pull them straight off the source project and then the second one is. </p><p>We changed some of our naming for the tiles going in so which made our the first version was very simplistic match was basically if the end of the string matches then it was the same column name so we made that a lot more robust we rejected that and we basically made sure that the name the names could be a little bit inconsistent with mixed case we Reverse the column names and it would still find the correct match because effectively we made a whole lot smarter. </p><p>And that's the version now we've been running probably about 18 months nearly, I believe. And that's still got legs, so we'll just keep running that as is. </p><p><strong>Shane:</strong> And the other thing you mentioned is some of the input tables we use to union have over a hundred columns because in our consume layer. We effectively use one big table as much as we can. So I'm literally grabbing five tables, each of a hundred columns, saying to a union, it's working out where those columns are the same rack and stacks that data. </p><p>And then when the columns are different, it treats that as a separate column. Then I can choose which columns I don't wanna bring through. So I may say don't do these ones because effectively I don't want, a table with 300 columns when I'm racking and stacking it. But again, I get that choice when I use that rule. </p><p>So that's really powerful for me. I don't have to actually go through and do a whole lot of colon mapping. The columns are the same. It picks it up. , But I can actually then exclude the ones that I'm like, yeah, I don't need that this time. This one probably saves me hours actually, because I just don't need to care. </p><p>\It just does what I want. </p><p>So that's the Pattern that we use. When wouldn't you use this Automated Unioning Pattern.?</p><p>I think for me one of the ones would be, I know that the data that I'm about to union is similar. Because I know where it comes from. I know what we've done to it. I think if you grab two random tables that had no relationship of the data and try to union them without any. Human being involved to look at it. </p><p>So just grab three random tables and union the hell out of it. It'll run, it'll create a consumed table that's actually the union and all of it. But I'm not sure it'll meet the intent of what I would actually need to report off. So I think for me, that's probably an anti-patent, that it just does it without a human being involved in that loop. </p><p><strong>Nigel:</strong> Yeah, it definitely allows you to do exactly that scenario, put three things that aren't related together, it'll definitely union them, it'll run, it'll produce valid SQL, 90 percent of that produced table will be, No, because they don't match. </p><p><strong>Shane:</strong> think that's the only one I can think of. The table versus view that was performance. It's not really an, it's just something we need to do to that. And then the name of the columns for helping with the automated matching, they're just sub patterns of the patent itself. </p><p>So that's the main one is the ability to use it to slam and rack and stack tables that actually aren't related at all and have had no shared data. It'll run it'll do the job, but probably not the job that you wanted to do. Alright, I think we got that one done. I hope everybody has a simply magical day.</p>]]></content:encoded></item><item><title><![CDATA[Orchestrating Dynamic Data Flows, AgileData Engineering Pattern #1]]></title><description><![CDATA[The Dynamic Data Flow Orchestration pattern dynamically generates and self-heals data flow manifests (DAGs) at runtime from a central context repository]]></description><link>https://agiledata.info/p/orchestrating-dynamic-data-flows</link><guid isPermaLink="false">https://agiledata.info/p/orchestrating-dynamic-data-flows</guid><dc:creator><![CDATA[Shagility]]></dc:creator><pubDate>Mon, 30 Jun 2025 04:13:16 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/29861a8b-9a2a-40c9-9077-6498d1258949_1618x1329.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Orchestrating Dynamic Data Flows</h1><h2>Quicklinks</h2><blockquote><p><strong><a href="https://agiledata.substack.com/i/167144707/pattern-description">Description</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167144707/pattern-context-diagram">Context Diagram</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167144707/agiledata-pattern-template">Pattern Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167144707/press-release-template">Press Release Template</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167144707/agiledata-app-platform-example">AgileData App / Platform Example</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167144707/agiledata-podcast-episode">AgileData Podcast Episode</a></strong></p><p><strong><a href="https://agiledata.substack.com/i/167144707/agiledata-podcast-episode-transcript">AgileData Podcast Transcript</a></strong></p></blockquote><p></p><h2><strong>Agile Data Engineering Pattern</strong></h2><p>An AgileData Engineering Pattern is a repeatable, proven approach for solving a common data engineering challenge in a simple, consistent, and scalable way, designed to reduce rework, speed up delivery, and embed quality by default.</p><h2><strong>Pattern Description</strong></h2><p>The Dynamic Data Flow Orchestration pattern <strong>dynamically generates and self-heals data flow manifests (DAGs) at runtime from a central context repository</strong>, enabling adaptive, cost-effective, and low-latency data processing primarily for micro-batching workloads</p><h2><strong>Pattern Context Diagram</strong></h2><p>&#171;TBD&#187;<br></p><h2><strong>Pattern Template</strong></h2><h3>Pattern Name</h3><p><strong>Dynamic Data Flow Orchestration</strong></p><h3>The Problem It Solves</h3><p>You know that moment when you've updated a data rule or schema, only to find your data pipelines have broken or are out of sync, forcing hours of manual fixes? Or perhaps you're struggling with inconsistent DAG deployments, especially when multiple engineers are making changes, leading to "drift" in what should be running? This pattern addresses the recurring challenge of maintaining data flows that are constantly up-to-date, reliable, and cost-effective, without constant manual intervention or the "sinking problem" between rules and physical DAGs.</p><h3>When to Use It</h3><p>Use this pattern when:</p><ul><li><p>You need data pipelines to <strong>self-heal</strong> and dynamically adapt to changes in rules, schemas, or dependencies.</p></li><li><p>You require your data lineage and manifests to be <strong>always up-to-date</strong> at runtime.</p></li><li><p>You are primarily working with <strong>microbatching</strong> workloads, where refresh latency is typically 15 minutes or more.</p></li><li><p>Your pipelines are often <strong>event-driven</strong>, starting when new data or files arrive.</p></li><li><p>You want to achieve <strong>low latency and low operational costs</strong> by avoiding persistent orchestration servers.</p></li><li><p>Maintaining <strong>auditability and preventing invalid changes</strong> in your data logic is critical.</p></li></ul><h3>How It Works</h3><h4>Trigger: </h4><p>The pattern is initiated when:</p><ul><li><p>The first piece of data arrives, triggering a refresh of a data table.</p></li><li><p>A scheduled execution occurs (e.g., a daily run at 7:00 AM).</p></li><li><p>A new row of data or file appears in a dependent upstream table, if the process is configured for "autosync".</p></li></ul><h4>Inputs:</h4><ul><li><p>Data transformation and loading rules, stored within a <strong>context database</strong>.</p></li><li><p>Configuration details, including attributes like "autosync" or "manual sync".</p></li></ul><h4>Steps:</h4><ol><li><p><strong>Define Rules in Context:</strong> Data rules and logic for transformations are created and stored in a central "context database".</p></li><li><p><strong>Initial Trigger and Lookup:</strong> When data arrives or a schedule triggers, a refresh of the relevant data table is initiated. Once loaded, the system pauses and performs a lookup in the context database to identify all downstream objects dependent on this table.</p></li><li><p><strong>Dynamic Manifest Generation:</strong> A <strong>manifest</strong> (or Directed Acyclic Graph - DAG) is built <strong>on-the-fly</strong>, outlining all necessary steps and their dependencies. This ensures the manifest always reflects the freshest logic.</p></li><li><p><strong>Parallel Job Execution:</strong> All identified jobs in the manifest are "seeded off" simultaneously. Each job is tagged, and they run in parallel.</p></li><li><p><strong>Pub/Sub Communication:</strong> As each job completes, it publishes an "I'm done" message via Pub/Sub. The system monitors these messages to track progress.</p></li><li><p><strong>Iterative Orchestration:</strong> Once a "driving table" (a key dependency for the next stage) is loaded, another config lookup occurs, generating a new manifest to continue the flow. This repeats until all steps in the manifest are completed.</p></li><li><p><strong>Change Validation and Self-Healing:</strong> Any changes to the context layer undergo validation before being deployed to production. If an upstream change (like adding a column) is detected, the system can automatically recreate tables, roll back watermarks, reload data, and revalidate rules downstream, handling 99% of common issues without manual intervention. Alerts are raised for changes that cannot be automatically resolved.</p></li></ol><h4>Outputs:</h4><ul><li><p><strong>Always up-to-date and reliable data flows</strong> that adapt dynamically to changes.</p></li><li><p>Significantly <strong>reduced manual intervention</strong> for pipeline failures.</p></li><li><p><strong>Low latency</strong> and <strong>cost-effective</strong> data processing due to serverless execution and no persistent orchestration server.</p></li><li><p>Data catalog and lineage graphs that are inherently <strong>accurate and current</strong> because they are driven directly from the context layer.</p></li></ul><h3>Why It Works</h3><p>This pattern works because it embraces a <strong>context-driven</strong> approach. Instead of static, manually managed DAGs that can get out of sync with changing rules, all data logic is stored in a central repository. Data flows are then <strong>dynamically generated at runtime</strong> from this context, much like a recipe that's always updated with the freshest ingredients, ensuring what's running is always the latest version.</p><p>The use of <strong>Pub/Sub messaging</strong> and <strong>serverless infrastructure (like BigQuery and Cloud Run)</strong> enables a "fire and forget" execution model. This means there&#8217;s no constant, costly orchestration server running; jobs are spun up only when needed, then destroyed, leading to "very little latency" and "low costs".</p><p>Built-in <strong>validation and "stage gates"</strong> ensure that only valid changes are pushed to the production context, acting as a quality control filter that prevents broken pipelines before they even run. Furthermore, its <strong>self-healing</strong> nature means that common changes, like adding a column, are automatically handled, dramatically reducing manual firefighting and boosting trust in the data. This pattern effectively shifts the burden of dependency management from the data engineer to the automated system.</p><h3>Real-World Example</h3><p>Imagine a data engineer adds a new column to a table or renames an existing one. In older systems, this might require manual updates to numerous DAGs, leading to errors and delays. With Dynamic Data Flow Orchestration, the engineer simply updates the rule in the <strong>context database</strong>. The next time data arrives, or a scheduled execution occurs, the system detects this change via its validation checks. It then <strong>automatically recreates the affected table</strong>, rolls back its watermark, reloads the data, and <strong>revalidates all downstream rules</strong>. The data flows seamlessly adapt to the new structure, often handling 99% of such changes without any manual intervention, ensuring the data remains fresh and pipelines keep running smoothly.</p><h3>Anti-Patterns or Gotchas</h3><ul><li><p><strong>Streaming Data:</strong> This pattern is primarily designed for <strong>microbatching</strong> (data refresh latency of 15 minutes or more). For true real-time, row-by-row streaming, this pattern's natural breakdown into nodes and links could create bottlenecks. A different approach, like submitting the entire end-to-end transformation as a single code stream, or a "two-speed pipeline" (similar to a <strong>Lambda Architecture</strong>), might be required for lower latency streaming needs.</p></li><li><p><strong>Skipping Context Definition:</strong> The system relies entirely on the context layer. Trying to create transformation code, tables, or schedules without first defining them in the context will not work, as it forces all work to be based on the latest, validated version.</p></li></ul><h3>Tips for Adoption</h3><ul><li><p><strong>Prioritise Context Definition:</strong> Ensure all data rules, logic, and dependencies are meticulously captured in the context database, as this is the single source of truth for dynamic flow generation and lineage.</p></li><li><p><strong>Leverage Validation:</strong> Implement robust validation mechanisms for any changes to the context layer. This ensures that invalid configurations don't reach production, preventing pipeline breaks. This includes integrating practices like peer review, Git review, PR pull requests, and data quality tests before context updates are active.</p></li><li><p><strong>Embrace Serverless:</strong> Utilise cloud services like Google Cloud's BigQuery, Pub/Sub, Cloud Functions, and Cloud Run to fully benefit from the low-cost, fire-and-forget execution model.</p></li><li><p><strong>Focus on Supporting Patterns:</strong> While the core orchestration pattern is robust, ensure that supporting patterns (like automated rebuilding, deployment, and destroy models for safe self-healing) are also "bulletproof" to avoid issues.</p></li><li><p><strong>Start with Microbatches:</strong> Begin by applying this pattern to workloads that fit the microbatching use case (15 minutes + latency) to build familiarity and confidence.</p></li><li><p><strong>Scalability Mindset:</strong> The pattern offers headroom for scaling (horizontal/vertical scaling of BigQuery instances, chunking data flows) if performance or cost issues arise, so be prepared to iterate.</p></li><li><p><strong>Trust the Automation:</strong> Allow the system to self-heal. It can handle 99% of common changes automatically, reducing the need for manual intervention.</p></li></ul><h3>Related Patterns</h3><ul><li><p><strong>Context Driven Development:</strong> This pattern is a direct application of the broader "context driven" principle, where all logic is dynamically generated from a central repository.</p></li><li><p><strong>Pub/Sub Messaging:</strong> Utilised as the core communication mechanism between processing steps, enabling the fire-and-forget execution and decoupled architecture.</p></li><li><p><strong>Layered Data Architecture:</strong> Operates within a typical layered data architecture (e.g., History, Design, Consume layers) common in modern data platforms.</p></li><li><p><strong>Lambda Architecture:</strong> A hybrid approach (streaming + batching) that might be adopted when the anti-pattern of real-time streaming is encountered, augmenting this pattern.</p></li><li><p><strong>Data Validation and Quality Gates:</strong> Integral to ensuring that changes to the context don't break downstream processes, encompassing peer review, Git PRs, and data quality tests.</p></li><li><p><strong>Deploy and Destroy Infrastructure:</strong> The underlying cloud infrastructure model (e.g., Cloud Functions, Cloud Run) that allows resources to be provisioned only for the duration of a job, contributing to cost efficiency.<br> </p></li></ul><h2><strong>Press Release Template</strong></h2><h3>Capability Name</h3><p>Dynamic Data Flow Orchestration</p><h3>Headline</h3><p>New Dynamic Data Flow Orchestration Ensures Self-Healing, Cost-Effective, and Always Up-to-Date Data Pipelines for Data Teams</p><h3>Introduction</h3><p>The Data Platform team is thrilled to announce the launch of our new <strong>Dynamic Data Flow Orchestration</strong> capability. This revolutionary approach automates the management of data pipelines, dynamically adapting to changes in data rules and dependencies. It&#8217;s designed for data engineers and platform users, ensuring data flows are always current, reliable, and efficiently processed.</p><h3>Problem</h3><p>&#8220;As a data engineer, I used to dread making changes to data pipelines. Things would constantly get out of sync, and I&#8217;d spend hours manually fixing failures just because a dependency changed or a new column was added. It was hard to trust that what was running was the freshest version of our data flows.&#8221;</p><h3>Solution</h3><p>Our new Dynamic Data Flow Orchestration leverages a <strong>context database</strong> to store all data rules and logic. When new data arrives, or a scheduled trigger occurs, the system dynamically generates a <strong>manifest on-the-fly</strong>, identifying all downstream dependencies and launching jobs in parallel. This means data flows <strong>self-heal</strong> and adapt automatically to changes like new rules or renamed tables, eliminating manual syncing and reducing failures.</p><p>By using <strong>Pub/Sub messaging</strong> and <strong>serverless infrastructure like BigQuery and Cloud Run</strong>, pipelines run with <strong>very little latency and significantly lower costs</strong>, as there&#8217;s no constant orchestration server running. Validation mechanisms ensure only valid changes update the production context, preventing breaks. The system will even automatically rebuild and refresh tables downstream if upstream changes are detected, generally taking care of 99% of common issues without manual intervention.</p><h3>Data Platform Product Manager</h3><p>&#8220;With Dynamic Data Flow Orchestration, we&#8217;ve fundamentally improved the maintainability and reliability of our data pipelines. The self-healing nature and always-up-to-date lineage significantly boost trust in our data, allowing our engineers to focus on value creation rather than constant firefighting.&#8221;</p><h3>Data Platform User</h3><p>&#8220;I love that I don&#8217;t have to worry if the data I&#8217;m using is fresh or if a pipeline has broken behind the scenes. This system just works! I can add a new rule or even rename a table, and the data flows seamlessly adapt without me having to lift a finger.&#8221;</p><h3>Get Started</h3><p>This capability is active across our data platform today, ensuring reliable and fresh data for all users. To learn more about how Dynamic Data Flow Orchestration ensures always-fresh and robust data, please contact your data platform product manager or visit agiledata.io for documentation.</p><h2>AgileData Podcast Episode</h2><p><a href="https://podcast.agiledata.io/e/orchestrating-dynamic-data-flows-agiledata-engineering-pattern-1-episode-67/">https://podcast.agiledata.io/e/orchestrating-dynamic-data-flows-agiledata-engineering-pattern-1-episode-67/</a></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://podcast.agiledata.io/e/orchestrating-dynamic-data-flows-agiledata-engineering-pattern-1-episode-67/&quot;,&quot;text&quot;:&quot;Listen to Podcast Epsiode&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://podcast.agiledata.io/e/orchestrating-dynamic-data-flows-agiledata-engineering-pattern-1-episode-67/"><span>Listen to Podcast Epsiode</span></a></p><div id="youtube2-Y3YFdCax1NU" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Y3YFdCax1NU&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Y3YFdCax1NU?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>AgileData Podcast Episode Transcript</h2><p>Shane: Welcome to the Agile Data Podcast. I'm Shane Gibson. </p><p>Nigel: And I'm Nigel Vining. </p><p>Shane: Hey, Nigel. Today we're trying a new thing. We are going to start a series where we effectively describe one Pattern in each episode that we use, and I'm thinking of it as pages in our Pattern library. The first one we wanna talk about is the way that we orchestrate the data flows in our way of working.</p><p>So why don't you start off. By giving me an overview of the patents that we use to engineer that particular task. </p><p>Nigel: So we do a little bit different after many years background, using the likes of Composer and Airflow to deploy dags and create run manifests we thought would be a little bit different and.</p><p>Use a context driven, generate the manifest on the fly approach. So effectively what we do is we create our data rules for transforming loading data. We put them into our context database, and then what happens is the first piece of data that turns up effectively triggers a refresh of that. Data table, and then this is where it changes from the usual.</p><p>So as soon as that table's loaded, we then pause and we do another lookup of the config, and we basically say, what are the downstream objects that hang off this table? Now that's loaded. From that, we build a manifest on the fly. So we effectively say, great, we need to load this table. Depends on this one.</p><p>So we will load it first. So we build that manifest and then we seed off all those jobs. We basically set 'em off all at the same time. We attach a tag to each of them and then we wait for them to finish. They all run in parallel, and as each one finishes, it checks that manifest and goes, am I the last?</p><p>Nope. Am I the last? Nope. Until we effectively get the driving table, which would start the next flow. So once that table comes back and says, yep, loaded, we do another config lookup and we start again, we create another manifest and we keep repeating this until we've effectively got nothing left in the manifest.</p><p>The. Beauty of this approach is in our early days we did a lot of rebuilds and changing context. Shane would introduce new contexts while jobs were already running. He'd delete tables or he'd recreate 'em on the different names. So the whole time what was happening is we were getting lots of failures until we went to this Pattern of context and manifest on demand, where effectively the whole time it was always checking and saying, oh, new piece of configs turned up.</p><p>Got another table to load and I would effectively slot it into the mix and it was all quite seamless and painless for me. So that's our Pattern a nutshell. </p><p>Shane: And the reason I like it is because if we think about each of our blobs of transformation code as been a rule, that's the term we use. I. I can go and add a new rule.</p><p>So I can say there's a table sitting in history, think raw bronze. I need to do something nasty to the data to bring it through into our design layer. And I can go and add a rule, I can add a small piece of logic that takes that data and moves it into our design layer. And as soon as I do that, the next time there's a scheduled execution or an execution that's triggered for a load, the manifest, the dag directed graph for that flow will dynamically rebuild itself.</p><p>And so it just self-heal. And then as you said, when I break my own rules for naming conventions, I can go and rename things. And again, that the data flows self-heal themselves. So effectively I can do the work I wanna do, and I don't need to care about what the directed graph for those data flows are. I don't need to care about the dependencies because the context engine.</p><p>This Pattern we use for orchestration of these data flows is doing it for me, right? I just don't care. And importantly, it's doing it for you 'cause you don't care either, right? Is effectively we can make any change we want and the machine just takes care of it. So I think for me, that was the biggest value.</p><p>But think about the alternatives. So one of the core patterns we use, which you can talk about in another podcast, is this idea of context driven. So everything we do, every piece of logic is stored in a repository, and everything that needs to execute is hydrated or generated dynamically from that repository.</p><p>So what that means is when we make a change, we can dynamically regenerate those data flows versus the old way of doing it, which is we'd hold that context in one place and we'd have a physical instantiation of it as a set of dags, and now I have to sync it. After, like, I wanna change the rules. Okay, now I need to change the dag.</p><p>And those things used to always get outta sync. We're kinda like, that's not the way it's meant to work. Oh shit. There's this piece of code stored as a stored process somewhere. As part of that dag. Well, the dags were a stored process and that sinking problem used to be a real problem in the old patterns.</p><p>Nigel: Yeah. And even today, I occasionally. Struggle with consistent DAG deployment. If there's multiple people, engineers, developers, making changes in that space, you can effectively get a little bit of drift of what should be running and what's associated with other things. 'cause there's a little bit of change going on as commits are happening, whereas this way we effectively leave it to runtime and it's at runtime.</p><p>As an object loads, we look at what's next. So it's always the freshest piece of lineage or manifest that you can have for any object. It's always up to date because it's always doing a query back to our context layer to say, Hey, what's next? And then it starts next, and that's cool until that's finished.</p><p>So I like it for that because it takes care of itself. </p><p>Shane: Also all of the maps that we generate, so our data catalog, our lineage graphs, everything else is driven off that context versus the previous generation where we would've treated all the code and all the schemas and all that structure and all those schedules as exhaust, we would've sucked it into the catalog to try and render it.</p><p>We flipped that model, which is you define the, the things. In context, and then everything else is generated off it. It's always up to date because I cannot do a piece of work without the context being entered. I cannot create a transformation code. I cannot create the idea of a table. I cannot create the idea of a schedule.</p><p>I cannot orchestrate dependencies across tables without first creating that piece of context. So again, that. Forces us to make sure that anything I'm looking at is the latest version because there is no choice. </p><p>Nigel: Yeah. Yep. And I guess the natural question that others would ask in the profession, what about when a change is made that break something?</p><p>And this is an interesting one because effectively when any changes are made, we validate them and we don't allow. Updates to the context layer unless it's a valid update. So every change that's made won't update the latest production context until it's valid, so we effectively stop anything getting into the context layer that's gonna cause a problem.</p><p>It stays at a draft. Effectively until it's validated. So when the manifest is generated itself, it's always looking for the active record, and it's fine for this to be draft floating around. It won't touch it until it's valid, and then it flips over to replace the current production record. </p><p>Shane: And then there's a bunch of attributes for that context.</p><p>So when I define a table or a piece of code, I can define that it's what we call auto sync. That, that it will execute, that manifest that data flow. Whenever a a row turns up in, in one of its dependencies, Salesforce sends us some records. It has that history tile or table. So raw bronze. Then if it's set to auto sync, it will then trigger.</p><p>The hydration of that manifest and run it. But if I flip it and say that for that piece of coat or that table, it's manual sink, custom sink, and I say it's gonna only run at 7:00 AM on a Monday, then that's when it triggers the hydration of that manifest and running that coat. So there's a bunch of attributes that are actually really important when you start moving all your data flow stuff back to a context layer.</p><p>Nigel: Yeah, absolutely, and that's a good point. You touch on effectively allowing that orchestration to start based on a record turning up or a file turning up or a schedule that's been attached to it that says, run it daily at 7:00 AM which is quite a common Pattern for pipelines, but 99% of our pipelines tend to start when something arrives.</p><p>File some data, which effectively triggers the pipeline to start and pull that data all the way through the layers. </p><p>Shane: And then the other Pattern that we use a lot in there is this idea of pub sub. As you said, it's a fire and forget. It figures out all the steps. It fires them off into effectively a queue.</p><p>It waits for the queue to run. So we, again, as you always do with patterns, you're using other patterns to make this Pattern really efficient and effective. </p><p>Nigel: Yeah, so the driver, as I said at the start of, I'm done, are you done? Are we ready to run something? Those are all pup sub messages. So we have a standard pup message, which basically says.</p><p>I'm done. And that's attached to everything in the execution layer. So anytime a table's refreshed, we get a, I'm done message off the end of that. And then we basically look for those messages as we, uh, walk through our layers. So that's our, basically, I guess our wrapper Pattern, which holds it all together.</p><p>Has something happened. Yep. Cool. Trigger something else. And we just use messages. </p><p>Shane: And then what's our average runtime? It's always dependent on we, we use BigQuery under the covers and Google Pub sub, and it depends on how much data on that, from what you are seeing of all the different customers we've got and the volumes of data.</p><p>What's an actual execution for a standard data flow way to go from, you know, arriving data and history all the way to data being consumable in that final layer. </p><p>Nigel: It is generally quite fast. We have very little latency because there's no overhead and like an airflow example, there's no overhead of starting up, I guess execution pods, Kubernetes pods to run a task to shut down, to start up and run another task.</p><p>Because we are effectively just using a message. We can tell BigQuery to run something. And, and parallel as much as possible. So our, our execution times are generally very fast 'cause there's zero overhead. We're basically just telling BigQuery, here's a list of things I want you to do and letting it do it.</p><p>Shane: And our costs are low because we're not running a container, Kubernetes and EC2, anything like that permanently to run the orchestration of the scheduling engine. 'cause we're using pub, sub and cloud functions. But moving to cloud run, it's doing all that magic for us. So it turns itself on the code's deployed, it's run, it's then destroyed, and we only get charged for that period of time that it's actually running, rather than have a 24 by seven service sitting there to just schedule and orchestrate our code.</p><p>Nigel: Yeah, that's exactly right. So we have no concept of an orchestration server running the background. Uh, we are effectively creating the BigQuery jobs on the fly, let it run them, and waiting for that I've finished message to come at the end of it. So there is never a service that is physically waiting for a job to run and costing us money.</p><p>Basically BigQuery is running jobs, and when it finishes, we get notified and we automatically start the next job. So it makes it very economic to run and we can run hundreds of jobs, thousands of messages, but it basically costs us nothing because there's nothing running to orchestrate that. Those are just messages bouncing around, starting jobs, waiting for jobs to finish.</p><p>Very effective. </p><p>Shane: But there is an anti-patent for this because we only use this for what we call micro batching. We think about anything above 15 minutes of refresh latency. This works for anything below that, we start hitting a problem and it's not a problem of Pattern around this orchestration of data flows.</p><p>It's a problem of the dependencies of everything else we've built. So if we think that we have a layered. Data, architecture, history, design, consume. We use tables under the cover in BigQuery, so there's a dependency on who's writing to them. And we know we can orchestrate this stuff very quickly, but if we move to a streaming model.</p><p>If we wanted to pick up one row and push it all the way through this, we would actually end up slightly rearchitecting the way we deploy the code to pub sub. We wouldn't actually fire off 10 tasks and wait for them to daisy chain and orchestrate themselves and have dependencies. What we'd probably do is look at the entire piece of code for the end-to-end orchestration and then submit that to run as a single code stream in BigQuery.</p><p>End to end is one task, wouldn't we? Because the patent we've got naturally breaks the task down into nodes and links. It says, run these, and when they're done, run these. And then if we keep firing it with volumes of streaming data, we will actually end up having a bottleneck, won't we? So that's an anti-patent.</p><p>We would use the same context approach. We would still hydrate and fire, but. We would slightly change the way the Pattern is deployed at runtime to handle that streaming volume. </p><p>Nigel: So in that case, we would let data stream constantly into the landing area, which is what we do in some projects. And then effectively, I guess our compromise around that one is then we would micro batch at 15 minute intervals from that stream data and move it through in micro batches currently.</p><p>Otherwise, yes, we would architect to stream. Through the layers, and we would probably run a two speed pipeline. In that case, we would micro batch it, say. Maybe 15 minute intervals. But then what we would also do is we would stream around the outside data directly to the consumer layer for the required attributes that need to be consumed in real time.</p><p>And then we would catch up the rest of the layers of our slower data. It through the normal pipeline, and then we get the bests </p><p>Shane: effectively adopting a lamb architecture, right? Yes. But the, the majority of our patents would just survive for this orchestration of data flows. But we know that we actually have to change or tailor or augment some of them when we hit that anti-patent for the way that we've currently running it and we're currently designing it.</p><p>So. Apart from a little bit of tweaking for streaming, is there any reason you wouldn't use this Pattern? Any use cases where you think this dynamic hydration of the data flows and this creating a manifest and this pushing it off to fire and forget and run? Is there any reason that you wouldn't use this Pattern?</p><p>Nigel: No. We've seen it run. Day in, day out for coming up five years now. The original architecture and code that was deployed back in the first year is largely exactly the same. We have not changed the core Pattern, so it's run happily across multiple customer use cases for five years. I think that's a fairly good testament to, it's a Pattern that works.</p><p>Shane: I think the other thing we can do is we can, if we needed to, we can scale horizontally or scale vertically. If we ever do get to a situation where we need to reduce the latency time of the end-to-end process, we can scale up the BigQuery instances and make them execute faster. We could chunk the. The data flows down into more domain orientated buckets and then figure out how to paralyze the runs of them.</p><p>So again, we've still got lots of headroom of how we can iterate this Pattern if we ever strike a cost or a performance or a complexity issue. So we know we're always gonna be iterating it, but like you said, it's been fairly good and bulletproof. I think the other thing is you've spent quite a lot of time on what we call the blast radius.</p><p>As you started building out that core Pattern as code and as I started doing the things that I would expect to do as a non-data engineer to data, I should be able to do this. And you suck your teeth going, oh my God, you've built those safety nets around that Pattern. So if you talk about the fact that I can create a piece of context, but it's not actually treated as part of that.</p><p>Production data flow is not executable when it runs, until it meets all these checks. We're effectively bringing in the peer review process, the Git review, the PR pool requests, the data quality tests, all of that's happening before any of those things allow the data flow to actually execute as a new version.</p><p>Nigel: Yeah, and so the other thing that's. Wrapping, this is in the context layer. We've got a couple of flags that reply to the context. So if something has changed in the context layer and successfully deployed into production state, we effectively put a flag on it that says, this config has changed. It's gonna require changes of downstream objects to flow this change.</p><p>For example, you've added a new column into a table. Now you expect that column to flow through and hydrate and all the layers downstream of it. So effectively we tag that object and say, Hey, this has changed. You're gonna need to recreate it and you're gonna need to check all the things. Downstream of it to make sure that they're gonna be okay.</p><p>So when we come to run that manifest, we run it as usual, but effectively we hit an object and say, oh, there's a flag on this that says it's changed. So we identify the change. We usually recreate the table and we roll the watermark back. We reload that table. That's cool. We get to the next one and we say.</p><p>Great. Something upstream of me has changed. I need to make sure how that's gonna work. So what we actually do is we do a, another validation of that context to make sure it's a hundred percent gonna work with that change. And at that point, if we determine, oops, this is gonna have to be rebuilt as well, we do that and we keep doing this all the way the end of the manifest, but we also stop the manifest.</p><p>Some reason we've actually caused something that's not gonna keep working and we stop and we raise an alert and say. Hey, this context, uh, is now gonna be invalidated because something, two or three layers upstream has changed enough that I can't fix it for you automatically. You're gonna have to look at change and then basically say what you need to do.</p><p>So that was quite a breakthrough 'cause we used to spend a lot of time. Manually trying to fix pipelines that stopped for small things like comma being added or removed. Whereas 99% of the time we can handle that automatically just by doing a full rebuild and refresh the table and then revalidating the rules.</p><p>And generally everything takes care of itself. 'cause we wanted it to be self-healing. 'cause there's nothing worse than pipelines that fall over just because you've added a column. Whereas something like that's easy, a column can just be added, populated, and away you go. So that was quite a breakthrough, basically putting those extra bags in the context.</p><p>Shane: Yeah. And effectively they're like stage gates. It's basically every time it's validating, am I still to run the next step? Yep. And then in the beginning, like you said, we had lots of times where it stopped. I would do something and then it would say, oh, actually you're not meant to be able to do that. But as each one of those happened, we'd go in and we'd either fix the way I created that context so I could no longer create it in an.</p><p>Invalid way, or we'd automate the rebuilding, the deploy and destroy model that it was trying to rebuild so it could rebuild itself safely. So again, we didn't change the core orchestration Pattern, we changed all the other things around it to make sure that patent was bulletproof and ran. And we didn't get call out at two o'clock in the morning because our data flow had stopped, or at seven o'clock to have a look at it and have a whole lot of catch up work to rebuild it.</p><p>So I think those supporting patterns were. As important as this core orchestration of the data flows by them being dynamic and using Manifest publisher. </p><p>Nigel: Yeah, that's exactly right. It makes it all quite robust and as I said, generally the pipelines run very accurately without a lot of tension. </p><p>Shane: Excellent.</p><p>Alright, well I think that one's done. So orchestrating data flows in a dynamic way using manifest in pub sub, right. Another one next week. But for now, I hope everybody has a simply magical day.</p>]]></content:encoded></item></channel></rss>