<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Data on Benny Simmonds</title>
    <link>https://www.bencode.io/categories/data/</link>
    <description>Recent content in Data on Benny Simmonds</description>
    <generator>Hugo -- 0.149.1</generator>
    <language>en-us</language>
    <lastBuildDate>Tue, 02 Dec 2025 09:02:00 +1100</lastBuildDate>
    <atom:link href="https://www.bencode.io/categories/data/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Entity Resolution and the Instability Problem</title>
      <link>https://www.bencode.io/posts/entity/</link>
      <pubDate>Tue, 02 Dec 2025 09:02:00 +1100</pubDate>
      <guid>https://www.bencode.io/posts/entity/</guid>
      <description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#the-problem&#34;&gt;The Problem&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#solution-1-make-the-api-recordcentric-not-entitycentric&#34;&gt;Solution 1: Make the API record‑centric, not entity‑centric&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#solution-2-introduce-your-own-stable-external-entity-id-and-map-it-to-senzing&#34;&gt;Solution 2: Introduce your own &lt;em&gt;stable&lt;/em&gt; external Entity ID and map it to Senzing&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#21-public-vs-internal-ids&#34;&gt;2.1. Public vs internal IDs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#22-handling-merges&#34;&gt;2.2. Handling merges&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#23-handling-splits&#34;&gt;2.3. Handling splits&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#24-pros--cons&#34;&gt;2.4. Pros / Cons&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#solution-3-provide-an-entity-change-feed-events-for-downstream-sync&#34;&gt;Solution 3: Provide an entity change feed (events) for downstream sync&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#31-why&#34;&gt;3.1. Why?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#32-event-model&#34;&gt;3.2. Event model&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#solution-4-treat-entity-ids-as-ephemeral-handles-with-ttl-semantics&#34;&gt;Solution 4: Treat entity IDs as &lt;em&gt;ephemeral handles&lt;/em&gt; with TTL semantics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#solution-5-eventsourcing--versioned-entities-for-heavy-complianceaudit-usecases&#34;&gt;Solution 5: Event‑sourcing / versioned entities (for heavy compliance/audit use‑cases)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#frankenres&#34;&gt;FrankenRes&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#internals&#34;&gt;Internals&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#api-surface&#34;&gt;API surface&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#detecting-splits-and-merges-with-senzing&#34;&gt;Detecting Splits and Merges with Senzing&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#what-senzing-actually-provides-the-barebones&#34;&gt;What Senzing actually provides (the barebones)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#minimum-state-you-need-to-track&#34;&gt;Minimum state you need to track&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#robust-per-event-processing-pattern&#34;&gt;Robust per-event processing pattern&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#concurrency-safeguard&#34;&gt;Concurrency safeguard&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#split-vs-merge-detection&#34;&gt;Split vs Merge Detection&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#detecting-splits&#34;&gt;Detecting splits&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#detecting-merges&#34;&gt;Detecting merges&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#a-simplier-way&#34;&gt;A simplier way&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#senzing-lifecycle-detector-c-implementation&#34;&gt;Senzing Lifecycle Detector C# Implementation&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#single-file-example&#34;&gt;Single-file example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#usage&#34;&gt;Usage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#tldr&#34;&gt;TL;DR&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;the-problem&#34;&gt;The Problem&lt;/h2&gt;
&lt;p&gt;The classic entity resolution gotcha: the thing that looks like a primary key (e.g. Senzing&amp;rsquo;s entity ID) is actually a volatile cluster ID that can legitimately change as the engine learns. Senzing explicitly says their resolved entity ID is &lt;strong&gt;not&lt;/strong&gt; a globally unique persistent identifier and that it&amp;rsquo;s just an identifier for a grouping that may be transient. (&lt;a href=&#34;https://senzing.zendesk.com/hc/en-us/articles/4415858978067-How-does-an-Entity-ID-behave&#34; title=&#34;How does an Entity ID behave&#34;&gt;senzing.zendesk.com&lt;/a&gt;)&lt;/p&gt;</description>
    </item>
    <item>
      <title>Entity Resolution with Senzing and the .NET SDK</title>
      <link>https://www.bencode.io/posts/senzing/</link>
      <pubDate>Fri, 19 Sep 2025 09:35:00 +1000</pubDate>
      <guid>https://www.bencode.io/posts/senzing/</guid>
      <description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#context&#34;&gt;Context&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#record-vs-entity-vs-relationship&#34;&gt;Record vs Entity vs Relationship&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#data-quality-issues-er-addresses&#34;&gt;Data Quality Issues ER Addresses&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#senzing&#34;&gt;Senzing&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#repository&#34;&gt;Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#key-senzing-attributes&#34;&gt;Key Senzing Attributes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#resolution-concepts&#34;&gt;Resolution Concepts&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#features&#34;&gt;Features&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#feature-scores&#34;&gt;Feature Scores&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#match-levels&#34;&gt;Match Levels&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#senzing-v4-sdk-setup-on-metal&#34;&gt;Senzing V4 SDK Setup on Metal&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#native-senzing-sdk-setup&#34;&gt;Native Senzing SDK Setup&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#net-sdk-setup&#34;&gt;.NET SDK Setup&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#setup-local-nuget-source&#34;&gt;Setup Local NuGet Source&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#senzing-v4-c-snippets&#34;&gt;Senzing V4 C# Snippets&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#senzing-v4-cli-tools&#34;&gt;Senzing V4 CLI Tools&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#sz_configtool&#34;&gt;sz_configtool&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#listfeatures&#34;&gt;listFeatures&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#listattributes&#34;&gt;listAttributes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#listrules&#34;&gt;listRules&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#listfragments&#34;&gt;listFragments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#principles&#34;&gt;principles&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#sz_explorer&#34;&gt;sz_explorer&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#get&#34;&gt;get&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#how-tree&#34;&gt;how (tree)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#senzing-weirdness&#34;&gt;Senzing Weirdness&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#typed-models-vs-loose-json-strings&#34;&gt;Typed models vs loose JSON strings&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#todo&#34;&gt;TODO&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#info-messages-aka-sz_with_info&#34;&gt;Info Messages aka SZ_WITH_INFO&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#senzing-best-practices&#34;&gt;Senzing Best practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#resources&#34;&gt;Resources&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;context&#34;&gt;Context&lt;/h2&gt;
&lt;p&gt;The process of identifying and linking records that refer to the same real-world entity across different data sources, even when the records contain variations, errors, or incomplete information.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Time Dimension Populate Script</title>
      <link>https://www.bencode.io/posts/2011-07-21-time-dimension/</link>
      <pubDate>Thu, 21 Jul 2011 07:00:00 +0000</pubDate>
      <guid>https://www.bencode.io/posts/2011-07-21-time-dimension/</guid>
      <description>&lt;p&gt;Here is a very simple TSQL script that will flesh out a time dimension, for use with SQL Server Analysis Services (SSAS) cube, and can easily be molded to work with other vendor implementations.&lt;/p&gt;
&lt;p&gt;The &lt;a href=&#34;http://msftdbprodsamples.codeplex.com/&#34;&gt;AdventureWorks DW&lt;/a&gt; provides a nice reference implementation for a time dimension. Unfortunately provides no guidance around the actual population of the dimension. This script will provide a repeatable, configurable way of building out a similar implementation.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Managing Database Evolution</title>
      <link>https://www.bencode.io/posts/2010-06-05-back-to-basics-managing-databases/</link>
      <pubDate>Sat, 05 Jun 2010 22:47:03 +0000</pubDate>
      <guid>https://www.bencode.io/posts/2010-06-05-back-to-basics-managing-databases/</guid>
      <description>&lt;p&gt;On a new clients site the other day, observed that over time the more companies I work for the deeper my knowledge for applying effective work practices becomes. In other words, over time you see things that work well, and things that don’t. I’m talking about simple practices that when applied to teams result more quality and/or efficient software.&lt;/p&gt;
&lt;p&gt;Databases and their associated artefacts (functions, triggers, message broker queues and so on) should be managed, and versioned. Again a simple problem with a simple solution, but in the real world tends to be practiced poorly.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
