Context

The process of identifying and linking records that refer to the same real-world entity across different data sources, even when the records contain variations, errors, or incomplete information.

  • Entity-Centric Learning: A process where resolved entities are treated holistically, enabling the system to learn from every name and address variation
  • Sequence Neutrality: The ability to reevaluate and improve prior ER decisions regardless of the order data is loaded
  • Auto-Generics Detection: The ability to detect and handle widely used erroneous data during real-time ER
  • Disclosed Relationships: Known connections between entities based on factual information
  • Derived Relationships: Undisclosed or hidden connections between entities detected during ER
  • Find Path: The ability to explore and reveal connections between distant entities in the entity graph
  • Privacy by Design (PbD): An engineering approach that considers privacy throughout the entire development process

Record vs Entity vs Relationship

  • Record: A single data entry from a source system (customer record, transaction, etc.)
  • Entity: A resolved real-world object that may be represented by multiple records
  • Relationship: Connections between entities (family members, business associates, etc.)

Data Quality Issues ER Addresses

  • Duplicates: Multiple records for the same entity
  • Variants: Name variations (Bob/Robert, Inc/Incorporated)
  • Typos: Misspellings and data entry errors
  • Incomplete Data: Missing fields across records
  • Inconsistent Formats: Different date/phone/address formats

Senzing

Specifically I’m interested in Senzing aka G2, G2 being the OG code name of what Senzing is today, which began as a 2009 IBM skunkworks project.

Senzing configuration is concerned with the following concepts:

  • Data sources
  • Entity types
  • Features and attributes
  • Resolution rules
  • Thresholds

Examples of entity types:

  • PERSON: Individuals
  • ORGANIZATION: Companies, agencies
  • VESSEL: Ships, aircraft
  • Custom Types: Domain-specific entities

Repository

Storage layer containing:

  • Resolved Entities: Final entity representations
  • Records: Original input records
  • Relationships: Entity connections
  • Features: Extracted and standardised data

These logical layers map to data stores:

  • Master List: A curated collection of entities (e.g., customer list, employee database) that may contain duplicates and is used for entity resolution
  • Transaction Data: Event-based records containing identifying information about entities, used to connect with master records and watchlists
  • Watchlist: A collection of entities you want to avoid due to potential risks (e.g., fraudsters, sanctioned individuals)
  • Reference List: Supplemental data used to enrich entity understanding (e.g., demographics, historical addresses)

Key Senzing Attributes

  • DATA_SOURCE: A required string attribute identifying the source of the record (max 25 characters)
  • RECORD_ID: A unique identifier within a data source, strongly recommended for updates and maintenance
  • RECORD_TYPE: Specifies the type of entity (e.g. PERSON, ORGANIZATION) to prevent inappropriate resolutions
  • NAME_TYPE: Classifies names as PRIMARY or ALIAS, used for entity resolution

Resolution Concepts

Candidates

Records that potentially match during resolution process.

Features

Standardised, comparable attributes extracted from raw data:

  • NAME_FULL: Standardised full names
  • ADDR_FULL: Standardised addresses
  • PHONE: Standardised phone numbers
  • EMAIL: Email addresses
  • DOB: Date of birth
  • SSN: Social Security Numbers

Feature Scores

Numerical similarity scores between features (0-100):

  • Exact Match: 100
  • Close Match: 80-99
  • Likely Match: 60-79
  • Possible Match: 40-59

Match Levels

Senzing’s confidence levels for entity relationships:

  1. Match (Level 1): Very high confidence same entity
  2. Possible Match (Level 2): Likely same entity, review recommended
  3. Possible Relation (Level 3): Likely related entities
  4. Name Only Match (Level 4): Shared name, different entities

Resolution Rules

Logic determining when records should resolve:

  • Principle: Core matching logic (exact SSN match)
  • Rule: Specific matching combinations
  • Threshold: Minimum scores required

.NET SDK

Primary Interfaces

// Core engine interface
public interface G2Engine
{
    // Add records
    int AddRecord(string dataSourceCode, string recordId, string jsonData);

    // Get entity details
    string GetEntity(long entityId);

    // Search for entities
    string SearchByAttributes(string jsonData);

    // Get relationships
    string FindNetworkByEntityId(long entityId, int maxDegrees);
}

// Configuration management
public interface G2Config
{
    string Save();
    int Load(string configStr);
    int AddDataSource(string dataSourceCode);
}

// Diagnostic interface
public interface G2Diagnostic
{
    string GetDataSourceCounts();
    string GetEntitySizeBreakdown(int minimumEntitySize);
}

Common Data Structures

// Input record format
var recordJson = @"{
    'NAME_FULL': 'John Smith',
    'ADDR_FULL': '123 Main St, Anytown, NY 12345',
    'PHONE': '555-123-4567',
    'EMAIL': 'john.smith@email.com',
    'DOB': '1980-05-15'
}";

// Entity response structure
{
    "ENTITY_ID": 12345,
    "RECORDS": [...],
    "FEATURES": {...},
    "RELATED_ENTITIES": [...]
}

Typical Implementation Flow

  1. Initialize Engine

    var engine = new G2EngineImpl();
    engine.Init("MyApp", configJson, verboseLogging);
    
  2. Load Configuration

    var config = new G2ConfigImpl();
    var configId = config.Create();
    engine.ReplaceConfig(configId);
    
  3. Add Data Sources

    config.AddDataSource("CUSTOMER_DB");
    config.AddDataSource("CRM_SYSTEM");
    
  4. Process Records

    engine.AddRecord("CUSTOMER_DB", "CUST001", recordJson);
    
  5. Query Results

    var entity = engine.GetEntity(entityId);
    var matches = engine.SearchByAttributes(searchJson);
    

Best Practices

Performance Optimization

  • Use bulk loading for large datasets
  • Implement proper threading for concurrent access
  • Monitor memory usage during processing
  • Use streaming for large result sets

Error Handling

try
{
    var result = engine.AddRecord(dataSource, recordId, jsonData);
    if (result != 0)
    {
        var error = engine.GetLastException();
        // Handle error appropriately
    }
}
catch (G2Exception ex)
{
    // Handle Senzing-specific exceptions
}

Configuration Management

  • Version control configuration changes
  • Test configuration in development environment
  • Use staged deployments for configuration updates
  • Monitor resolution quality after changes

Troubleshooting Quick Reference

Common Issues

  • Poor Match Quality: Review feature configuration and thresholds
  • Performance Issues: Check threading, memory allocation, database tuning
  • Configuration Errors: Validate JSON syntax and required fields
  • Memory Leaks: Ensure proper disposal of engine instances

Diagnostic Tools

// Get processing statistics
var stats = diagnostic.GetDataSourceCounts();

// Analyze entity size distribution
var breakdown = diagnostic.GetEntitySizeBreakdown(2);

// Review feature utilization
var features = engine.GetFeatureStatistics();

This primer covers the essential concepts and terminology you’ll encounter when building your entity resolution system with Senzing’s .NET SDK. Focus on understanding the resolution workflow and key configuration concepts as you begin implementation.

Senzing V4 Setup on Metal

Senzing V4 has just been released. New to the eco-system its super confusing to know what existing code/tools/sdks/cotainers support V3 vs V4. Here’s the V4, rest is V3:

  • senzingsdk-runtime: the native SDK
  • senzingsdk-poc: meta package that includes senzingsdk-runtime, senzingsdk-setup, and senzingsdk-tools - contains files required for a Senzing v4 Linux Quickstart PoC, including er/bin utilities, sz_create_project, sz_setup_config, sz_update_project, sz_fileloader, and a SQLite database /opt/senzing/er/resources/templates/G2C.db.
  • senzingsdk-setup: Contains er/bin upgrade utilities sz_dbupgrades, sz_configupgrade, and er/resources files and templates.
  • senzingsdk-tools: Contains er/bin utilities, such as sz_command, sz_configtool, and the EDA tools (sz_explorer, sz_audit, and sz_snapshot).

Native Senzing SDK Setup

The native SDK is needed to work with other programming lang SDKs.

  1. Read docs.
  2. Setup apt repo or get raw debs
  3. sudo apt install <package.deb> in this order senzingsdk-runtime, senzingsdk-tools, senzingsdk-setup then senzingsdk-poc
  4. The distribution is unpacked to /opt/senzing/

.NET SDK Setup

  1. Install .NET SDK v8
  2. Install the native SDK (as above)
  3. Create local NuGet source and push Senzing.Sdk package (see below)
  4. Export export SENZING_PATH=/opt/senzing/ and export LD_LIBRARY_PATH=$SENZING_PATH/er/lib:$LD_LIBRARY_PATH

Setup local NuGet source and push Senzing.Sdk

dotnet nuget list source
mkdir -p ~/nuget/packages
dotnet nuget add source ~/nuget/packages/ -n dev
dotnet nuget push /opt/senzing/er/sdk/dotnet/Senzing.Sdk.4.0.0.nupkg --source dev

Senzing V4 C# Snippets

Build the SnippetRunner:

git clone https://github.com/Senzing/code-snippets-v4.git
cd ~/repos/code-snippets-v4/csharp/runner/SnippetRunner
dotnet build

Run/debug the various snippets:

export SENZING_ENGINE_CONFIGURATION_JSON='{
 "PIPELINE" : {
  "CONFIGPATH" : "/etc/opt/senzing",
  "RESOURCEPATH" : "/opt/senzing/er/resources",
  "SUPPORTPATH" : "/opt/senzing/data"
 },
 "SQL" : {
  "CONNECTION" : "postgresql://sz:changeme@10.1.1.123:5432:G2?sslmode=disable"
 }
}'

dotnet run --project information/GetVersion
{"PRODUCT_NAME":"Senzing SDK","VERSION":"4.0.0","BUILD_VERSION":"4.0.0.25230","BUILD_DATE":"2025-08-18","BUILD_NUMBER":"2025_08_18__12_32","COMPATIBILITY_VERSION":{"CONFIG_VERSION":"11"},"SCHEMA_VERSION":{"ENGINE_SCHEMA_VERSION":"4.0","MINIMUM_REQUIRED_SCHEMA_VERSION":"4.0","MAXIMUM_REQUIRED_SCHEMA_VERSION":"4.99"}}

dotnet run --project information/GetRepositoryInfo
{"dataStores":[{"id":"CORE","type":"postgresql","location":"10.1.1.123","stats":{"maxXid":["pg_statistic",141115]}}]}

If debugging in an IDE create sz.env with the following and bind it to the debugger environment:

SENZING_ENGINE_CONFIGURATION_JSON='{ "PIPELINE" : { "CONFIGPATH" : "/etc/opt/senzing", "RESOURCEPATH" : "/opt/senzing/er/resources", "SUPPORTPATH" : "/opt/senzing/data" }, "SQL" : { "CONNECTION" : "postgresql://sz:changeme@10.1.1.123:5432:G2?sslmode=disable" } }'
SENZING_PATH=/opt/senzing/
LD_LIBRARY_PATH=$SENZING_PATH/er/lib:$LD_LIBRARY_PATH

Senzing V4 SDK CLI Tools

SDK Tools available:

  • sz_audit: Tool for system audits and analyzing entity resolution processes
  • sz_json_analyzer: Analyzes and validates JSON data structures within the Senzing environment
  • sz_snapshot: Creates snapshots of the current project state for backup and version control purposes
  • sz_command: Processes and executes various Senzing-related commands and operations
  • sz_explorer: Interactive exploratory data analysis tool for examining resolved entities and their relationships
  • sz_configtool: Utility for managing and configuring data sources

sz_configtool

This utility allows you to configure a Senzing instance.

Senzing compares records within and across data sources. Records consist of features and features have attributes. For instance, the NAME feature has attributes such as NAME_FIRST and NAME_LAST for a person and NAME_ORG for an organization.

Features are standardised and expressed in various ways to create candidate keys, and when candidates are found all of their features are compared to the features of the incoming record to see how close they actually are.

Data source commands:

  • addDataSource: to register a new data source
  • deleteDataSource: to remove a data source created by error
  • listDataSources: to see all the registered data sources

When you see a how or a why screen output in Senzing, you see the actual entity counts and scores of a match. The list functions below show you what those thresholds and scores are currently configured to.

Features and attribute settings:

listFeatures: to see all features, whether they are used for candidates, and how they are scored listAttributes: to see all the attributes you can map to

Principles (rules, scores, and thresholds):

listFunctions: to see all the standardize, expression and comparison functions possible listGenericThresholds: to see all the thresholds for when feature values go generic for candidates or scoring listRules: to see all the principles in the order they are evaluated listFragments: to see all the fragments of rules are configured, such as what is considered close_name

Finally, a set of rules or “principles” are applied to the feature scores of each candidate to see if the incoming record should resolve to an existing entity or become a new one. In either case, the rules are also used to create relationships between entities.

To understand more about configuring Senzing:

listFeatures

Note the standardize functions, such as PARSE_PHONE, for normalizing data:

┌────┬────────────────────────────┬────────────────┬──────────┬───────────┬────────────┬────────────────────┬────────────────┬──────────────────────────┬──────────┬─────────┬
│ id │ feature                    │ class          │ behavior │ anonymize │ candidates │ standardize        │ expression     │ comparison               │ matchKey │ version │
├────┼────────────────────────────┼────────────────┼──────────┼───────────┼────────────┼────────────────────┼────────────────┼──────────────────────────┼──────────┼─────────┼
│ 1  │ NAME                       │ NAME           │ NAME     │ No        │ No         │ PARSE_NAME         │ NAME_HASHER    │ GNR_COMP                 │ Yes      │ 2       │
├────┼────────────────────────────┼────────────────┼──────────┼───────────┼────────────┼────────────────────┼────────────────┼──────────────────────────┼──────────┼─────────┼
│ 2  │ DOB                        │ BIO_DATE       │ FMES     │ No        │ Yes        │ PARSE_DOB          │                │ DOB_COMP                 │ Yes      │ 2       │
├────┼────────────────────────────┼────────────────┼──────────┼───────────┼────────────┼────────────────────┼────────────────┼──────────────────────────┼──────────┼─────────┼
│ 3  │ DOD                        │ BIO_DATE       │ FMES     │ No        │ Yes        │ PARSE_DOB          │                │ DOB_COMP                 │ Yes      │ 2       │
├────┼────────────────────────────┼────────────────┼──────────┼───────────┼────────────┼────────────────────┼────────────────┼──────────────────────────┼──────────┼─────────┼
│ 4  │ GENDER                     │ BIO_FEATURE    │ FVME     │ No        │ No         │                    │                │ EXACT_DOMAIN_COMP        │ Denial   │ 1       │
├────┼────────────────────────────┼────────────────┼──────────┼───────────┼────────────┼────────────────────┼────────────────┼──────────────────────────┼──────────┼─────────┼
│ 5  │ ADDRESS                    │ POSTAL_ADDRESS │ FF       │ No        │ No         │ PARSE_ADDR         │ ADDR_HASHER    │ ADDR_COMP                │ Yes      │ 4       │
├────┼────────────────────────────┼────────────────┼──────────┼───────────┼────────────┼────────────────────┼────────────────┼──────────────────────────┼──────────┼─────────┼
│ 6  │ PHONE                      │ PHONE          │ FF       │ No        │ No         │ PARSE_PHONE        │ PHONE_HASHER   │ PHONE_COMP               │ Yes      │ 1       │
├────┼────────────────────────────┼────────────────┼──────────┼───────────┼────────────┼────────────────────┼────────────────┼──────────────────────────┼──────────┼─────────┼
listAttributes
┌──────┬──────────────────────────────┬──────────────┬────────────────────────┬─────────────────┬──────────┬──────────┬──────────┐
│ id   │ attribute                    │ class        │ feature                │ element         │ required │ default  │ internal │
├──────┼──────────────────────────────┼──────────────┼────────────────────────┼─────────────────┼──────────┼──────────┼──────────┤
│ 1001 │ DATA_SOURCE                  │ OBSERVATION  │ None                   │ None            │ Yes      │ None     │ No       │
├──────┼──────────────────────────────┼──────────────┼────────────────────────┼─────────────────┼──────────┼──────────┼──────────┤
│ 1003 │ RECORD_ID                    │ OBSERVATION  │ None                   │ None            │ No       │ None     │ No       │
├──────┼──────────────────────────────┼──────────────┼────────────────────────┼─────────────────┼──────────┼──────────┼──────────┤
│ 1007 │ DSRC_ACTION                  │ OBSERVATION  │ None                   │ None            │ Yes      │ None     │ Yes      │
├──────┼──────────────────────────────┼──────────────┼────────────────────────┼─────────────────┼──────────┼──────────┼──────────┤
│ 1101 │ NAME_TYPE                    │ NAME         │ NAME                   │ USAGE_TYPE      │ No       │ None     │ No       │
├──────┼──────────────────────────────┼──────────────┼────────────────────────┼─────────────────┼──────────┼──────────┼──────────┤
listRules
┌─────┬───────────────────────────┬─────────┬────────┬──────────┬─────────────────────┬───────────────┬──────┐
│ id  │ rule                      │ resolve │ relate │ rtype_id │ fragment            │ disqualifier  │ tier │
├─────┼───────────────────────────┼─────────┼────────┼──────────┼─────────────────────┼───────────────┼──────┤
│ 100 │ SAME_A1                   │ Yes     │ No     │ 1        │ SAME_A1             │ None          │ 10   │
├─────┼───────────────────────────┼─────────┼────────┼──────────┼─────────────────────┼───────────────┼──────┤
│ 108 │ SF1_SNAME_CFF_CSTAB       │ Yes     │ No     │ 1        │ SF1_SNAME_CFF_CSTAB │ DIFF_EXCL     │ 18   │
├─────┼───────────────────────────┼─────────┼────────┼──────────┼─────────────────────┼───────────────┼──────┤
│ 110 │ SF1_PNAME_CFF_CSTAB       │ Yes     │ No     │ 1        │ SF1_PNAME_CFF_CSTAB │ DIFF_EXCL     │ 20   │
├─────┼───────────────────────────┼─────────┼────────┼──────────┼─────────────────────┼───────────────┼──────┤
│ 111 │ SF1_SNAME_CFF_CSTAB_DEXCL │ Yes     │ No     │ 1        │ SF1_SNAME_CFF_CSTAB │ DIFF_GEN_A1ES │ 20   │
├─────┼───────────────────────────┼─────────┼────────┼──────────┼─────────────────────┼───────────────┼──────┤
│ 112 │ SF1_PNAME_CFF_OLD         │ Yes     │ No     │ 1        │ DISABLE             │               │ 20   │
├─────┼───────────────────────────┼─────────┼────────┼──────────┼─────────────────────┼───────────────┼──────┤
│ 120 │ SF1_PNAME_CSTAB           │ Yes     │ No     │ 1        │ SF1_PNAME_CSTAB     │ DIFF_EXCL     │ 20   │
├─────┼───────────────────────────┼─────────┼────────┼──────────┼─────────────────────┼───────────────┼──────┤
listFragments

Rules are combinations of fragments like close_name or same_name

┌─────┬─────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────┬──────────────┐
│ id  │ fragment            │ source                                                                                                  │ depends      │
├─────┼─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ 11  │ SAME_NAME           │ ./FRAGMENT[./GNR_SAME_NAME>0]                                                                           │ 301          │
├─────┼─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ 12  │ CLOSE_NAME          │ ./FRAGMENT[./GNR_CLOSE_NAME>0]                                                                          │ 302          │
├─────┼─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ 13  │ LIKELY_NAME         │ ./FRAGMENT[./GNR_LIKELY_NAME>0]                                                                         │ 303          │
├─────┼─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ 14  │ PART_NAME           │ ./FRAGMENT[./GNR_PART_NAME>0]                                                                           │ 304          │
├─────┼─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ 15  │ SUR_NAME            │ ./FRAGMENT[./GNR_SUR_NAME>0]                                                                            │ 305          │
├─────┼─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ 16  │ DIFF_NAME           │ ./SUMMARY/BEHAVIOR/NAME[sum(./UNLIKELY | ./NO_CHANCE) > 0]                                              │ None         │
├─────┼─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ 17  │ NO_NAME             │ ./SUMMARY/BEHAVIOR/NAME[sum(./SAME | ./CLOSE | ./LIKELY | ./PLAUSIBLE | ./UNLIKELY | ./NO_CHANCE) = 0]  │ None         │
├─────┼─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ 18  │ ORG_NAME            │ ./FRAGMENT[./GNR_ORG_NAME>0]                                                                            │ 306          │
├─────┼─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
│ 19  │ PART_NO_NAME        │ ./FRAGMENT[./PART_NAME>0 or ./NO_NAME>0]                                                                │ 14,17        │
├─────┼─────────────────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────┼──────────────┤
principles

Before the principles are applied, the features and expressions created for an incoming record are used to find candidates. An example of an expression is name and DOB and there is an expression call on the feature “name” to automatically create it if both a name and DOB are present on the incoming record. Features and expressions used for candidates are also referred to as candidate builders or candidate keys.

  • listFeatures: to see what features are used for candidates
  • setFeature: to toggle whether or not a feature is used for candidates
  • listExpressionCalls: to see what expressions are currently being created
  • addToNamehash: to add an element from another feature to the list of composite name keys
  • addExpressionCall: to add a new expression call, aka candidate key
  • listGenericThresholds: to see when candidate keys will become generic and are no longer used to find candidates
  • setGenericThreshold: to change when features with certain behaviors become generic

Once the candidate matches have been found, scoring and rule evaluation takes place. Scores are rolled up by behavior. For instance, both addresses and phones have the behavior FF (Frequency Few). If they both score above their scoring functions close threshold, there would be two CLOSE_FFs (a fragment) which can be used in a rule such as NAME+CLOSE_FF.

Commands that help with configuring principles (rules) and scoring:

  • listRules: these are the principles that are applied top down
  • listFragments: rules are combinations of fragments like close_name or same_name
  • listFunctions: the comparison functions show you what is considered same, close, likely, etc.
  • setRule: to change whether an existing rule resolves or relates

sz_explorer

Adhoc entity commands:

  • search: search for entities by name and/or other attributes.
  • get: get an entity by entity ID or record_id.
  • compare: place two or more entities side by side for easier comparison.
  • how: get a step by step replay of how an entity came together.
  • why: see why entities or records either did or did not resolve.
  • tree: see a tree view of an entity’s relationships through 1 or 2 degrees.
  • export: export the json records for an entity for debugging or correcting and reloading.

Snapshot reports (requires a json file generated by sz_snapshot):

  • data_source_summary: shows how many duplicates were detected within each data source, as well as the possible matches and relationships that were derived. For example, how many duplicate customers there are, and are any of them related to each other.
  • cross_source_summary: shows how many matches were made across data sources. For example, how many employees are related to customers.
  • entity_source_summary: shows the number of entities by the set of data sources they can be found in. For example, how many entities are only in one data source, how many are only in these two data sources, etc.
  • entity_size_breakdown: shows how many entities of what size were created. For instance, some entities are singletons, some might have connected 2 records, some 3, etc. This report is primarily used to ensure there are no instances of over matching. For instance, it’s ok for an entity to have hundreds of records as long as there are not too many different names, addresses, identifiers, etc.
  • principles_used: shows what principles and match_keys are firing across all data sources. For example, how many name and address matches, how many address only, etc.

Audit report (requires a json file generated by sz_audit):

  • audit_summary: shows the precision, recall and F1 scores with the ability to browse the entities that were split or merged.

Other commands:

  • quick_look: show the number of records in the repository by data source without a snapshot.
  • load: load a snapshot or audit report json file.
  • score: show the scores of any two names, addresses, identifiers, or combination thereof.
  • set: various settings affecting how entities are displayed.
  • show_last_call: shows the last request and response message sent to the Senzing SDK
get
(szeda) get CUSTOMERS 1070

Entity summary for entity 51: Jie Wang
┼───────────┼────────────────────────────────────────┼─────────────────┼
│ Sources   │ Features                               │ Additional Data │
┼───────────┼────────────────────────────────────────┼─────────────────┼
│ CUSTOMERS │ NAME: Jie Wang (PRIMARY)               │ AMOUNT: 100     │
│ 1069      │ NAME: 王杰 (NATIVE)                    │ AMOUNT: 200     │
│ 1070      │ DOB: 9/14/93                           │ DATE: 1/26/18   │
│           │ GENDER: Male                           │ DATE: 1/27/18   │
│           │ GENDER: M                              │ STATUS: Active  │
│           │ ADDRESS: 12 Constitution Street (HOME) │                 │
│           │ NATIONAL_ID: 832721 Hong Kong          │                 │
│           │ NATIONAL_ID: 832721                    │                 │
│           │ RECORD_TYPE: PERSON                    │                 │
┼───────────┼────────────────────────────────────────┼─────────────────┼
│ REFERENCE │ NAME: Wang Jie (PRIMARY)               │ CATEGORY: Owner │
│ 2013      │ DOB: 1993-09-14                        │ STATUS: Current │
│           │ RECORD_TYPE: PERSON                    │                 │
│           │ REL_POINTER: 2011 (OWNS 60%)           │                 │
┼───────────┼────────────────────────────────────────┼─────────────────┼
└── Disclosed relation (1)
    └── --> OWNS 60% (1)
        └── 97: Hajah Mamunah Jln Pisang CUSTOMERS (1) | REFERENCE (1) +REL_POINTER(OWNS 60%:)
how (tree)
(szeda) how 51

How decision tree for 51: Jie Wang

V51-S2 final entity
├── Step 2: Add record to virtual entity on NAME+DOB  Principle 180: SNAME_SSTAB to create V51-S2
│   ┌──────────────┬─────────────────┬────────┬──────────────────────┐
│   │ VIRTUAL_ID   │ V100002         │ SCORES │ V51-S1               │
│   ├──────────────┼─────────────────┼────────┼──────────────────────┤
│   │ DATA_SOURCES │ REFERENCE: 2013 │        │ CUSTOMERS: 2 records │
│   │ NAME         │ 王杰            │  100   │ Wang Jie             │
│   │ DOB          │ 9/14/93         │  100   │ 1993-09-14           │
│   │ RECORD_TYPE  │ PERSON          │  100   │ PERSON               │
│   └──────────────┴─────────────────┴────────┴──────────────────────┘
│
└── Step 1: Create virtual entity on NAME+DOB+ADDRESS+NATIONAL_ID  Principle 131: CF1_PNAME_CFF_CSTAB to create V51-S1
    ┌──────────────┬────────────────────────┬────────┬────────────────────────┐
    │ VIRTUAL_ID   │ V51                    │ SCORES │ V54                    │
    ├──────────────┼────────────────────────┼────────┼────────────────────────┤
    │ DATA_SOURCES │ CUSTOMERS: 1070        │        │ CUSTOMERS: 1069        │
    │ NAME         │ Jie Wang               │   95   │ 王杰                   │
    │ DOB          │ 9/14/93                │  100   │ 9/14/93                │
    │ GENDER       │ Male                   │  100   │ M                      │
    │ ADDRESS      │ 12 Constitution Street │   85   │ 12 Constitution Street │
    │ NATIONAL_ID  │ 832721 Hong Kong       │   95   │ 832721                 │
    │ RECORD_TYPE  │ PERSON                 │  100   │ PERSON                 │
    └──────────────┴────────────────────────┴────────┴────────────────────────┘

Load Sample Data with sz-file-loader

First register data sources with sz_configtool:

sz_configtool
addDataSource CUSTOMERS
addDataSource REFERENCE
addDataSource WATCHLIST
save

Then hydrate them with senzing/sz-file-loader:

  1. docker run -it --rm -u $UID -v ${PWD}/data/senzing/data/:/data --env SENZING_ENGINE_CONFIGURATION_JSON senzing/sz-file-loader -f /data/customers.jsonl
  2. docker run -it --rm -u $UID -v ${PWD}/data/senzing/data/:/data --env SENZING_ENGINE_CONFIGURATION_JSON senzing/sz-file-loader -f /data/reference.jsonl
  3. docker run -it --rm -u $UID -v ${PWD}/data/senzing/data/:/data --env SENZING_ENGINE_CONFIGURATION_JSON senzing/sz-file-loader -f /data/watchlist.jsonl

Explore with sz_explorer.

Resources