Releases

Release 2024.01.00

Notable changes

RACI claim change

Additional default claim has been added to the DeduplicationReviewer and DeduplicationAdministrator roles. The management.duplicates claim is set to Informed and Accountable respectively. Please make sure you apply this claim to these roles and any relevant roles yourself.

Platform upgrades

.NET upgrade

Upgraded the CluedIn platform to .NET 6.

Any custom-built extensions for CluedIn should be upgraded to match CluedIn’s .NET 6 platform reference versions. In most cases, this would involve updating CluedIn NuGet package dependencies to version 4.0.0 and updating the target framework for the extension to .NET 6.

Neo4j upgrade

Upgraded Neo4j from 3.5.35 to 5.12.

This brings several overall improvements by Neo4j. See https://github.com/neo4j/neo4j/wiki/Neo4j-5-changelog#5120

MS SQL upgrade

Upgraded Microsoft SQL Server from 2017 to 2022.

RabbitMQ upgrade

Upgraded RabbitMQ from 3.10 to 3.12.

See documentation for details on how perform the upgrade.

New features

Profiling

Added profiling of vocabulary keys based on their strongly typed data types. Profiling charts show the usage of vocabulary keys and aggregated data, based on the the actual ‘strongly typed’ data type. Unmapped data types are profiled based on the text values. Profiling also shows system-wide information such as entity type usage and distribution.

Copilot

Copilot is an intelligent AI assistant integrated into the CluedIn product. It is designed to help users perform various tasks and streamline their workflow. With Copilot, users can create and manage rules, stream data, create and update datasets, and perform data quality checks.

Copilot provides a user-friendly interface that allows users to interact and communicate with the AI assistant through natural language commands. It can activate, clone, and deactivate rules, as well as provide suggestions for data set mapping and vocabulary key rules. Copilot can also generate data quality metrics and detect anomalies within the values of a vocabulary key.

In addition to rule management and data processing tasks, Copilot can perform entity search and provide information about entities and data sets. It can help users create clean, deduplication, and survivorship projects, and even provide explanations for deduplication groups within projects.

With its wide range of features and capabilities, Copilot aims to enhance the user experience and make working with CluedIn more efficient and productive. For information on how to enable Copilot in CluedIn, refer to Copilot Integration.

Entity type configurations from connectors

CluedIn has enhanced several enrichers, including PermId, KnowledgeGraph, DuckDuckGo, CompanyHouse, ClearBit, GoogleMaps, libpostal, VatLayer, and CVR. The key improvement is the ability to configure entity types and vocabulary keys directly within the CluedIn user interface. This update provides users with greater flexibility and customization options, allowing them to tailor these enrichers to their specific needs without restrictions.

Data removal from data source

Added ability to remove/undo processing of data from the data source. This enables easy experiments with data and mappings in the data source.

Any side effects from ingesting new data into the system will be undone as well (for example, merging based on entity codes added from the data source and enrichments based on data that will be removed).

Example

Initial state:

Data set 1 ingested:

Data set 2 ingested:

Scenario:

When data set 3 is ingested:

Undoing processing of data set 3 will result in A & B to be split apart into two separate entities again.

Delta clues

Clues normally represent a full record in a source system.

Added a new type of clue that is used to describe incremental modifications to an entity. This enables removal modifications to be applied to an entity.

Improvements on the entity page Relations tab around edge relationship modifications are built upon delta clues for tracking modifications made to an entity.

Strong typing

Strong typing brings the ability to understand what types of data (date, string, integer etc) we are using within the system.

Before strong typing, data was stored as strings. Strong typing gives the ability to store your data in a structured way, allowing CluedIn to query based on specific data types. This gives an improved workflow in areas such as the rules engine and search.

For example, if you ingested some data that was a date (a birthday), then it would not have been easy to retrieve all data that is between date x and y.

Strong typing is enabled by default, and you will have to explicitly opt out to disable this feature.

Important! Strong typing does not necessarily impact any of your existing data.

For existing vocabulary keys to be strongly typed, you will need to change the data type and storage on the vocabulary key itself. However, there is an exception to this rule. CluedIn’s core vocabularies are strongly typed by default. As such, any data using a core vocabulary will be strongly typed after you perform the upgrade.

Core Vocabulary Name
Activity
Audio
Bank Account
Commit
Computer
Database
Date
Department
Discussion
Document
Event
File
Folder
Geocode
Group
HR Work Schedule
Image
Location
Lookup Data
Mail
Message
Network Address
Organization
Payment Card
Person
Phone Number
Phone Number Composite
Presentation
Product
Project
Repository
Sale
Skill
Social Links
Spreadsheet
Task
User Codes
User
Video

Global Data Model

The Global Data Model feature offers a comprehensive visual representation of the relationships between entity types within your organization’s data ecosystem. This feature enables you to explore the connections and associations between different entity types, giving you a deeper understanding of your data structure and its interrelationships.

Cluster-wide locking

Added new mechanism for synchronization of execution of work inside the CluedIn application in multiple machine deployments. Processing context locks have been updated to utilize this new synchronization mechanism, which now works across multiple processing instances.

Note: This feature is only relevant to people who write providers or extensions for CluedIn.

Enhancements

Microservice improvements

Annotation, DataSource, and Submitter have undergone a complete rewrite to allow for improved performance, stability, and scaling.

Annotation, DataSource, and Submitter have turned into new services named DataSource, DataSourceProcessing, and DataSourceSubmitter, each of which is responsible for a specific part of the ingestion process.

Stream improvements

Streams have a new workflow providing more flexibility. Activating/deactivating has changed to start, pause, or stop.

You will see new prompts, warnings, and status providing more information as you perform actions on your streams.

Performance of the stream ingestion tables has been significantly improved, allowing CluedIn to be much smarter about the messages that get sent to the connectors.

Note: Improved performance of stream ingestion table is applicable only to streams created after this release or that get stopped and started.

Streams can now export edge properties as part of the export target configuration.

Streams now link to export target health checks and will make better decisions when exporting data. A new badge indicates when the stream is Exporting or Not Exporting data (given the export target has been configured and the stream is started). For more information about streams, refer to our documentation portal.

Export target health checks

Health checks have been added to export targets, allowing each processing pod to know the status of any active export target. This allows streams that utilize these export targets to act accordingly when an export target health check reports an unhealthy state.

Entity relations improvements

Refactored the graph using HTML components allowing better interactions with the graph. A context menu allows the addition and removal of edges. Edges can be edited and we can also add properties to the edges. Shadow nodes and temporal nodes are filtered out by default. Large graphs will group collections of similar edges together, which can be easily expanded. Edges of an entity can be expanded out to the ‘n’th level.

Entity history improvements

Improved history to provide filters based on origin (branch), allowing users to view all changes from a single source.

Added a collapsible menu listing all metadata and properties on the entity, allowing filtering of the history by one or more properties.

When filtering by properties, the top panel displays the actual value for the golden record. Changes show the author and source where possible with links through to the author or source. For more details, see History.

CluedIn.Controller

Features

CluedIn

Features

Fixes

CluedIn.MicroService.Clean

CluedIn.MicroServices

Feature

Fix

CluedIn.UI.Documentation

CluedIn.UI.Gql

Features

Fixes

CluedIn.UI

Features

Fixes

Runtime-Environment

Features

Packages

For this release, kindly utilize the precise versions listed below for the following packages

Connectors

Name Version
CluedIn.Connector.AzureDataLake 4.0.0
CluedIn.Connector.AzureDedicatedSqlPool 4.0.0
CluedIn.Connector.AzureEventHub 4.0.0
CluedIn.Connector.AzureServiceBus 4.0.0
CluedIn.Connector.Http 4.0.0
CluedIn.Connector.SqlServer 4.0.0
CluedIn.PowerApps 4.0.1
CluedIn.Connector.Dataverse 4.0.1

Enrichers

Name Version
CluedIn.ExternalSearch.Providers.DuckDuckGo.Provider 4.0.0
CluedIn.ExternalSearch.Providers.PermId.Provider 4.0.0
CluedIn.ExternalSearch.Providers.Web 4.0.0
CluedIn.Provider.ExternalSearch.Bregg 4.0.0
CluedIn.Provider.ExternalSearch.ClearBit 4.0.0
CluedIn.Provider.ExternalSearch.CompanyHouse 4.0.0
CluedIn.Provider.ExternalSearch.CVR 4.0.0
CluedIn.Provider.ExternalSearch.Gleif 4.0.0
CluedIn.Provider.ExternalSearch.GoogleMaps 4.0.0
CluedIn.Provider.ExternalSearch.KnowledgeGraph 4.0.0
CluedIn.Provider.ExternalSearch.Libpostal 4.0.0
CluedIn.Provider.ExternalSearch.OpenCorporates 4.0.0
CluedIn.Provider.ExternalSearch.Providers.VatLayer 4.0.0
CluedIn.Provider.MasterDataServices 4.0.0

Crawlers

Name Version
CluedIn.Crawling.MasterDataServices 4.0.0
CluedIn.Purview 4.0.0

Other

Name Version
CluedIn.Vocabularies.CommonDataModel 4.0.1
CluedIn.EventHub 4.0.0

Tags

Clean

Docker Image Tags
cluedin/cluedin-micro-clean 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77575

Controller

Docker Image Tags
cluedin/controller 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77579

Docs

Docker Image Tags
cluedin/cluedin-micro-documentation 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77578

Gql

Docker Image Tags
cluedin/cluedin-ui-gql 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77576

Microservices

Docker Image Tags
cluedin/data-source 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77574
cluedin/data-source-processing 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77574
cluedin/data-source-submitter 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77574

Runtime

Docker Image Tags
cluedin/neo4j 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77580
cluedin/openrefine 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77580
cluedin/sqlserver-home 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77580

Server

Docker Image Tags
cluedin/cluedin-server 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77585, 4.0.0_77585-alpine, 4.0.0-alpine, 4.0-alpine
cluedin/cluedin-server 2024.01.00, 2024.01, 4.0.0_77585-ubuntu, 4.0.0-ubuntu, 4.0-ubuntu
cluedin/nuget-installer 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77585, 4.0.0_77585-alpine, 4.0.0-alpine, 4.0-alpine
cluedin/nuget-installer 2024.01.00, 2024.01, 4.0.0_77585-ubuntu, 4.0.0-ubuntu, 4.0-ubuntu

UI

Docker Image Tags
cluedin/ui 2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77577