Releases

Release 2024.01.00

Notable changes

RACI claim change

Additional default claim has been added to the DeduplicationReviewer and DeduplicationAdministrator roles. The management.duplicates claim is set to Informed and Accountable respectively. Please make sure you apply this claim to these roles and any relevant roles yourself.

Platform upgrades

.NET upgrade

Upgraded the CluedIn platform to .NET 6.

Any custom-built extensions for CluedIn should be upgraded to match CluedIn’s .NET 6 platform reference versions. In most cases, this would involve updating CluedIn NuGet package dependencies to version 4.0.0 and updating the target framework for the extension to .NET 6.

Neo4j upgrade

Upgraded Neo4j from 3.5.35 to 5.12.

This brings several overall improvements by Neo4j. See https://github.com/neo4j/neo4j/wiki/Neo4j-5-changelog#5120

MS SQL upgrade

Upgraded Microsoft SQL Server from 2017 to 2022.

RabbitMQ upgrade

Upgraded RabbitMQ from 3.10 to 3.12.

See documentation for details on how perform the upgrade.

New features

Profiling

Added profiling of vocabulary keys based on their strongly typed data types. Profiling charts show the usage of vocabulary keys and aggregated data, based on the the actual ‘strongly typed’ data type. Unmapped data types are profiled based on the text values. Profiling also shows system-wide information such as entity type usage and distribution.

Copilot

Copilot is an intelligent AI assistant integrated into the CluedIn product. It is designed to help users perform various tasks and streamline their workflow. With Copilot, users can create and manage rules, stream data, create and update datasets, and perform data quality checks.

Copilot provides a user-friendly interface that allows users to interact and communicate with the AI assistant through natural language commands. It can activate, clone, and deactivate rules, as well as provide suggestions for data set mapping and vocabulary key rules. Copilot can also generate data quality metrics and detect anomalies within the values of a vocabulary key.

In addition to rule management and data processing tasks, Copilot can perform entity search and provide information about entities and data sets. It can help users create clean, deduplication, and survivorship projects, and even provide explanations for deduplication groups within projects.

With its wide range of features and capabilities, Copilot aims to enhance the user experience and make working with CluedIn more efficient and productive. For information on how to enable Copilot in CluedIn, refer to Copilot Integration.

Entity type configurations from connectors

CluedIn has enhanced several enrichers, including PermId, KnowledgeGraph, DuckDuckGo, CompanyHouse, ClearBit, GoogleMaps, libpostal, VatLayer, and CVR. The key improvement is the ability to configure entity types and vocabulary keys directly within the CluedIn user interface. This update provides users with greater flexibility and customization options, allowing them to tailor these enrichers to their specific needs without restrictions.

Data removal from data source

Added ability to remove/undo processing of data from the data source. This enables easy experiments with data and mappings in the data source.

Any side effects from ingesting new data into the system will be undone as well (for example, merging based on entity codes added from the data source and enrichments based on data that will be removed).

Example

Initial state:

Data set 1 ingested:

Entity A

Data set 2 ingested:

Entity B

Scenario:

When data set 3 is ingested:

Code overlap of A & B that will merge entity A & B

Undoing processing of data set 3 will result in A & B to be split apart into two separate entities again.

Delta clues

Clues normally represent a full record in a source system.

Added a new type of clue that is used to describe incremental modifications to an entity. This enables removal modifications to be applied to an entity.

Improvements on the entity page Relations tab around edge relationship modifications are built upon delta clues for tracking modifications made to an entity.

Strong typing

Strong typing brings the ability to understand what types of data (date, string, integer etc) we are using within the system.

Before strong typing, data was stored as strings. Strong typing gives the ability to store your data in a structured way, allowing CluedIn to query based on specific data types. This gives an improved workflow in areas such as the rules engine and search.

For example, if you ingested some data that was a date (a birthday), then it would not have been easy to retrieve all data that is between date x and y.

Strong typing is enabled by default, and you will have to explicitly opt out to disable this feature.

Important! Strong typing does not necessarily impact any of your existing data.

For existing vocabulary keys to be strongly typed, you will need to change the data type and storage on the vocabulary key itself. However, there is an exception to this rule. CluedIn’s core vocabularies are strongly typed by default. As such, any data using a core vocabulary will be strongly typed after you perform the upgrade.

Core Vocabulary Name
Activity
Audio
Bank Account
Commit
Computer
Database
Date
Department
Discussion
Document
Event
File
Folder
Geocode
Group
HR Work Schedule
Image
Location
Lookup Data
Mail
Message
Network Address
Organization
Payment Card
Person
Phone Number
Phone Number Composite
Presentation
Product
Project
Repository
Sale
Skill
Social Links
Spreadsheet
Task
User Codes
User
Video

Global Data Model

The Global Data Model feature offers a comprehensive visual representation of the relationships between entity types within your organization’s data ecosystem. This feature enables you to explore the connections and associations between different entity types, giving you a deeper understanding of your data structure and its interrelationships.

Cluster-wide locking

Added new mechanism for synchronization of execution of work inside the CluedIn application in multiple machine deployments. Processing context locks have been updated to utilize this new synchronization mechanism, which now works across multiple processing instances.

Note: This feature is only relevant to people who write providers or extensions for CluedIn.

Enhancements

Microservice improvements

Annotation, DataSource, and Submitter have undergone a complete rewrite to allow for improved performance, stability, and scaling.

Annotation, DataSource, and Submitter have turned into new services named DataSource, DataSourceProcessing, and DataSourceSubmitter, each of which is responsible for a specific part of the ingestion process.

Stream improvements

Streams have a new workflow providing more flexibility. Activating/deactivating has changed to start, pause, or stop.

Starting: Exports all relevant data to the export target.
Pausing: Allows messages to continue to accumulate in the stream queues but no data will be exported.
Stopping: Stops messages from accumulating in the stream queues and no data will be exported.

You will see new prompts, warnings, and status providing more information as you perform actions on your streams.

Performance of the stream ingestion tables has been significantly improved, allowing CluedIn to be much smarter about the messages that get sent to the connectors.

Note: Improved performance of stream ingestion table is applicable only to streams created after this release or that get stopped and started.

Streams can now export edge properties as part of the export target configuration.

Streams now link to export target health checks and will make better decisions when exporting data. A new badge indicates when the stream is Exporting or Not Exporting data (given the export target has been configured and the stream is started). For more information about streams, refer to our documentation portal.

Export target health checks

Health checks have been added to export targets, allowing each processing pod to know the status of any active export target. This allows streams that utilize these export targets to act accordingly when an export target health check reports an unhealthy state.

Entity relations improvements

Refactored the graph using HTML components allowing better interactions with the graph. A context menu allows the addition and removal of edges. Edges can be edited and we can also add properties to the edges. Shadow nodes and temporal nodes are filtered out by default. Large graphs will group collections of similar edges together, which can be easily expanded. Edges of an entity can be expanded out to the ‘n’th level.

Entity history improvements

Improved history to provide filters based on origin (branch), allowing users to view all changes from a single source.

Added a collapsible menu listing all metadata and properties on the entity, allowing filtering of the history by one or more properties.

When filtering by properties, the top panel displays the actual value for the golden record. Changes show the author and source where possible with links through to the author or source. For more details, see History.

CluedIn.Controller

Features

Support for automated upgrade tasks

CluedIn

Features

Allow OpenAI to work with a single metadata property or vocabulary key
Update platform to .NET 6
Raise key operations as remote events for the following events
|Event Name| |-| | AddCleaningProjectEvent | | AddDatasourceEvent | | AddEntityType | | AddRoleEvent | | AddStreamEvent | | AddTaskEvent | | AddUserEvent | | AddVocabularyEvent | | DeleteVocabularyEvent | | UpdateVocabularyEvent | | AddVocabularyKeyEvent | | DeleteVocabularyKeyEvent | | RenameVocabularyKeyEvent | | UpdateVocabularyKeyEvent | | ApiTokenCreatedEvent | | ApiTokenRevokedEvent | | CommitCleaningProjectEvent | | CreateRoleRequestTaskEvent | | DisableEnricherEvent | | ChangeExportTargetStateEvent | | ChangeStreamStateEvent | | EnableEnricherEvent | | FailApprovalTaskEvent | | GlossaryCategoryAddedEvent | | GlossaryCategoryDeletedEvent | | GlossaryCategoryUpdatedEvent | | GlossaryTermAddedEvent | | GlossaryTermDeletedEvent | | GlossaryTermUpdatedEvent | | ProcessGlobalMetrics | | ProcessingJobStatusUpdatedEvent | | RegisterExportTargetEvent | | RejectApprovalTaskEvent | | RejectRoleRequestTaskEvent | | RemoveCleaningProjectEvent | | RemoveStreamEvent | | ReprocessStreamTargetEvent | | TerminateStreamReingestionEvent | | RuleCreatedEvent | | RuleDeletedEvent | | RuleStateEvent | | RuleTriggeredEvent | | RuleUpdatedEvent | | UpdateExportTargetEvent | | UserAddedToRoleEvent | | UserRemovedFromRoleEvent|
Removed legacy deduplication functionality
Removed legacy training functionality
Remove legacy GDPR functionality
Removed Retention functionality
Remove support for breach functionality
Support for improved entity history
Filter entity history by origin (branch), metadata property or vocabulary key
Cancel a deduplication project whilst it is merging
SSO users that are disabled in CluedIn are now blocked from logging in
Improved the RoleRequest flow to include the RACI access level
Support for Start, pause and stopping of a stream
Support for global data model
Vocabularies can now be filtered depending on if they have been used or not
Find a user from their entity code
Support for displaying additional settings from connectors on the Entity Type page
Introduced system/cluster wide locking mechanism
Delta clues/actions
Metrics are now turned off by default
Component health checks have been added for connectors
Fetch all vocabulary keys used by a stream based on their filters
Grafana charts have improved security
Stream ingestion log performance has been greatly improved
Approval item API, service and table to handle the approval workflow using PowerApps/PowerAutomate
Improve hierarchy to support the loading of 100k entities
Copilot support
Allow editing of a page template name
Improve ElasticPotentialMatchesResolver query to use filter queries for required fields
Made part of metric processing asynchronous

New configuration options for processing context locks

Key	Default Value
`Processing.ContextLocks.ClusterWideLocks.Enabled`	`false`
`Processing.ContextLocks.AcquireTimeoutSeconds`

Add option to acquire context lock on command ProcessingKey

Key Default Value

Processing.ContextLocks.LockOnProcessingKeyEnabled False
Flag invalid rules/filters for UI attention when a vocabulary keys storage or data type changes
Additional RabbitMQ specific exception handling
Minimize the size of structured logging in workflows
Implemented rule and filter validation
Ability to see large value changes in entity history in the UI that were previously omitted
Invalid API routes or verbs now return a 404 status instead of redirecting to cluedin.com
Removed ElasticSearchUpdateEntityEvent

Key	Default Value
`Processing.ContextLocks.LockOnProcessingKeyEnabled`	`False`

Added config flags to disable stream log checks on publishing data to connectors

Key	Default Value
`Streams.CheckStreamLogOnPublish`	`true`
`Streams.CheckStreamLogOnPublishDelete`	`true`

Additional keys controlling stream log have been renamed

Old Key	New Key	New Default Value
`Streams.PreventIngestionLogging`	`Streams.Logging.LegacyLog.Enabled`	`true`
`Streams.UseLegacyIngestionLogging`	`Streams.Logging.UseLegacyLog`	`false`
`Streams.UseUpsertForIngestionLogging`	`Streams.Logging.UseUpsert`	`false`

Fixes

Merge fails due to overflow in Entity Part ID
Dataparts with large ids are truncated when retrieving entity history
Improved Redis reconnect / disconnect impact on overall system
Missing null check in StreamPublishMappingCommandV2 resulting in invalid data for deletes in streams
Creation of hierarchy with entities that have mix-cased entity codes
Unhandled transient errors when connecting to RabbitMQ
My tasks throws shows generic error
Capture potential OutOfMemoryException in EasyNetQ deserialization
Workflow not sending failed workflows to the deadletter queue
Create a deduplication project with duplicate entity types
Users are not removed when an invitation is revoked
Use correct key mappings in rule operators
Topology action could be ignored if no split is performed
Export target authentication details not updating
Processing context locks did not work correctly with asynchronous code
Merge dataparts were not processed during a merge under certain circumstances
Merge would fail, if one or more of the entities did not have any entity codes beside origin entity code
Some entities could not be included in a clean project under certain circumstances
Rules and streams are not reporting vocabulary key usage correctly
Metric save operations take place even when no changes were made
Slow SQL queries on linux
EntityReference will not deserialize with a name containing §
Refresh tokens are invalidated incorrectly
Error thrown when accessing a deleted hierarchy project
Unknown entity types throw an error when retrieving page templates
IsExternalData not being set correctly on golden record when a data part codes collection is empty
Processing could get stuck on a shadow entity with multiple data parts
Could not deduplicate an entity that has an empty name within a deduplication project

CluedIn.MicroService.Clean

CluedIn.MicroServices

Feature

XLS files can now be recovered in case of a service disruption during parsing or loading
Empty values used for Origin Entity Code and Entity codes will no longer be included in the clue
Ability to reset a mapping
Ability to batch ignore fields from a mapping
Support for specifying types when mapping a field
Support for UTF-16 encoded files and UTF-16LE (note: UTF-32 is not supported)
Notifications for small files will trigger a single notification, while larger files will send two notifications—one when parsing is completed and another when loading is completed
API to clear the quarantine entirely
Support to store an image for the Data source
Improved performance when parsing files and handling large endpoints
Each record stored inside a data set will now have a field called “cluedincreated” to indicate the date of creation

Fix

Data sets can now have a maximum of 499 properties, reduced from 500 due to the introduction of the “cluedincreated” property to enable profiling for Elastic Search.
Added an environment variable to control the JSON parser’s size limit, allowing it to process larger payloads if required. The default limit is set to 10MB.
Removed the limitations that keys must be in a certain format, which previously created a barrier to ingesting data using files. The only requirement now is that they have no more than 499 properties.
By default, the new origin entity code will now use a HASH of the object instead of a generated GUID. This change will assist in merging identical data being sent to an endpoint.

CluedIn.UI.Documentation

CluedIn.UI.Gql

Features

Removed legacy consent functionality
Removed legacy data retention functionality
Removed legacy data breach functionality
Removed legacy deduplication project functionality
Removed deprecated search feature flag
Feature flag for mesh center
Feature flag for potential duplicates
Feature flag for legacy clean
Support for expansion of records in the entity relations graph
Support for displaying vocabulary key usage for annotation, saved search, clean, and deduplication
Support for adding an edge between entities
Support for the global data model endpoints
Support filtering vocabularies and vocabulary keys based on if they are used within the system or not
Support adding\removing properties to edges
Support for adding a role as an owner for and entity type
Support for retrieving additional configurations from connectors for the entity type page
Support for stream improvements
Support for export target health checks
Support for setting the stale data strategy when committing a clean project
Support for additional authentication for grafana
Support for fetching all vocabulary keys used by a stream based on their filters
Support for profiling
Support for deleting edges on the entity relations graph and tree view
Support for improvements in hierarchy
Support for copilot

Fixes

Entity types RACI setting is under the wrong area in the role management page
Vocabulary keys were not being retrieved when isVisble property was set to false
Page template details retrieved too often
Notification call could continue to poll when unauthorized

CluedIn.UI

Features

Removed legacy deduplication project functionality
Removed legacy data retention functionality
Removed legacy consent functionality
Removed legacy compliance functionality
Removed legacy data breach functionality
Removed legacy training functionality
Removed legacy potential users functionality
Feature flag for potential duplicates
Feature flag for mesh center
Feature flag for legacy clean
Expansion of records in the entity relations graph to the nth level
New Data tab on a data set page to explore the processed records
Improved duplicates check on the edit mapping page
Display images instead of “base64” or URL on the data set preview tab
Annotations, Saved Search, Clean, and Deduplication have been added to the vocabulary key usage tab
Progress bar for publishing and accompanying notification added to the hierarchy builder
Add/remove/edit edges to an entity on the entity relations tab
Add/remove/edit properties on an edge on the entity relations tab
Improved the way we display data on the stream data tab
Test connection for existing export targets
Start/pause/stop streaming functionality
Global data model graph
Vocabularies and vocabulary keys can be filtered based on if they are used within the system or not on the vocabulary list
Can add a role as an owner for and entity type
Improved the settings page layout
Additional configurations from connectors can be displayed on the entity type page
Entity types can be filtered based on if they are used within the system or not on the entity types list
Import export target activation and de-activation flow
Entity types can be filter by source on the entity types list
Can now choose the stale data strategy when committing a clean project
Automatically select vocabulary keys in the export target configuration for streams based on their filters
Entity history can be filtered by data source or a manual data entry project name
Highlight Clean Projects, Rules, Glossary Term, Streams or Saved searches that have invalid filters/condition/actions after a vocabulary keys data type has been changed
Copilot feature
Moved sort by dropdown for search to main bar
Improved the handling of invalid authentication/refresh tokens
Replace the tooltip in the golden record pages and properties tab
Stream preview tab now supports paging and more control over what is displayed
Stream export target configuration allows for exporting edge properties
Allow the data source icon to be edited
Exporting/Not Exporting statuses have been added to streams
Export target health status displayed on the export target page
Undo the processed data from the data source page

Fixes

Correctly handle blank organization settings
Edge relations are not created if source code setting is disabled during mapping
Incorrect behavior when trying to create a vocabulary key on the add mapping page
When activating a rule the active state will not update until you refreshed the page
Duplicate text is displayed on the permissions tab when the global security filter is disabled
Display name field on the add mapping panel allows just white space
Long vocabulary keys on the quarantine table overflows the next column
Long vocabulary keys on the edge properties panel overflow
RACI claims have incorrect grouping
Hierarchy builder expand/collapse button does not work
Hierarchy is not using the correct theme
Load Entities panel would display an error on the hierarchy builder when the hierarchy was empty
Vocabulary keys were not being retrieved when isVisble property was set to false
Hierarchy editor should update after loading entities from th load entities panel
When multiple claims have been requested for access only the first claim is displayed to the approver
Removing vocabulary key that is used in Rule/Stream/Glossary fails
Hierarchy errors when saving a node with a deleted subtree
The golden record overview tab displays an empty page whilst retrieving the data
Page template details retrieved too often
Tooltips render behind side panels
Hierarchy name is not validated on the client side
When adding a vocabulary key through the data catalog you can enter a group name that is too long
Load more button doesn’t work on the notification center side panel

Runtime-Environment

Features

Moved the EntityTypes claim under the Management section
Dropped legacy deduplication tables
New tables and columns to support the new data source micro services
RequestedClaimLevel is stored on TasksRoleRequest table
New EntityType.Source column with Core value for core entity types, NULL value for non-core entity types
New JobId column in Hierarchy and HierarchyPreviousVersion tables
New EntityType AdditionalProperties column
New EntityType OwnedByRoleId column
Upgrade Neo4j to version 5 and introduce upgrade and home containers
Removed support for GDPR consent
Drop training database
Remove support for retention
Support for stream export target health checks
Removed DataBreach tables
Tables for Copilot
Increase ManualDataEntryFormFields.Description to nvarchar(max)
Added an additional default claim to the DeduplicationReviewer and DeduplicationAdministrator roles. The Duplicates claim is set to Informed and Accountable respectively.
Increase StreamMappings.SourceDataType column size to 350

Packages

For this release, kindly utilize the precise versions listed below for the following packages

Connectors

Name	Version
CluedIn.Connector.AzureDataLake	4.0.0
CluedIn.Connector.AzureDedicatedSqlPool	4.0.0
CluedIn.Connector.AzureEventHub	4.0.0
CluedIn.Connector.AzureServiceBus	4.0.0
CluedIn.Connector.Http	4.0.0
CluedIn.Connector.SqlServer	4.0.0
CluedIn.PowerApps	4.0.1
CluedIn.Connector.Dataverse	4.0.1

Enrichers

Name	Version
CluedIn.ExternalSearch.Providers.DuckDuckGo.Provider	4.0.0
CluedIn.ExternalSearch.Providers.PermId.Provider	4.0.0
CluedIn.ExternalSearch.Providers.Web	4.0.0
CluedIn.Provider.ExternalSearch.Bregg	4.0.0
CluedIn.Provider.ExternalSearch.ClearBit	4.0.0
CluedIn.Provider.ExternalSearch.CompanyHouse	4.0.0
CluedIn.Provider.ExternalSearch.CVR	4.0.0
CluedIn.Provider.ExternalSearch.Gleif	4.0.0
CluedIn.Provider.ExternalSearch.GoogleMaps	4.0.0
CluedIn.Provider.ExternalSearch.KnowledgeGraph	4.0.0
CluedIn.Provider.ExternalSearch.Libpostal	4.0.0
CluedIn.Provider.ExternalSearch.OpenCorporates	4.0.0
CluedIn.Provider.ExternalSearch.Providers.VatLayer	4.0.0
CluedIn.Provider.MasterDataServices	4.0.0

Crawlers

Name	Version
CluedIn.Crawling.MasterDataServices	4.0.0
CluedIn.Purview	4.0.0

Other

Name	Version
CluedIn.Vocabularies.CommonDataModel	4.0.1
CluedIn.EventHub	4.0.0

Docker Image	Tags
cluedin/data-source	`2024.01.00`, `2024.01`, `4.0`, `4.0.0`, `4.0.0_77574`
cluedin/data-source-processing	`2024.01.00`, `2024.01`, `4.0`, `4.0.0`, `4.0.0_77574`
cluedin/data-source-submitter	`2024.01.00`, `2024.01`, `4.0`, `4.0.0`, `4.0.0_77574`

Docker Image	Tags
cluedin/neo4j	`2024.01.00`, `2024.01`, `4.0`, `4.0.0`, `4.0.0_77580`
cluedin/openrefine	`2024.01.00`, `2024.01`, `4.0`, `4.0.0`, `4.0.0_77580`
cluedin/sqlserver-home	`2024.01.00`, `2024.01`, `4.0`, `4.0.0`, `4.0.0_77580`

Docker Image	Tags
cluedin/cluedin-server	`2024.01.00`, `2024.01`, `4.0`, `4.0.0`, `4.0.0_77585`, `4.0.0_77585-alpine`, `4.0.0-alpine`, `4.0-alpine`
cluedin/cluedin-server	`2024.01.00`, `2024.01`, `4.0.0_77585-ubuntu`, `4.0.0-ubuntu`, `4.0-ubuntu`
cluedin/nuget-installer	`2024.01.00`, `2024.01`, `4.0`, `4.0.0`, `4.0.0_77585`, `4.0.0_77585-alpine`, `4.0.0-alpine`, `4.0-alpine`
cluedin/nuget-installer	`2024.01.00`, `2024.01`, `4.0.0_77585-ubuntu`, `4.0.0-ubuntu`, `4.0-ubuntu`

Releases

Release 2024.01.00

Notable changes

RACI claim change

Platform upgrades

.NET upgrade

Neo4j upgrade

MS SQL upgrade

RabbitMQ upgrade

New features

Profiling

Copilot

Entity type configurations from connectors

Data removal from data source

Delta clues

Strong typing

Global Data Model

Cluster-wide locking

Enhancements

Microservice improvements

Stream improvements

Export target health checks

Entity relations improvements

Entity history improvements

CluedIn.Controller

Features

CluedIn

Features

Fixes

CluedIn.MicroService.Clean

CluedIn.MicroServices

Feature

Fix

CluedIn.UI.Documentation

CluedIn.UI.Gql

Features

Fixes

CluedIn.UI

Features

Fixes

Runtime-Environment

Features

Packages

Connectors

Enrichers

Crawlers

Other

Tags

Clean

Controller

Docs

Gql

Microservices

Runtime

Server

UI