Release 2024.01.00
Notable changes
RACI claim change
Additional default claim has been added to the DeduplicationReviewer and DeduplicationAdministrator roles. The management.duplicates claim is set to Informed and Accountable respectively. Please make sure you apply this claim to these roles and any relevant roles yourself.
.NET upgrade
Upgraded the CluedIn platform to .NET 6.
Any custom-built extensions for CluedIn should be upgraded to match CluedIn’s .NET 6 platform reference versions. In most cases, this would involve updating CluedIn NuGet package dependencies to version 4.0.0 and updating the target framework for the extension to .NET 6.
Neo4j upgrade
Upgraded Neo4j from 3.5.35 to 5.12.
This brings several overall improvements by Neo4j. See https://github.com/neo4j/neo4j/wiki/Neo4j-5-changelog#5120
MS SQL upgrade
Upgraded Microsoft SQL Server from 2017 to 2022.
RabbitMQ upgrade
Upgraded RabbitMQ from 3.10 to 3.12.
See documentation for details on how perform the upgrade.
New features
Profiling
Added profiling of vocabulary keys based on their strongly typed data types. Profiling charts show the usage of vocabulary keys and aggregated data, based on the the actual ‘strongly typed’ data type. Unmapped data types are profiled based on the text values. Profiling also shows system-wide information such as entity type usage and distribution.
Copilot
Copilot is an intelligent AI assistant integrated into the CluedIn product. It is designed to help users perform various tasks and streamline their workflow. With Copilot, users can create and manage rules, stream data, create and update datasets, and perform data quality checks.
Copilot provides a user-friendly interface that allows users to interact and communicate with the AI assistant through natural language commands. It can activate, clone, and deactivate rules, as well as provide suggestions for data set mapping and vocabulary key rules. Copilot can also generate data quality metrics and detect anomalies within the values of a vocabulary key.
In addition to rule management and data processing tasks, Copilot can perform entity search and provide information about entities and data sets. It can help users create clean, deduplication, and survivorship projects, and even provide explanations for deduplication groups within projects.
With its wide range of features and capabilities, Copilot aims to enhance the user experience and make working with CluedIn more efficient and productive. For information on how to enable Copilot in CluedIn, refer to Copilot Integration.
Entity type configurations from connectors
CluedIn has enhanced several enrichers, including PermId, KnowledgeGraph, DuckDuckGo, CompanyHouse, ClearBit, GoogleMaps, libpostal, VatLayer, and CVR. The key improvement is the ability to configure entity types and vocabulary keys directly within the CluedIn user interface. This update provides users with greater flexibility and customization options, allowing them to tailor these enrichers to their specific needs without restrictions.
Data removal from data source
Added ability to remove/undo processing of data from the data source. This enables easy experiments with data and mappings in the data source.
Any side effects from ingesting new data into the system will be undone as well (for example, merging based on entity codes added from the data source and enrichments based on data that will be removed).
Example
Initial state:
Data set 1 ingested:
Data set 2 ingested:
Scenario:
When data set 3 is ingested:
- Code overlap of A & B that will merge entity A & B
Undoing processing of data set 3 will result in A & B to be split apart into two separate entities again.
Delta clues
Clues normally represent a full record in a source system.
Added a new type of clue that is used to describe incremental modifications to an entity. This enables removal modifications to be applied to an entity.
Improvements on the entity page Relations tab around edge relationship modifications are built upon delta clues for tracking modifications made to an entity.
Strong typing
Strong typing brings the ability to understand what types of data (date, string, integer etc) we are using within the system.
Before strong typing, data was stored as strings. Strong typing gives the ability to store your data in a structured way, allowing CluedIn to query based on specific data types. This gives an improved workflow in areas such as the rules engine and search.
For example, if you ingested some data that was a date (a birthday), then it would not have been easy to retrieve all data that is between date x and y.
Strong typing is enabled by default, and you will have to explicitly opt out to disable this feature.
Important! Strong typing does not necessarily impact any of your existing data.
For existing vocabulary keys to be strongly typed, you will need to change the data type and storage on the vocabulary key itself. However, there is an exception to this rule. CluedIn’s core vocabularies are strongly typed by default. As such, any data using a core vocabulary will be strongly typed after you perform the upgrade.
| Core Vocabulary Name |
| Activity |
| Audio |
| Bank Account |
| Commit |
| Computer |
| Database |
| Date |
| Department |
| Discussion |
| Document |
| Event |
| File |
| Folder |
| Geocode |
| Group |
| HR Work Schedule |
| Image |
| Location |
| Lookup Data |
| Mail |
| Message |
| Network Address |
| Organization |
| Payment Card |
| Person |
| Phone Number |
| Phone Number Composite |
| Presentation |
| Product |
| Project |
| Repository |
| Sale |
| Skill |
| Social Links |
| Spreadsheet |
| Task |
| User Codes |
| User |
| Video |
Global Data Model
The Global Data Model feature offers a comprehensive visual representation of the relationships between entity types within your organization’s data ecosystem. This feature enables you to explore the connections and associations between different entity types, giving you a deeper understanding of your data structure and its interrelationships.
Cluster-wide locking
Added new mechanism for synchronization of execution of work inside the CluedIn application in multiple machine deployments. Processing context locks have been updated to utilize this new synchronization mechanism, which now works across multiple processing instances.
Note: This feature is only relevant to people who write providers or extensions for CluedIn.
Enhancements
Microservice improvements
Annotation, DataSource, and Submitter have undergone a complete rewrite to allow for improved performance, stability, and scaling.
Annotation, DataSource, and Submitter have turned into new services named DataSource, DataSourceProcessing, and DataSourceSubmitter, each of which is responsible for a specific part of the ingestion process.
Stream improvements
Streams have a new workflow providing more flexibility. Activating/deactivating has changed to start, pause, or stop.
- Starting: Exports all relevant data to the export target.
- Pausing: Allows messages to continue to accumulate in the stream queues but no data will be exported.
- Stopping: Stops messages from accumulating in the stream queues and no data will be exported.
You will see new prompts, warnings, and status providing more information as you perform actions on your streams.
Performance of the stream ingestion tables has been significantly improved, allowing CluedIn to be much smarter about the messages that get sent to the connectors.
Note: Improved performance of stream ingestion table is applicable only to streams created after this release or that get stopped and started.
Streams can now export edge properties as part of the export target configuration.
Streams now link to export target health checks and will make better decisions when exporting data. A new badge indicates when the stream is Exporting or Not Exporting data (given the export target has been configured and the stream is started). For more information about streams, refer to our documentation portal.
Export target health checks
Health checks have been added to export targets, allowing each processing pod to know the status of any active export target. This allows streams that utilize these export targets to act accordingly when an export target health check reports an unhealthy state.
Entity relations improvements
Refactored the graph using HTML components allowing better interactions with the graph. A context menu allows the addition and removal of edges. Edges can be edited and we can also add properties to the edges. Shadow nodes and temporal nodes are filtered out by default. Large graphs will group collections of similar edges together, which can be easily expanded. Edges of an entity can be expanded out to the ‘n’th level.
Entity history improvements
Improved history to provide filters based on origin (branch), allowing users to view all changes from a single source.
Added a collapsible menu listing all metadata and properties on the entity, allowing filtering of the history by one or more properties.
When filtering by properties, the top panel displays the actual value for the golden record. Changes show the author and source where possible with links through to the author or source. For more details, see History.
CluedIn.Controller
Features
- Support for automated upgrade tasks
CluedIn
Features
- Allow OpenAI to work with a single metadata property or vocabulary key
- Update platform to .NET 6
- Raise key operations as remote events for the following events
|Event Name|
|-|
| AddCleaningProjectEvent |
| AddDatasourceEvent |
| AddEntityType |
| AddRoleEvent |
| AddStreamEvent |
| AddTaskEvent |
| AddUserEvent |
| AddVocabularyEvent |
| DeleteVocabularyEvent |
| UpdateVocabularyEvent |
| AddVocabularyKeyEvent |
| DeleteVocabularyKeyEvent |
| RenameVocabularyKeyEvent |
| UpdateVocabularyKeyEvent |
| ApiTokenCreatedEvent |
| ApiTokenRevokedEvent |
| CommitCleaningProjectEvent |
| CreateRoleRequestTaskEvent |
| DisableEnricherEvent |
| ChangeExportTargetStateEvent |
| ChangeStreamStateEvent |
| EnableEnricherEvent |
| FailApprovalTaskEvent |
| GlossaryCategoryAddedEvent |
| GlossaryCategoryDeletedEvent |
| GlossaryCategoryUpdatedEvent |
| GlossaryTermAddedEvent |
| GlossaryTermDeletedEvent |
| GlossaryTermUpdatedEvent |
| ProcessGlobalMetrics |
| ProcessingJobStatusUpdatedEvent |
| RegisterExportTargetEvent |
| RejectApprovalTaskEvent |
| RejectRoleRequestTaskEvent |
| RemoveCleaningProjectEvent |
| RemoveStreamEvent |
| ReprocessStreamTargetEvent |
| TerminateStreamReingestionEvent |
| RuleCreatedEvent |
| RuleDeletedEvent |
| RuleStateEvent |
| RuleTriggeredEvent |
| RuleUpdatedEvent |
| UpdateExportTargetEvent |
| UserAddedToRoleEvent |
| UserRemovedFromRoleEvent|
- Removed legacy deduplication functionality
- Removed legacy training functionality
- Remove legacy GDPR functionality
- Removed Retention functionality
- Remove support for breach functionality
- Support for improved entity history
- Filter entity history by origin (branch), metadata property or vocabulary key
- Cancel a deduplication project whilst it is merging
- SSO users that are disabled in CluedIn are now blocked from logging in
- Improved the RoleRequest flow to include the RACI access level
- Support for Start, pause and stopping of a stream
- Support for global data model
- Vocabularies can now be filtered depending on if they have been used or not
- Find a user from their entity code
- Support for displaying additional settings from connectors on the Entity Type page
- Introduced system/cluster wide locking mechanism
- Delta clues/actions
- Metrics are now turned off by default
- Component health checks have been added for connectors
- Fetch all vocabulary keys used by a stream based on their filters
- Grafana charts have improved security
- Stream ingestion log performance has been greatly improved
- Approval item API, service and table to handle the approval workflow using PowerApps/PowerAutomate
- Improve hierarchy to support the loading of 100k entities
- Copilot support
- Allow editing of a page template name
- Improve
ElasticPotentialMatchesResolver query to use filter queries for required fields
- Made part of metric processing asynchronous
-
New configuration options for processing context locks
| Key |
Default Value |
Processing.ContextLocks.ClusterWideLocks.Enabled |
false |
Processing.ContextLocks.AcquireTimeoutSeconds |
|
-
Add option to acquire context lock on command ProcessingKey
| Key |
Default Value |
Processing.ContextLocks.LockOnProcessingKeyEnabled |
False |
- Flag invalid rules/filters for UI attention when a vocabulary keys storage or data type changes
- Additional RabbitMQ specific exception handling
- Minimize the size of structured logging in workflows
- Implemented rule and filter validation
- Ability to see large value changes in entity history in the UI that were previously omitted
- Invalid API routes or verbs now return a
404 status instead of redirecting to cluedin.com
- Removed
ElasticSearchUpdateEntityEvent
-
Added config flags to disable stream log checks on publishing data to connectors
| Key |
Default Value |
Streams.CheckStreamLogOnPublish |
true |
Streams.CheckStreamLogOnPublishDelete |
true |
Additional keys controlling stream log have been renamed
| Old Key |
New Key |
New Default Value |
Streams.PreventIngestionLogging |
Streams.Logging.LegacyLog.Enabled |
true |
Streams.UseLegacyIngestionLogging |
Streams.Logging.UseLegacyLog |
false |
Streams.UseUpsertForIngestionLogging |
Streams.Logging.UseUpsert |
false |
Fixes
- Merge fails due to overflow in Entity Part ID
- Dataparts with large ids are truncated when retrieving entity history
- Improved Redis reconnect / disconnect impact on overall system
- Missing null check in StreamPublishMappingCommandV2 resulting in invalid data for deletes in streams
- Creation of hierarchy with entities that have mix-cased entity codes
- Unhandled transient errors when connecting to RabbitMQ
- My tasks throws shows generic error
- Capture potential
OutOfMemoryException in EasyNetQ deserialization
- Workflow not sending failed workflows to the deadletter queue
- Create a deduplication project with duplicate entity types
- Users are not removed when an invitation is revoked
- Use correct key mappings in rule operators
- Topology action could be ignored if no split is performed
- Export target authentication details not updating
- Processing context locks did not work correctly with asynchronous code
- Merge dataparts were not processed during a merge under certain circumstances
- Merge would fail, if one or more of the entities did not have any entity codes beside origin entity code
- Some entities could not be included in a clean project under certain circumstances
- Rules and streams are not reporting vocabulary key usage correctly
- Metric save operations take place even when no changes were made
- Slow SQL queries on linux
EntityReference will not deserialize with a name containing §
- Refresh tokens are invalidated incorrectly
- Error thrown when accessing a deleted hierarchy project
- Unknown entity types throw an error when retrieving page templates
IsExternalData not being set correctly on golden record when a data part codes collection is empty
- Processing could get stuck on a shadow entity with multiple data parts
- Could not deduplicate an entity that has an empty name within a deduplication project
CluedIn.MicroService.Clean
CluedIn.MicroServices
Feature
- XLS files can now be recovered in case of a service disruption during parsing or loading
- Empty values used for Origin Entity Code and Entity codes will no longer be included in the clue
- Ability to reset a mapping
- Ability to batch ignore fields from a mapping
- Support for specifying types when mapping a field
- Support for UTF-16 encoded files and UTF-16LE (note: UTF-32 is not supported)
- Notifications for small files will trigger a single notification, while larger files will send two notifications—one when parsing is completed and another when loading is completed
- API to clear the quarantine entirely
- Support to store an image for the Data source
- Improved performance when parsing files and handling large endpoints
- Each record stored inside a data set will now have a field called “cluedincreated” to indicate the date of creation
Fix
- Data sets can now have a maximum of 499 properties, reduced from 500 due to the introduction of the “cluedincreated” property to enable profiling for Elastic Search.
- Added an environment variable to control the JSON parser’s size limit, allowing it to process larger payloads if required. The default limit is set to 10MB.
- Removed the limitations that keys must be in a certain format, which previously created a barrier to ingesting data using files. The only requirement now is that they have no more than 499 properties.
- By default, the new origin entity code will now use a HASH of the object instead of a generated GUID. This change will assist in merging identical data being sent to an endpoint.
CluedIn.UI.Documentation
CluedIn.UI.Gql
Features
- Removed legacy consent functionality
- Removed legacy data retention functionality
- Removed legacy data breach functionality
- Removed legacy deduplication project functionality
- Removed deprecated search feature flag
- Feature flag for mesh center
- Feature flag for potential duplicates
- Feature flag for legacy clean
- Support for expansion of records in the entity relations graph
- Support for displaying vocabulary key usage for annotation, saved search, clean, and deduplication
- Support for adding an edge between entities
- Support for the global data model endpoints
- Support filtering vocabularies and vocabulary keys based on if they are used within the system or not
- Support adding\removing properties to edges
- Support for adding a role as an owner for and entity type
- Support for retrieving additional configurations from connectors for the entity type page
- Support for stream improvements
- Support for export target health checks
- Support for setting the stale data strategy when committing a clean project
- Support for additional authentication for grafana
- Support for fetching all vocabulary keys used by a stream based on their filters
- Support for profiling
- Support for deleting edges on the entity relations graph and tree view
- Support for improvements in hierarchy
- Support for copilot
Fixes
- Entity types RACI setting is under the wrong area in the role management page
- Vocabulary keys were not being retrieved when
isVisble property was set to false
- Page template details retrieved too often
- Notification call could continue to poll when unauthorized
CluedIn.UI
Features
- Removed legacy deduplication project functionality
- Removed legacy data retention functionality
- Removed legacy consent functionality
- Removed legacy compliance functionality
- Removed legacy data breach functionality
- Removed legacy training functionality
- Removed legacy potential users functionality
- Feature flag for potential duplicates
- Feature flag for mesh center
- Feature flag for legacy clean
- Expansion of records in the entity relations graph to the nth level
- New
Data tab on a data set page to explore the processed records
- Improved duplicates check on the edit mapping page
- Display images instead of “base64” or URL on the data set preview tab
- Annotations, Saved Search, Clean, and Deduplication have been added to the vocabulary key usage tab
- Progress bar for publishing and accompanying notification added to the hierarchy builder
- Add/remove/edit edges to an entity on the entity relations tab
- Add/remove/edit properties on an edge on the entity relations tab
- Improved the way we display data on the stream data tab
- Test connection for existing export targets
- Start/pause/stop streaming functionality
- Global data model graph
- Vocabularies and vocabulary keys can be filtered based on if they are used within the system or not on the vocabulary list
- Can add a role as an owner for and entity type
- Improved the settings page layout
- Additional configurations from connectors can be displayed on the entity type page
- Entity types can be filtered based on if they are used within the system or not on the entity types list
- Import export target activation and de-activation flow
- Entity types can be filter by source on the entity types list
- Can now choose the stale data strategy when committing a clean project
- Automatically select vocabulary keys in the export target configuration for streams based on their filters
- Entity history can be filtered by data source or a manual data entry project name
- Highlight Clean Projects, Rules, Glossary Term, Streams or Saved searches that have invalid filters/condition/actions after a vocabulary keys data type has been changed
- Copilot feature
- Moved sort by dropdown for search to main bar
- Improved the handling of invalid authentication/refresh tokens
- Replace the tooltip in the golden record pages and properties tab
- Stream preview tab now supports paging and more control over what is displayed
- Stream export target configuration allows for exporting edge properties
- Allow the data source icon to be edited
- Exporting/Not Exporting statuses have been added to streams
- Export target health status displayed on the export target page
- Undo the processed data from the data source page
Fixes
- Correctly handle blank organization settings
- Edge relations are not created if source code setting is disabled during mapping
- Incorrect behavior when trying to create a vocabulary key on the add mapping page
- When activating a rule the active state will not update until you refreshed the page
- Duplicate text is displayed on the permissions tab when the global security filter is disabled
- Display name field on the add mapping panel allows just white space
- Long vocabulary keys on the quarantine table overflows the next column
- Long vocabulary keys on the edge properties panel overflow
- RACI claims have incorrect grouping
- Hierarchy builder expand/collapse button does not work
- Hierarchy is not using the correct theme
- Load Entities panel would display an error on the hierarchy builder when the hierarchy was empty
- Vocabulary keys were not being retrieved when
isVisble property was set to false
- Hierarchy editor should update after loading entities from th load entities panel
- When multiple claims have been requested for access only the first claim is displayed to the approver
- Removing vocabulary key that is used in Rule/Stream/Glossary fails
- Hierarchy errors when saving a node with a deleted subtree
- The golden record overview tab displays an empty page whilst retrieving the data
- Page template details retrieved too often
- Tooltips render behind side panels
- Hierarchy name is not validated on the client side
- When adding a vocabulary key through the data catalog you can enter a group name that is too long
- Load more button doesn’t work on the notification center side panel
Runtime-Environment
Features
- Moved the
EntityTypes claim under the Management section
- Dropped legacy deduplication tables
- New tables and columns to support the new data source micro services
- RequestedClaimLevel is stored on TasksRoleRequest table
- New EntityType.Source column with
Core value for core entity types, NULL value for non-core entity types
- New
JobId column in Hierarchy and HierarchyPreviousVersion tables
- New EntityType AdditionalProperties column
- New EntityType OwnedByRoleId column
- Upgrade Neo4j to version 5 and introduce upgrade and home containers
- Removed support for GDPR consent
- Drop training database
- Remove support for retention
- Support for stream export target health checks
- Removed DataBreach tables
- Tables for Copilot
- Increase ManualDataEntryFormFields.Description to nvarchar(max)
- Added an additional default claim to the
DeduplicationReviewer and DeduplicationAdministrator roles.
The Duplicates claim is set to Informed and Accountable respectively.
- Increase StreamMappings.SourceDataType column size to 350
Packages
For this release, kindly utilize the precise versions listed below for the following packages
Connectors
| Name |
Version |
| CluedIn.Connector.AzureDataLake |
4.0.0 |
| CluedIn.Connector.AzureDedicatedSqlPool |
4.0.0 |
| CluedIn.Connector.AzureEventHub |
4.0.0 |
| CluedIn.Connector.AzureServiceBus |
4.0.0 |
| CluedIn.Connector.Http |
4.0.0 |
| CluedIn.Connector.SqlServer |
4.0.0 |
| CluedIn.PowerApps |
4.0.1 |
| CluedIn.Connector.Dataverse |
4.0.1 |
Enrichers
| Name |
Version |
| CluedIn.ExternalSearch.Providers.DuckDuckGo.Provider |
4.0.0 |
| CluedIn.ExternalSearch.Providers.PermId.Provider |
4.0.0 |
| CluedIn.ExternalSearch.Providers.Web |
4.0.0 |
| CluedIn.Provider.ExternalSearch.Bregg |
4.0.0 |
| CluedIn.Provider.ExternalSearch.ClearBit |
4.0.0 |
| CluedIn.Provider.ExternalSearch.CompanyHouse |
4.0.0 |
| CluedIn.Provider.ExternalSearch.CVR |
4.0.0 |
| CluedIn.Provider.ExternalSearch.Gleif |
4.0.0 |
| CluedIn.Provider.ExternalSearch.GoogleMaps |
4.0.0 |
| CluedIn.Provider.ExternalSearch.KnowledgeGraph |
4.0.0 |
| CluedIn.Provider.ExternalSearch.Libpostal |
4.0.0 |
| CluedIn.Provider.ExternalSearch.OpenCorporates |
4.0.0 |
| CluedIn.Provider.ExternalSearch.Providers.VatLayer |
4.0.0 |
| CluedIn.Provider.MasterDataServices |
4.0.0 |
Crawlers
| Name |
Version |
| CluedIn.Crawling.MasterDataServices |
4.0.0 |
| CluedIn.Purview |
4.0.0 |
Other
| Name |
Version |
| CluedIn.Vocabularies.CommonDataModel |
4.0.1 |
| CluedIn.EventHub |
4.0.0 |
Clean
| Docker Image |
Tags |
| cluedin/cluedin-micro-clean |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77575 |
Controller
| Docker Image |
Tags |
| cluedin/controller |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77579 |
Docs
| Docker Image |
Tags |
| cluedin/cluedin-micro-documentation |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77578 |
Gql
| Docker Image |
Tags |
| cluedin/cluedin-ui-gql |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77576 |
Microservices
| Docker Image |
Tags |
| cluedin/data-source |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77574 |
| cluedin/data-source-processing |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77574 |
| cluedin/data-source-submitter |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77574 |
Runtime
| Docker Image |
Tags |
| cluedin/neo4j |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77580 |
| cluedin/openrefine |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77580 |
| cluedin/sqlserver-home |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77580 |
Server
| Docker Image |
Tags |
| cluedin/cluedin-server |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77585, 4.0.0_77585-alpine, 4.0.0-alpine, 4.0-alpine |
| cluedin/cluedin-server |
2024.01.00, 2024.01, 4.0.0_77585-ubuntu, 4.0.0-ubuntu, 4.0-ubuntu |
| cluedin/nuget-installer |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77585, 4.0.0_77585-alpine, 4.0.0-alpine, 4.0-alpine |
| cluedin/nuget-installer |
2024.01.00, 2024.01, 4.0.0_77585-ubuntu, 4.0.0-ubuntu, 4.0-ubuntu |
UI
| Docker Image |
Tags |
| cluedin/ui |
2024.01.00, 2024.01, 4.0, 4.0.0, 4.0.0_77577 |