Additional default claim has been added to the DeduplicationReviewer
and DeduplicationAdministrator
roles. The management.duplicates
claim is set to Informed and Accountable respectively. Please make sure you apply this claim to these roles and any relevant roles yourself.
Upgraded the CluedIn platform to .NET 6.
Any custom-built extensions for CluedIn should be upgraded to match CluedIn’s .NET 6 platform reference versions. In most cases, this would involve updating CluedIn NuGet package dependencies to version 4.0.0 and updating the target framework for the extension to .NET 6.
Upgraded Neo4j from 3.5.35 to 5.12.
This brings several overall improvements by Neo4j. See https://github.com/neo4j/neo4j/wiki/Neo4j-5-changelog#5120
Upgraded Microsoft SQL Server from 2017 to 2022.
Upgraded RabbitMQ from 3.10 to 3.12.
See documentation for details on how perform the upgrade.
Added profiling of vocabulary keys based on their strongly typed data types. Profiling charts show the usage of vocabulary keys and aggregated data, based on the the actual ‘strongly typed’ data type. Unmapped data types are profiled based on the text values. Profiling also shows system-wide information such as entity type usage and distribution.
Copilot is an intelligent AI assistant integrated into the CluedIn product. It is designed to help users perform various tasks and streamline their workflow. With Copilot, users can create and manage rules, stream data, create and update datasets, and perform data quality checks.
Copilot provides a user-friendly interface that allows users to interact and communicate with the AI assistant through natural language commands. It can activate, clone, and deactivate rules, as well as provide suggestions for data set mapping and vocabulary key rules. Copilot can also generate data quality metrics and detect anomalies within the values of a vocabulary key.
In addition to rule management and data processing tasks, Copilot can perform entity search and provide information about entities and data sets. It can help users create clean, deduplication, and survivorship projects, and even provide explanations for deduplication groups within projects.
With its wide range of features and capabilities, Copilot aims to enhance the user experience and make working with CluedIn more efficient and productive. For information on how to enable Copilot in CluedIn, refer to Copilot Integration.
CluedIn has enhanced several enrichers, including PermId, KnowledgeGraph, DuckDuckGo, CompanyHouse, ClearBit, GoogleMaps, libpostal, VatLayer, and CVR. The key improvement is the ability to configure entity types and vocabulary keys directly within the CluedIn user interface. This update provides users with greater flexibility and customization options, allowing them to tailor these enrichers to their specific needs without restrictions.
Added ability to remove/undo processing of data from the data source. This enables easy experiments with data and mappings in the data source.
Any side effects from ingesting new data into the system will be undone as well (for example, merging based on entity codes added from the data source and enrichments based on data that will be removed).
Example
Initial state:
Data set 1 ingested:
Data set 2 ingested:
Scenario:
When data set 3 is ingested:
Undoing processing of data set 3 will result in A & B to be split apart into two separate entities again.
Clues normally represent a full record in a source system.
Added a new type of clue that is used to describe incremental modifications to an entity. This enables removal modifications to be applied to an entity.
Improvements on the entity page Relations tab around edge relationship modifications are built upon delta clues for tracking modifications made to an entity.
Strong typing brings the ability to understand what types of data (date, string, integer etc) we are using within the system.
Before strong typing, data was stored as strings. Strong typing gives the ability to store your data in a structured way, allowing CluedIn to query based on specific data types. This gives an improved workflow in areas such as the rules engine and search.
For example, if you ingested some data that was a date (a birthday), then it would not have been easy to retrieve all data that is between date x and y.
Strong typing is enabled by default, and you will have to explicitly opt out to disable this feature.
Important! Strong typing does not necessarily impact any of your existing data.
For existing vocabulary keys to be strongly typed, you will need to change the data type and storage on the vocabulary key itself. However, there is an exception to this rule. CluedIn’s core vocabularies are strongly typed by default. As such, any data using a core vocabulary will be strongly typed after you perform the upgrade.
Core Vocabulary Name |
---|
Activity |
Audio |
Bank Account |
Commit |
Computer |
Database |
Date |
Department |
Discussion |
Document |
Event |
File |
Folder |
Geocode |
Group |
HR Work Schedule |
Image |
Location |
Lookup Data |
Message |
Network Address |
Organization |
Payment Card |
Person |
Phone Number |
Phone Number Composite |
Presentation |
Product |
Project |
Repository |
Sale |
Skill |
Social Links |
Spreadsheet |
Task |
User Codes |
User |
Video |
The Global Data Model feature offers a comprehensive visual representation of the relationships between entity types within your organization’s data ecosystem. This feature enables you to explore the connections and associations between different entity types, giving you a deeper understanding of your data structure and its interrelationships.
Added new mechanism for synchronization of execution of work inside the CluedIn application in multiple machine deployments. Processing context locks have been updated to utilize this new synchronization mechanism, which now works across multiple processing instances.
Note: This feature is only relevant to people who write providers or extensions for CluedIn.
Annotation, DataSource, and Submitter have undergone a complete rewrite to allow for improved performance, stability, and scaling.
Annotation, DataSource, and Submitter have turned into new services named DataSource, DataSourceProcessing, and DataSourceSubmitter, each of which is responsible for a specific part of the ingestion process.
Streams have a new workflow providing more flexibility. Activating/deactivating has changed to start, pause, or stop.
You will see new prompts, warnings, and status providing more information as you perform actions on your streams.
Performance of the stream ingestion tables has been significantly improved, allowing CluedIn to be much smarter about the messages that get sent to the connectors.
Note: Improved performance of stream ingestion table is applicable only to streams created after this release or that get stopped and started.
Streams can now export edge properties as part of the export target configuration.
Streams now link to export target health checks and will make better decisions when exporting data. A new badge indicates when the stream is Exporting or Not Exporting data (given the export target has been configured and the stream is started). For more information about streams, refer to our documentation portal.
Health checks have been added to export targets, allowing each processing pod to know the status of any active export target. This allows streams that utilize these export targets to act accordingly when an export target health check reports an unhealthy state.
Refactored the graph using HTML components allowing better interactions with the graph. A context menu allows the addition and removal of edges. Edges can be edited and we can also add properties to the edges. Shadow nodes and temporal nodes are filtered out by default. Large graphs will group collections of similar edges together, which can be easily expanded. Edges of an entity can be expanded out to the ‘n’th level.
Improved history to provide filters based on origin (branch), allowing users to view all changes from a single source.
Added a collapsible menu listing all metadata and properties on the entity, allowing filtering of the history by one or more properties.
When filtering by properties, the top panel displays the actual value for the golden record. Changes show the author and source where possible with links through to the author or source. For more details, see History.
ElasticPotentialMatchesResolver
query to use filter queries for required fieldsNew configuration options for processing context locks
Key | Default Value |
---|---|
Processing.ContextLocks.ClusterWideLocks.Enabled |
false |
Processing.ContextLocks.AcquireTimeoutSeconds |
Add option to acquire context lock on command ProcessingKey
Key | Default Value |
---|---|
Processing.ContextLocks.LockOnProcessingKeyEnabled |
False |
404
status instead of redirecting to cluedin.com
ElasticSearchUpdateEntityEvent
Added config flags to disable stream log checks on publishing data to connectors
Key | Default Value |
---|---|
Streams.CheckStreamLogOnPublish |
true |
Streams.CheckStreamLogOnPublishDelete |
true |
Additional keys controlling stream log have been renamed
Old Key | New Key | New Default Value |
---|---|---|
Streams.PreventIngestionLogging |
Streams.Logging.LegacyLog.Enabled |
true |
Streams.UseLegacyIngestionLogging |
Streams.Logging.UseLegacyLog |
false |
Streams.UseUpsertForIngestionLogging |
Streams.Logging.UseUpsert |
false |
OutOfMemoryException
in EasyNetQ deserializationEntityReference
will not deserialize with a name containing §
IsExternalData
not being set correctly on golden record when a data part codes collection is emptyisVisble
property was set to false
Data
tab on a data set page to explore the processed recordsisVisble
property was set to false
EntityTypes
claim under the Management sectionCore
value for core entity types, NULL
value for non-core entity typesJobId
column in Hierarchy
and HierarchyPreviousVersion
tablesDeduplicationReviewer
and DeduplicationAdministrator
roles.
The Duplicates claim is set to Informed and Accountable respectively.For this release, kindly utilize the precise versions listed below for the following packages
Name | Version |
---|---|
CluedIn.Connector.AzureDataLake | 4.0.0 |
CluedIn.Connector.AzureDedicatedSqlPool | 4.0.0 |
CluedIn.Connector.AzureEventHub | 4.0.0 |
CluedIn.Connector.AzureServiceBus | 4.0.0 |
CluedIn.Connector.Http | 4.0.0 |
CluedIn.Connector.SqlServer | 4.0.0 |
CluedIn.PowerApps | 4.0.1 |
CluedIn.Connector.Dataverse | 4.0.1 |
Name | Version |
---|---|
CluedIn.ExternalSearch.Providers.DuckDuckGo.Provider | 4.0.0 |
CluedIn.ExternalSearch.Providers.PermId.Provider | 4.0.0 |
CluedIn.ExternalSearch.Providers.Web | 4.0.0 |
CluedIn.Provider.ExternalSearch.Bregg | 4.0.0 |
CluedIn.Provider.ExternalSearch.ClearBit | 4.0.0 |
CluedIn.Provider.ExternalSearch.CompanyHouse | 4.0.0 |
CluedIn.Provider.ExternalSearch.CVR | 4.0.0 |
CluedIn.Provider.ExternalSearch.Gleif | 4.0.0 |
CluedIn.Provider.ExternalSearch.GoogleMaps | 4.0.0 |
CluedIn.Provider.ExternalSearch.KnowledgeGraph | 4.0.0 |
CluedIn.Provider.ExternalSearch.Libpostal | 4.0.0 |
CluedIn.Provider.ExternalSearch.OpenCorporates | 4.0.0 |
CluedIn.Provider.ExternalSearch.Providers.VatLayer | 4.0.0 |
CluedIn.Provider.MasterDataServices | 4.0.0 |
Name | Version |
---|---|
CluedIn.Crawling.MasterDataServices | 4.0.0 |
CluedIn.Purview | 4.0.0 |
Name | Version |
---|---|
CluedIn.Vocabularies.CommonDataModel | 4.0.1 |
CluedIn.EventHub | 4.0.0 |
Docker Image | Tags |
---|---|
cluedin/cluedin-micro-clean | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77575 |
Docker Image | Tags |
---|---|
cluedin/controller | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77579 |
Docker Image | Tags |
---|---|
cluedin/cluedin-micro-documentation | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77578 |
Docker Image | Tags |
---|---|
cluedin/cluedin-ui-gql | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77576 |
Docker Image | Tags |
---|---|
cluedin/data-source | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77574 |
cluedin/data-source-processing | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77574 |
cluedin/data-source-submitter | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77574 |
Docker Image | Tags |
---|---|
cluedin/neo4j | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77580 |
cluedin/openrefine | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77580 |
cluedin/sqlserver-home | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77580 |
Docker Image | Tags |
---|---|
cluedin/cluedin-server | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77585 , 4.0.0_77585-alpine , 4.0.0-alpine , 4.0-alpine |
cluedin/cluedin-server | 2024.01.00 , 2024.01 , 4.0.0_77585-ubuntu , 4.0.0-ubuntu , 4.0-ubuntu |
cluedin/nuget-installer | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77585 , 4.0.0_77585-alpine , 4.0.0-alpine , 4.0-alpine |
cluedin/nuget-installer | 2024.01.00 , 2024.01 , 4.0.0_77585-ubuntu , 4.0.0-ubuntu , 4.0-ubuntu |
Docker Image | Tags |
---|---|
cluedin/ui | 2024.01.00 , 2024.01 , 4.0 , 4.0.0 , 4.0.0_77577 |