A completely new execution planner and rule executer have been introduced to optimize different kinds of rule configurations. The execution planner dynamically re-orders rule execution according to the most optimal order for the given dataset. As a result, the generation of deduplication projects has been greatly improved in many common scenarios.
The following rule conditions have been improved:
Note: If the number of unique values for fuzzy comparison exceeds 10K, the rule planner will fall back to the old execution engine.
Rules with multiple conditions: Evaluation of multiple Equals conditions with no normalization options set has greatly improved.
Benchmark with a dataset of 1 million entities:
A thorough review of the data flow has been conducted to enhance performance and resilience when pushing data to CluedIn. The system now primarily operates based on the number of data source processing pods it possesses. With the new changes, an instance running with only one data source processing pod can still receive millions of records without exceeding the limit set by the Kubernetes cluster. The pod will only take what it can, when it can. As demonstrated in the benchmark test results, the performance with only one data source processing pod is already quite significant, capable of ingesting a million records in about 10 minutes. To further improve performance, you can scale the data source processing to two or three replicas. There is no need to increase the number of submitters, as two pods are already sufficient to process millions of records.
Benchmark Scenario:
The scenario was executed for each payload size.
# | Test name | Duration | Submission Duration |
---|---|---|---|
1. | Simple data - 10 | 55 ms | 256 ms |
2. | Simple data - 100 | 109 ms | 538 ms |
3. | Simple data - 1000 | 533 ms | 2 719 ms |
4. | Simple data - 10000 | 521 ms | 5 155 ms |
5. | Simple data - 1000000 | 373 403ms | 652 504 ms |
Duration: The amount of time it took for the system to receive all the required payloads, by batches of 1000 records. Submission Duration: The amount of time between the start of the test and the time when all records were sent to processing.
NOTE: For large dataset, we advise you to use a 10K payload size per request.
The benchmark was executed against an Essential CluedIn instance with:
Throughout the system, we have overhauled and unified the way you interact with mapped vocabulary keys. You will see more information when you have used a mapped vocabulary key in the system and you will have easy access to link back to the vocabulary keys that are impacted. This provides you with a better understanding of where data ends up in your system and which keys you might want to prioritize.
Alongside this, we have improved the remapping process, and more checks are performed in the system to ensure that nothing will be broken when a vocabulary key is remapped. We display the impact, such as how many streams, saved searches, glossaries and clean projects will be affected by remapping a vocabulary key. If there are any compatibility issues, such as incompatible data types, we automatically disable any affected areas and inform the user of the action, allowing you to go and select the correct settings and re-enable the affected areas.
Clean now has a more cohesive flow with CluedIn. Instead of being taken to a separate tab to work on your clean projects everything is now inline and gives a much more streamlined experience when working with clean projects.
Users that have and SSO account can now change their local password. It is a small but important change that will help in areas such as token use or whilst using the excel plugin.
EqualsOperator
to be used with boolean properties and vocabulary keys in the query builderresult
term to matches
for the deduplication tabmerged entities
term to merges
for the deduplication tabSign in to your team
page to reference organization
instead of team
More
button menu on an entityhashCheck
to support the detection of duplicate items submitted to a data setdataSets
table to identify if a dedicated table should be used for loggingFor this release, kindly utilize the precise versions listed below for the following packages
Name | Version |
---|---|
CluedIn.Connector.AzureDataLake | 4.0.0 |
CluedIn.Connector.AzureDedicatedSqlPool | 4.0.0 |
CluedIn.Connector.AzureEventHub | 4.0.0 |
CluedIn.Connector.AzureServiceBus | 4.0.0 |
CluedIn.Connector.Http | 4.0.0 |
CluedIn.Connector.SqlServer | 4.0.0 |
CluedIn.PowerApps | 4.0.1 |
CluedIn.Connector.Dataverse | 4.0.1 |
Name | Version |
---|---|
CluedIn.ExternalSearch.Providers.DuckDuckGo.Provider | 4.0.0 |
CluedIn.ExternalSearch.Providers.PermId.Provider | 4.0.0 |
CluedIn.ExternalSearch.Providers.Web | 4.0.0 |
CluedIn.Provider.ExternalSearch.Bregg | 4.0.0 |
CluedIn.Provider.ExternalSearch.ClearBit | 4.0.0 |
CluedIn.Provider.ExternalSearch.CompanyHouse | 4.0.0 |
CluedIn.Provider.ExternalSearch.CVR | 4.0.0 |
CluedIn.Provider.ExternalSearch.Gleif | 4.0.0 |
CluedIn.Provider.ExternalSearch.GoogleMaps | 4.0.0 |
CluedIn.Provider.ExternalSearch.KnowledgeGraph | 4.0.0 |
CluedIn.Provider.ExternalSearch.Libpostal | 4.0.0 |
CluedIn.Provider.ExternalSearch.OpenCorporates | 4.0.0 |
CluedIn.Provider.ExternalSearch.Providers.VatLayer | 4.0.0 |
CluedIn.Provider.MasterDataServices | 4.0.0 |
Name | Version |
---|---|
CluedIn.Crawling.MasterDataServices | 4.0.0 |
CluedIn.Purview | 4.2.0 |
Name | Version |
---|---|
CluedIn.Vocabularies.CommonDataModel | 4.0.1 |
CluedIn.EventHub | 4.2.0 |
Docker Image | Tags |
---|---|
cluedin/cluedin-micro-clean | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83761 |
Docker Image | Tags |
---|---|
cluedin/controller | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83740 |
Docker Image | Tags |
---|---|
cluedin/cluedin-micro-documentation | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83739 |
Docker Image | Tags |
---|---|
cluedin/cluedin-ui-gql | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83738 |
Docker Image | Tags |
---|---|
cluedin/data-source | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83773 |
cluedin/data-source-processing | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83773 |
cluedin/data-source | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83773 |
cluedin/data-source-processing | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83773 |
cluedin/data-source-submitter | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83773 |
cluedin/data-source | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83773 |
Docker Image | Tags |
---|---|
cluedin/neo4j | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83741 |
cluedin/openrefine | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83741 |
Docker Image | Tags |
---|---|
cluedin/cluedin-server | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83744 , 4.2.0_83744-alpine , 4.2.0-alpine , 4.2-alpine |
cluedin/cluedin-server | 2024.04.00 , 2024.04 , 4.2.0_83744-ubuntu , 4.2.0-ubuntu , 4.2-ubuntu |
cluedin/nuget-installer | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83744 , 4.2.0_83744-alpine , 4.2.0-alpine , 4.2-alpine |
cluedin/nuget-installer | 2024.04.00 , 2024.04 , 4.2.0_83744-ubuntu , 4.2.0-ubuntu , 4.2-ubuntu |
Docker Image | Tags |
---|---|
cluedin/ui | 2024.04.00 , 2024.04 , 4.2 , 4.2.0 , 4.2.0_83737 |