Before upgrading to version 3.3.0 you must apply the upgrade steps for each previous version of CluedIn between your currently installed version and 3.3.0, that has an upgrade process.
Previous upgrade scripts can be found at:
NOTE: As this is a major version upgrade (
3.3
) it requires downtime of the cluster while the upgrade is performed.
The goal of this guide is to show you how to migrate CluedIn data from an old installation to a new one.
There are significant improvements, fixes and new features in the new cluedin-platform
Helm chart and it is simpler to re-mount the old disks (PVCs) into a new installation.
cluedin-sql-admin
(or <releasename>-sql-admin
)values.yaml
.All upgrades are different. This example shows how to migrate a default installation. With each installations there are customizations that might need to be moved across, so it is up to the upgrader to ensure that any values.yaml
customisations are still valid with the new chart. You will probably fine that many things can be removed due to consistent defaults.
Things to consider:
Application Version - Version is now configured globally so where you might previously have had:
submitter:
image: cluedin/cluedin-micro-submitter-node:3.2.5
gql:
image: cluedin/cluedin-ui-gql:3.2.5
ui:
image: cluedin/ui:3.2.5
… these entries can be removed as version is controlled centrally via:
global:
image:
tag: "3.3.0"
values.yaml
were located in the root of the values.yaml patch file, for example:
cluedin:
roles:
main:
resources:
limits:
memory: "4Gi"
requests:
memory: "2Gi"
processing:
count: 0
crawling:
count: 0
… with the new chart you need to address them to the correct part of the chart (platform
/application
/infrastructure
) and nest them accordingly. So the previous example would become:
application:
cluedin:
roles:
main:
resources:
limits:
memory: "4Gi"
requests:
memory: "2Gi"
processing:
count: 0
crawling:
count: 0
Infrastructure modification such as resource sizing - Most components now use official charts so please check their individual projects for how to make changes. See cluedin-infrastructure/values.yaml
for hints.
Table of current versions:
Chart Name | Chart Version | App. Version | Source |
---|---|---|---|
elasticsearch | 7.14.0 | 7.8.0 | https://helm.elastic.co |
kube-prometheus-stack | 20.0.1 | 20 | https://prometheus-community.github.io/helm-charts |
neo4j | 4.2.8-1 | 3.5.30 | https://neo4j-contrib.github.io/neo4j-helm |
rabbitmq | 8.24.13 | 3.9.11 | https://charts.bitnami.com/bitnami |
redis | 15.6.10 | 6.2.6 | https://charts.bitnami.com/bitnami |
mssql-linux | 0.12 | 14.0.3401.7 (2017-CU25) | [Internal Fork] |
haproxy-ingress | 0.13.6 | 0.13.6 | https://haproxy-ingress.github.io/charts |
cert-manager | 1.7.1 | 1.7.1 | https://haproxy-ingress.github.io/charts |
You will need the latest versions of:
You will also need a kubeconfig
file that has access to the cluster you are upgrading.
You may also want to use a GUI tool such as Lens or k9s to view progress.
Be sure to take a full backup of all your data before beginning the upgrade process. Also, make sure to test/familiarise yourself the restore process before continuing.
A good tool (but out of scope for this guide) is Velero. Velero is an open source tool to safely back up and restore, perform disaster recovery, and migrate Kubernetes cluster resources and persistent volumes.
Velero consists of:
This can backup and restore the whole cluster and is simple to use. Of course, take in Azure Disk snapshots is also another way (Velero can help do this for you as well).
helm repo add cluedin https://cluedin-io.github.io/Charts/
helm repo update
The latest version contains fixes to support preserving PVC disks when the chart is uninstalled.
helm upgrade -n cluedin cluedin cluedin/cluedin --version 3.2.5-update.4 --reuse-values
Note: Here we use the --reuse-values
to use the previous releases values. Be sure this wont reset any manual changes that may have occured.
Validation Step: Check that all PVC resources, that belong to deployments, now have a helm.sh/resource-policy=keep
annotation. This ensures they will not be removed in the following steps. PVC belonging to StatefulSet resources are always ignore and will not be removed regardless.
kubectl get pvc -n cluedin -o=jsonpath='{.items[?(@.metadata.annotations.helm\.sh/resource-policy=="keep")].metadata.name}'
You can also protect PVs against accidental PVC deletion by patching the reclaim policy of the PV (not PVC) with a command such as:
kubectl get pv -n cluedin
kubectl patch pv -n cluedin -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' <PV name> <Another PV name>
This will shut down all processes gracefully and ensure any locks on files, particularly for the database pods, are released.
kubectl scale deployments -n cluedin --replicas=0 --all
kubectl scale statefulsets -n cluedin --replicas=0 --all
Validation Step: Check no cluedin pods are running.
kubectl get pods -n cluedin
Remove the previous cluedin application using:
helm uninstall -n cluedin cluedin
This will remove all Helm-managed resources that are part of the CluedIn Helm installation. Any resources not managed by Helm (for example, the docker registry secret which was likely created manually) will remain. Also, the PVC disks that contain the essential CluedIn data will remain.
Validation Step: All CluedIn resources should have been removed but all PVCs should have remained behind.
kubectl get pvc -n cluedin
cluedin-neo4j-data ...
cluedin-sqlserver-backup ...
cluedin-sqlserver-transact ...
cluedin-sqlserver-master ...
cluedin-sqlserver-data ...
data-cluedin-rabbitmq-0 ...
elasticsearch-master-elasticsearch-master-0 ...
cluedin-redis-data ...
cluedin-openrefine-data ...
Cert-Manager is now installed as part of the new cluedin-platform
chart. Any previous versions should be removed. Please take a note if you have made any configuration changes to cert-manager that may need to be re-applied to the new version.
helm uninstall cert-manager -n cluedin
Hint: If this doesnt match then you can search for the name of the release using: helm ls --all-namespaces
cluedin-platform
charthelm repo update
helm search repo cluedin-platform
NAME CHART VERSION APP VERSION DESCRIPTION
cluedin/cluedin-platform 1.0.0 3.3.0 Deploys all parts of the CluedIn platform
values-upgrade.yaml
The installation will take the form of two stages:
We will be using the new cluedin-platform
helm chart that contains both of these stages and has the ability to toggle these parts via the values.yaml
passed to the Helm chart.
The first part of this is to configure the infrastructure services and skip the application install (for now).
The need for as Nuget PAT token has been removed in this version as all packages are now available on CluedIn’s public nuget feed. If you have a secret called cluedin-server-nuget-secret
then please remove it before proceeding.
Note: If you still need to access develoment feeds or have private feeds already configured the just remove the extraSecrets
section of the configuration below and it will re-use the existing secret.
Example upgrade configuration:
This patch will re-mount the old PVCs into the new infrastucture objects.
(values-upgrade.yaml
)
global:
image:
tag: "3.3.0"
infrastructure:
enabled: true
haproxy-ingress:
enabled: false
elasticsearch:
enabled: true
persistence:
enabled: false
extraVolumes:
- name: data
persistentVolumeClaim:
claimName: "elasticsearch-master-elasticsearch-master-0"
extraVolumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
monitoring:
enabled: true
mssql:
enabled: true
sapassword: "yourStrong(!)Password" # <- Be sure to update this to your SA password before running!
persistence:
enabled: true
existingDataClaim: cluedin-sqlserver-data
existingTransactionLogClaim: cluedin-sqlserver-transact
existingBackupClaim: cluedin-sqlserver-backup
existingMasterClaim: cluedin-sqlserver-master
neo4j:
enabled: true
core:
persistentVolume:
enabled: false
mountPath: /olddata
additionalVolumes:
- name: upgradedata
persistentVolumeClaim:
claimName: cluedin-neo4j-data
additionalVolumeMounts:
- name: upgradedata
mountPath: "/data"
rabbitmq:
enabled: true
persistence:
existingClaim: "data-cluedin-rabbitmq-0"
redis:
enabled: true
master:
persistence:
existingClaim: "cluedin-redis-data"
seq:
enabled: false
application:
enabled: false
This is used as part of the new installation:
helm upgrade -i cluedin-platform -n cluedin cluedin/cluedin-platform --values values-upgrade.yaml
What this stage will do:
One the service are all green / ready then proceed to the next stage, being careful that SQL Server has finished its upgrade.
Validation Step:
In the cluedin-sqlserver
logs you will see various upgrade notifications..
In tools like Lens you can see PVCs remounted to the new pods (except OpenRefine) ..
Create a file called pre-install.sql
..
USE [DataStore.Db.ExternalSearch]
GO
DELETE FROM [dbo].[ExternalSearchQuery]
GO
Create a file called post-install.sql
..
USE [DataStore.Db.MicroServices]
GO
INSERT INTO [dbo].[datasetendpointreceipts](id,dataSetId,failed,total,retry,successful,updatedat,createdat)
SELECT
newid() as id
,id as dataSetId
,JSON_VALUE(stats, '$.failed') as failed
,JSON_VALUE(stats, '$.total') as total
,JSON_VALUE(stats, '$.retry') as retry
,JSON_VALUE(stats, '$.successful') as successful
,getdate() as updateat
,getdate() as createdat
FROM [dbo].[datasets]
GO
UPDATE [dbo].[DataSets] SET expectedTotal=JSON_VALUE(stats, '$.total')
GO
DECLARE @id uniqueidentifier
DECLARE @model NVARCHAR(max)
DECLARE @legacy CURSOR
DECLARE @updatedModel NVARCHAR(MAX);
SET @legacy = CURSOR FOR
SELECT [id], [model]
FROM [DataStore.Db.OpenCommunication].[dbo].[Rules]
OPEN @legacy
FETCH NEXT
FROM @legacy INTO @id, @model
WHILE @@FETCH_STATUS = 0
BEGIN
PRINT 'ID: ' + CONVERT(NVARCHAR(50), @id)
PRINT 'Current: ' + @model
SET @updatedModel = REPLACE( @model, 'CluedIn.Rules, Version=3.2.0.0', 'CluedIn.Rules, Version=3.3.0.0')
PRINT 'Updated: ' + @updatedModel
Update [DataStore.Db.OpenCommunication].[dbo].[Rules]
set Model = @updatedModel
where Id = @id
FETCH NEXT
FROM @legacy INTO @id, @model
END
CLOSE @legacy
DEALLOCATE @legacy
Create configmaps from the files ..
kubectl create configmap -n cluedin cluedin-init-sqlserver-upgrade-pre-install --from-file=pre-install.sql
kubectl create configmap -n cluedin cluedin-init-sqlserver-upgrade-post-install --from-file=post-install.sql
Install the latest CluedIn CRDs into the cluster.
kubectl apply -n cluedin -f https://cluedin-io.github.io/Charts/cluedin-crd/cluedin-crd.1.0.0.yaml
values-upgrade.yaml
Adding the links to the upgrade scripts and enabling the application installation.
global:
image:
tag: "3.3.0"
containerImages:
initSql:
scripts:
preInstall: "cluedin-init-sqlserver-upgrade-pre-install"
preInstallKey: "pre-install.sql"
postInstall: "cluedin-init-sqlserver-upgrade-post-install"
postInstallKey: "post-install.sql"
infrastructure:
enabled: true
elasticsearch:
enabled: true
persistence:
enabled: false
extraVolumes:
- name: data
persistentVolumeClaim:
claimName: "elasticsearch-master-elasticsearch-master-0"
extraVolumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
monitoring:
enabled: true
mssql:
enabled: true
sapassword: "yourStrong(!)Password"
persistence:
enabled: true
existingDataClaim: cluedin-sqlserver-data
existingTransactionLogClaim: cluedin-sqlserver-transact
existingBackupClaim: cluedin-sqlserver-backup
existingMasterClaim: cluedin-sqlserver-master
neo4j:
enabled: true
core:
persistentVolume:
enabled: false
mountPath: /olddata
additionalVolumes:
- name: upgradedata
persistentVolumeClaim:
claimName: cluedin-neo4j-data
additionalVolumeMounts:
- name: upgradedata
mountPath: "/data"
rabbitmq:
enabled: true
persistence:
existingClaim: "data-cluedin-rabbitmq-0"
redis:
enabled: true
master:
persistence:
existingClaim: "cluedin-redis-data"
seq:
enabled: false
application:
enabled: true
system:
runDatabaseJobsOnUpgrade: true
openrefine:
persistence:
existingClaim: "cluedin-openrefine-data"
Running the same command again ..
helm upgrade -i cluedin-platform -n cluedin cluedin/cluedin-platform --values values-upgrade.yaml
This will:
Note: This time we run with runDatabaseJobsOnUpgrade: true
- normally with an upgrade we dont run the database upgrade scripts (as this adds an extra overhead to upgrade time, especially if only small changes are being made). This flag forces the database upgrade scripts to run even on an upgrade. If you want to run further upgrades then it it worth setting this flag back to false
once the databases have been installed.
Post upgrade, if there is any routing related issues, upgrade the haproxy helm chart version in-place using:
helm upgrade haproxy-ingress -n <HAProxy Namespace> haproxy-ingress/haproxy-ingress
At this stage, your envionrment should be up and running again. You must now perform some additional steps that act upon the data in your installation.
As some of these steps may edit data - you may also wish to take another backup so that you can restore to this point in the process if required.
Enrichers have been updated to enable configuration at runtime. This requires changes to entries in the database and elastic indexes so that existing data can be mapped to new configurations.
To register new enricher configurations you must trigger an authenticated POST
request to api/enrichers/checkforupgrades
.
To make this easier, the following PowerShell script may be used:
param(
[Parameter(Mandatory)]
[string]$CluedIn,
[Parameter(Mandatory)]
[string]$Org,
[Parameter(Mandatory)]
[string]$Username,
[Parameter(Mandatory)]
[string]$Password,
[switch]$NoCluedInProxy
)
$ErrorActionPreference = 'Stop'
$accessToken = $null
$authEndpoint = if($NoCluedInProxy) { "${Cluedin}:9001" } else { "${Cluedin}/auth" }
Write-Host "Logging in" -ForegroundColor Green
$login = Invoke-WebRequest -Uri "${authEndpoint}/connect/token" -Method 'POST' -Body "client_id=${Org}&grant_type=password&password=${Password}&username=${Username}"
if($login.StatusCode -eq 200) {
$accessToken = $login.Content | ConvertFrom-Json | Select-Object -ExpandProperty access_token
}
if(!$accessToken) {
Write-Error "Could not login - please check parameters and confirmed user credentials"
}
$apiEndpoint = if($NoCluedInProxy) { "${Cluedin}:9000" } else { "${Cluedin}/api" }
Write-Host "Upgrading enrichers" -ForegroundColor Green
$enrichers = Invoke-WebRequest -Uri "${apiEndpoint}/api/enrichers/checkforupgrades" -Method 'POST' -Headers @{ Authorization = "Bearer ${accessToken}" }
if($enrichers.StatusCode -eq 200) {
$foundContent = $enrichers.Content | ConvertFrom-Json
$foundContent | Format-Table
$nonZero = $foundContent | Where-Object { $_.RecordsProcessed -gt 0 }
if($nonZero) {
Write-Warning "One or more providers may still have data to update - run this script again"
} else {
Write-Host "All enrichers are updated" -ForegroundColor Green
}
} else {
Write-Error "Could not upgrade enrichers - please check parameters"
}
Save the script locally as enricher-upgrade.ps1
and invoke with:
pwsh .\enricher-upgrade.ps1 -CluedIn http://app.<my domain> -Org <org name> -Username <username> -Password <password>
# ALTERNATIVE If you are running a local instance using CluedIn Home or localhost
pwsh .\enricher-upgrade.ps1 -CluedIn http://localhost -Org <org name> -Username <username> -Password <password> -NoCluedInProxy
The script will provide details of the results and inform you if you need to run the script again.
The Rule Builder has been updated to allow more nested actions to have their own unique filter. To enable this feature each rule in the system must be re-configured.
To register new enricher configurations you must trigger an authenticated POST
request to api/rules/checkforprocessingruleupgrades
.
To make this easier, the following PowerShell script may be used:
param(
[Parameter(Mandatory)]
[string]$CluedIn,
[Parameter(Mandatory)]
[string]$Org,
[Parameter(Mandatory)]
[string]$Username,
[Parameter(Mandatory)]
[string]$Password,
[switch]$NoCluedInProxy
)
$ErrorActionPreference = 'Stop'
$accessToken = $null
$authEndpoint = if($NoCluedInProxy) { "${Cluedin}:9001" } else { "${Cluedin}/auth" }
Write-Host "Logging in" -ForegroundColor Green
$login = Invoke-WebRequest -Uri "${authEndpoint}/connect/token" -Method 'POST' -Body "client_id=${Org}&grant_type=password&password=${Password}&username=${Username}"
if($login.StatusCode -eq 200) {
$accessToken = $login.Content | ConvertFrom-Json | Select-Object -ExpandProperty access_token
}
if(!$accessToken) {
Write-Error "Could not login - please check parameters and confirmed user credentials"
}
$apiEndpoint = if($NoCluedInProxy) { "${Cluedin}:9000" } else { "${Cluedin}/api" }
Write-Host "Upgrading enrichers" -ForegroundColor Green
$rules = Invoke-WebRequest -Uri "${apiEndpoint}/api/rules/checkforprocessingruleupgrades" -Method 'POST' -Headers @{ Authorization = "Bearer ${accessToken}" }
if($rules.StatusCode -eq 200) {
$processedCount = [int]$rules.Content
if($processedCount -gt 0) {
Write-Warning "One or more providers may still have data to update - run this script again"
} else {
Write-Host "All enrichers are updated" -ForegroundColor Green
}
} else {
Write-Error "Could not upgrade enrichers - please check parameters"
}
Save the script locally as rules-upgrade.ps1
and invoke with:
pwsh .\rules-upgrade.ps1 -CluedIn http://app.<my domain> -Org <org name> -Username <username> -Password <password>
# ALTERNATIVE If you are running a local instance using CluedIn Home or localhost
pwsh .\rules-upgrade.ps1 -CluedIn http://localhost -Org <org name> -Username <username> -Password <password> -NoCluedInProxy
The script will provide details of the results and inform you if you need to run the script again.
You will now be able to login and re-configure your enrichers.
Previously configuration was handled through the CluedIn environment configuration. Now you can login and configure enrichers under
the Prepare
area of CluedIn.
If the organization(s) within CluedIn was created using an old bootstrap or manual method, in order to support features using the CLuedIn Controller we need to create a reference Organization CRD. We do not need to create a new Organization, just a CRD with a reference to the Organizations ID.
To do this we need to get the Organizations identifier.
Search for the organziation name (for example foobar
from within CluedIn.
.. and click the View Codes
button in the top right of the panel ..
The GUID that appears here is the Organizations identifier. We can use this to create the Organization CRD.
First we need a secret so that the controller can log into the organization ..
apiVersion: v1
kind: Secret
metadata:
name: foobar-org
data:
password: Rm9vYmFyMjMh
username: YWRtaW5AZm9vYmFyLmNvbQ==
type: Opaque
Then we create the Organization CRD..
apiVersion: api.cluedin.com/v1
kind: Organization
metadata:
name: foobar-organization
spec:
id: '9d270e17-bf2f-426a-8e03-0c94661c0438'
name: foobar
kubectl apply -n cluedin -f foobar-organization.yaml
You can verify everything is correct by running the get orgs
command again ..
kubectl get orgs -n cluedin
NAME ORGANIZATION NAME ADMIN USER SECRET ORGANIZATION ID PHASE STATUS
foobar-organization foobar foobar-org 9d270e17-bf2f-426a-8e03-0c94661c0438 Active Organization [foobar] activated.