Post Deployment Checklist
The following guide will walk you through steps you can take to make sure your Galileo cluster is properly deployed and running well.
This guide applies to all cloud providers.
1. Confirm that all DNS records have been created.
Galileo will not set DNS records for your cluster and as such you need to set those appropriately for your company. Each record should have a TTL of 60 seconds or less.
If you are letting Galileo provision Let’s Encrypt certificates for you automatically with cert-manager, it’s important to make sure that all of cert-manager’s http solvers have told Let’s Encrypt to provision a certificate with all of the domains specified for the cluster (i.e. api|console|data|grafana.my-cluster.my-domain.com
)
kubectl get ingress -n galileo | grep -i http-solver
When you run the above command, if you see no output, then the solvers should have finished. You can check this by visiting any of the domains for your cluster.
2. Check the API’s health-check.
curl -I -X GET https://api.<CLUSTER_SUBDOMAIN>.<CLUSTER_DOMAIN>/healthcheck
If the response is a 200, then this is a good sign that almost everything is up and running as expected.
3. Check for unready pods.
kubectl get pods --all-namespaces -o go-template='{{ range $item := .items }}{{ range .status.conditions }}{{ if (or (and (eq .type "PodScheduled") (eq .status "False")) (and (eq .type "Ready") (eq .status "False"))) }}{{ $item.metadata.name}} {{ end }}{{ end }}{{ end }}'
If any pods are in an unready state, especially in the namespace where the Galileo platform was deployed, please notify the appropriate representative from Galileo and they will help to solve the issue.
4. Check for pending persistent volume claims.
kubectl get pvc --all-namespaces | grep -i pending
If any persistent volume claims are in a pending state, especially in the namespace where the Galileo platform was deployed, please notify the appropriate representative from Galileo and they will help to solve the issue.
5. Clickhouse keeper fails to start
kubectl get sts --all-namespaces | grep -i clickhouse-keeper
If there is a statefulset clickhouse-keeper
with zero ready replicas, it means the kubernetes version is incompatible, please take the following steps:
- Upgrade kubernetes version (control plane + node groups) to at least 1.30
- Delete the broken CRD with
kubectl delete crd clickhousekeeperinstallations.clickhouse-keeper.altinity.com
- Delete the clickhouse operator with
kubectl delete deploy clickhouse-operator
- Re-apply the manifest
- Wait for 2 minutes, confirm 3 clickhouse keeper statefulsets
chk-clickhouse-keeper-cluster
are up withkubectl get sts --all-namespaces | grep -i clickhouse-keeper
- If you still see an unhealthy statefulset
clickhouse-keeper
along with those 3, just clean up the statefulset and its pvc withkubectl delete sts clickhouse-keeper && kubectl delete pvc data-volume-claim-clickhouse-keeper-0
Was this page helpful?