Create New Document

The title of your document (will be displayed as H1)
URL-friendly name (no spaces, use dashes)
Path where to create document (optional, use forward slashes to create subdirectories)

Move/Rename Document

Current location of the document
New path for the document (including the slug)
This only changes the document's path. It does not modify the document's title (H1 heading).

Delete Document

Are you sure you want to delete this document? This action cannot be undone.

Warning: If this is a folder, all contents including subfolders and documents will be deleted.

Message

Message content goes here.

Confirm Action

Are you sure?

Attachments

Allowed file types: jpg, jpeg, png, gif, svg, webp, txt, log, csv, sfd, zip, pdf, docx, xlsx, pptx, mp4 (Max: 10MB)

Document Files

Loading attached files...

Document History

Previous Versions

Loading versions...

Preview

Select a version to preview

Wiki Settings

Language for the user interface
Number of versions to keep per document. Set to 0 to disable versioning.
Maximum allowed file size for uploads in MB.

User Management

Add New User

Leave empty to keep current password
Users with these groups can access restricted sections.

Define path-based access rules for sections of your wiki, then assign users to groups in the Users tab. Rules are evaluated in order. First match wins.

Active Rules

Import markdown files from a ZIP archive. Files will be processed and stored in the appropriate document structure. Directory structure in the ZIP (category/subcategory) will be preserved in the wiki.

Upload a ZIP file containing markdown (.md) files to import.

Create and manage backups of your wiki data. Backups include all documents, images, and configuration files.

Available Backups

Loading backups...

Add/Edit Access Rule

Selected: /

Add Column

Troubleshooting Guide

This guide covers common issues and their solutions for the Mnemonica Kubernetes deployment.

Table of Contents

  1. Pod Issues
  2. Storage Issues
  3. Networking Issues
  4. Service-Specific Issues
  5. Performance Issues
  6. Debugging Commands

Pod Issues

Pods Not Starting (CrashLoopBackOff)

Symptoms: Pods repeatedly restart and show CrashLoopBackOff status

Diagnosis:

kubectl get pods
kubectl describe pod <pod-name>
kubectl logs <pod-name> --previous

Common Causes:

  1. Missing Secrets

    • Check if all required secrets exist:
      kubectl get secrets
      
    • Required secrets: uwsgi, centrifugo, fsa, regcred, juicefs-secret, juicefs-rabbitmq
    • Solution: Ensure secrets are properly linked from secrets directory
  2. ConfigMap Issues

    • Verify ConfigMaps exist:
      kubectl get configmaps
      
    • Ensure uwsgi-env, frontend-env, and other ConfigMaps are applied
  3. Image Pull Errors

    • Check if regcred is valid:
      kubectl get secret regcred -o yaml
      
    • Verify image exists: infomne/uwsgi-apps:v2.14.6-1-gab90065
    • Check pull status: kubectl describe pod <pod-name> | grep -A 5 Events

Pods Pending

Symptoms: Pods stuck in Pending state

Diagnosis:

kubectl describe pod <pod-name>

Common Causes:

  1. Insufficient Resources

    • Check node resources:
      kubectl top nodes
      kubectl describe nodes
      
    • Solution: Cluster autoscaler should add nodes, or adjust resource requests
  2. PVC Not Bound

    • Check PVC status:
      kubectl get pvc
      
    • Solution: Ensure JuiceFS CSI driver is installed and secrets are configured
  3. Anti-Affinity Rules

    • If you have only one node, anti-affinity rules will prevent pod scheduling
    • Temporarily remove anti-affinity for dev/testing, or add more nodes

ImagePullBackOff

Symptoms: Cannot pull container images

Solutions:

# Recreate registry credentials
kubectl delete secret regcred
kubectl create secret docker-registry regcred \
  --docker-server=<registry-server> \
  --docker-username=<username> \
  --docker-password=<password> \
  --docker-email=<email>

Storage Issues

PVC Not Binding

Symptoms: PersistentVolumeClaims stuck in Pending state

Diagnosis:

kubectl get pvc
kubectl describe pvc <pvc-name>
kubectl get pv

Common Causes:

  1. JuiceFS CSI Driver Not Installed

    helm list -n kube-system | grep juicefs
    kubectl get pods -n kube-system | grep juicefs
    
    • Solution: Install JuiceFS CSI driver
      helm upgrade -i juicefs-csi-driver -n kube-system \
        -f custom-values/juicefs-custom-values-dev.yaml
      
  2. JuiceFS Secrets Missing

    kubectl get secret juicefs-secret
    kubectl get secret juicefs-rabbitmq
    
    • Solution: Create JuiceFS secrets with proper credentials
  3. PV/PVC Label Mismatch

    • Check PV labels match PVC selector:
      kubectl get pv dev-media -o yaml | grep -A 3 labels
      kubectl get pvc dev-media -o yaml | grep -A 5 selector
      

Storage Access Issues

Symptoms: Pods can't read/write to mounted volumes

Diagnosis:

# Exec into pod
kubectl exec -it <pod-name> -- ls -la /var/www/media/

# Check mount status
kubectl exec -it <pod-name> -- mount | grep juicefs

# Check JuiceFS mount pod logs
kubectl logs -n kube-system <juicefs-mount-pod>

Solutions:

Networking Issues

Service Not Accessible

Symptoms: Cannot reach services from within cluster

Diagnosis:

kubectl get svc
kubectl describe svc <service-name>
kubectl get endpoints <service-name>

Solutions:

  1. No Endpoints

    • Service selector doesn't match pod labels
    • Verify labels:
      kubectl get pods --show-labels
      kubectl get svc <service-name> -o yaml | grep selector
      
  2. Port Mismatch

    • Verify service port matches container port:
      kubectl get svc <service-name> -o yaml
      kubectl get pod <pod-name> -o yaml | grep containerPort
      

External Access Issues

Symptoms: Cannot access application from devel.mnemonica.com

Diagnosis:

# Check NodePort service
kubectl get svc frontend
curl http://<node-ip>:30001

# Check AWS Load Balancer Controller
kubectl get pods -n kube-system | grep aws-load-balancer
kubectl logs -n kube-system <alb-controller-pod>

# Check Target Group Bindings
kubectl get targetgroupbindings
kubectl describe targetgroupbinding <tgb-name>

Solutions:

  1. Verify AWS ALB is created and healthy
  2. Check security groups allow traffic
  3. Verify DNS points to correct load balancer
  4. Check Target Group health checks

DNS Resolution Issues

Symptoms: Services can't resolve DNS names

Diagnosis:

# Test DNS from pod
kubectl exec -it <pod-name> -- nslookup redis-master
kubectl exec -it <pod-name> -- nslookup backend.default.svc.cluster.local

# Check CoreDNS
kubectl get pods -n kube-system | grep coredns
kubectl logs -n kube-system <coredns-pod>

Solutions:

Service-Specific Issues

Backend Issues

Symptoms: Backend pod crashes or returns errors

Diagnosis:

kubectl logs -f deployment/backend
kubectl exec -it <backend-pod> -- python manage.py check

Common Issues:

  1. Database Connection Failed

    • Check PostgreSQL credentials in uwsgi secret
    • Verify PGSSLCERT exists at /tmp/postgresql.crt
    • Test connection from pod:
      kubectl exec -it <backend-pod> -- psql <connection-string>
      
  2. Redis Connection Failed

    • Check Redis is running: kubectl get pods | grep redis
    • Test connection:
      kubectl exec -it <backend-pod> -- redis-cli -h redis-master ping
      
  3. RabbitMQ Connection Failed

    • Check RabbitMQ: kubectl get pods | grep rabbitmq
    • Verify credentials in CELERY_BROKER_URL (uwsgi-env ConfigMap)
    • Test connection:
      kubectl exec -it <backend-pod> -- curl http://guest:guest@rabbitmq:15672/api/whoami
      

Celery Issues

Symptoms: Tasks not processing, workers offline

Diagnosis:

kubectl logs -f deployment/celery
kubectl logs -f deployment/celery-encoding

# Check Flower monitoring
kubectl port-forward deployment/flower 5555:5555
# Open http://localhost:5555

Common Issues:

  1. Workers Not Connecting to Broker

    • Verify CELERY_BROKER_URL in uwsgi-env ConfigMap
    • Check RabbitMQ logs: kubectl logs <rabbitmq-pod>
  2. Tasks Failing

    • Check worker logs for exceptions
    • Verify shared storage is accessible
    • Check encoding parameters in uwsgi-env ConfigMap

Encoding Pool Issues

Symptoms: Video encoding not working

Diagnosis:

kubectl logs -f deployment/encoding-pool-master
kubectl get jobs | grep encoding-worker
kubectl logs job/<encoding-worker-job>

Common Issues:

  1. Workers Not Starting

    • Check encoding-pool-master has kubectl access:
      kubectl exec -it <master-pod> -- kubectl get pods
      
    • Verify epmaster service account: kubectl get sa epmaster
    • Check role binding: kubectl get rolebinding | grep epmaster
  2. Encoding Failures

    • Check GPU availability (if using h264_nvenc)
    • Verify ffmpeg is installed: kubectl exec -it <worker-pod> -- ffmpeg -version
    • Check encoding parameters in uwsgi-env ConfigMap
    • Review worker job logs

TUSD Upload Issues

Symptoms: File uploads failing or hanging

Diagnosis:

# Check TUSD pod
kubectl logs -f <tusd-pod>

# Check tus-hook-listener
kubectl logs -f deployment/tus-hook-listener

# Check lock files
kubectl exec -it <tusd-pod> -- find /var/www/media/mnemonica/storage/tus -name "*.lock"

Solutions:

  1. Stale Lock Files

    • TUSD init container should remove them, but you can manually clean:
      kubectl exec -it <tusd-pod> -- find /var/www/media/mnemonica/storage/tus -name "*.lock" -delete
      
  2. Storage Full

    • Check JuiceFS backend (S3) storage
    • Review upload directory size
  3. Hook Listener Not Responding

    • Verify tus-hook-listener service: kubectl get svc tus-hook-listener
    • Check pod is running: kubectl get pods -l app=tus-hook-listener

Centrifugo WebSocket Issues

Symptoms: Real-time features not working

Diagnosis:

kubectl logs -f <centrifugo-pod>

# Check service
kubectl get svc centrifugo

# Test connection from backend
kubectl exec -it <backend-pod> -- curl http://centrifugo:9000/api

Solutions:

  1. Token Mismatch

    • Verify centrifugo secret matches backend configuration
    • Check tokenHmacSecretKey and apiKey are set correctly
  2. Connection Refused

    • Ensure Centrifugo pod is running
    • Verify service endpoints: kubectl get endpoints centrifugo

Redis Issues

Symptoms: Cache not working, sessions lost

Diagnosis:

kubectl logs -f <redis-pod>
kubectl exec -it <redis-pod> -- redis-cli ping
kubectl exec -it <redis-pod> -- redis-cli INFO

Solutions:

  1. Redis Out of Memory

    • Check memory: kubectl exec -it <redis-pod> -- redis-cli INFO memory
    • Increase resource limits or add eviction policy
  2. Connection Refused

    • Verify redis-master service: kubectl get svc redis-master
    • Check MNEMONICA_CACHE_URL in uwsgi-env ConfigMap

Performance Issues

Slow Response Times

Diagnosis:

# Check resource usage
kubectl top pods
kubectl top nodes

# Check pod resource limits
kubectl describe pod <pod-name> | grep -A 5 Limits

# Check for throttling
kubectl describe pod <pod-name> | grep -i throttling

Solutions:

  1. CPU/Memory Limits Too Low

    • Uncomment and adjust resource requests/limits in deployment files
    • Example:
      resources:
        requests:
          cpu: 100m
          memory: 512Mi
        limits:
          cpu: 1000m
          memory: 2Gi
      
  2. Storage I/O Issues

    • Increase JuiceFS cache-size mount option
    • Check S3 backend performance
    • Monitor JuiceFS mount pod metrics
  3. Too Few Replicas

    • Scale critical services:
      kubectl scale deployment backend --replicas=2
      kubectl scale deployment celery --replicas=3
      

High Memory Usage

Diagnosis:

kubectl top pods
kubectl exec -it <pod-name> -- free -h

Solutions:

Debugging Commands

Get Resource Status

# All resources overview
kubectl get all

# Specific resource types
kubectl get pods
kubectl get svc
kubectl get deployments
kubectl get pvc
kubectl get configmaps
kubectl get secrets
kubectl get jobs

# Across all namespaces
kubectl get pods -A
helm list -A

Describe Resources

kubectl describe pod <pod-name>
kubectl describe deployment <deployment-name>
kubectl describe pvc <pvc-name>
kubectl describe node <node-name>

Logs

# Current logs
kubectl logs <pod-name>
kubectl logs -f <pod-name>  # Follow
kubectl logs <pod-name> -c <container-name>  # Multi-container pod

# Previous crashed pod logs
kubectl logs <pod-name> --previous

# Deployment logs (any pod)
kubectl logs -f deployment/<deployment-name>

# Last N lines
kubectl logs <pod-name> --tail=100

# Logs with timestamps
kubectl logs <pod-name> --timestamps

Execute Commands in Pods

# Interactive shell
kubectl exec -it <pod-name> -- /bin/bash
kubectl exec -it <pod-name> -- /bin/sh  # If bash not available

# Single command
kubectl exec <pod-name> -- ls -la /var/www/media
kubectl exec <pod-name> -- env
kubectl exec <pod-name> -- ps aux

# Python management commands (backend)
kubectl exec -it <backend-pod> -- python manage.py shell
kubectl exec -it <backend-pod> -- python manage.py dbshell
kubectl exec -it <backend-pod> -- python manage.py check

Port Forwarding

# Forward local port to pod
kubectl port-forward pod/<pod-name> 8080:80
kubectl port-forward deployment/<deployment-name> 8080:80
kubectl port-forward svc/<service-name> 8080:80

# Access at http://localhost:8080

Copy Files

# From pod to local
kubectl cp <pod-name>:/path/to/file ./local-file

# From local to pod
kubectl cp ./local-file <pod-name>:/path/to/file

Resource Usage

# Real-time resource monitoring
kubectl top pods
kubectl top nodes

# Specific pod
kubectl top pod <pod-name>

# Sort by CPU/Memory
kubectl top pods --sort-by=cpu
kubectl top pods --sort-by=memory

Events

# Cluster events
kubectl get events --sort-by='.lastTimestamp'

# Specific resource events
kubectl describe pod <pod-name> | grep -A 20 Events

Restart Deployments

# Graceful restart
kubectl rollout restart deployment/<deployment-name>

# Force delete and recreate pod
kubectl delete pod <pod-name>

# Restart all pods with specific label
kubectl delete pods -l app=backend

Config and Secrets

# View ConfigMap
kubectl get configmap uwsgi-env -o yaml

# Edit ConfigMap (will auto-reload with Reloader)
kubectl edit configmap uwsgi-env

# Decode secret (base64)
kubectl get secret uwsgi -o jsonpath='{.data.DATABASE_URL}' | base64 -d

Network Testing

# Test service connectivity from pod
kubectl exec -it <pod-name> -- curl http://<service-name>:<port>
kubectl exec -it <pod-name> -- wget -O- http://<service-name>:<port>
kubectl exec -it <pod-name> -- nc -zv <service-name> <port>

# DNS testing
kubectl exec -it <pod-name> -- nslookup <service-name>
kubectl exec -it <pod-name> -- dig <service-name>.default.svc.cluster.local

Helm Debugging

# List releases
helm list -A

# Check release status
helm status <release-name>

# View release values
helm get values <release-name>

# View all release details
helm get all <release-name>

# Rollback
helm rollback <release-name> <revision>

Node Debugging

# Node details
kubectl describe node <node-name>

# Drain node (evict pods)
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

# Uncordon node (allow scheduling)
kubectl uncordon <node-name>

# Cordon node (prevent new pods)
kubectl cordon <node-name>

Emergency Procedures

Complete Application Restart

# Restart all application services
kubectl rollout restart deployment/backend
kubectl rollout restart deployment/frontend
kubectl rollout restart deployment/celery
kubectl rollout restart deployment/celerybeat
kubectl rollout restart deployment/celery-encoding
kubectl rollout restart deployment/encoding-pool-master
kubectl rollout restart deployment/tus-hook-listener
kubectl rollout restart deployment/traffic-log-listener
kubectl rollout restart deployment/flower

Clear All Pods

# Delete all pods (will be recreated by deployments)
kubectl delete pods -l pdb=minAvail1

Reset Redis

kubectl delete pod -l app=redis-master
# Or flush all data
kubectl exec -it <redis-pod> -- redis-cli FLUSHALL

Reset RabbitMQ

# Restart RabbitMQ (tasks in queue will be lost)
kubectl delete pod -l app.kubernetes.io/name=rabbitmq

Getting Help

If issues persist:

  1. Collect diagnostics:

    kubectl get all -o wide > cluster-state.txt
    kubectl describe pods >> cluster-state.txt
    kubectl get events --sort-by='.lastTimestamp' >> cluster-state.txt
    
  2. Check CloudWatch logs in AWS Console

  3. Review Groundcover dashboards

  4. Contact support:

Attached Files

Loading attached files...

Comments

No comments yet. Be the first to comment!

Search Results