Error Code Reference

Comprehensive reference guide for understanding and resolving platform error codes in edge Kubernetes deployments. This guide provides detailed explanations, root cause analysis, and step-by-step solutions for common error scenarios.

How to Use This Reference

Each error code entry includes:

Meaning: Clear explanation of what the error indicates
Root Causes: Common underlying issues that trigger this error
Immediate Actions: Quick steps to take when encountering the error
Detailed Solutions: Step-by-step troubleshooting procedures
Prevention: Best practices to avoid the error in the future

HTTP Error Codes

400 - Bad Request

Meaning: The server cannot process the request due to client-side errors in the request syntax, format, or content.

Root Causes:

Malformed YAML or JSON in Kubernetes manifests
Invalid resource specifications or field values
Missing required fields in API requests
Incompatible API versions
Invalid resource names or labels

Immediate Actions:

# Validate manifest syntax before applying
kubectl apply --dry-run=client -f your-manifest.yaml

# Check for YAML syntax errors
yamllint your-manifest.yaml

# Validate against Kubernetes schema
kubectl apply --validate=true --dry-run=client -f your-manifest.yaml

Detailed Solutions:

Validate Manifest Structure:

# Use kubeval for schema validation
kubeval your-manifest.yaml

# Check API version compatibility
kubectl api-versions | grep apps

Common Field Validation Issues:

# Verify resource names follow DNS conventions
# Names must be lowercase, contain only alphanumeric characters and hyphens

# Check label and annotation syntax
kubectl explain pod.metadata.labels

Debug Specific Field Errors:

# Get detailed field explanations
kubectl explain deployment.spec.template.spec.containers

# Validate resource requirements format
kubectl explain pod.spec.containers.resources

Prevention:

Use IDE plugins with Kubernetes schema validation
Implement CI/CD pipeline validation steps
Use kubectl --dry-run=client before applying changes
Maintain consistent naming conventions

401 - Unauthorized

Meaning: The request lacks valid authentication credentials or the provided credentials are invalid.

Root Causes:

Expired authentication tokens or certificates
Missing or incorrect kubeconfig configuration
Invalid service account tokens
Clock skew between client and server
Revoked or disabled user credentials

Immediate Actions:

# Check current authentication status
kubectl auth whoami

# View current context and credentials
kubectl config current-context
kubectl config view --minify

Detailed Solutions:

Token Refresh and Validation:

# Refresh authentication token
kubectl auth refresh

# For OIDC providers, re-authenticate
kubectl oidc-login

# Check token expiration
kubectl config view --raw -o jsonpath='{.users[0].user.auth-provider.config.id-token}' | base64 -d

Service Account Authentication:

# Check service account token
kubectl get serviceaccount <sa-name> -o yaml

# Verify token mounting in pods
kubectl describe pod <pod-name> | grep -A 5 "Mounts:"

# Create new service account token if needed
kubectl create token <sa-name>

Certificate-Based Authentication:

# Check client certificate validity
openssl x509 -in ~/.kube/client.crt -text -noout

# Verify certificate against CA
openssl verify -CAfile ~/.kube/ca.crt ~/.kube/client.crt

Prevention:

Set up automatic token renewal
Monitor certificate expiration dates
Use short-lived tokens with refresh mechanisms
Implement proper RBAC policies

403 - Forbidden

Meaning: The server understood the request but refuses to authorize it due to insufficient permissions.

Root Causes:

Inadequate RBAC permissions for the requested operation
Missing role bindings or cluster role bindings
Incorrect service account assignments
Namespace-level permission restrictions
Resource quota or limit range violations

Immediate Actions:

# Check specific permission
kubectl auth can-i <verb> <resource> --namespace <namespace>

# List all permissions for current user
kubectl auth can-i --list --namespace <namespace>

Detailed Solutions:

RBAC Permission Analysis:

# Check current user's role bindings
kubectl get rolebinding,clusterrolebinding -A | grep $(kubectl config view --minify -o jsonpath='{.contexts[0].context.user}')

# Examine specific role permissions
kubectl describe role <role-name> -n <namespace>
kubectl describe clusterrole <clusterrole-name>

Service Account Permission Debugging:

# Check service account's role bindings
kubectl get rolebinding,clusterrolebinding -A | grep <service-account-name>

# Test permissions as service account
kubectl auth can-i create pods --as=system:serviceaccount:<namespace>:<sa-name>

Resource Quota and Limits:

# Check namespace resource quotas
kubectl describe quota -n <namespace>

# Check limit ranges
kubectl describe limitrange -n <namespace>

Prevention:

Follow principle of least privilege
Regularly audit RBAC permissions
Use namespace-specific service accounts
Document required permissions for applications

404 - Not Found

Meaning: The requested resource does not exist on the server.

Root Causes:

Resource was deleted or never created
Incorrect resource name or namespace
API version mismatch
Custom Resource Definition (CRD) not installed
Typos in resource references

Immediate Actions:

# List all resources in namespace
kubectl get all -n <namespace>

# Search across all namespaces
kubectl get <resource-type> -A | grep <resource-name>

Detailed Solutions:

Resource Discovery:

# List all available resource types
kubectl api-resources

# Check specific resource type availability
kubectl api-resources | grep <resource-type>

# Verify CRD installation
kubectl get crd | grep <custom-resource>

Namespace and Name Verification:

# Check current namespace context
kubectl config view --minify | grep namespace

# List resources with labels
kubectl get <resource-type> --show-labels -A

API Version Compatibility:

# Check deprecated API versions
kubectl api-versions | sort

# Convert deprecated resources
kubectl convert -f old-manifest.yaml --output-version <new-api-version>

Prevention:

Use consistent naming conventions
Implement resource tagging strategies
Version control for resource manifests
Regular cluster resource auditing

Kubernetes Pod Error States

ImagePullBackOff / ErrImagePull

Meaning: Kubernetes cannot pull the specified container image from the registry.

Root Causes:

Image does not exist in the specified registry
Incorrect image name, tag, or registry URL
Authentication issues with private registries
Network connectivity problems to registry
Registry service unavailable
Image architecture mismatch (e.g., ARM vs x86)

Immediate Actions:

# Check pod events for specific error details
kubectl describe pod <pod-name>

# Verify image exists manually
docker pull <image-name>:<tag>

Detailed Solutions:

Image and Registry Verification:

# Test direct image pull
docker pull <registry>/<image>:<tag>

# Check image manifest
docker manifest inspect <image>:<tag>

# Verify image architecture
docker image inspect <image>:<tag> | grep Architecture

Registry Authentication:

# Check existing image pull secrets
kubectl get secrets | grep docker

# Describe image pull secret
kubectl get secret <secret-name> -o yaml

# Create new registry secret
kubectl create secret docker-registry <secret-name> \
  --docker-server=<registry-url> \
  --docker-username=<username> \
  --docker-password=<password>

# Link secret to service account
kubectl patch serviceaccount default -p '{"imagePullSecrets": [{"name": "<secret-name>"}]}'

Network and Connectivity:

# Test registry connectivity from node
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- curl -v https://<registry-url>

# Check DNS resolution
kubectl exec -it <debug-pod> -- nslookup <registry-domain>

# Verify proxy settings if applicable
kubectl get nodes -o yaml | grep -i proxy

Prevention:

Use specific image tags instead of 'latest'
Implement image scanning and validation
Use private registries with proper authentication
Test image pulls in CI/CD pipeline

CrashLoopBackOff

Meaning: The container repeatedly crashes and Kubernetes keeps restarting it with increasing delays.

Root Causes:

Application startup failures due to misconfigurations
Missing dependencies or environment variables
Resource constraints (memory/CPU limits too low)
Port binding conflicts
Database or external service connectivity issues
Invalid command or entrypoint specifications

Immediate Actions:

# Check current and previous container logs
kubectl logs <pod-name> --previous
kubectl logs <pod-name> --tail=50

# Get detailed pod information
kubectl describe pod <pod-name>

Detailed Solutions:

Application Log Analysis:

# Follow logs in real-time
kubectl logs <pod-name> -f

# Check logs from all containers in pod
kubectl logs <pod-name> --all-containers=true

# Get logs with timestamps
kubectl logs <pod-name> --timestamps=true --since=10m

Resource and Configuration Review:

# Check resource limits and requests
kubectl get pod <pod-name> -o yaml | grep -A 10 resources

# Verify environment variables
kubectl get pod <pod-name> -o yaml | grep -A 20 env

# Check mounted volumes and secrets
kubectl describe pod <pod-name> | grep -A 10 "Mounts:"

Interactive Debugging:

# Run debug container with same image
kubectl run debug-container --image=<same-image> -it --rm -- /bin/sh

# Execute commands in running container (if available)
kubectl exec -it <pod-name> -- /bin/sh

# Debug with different entrypoint
kubectl run debug-pod --image=<image> -it --rm --command -- /bin/sh

Prevention:

Implement proper health checks and readiness probes
Use init containers for dependency checks
Set appropriate resource requests and limits
Test containerized applications locally first

Pending

Meaning: The pod has been accepted by Kubernetes but cannot be scheduled to run on any node.

Root Causes:

Insufficient CPU or memory resources on available nodes
Node selector constraints cannot be satisfied
Pod anti-affinity rules preventing scheduling
Taints on nodes without corresponding tolerations
Persistent volume availability issues
Resource quotas preventing pod creation

Immediate Actions:

# Check pod scheduling events
kubectl describe pod <pod-name>

# Review node resource availability
kubectl top nodes
kubectl describe nodes

Detailed Solutions:

Resource Availability Analysis:

# Check detailed node capacity
kubectl describe node <node-name> | grep -A 10 "Allocated resources"

# View all node allocatable resources
kubectl get nodes -o custom-columns=NAME:.metadata.name,CPU-ALLOCATABLE:.status.allocatable.cpu,MEMORY-ALLOCATABLE:.status.allocatable.memory

# Check resource quotas
kubectl describe quota -n <namespace>

Scheduling Constraints Review:

# Check node labels for selector matching
kubectl get nodes --show-labels

# Review pod node selector and affinity rules
kubectl get pod <pod-name> -o yaml | grep -A 10 nodeSelector
kubectl get pod <pod-name> -o yaml | grep -A 20 affinity

# Check node taints
kubectl describe node <node-name> | grep Taints

Storage and Volume Issues:

# Check persistent volume claims
kubectl get pvc -n <namespace>

# Verify storage class availability
kubectl get storageclass

# Check persistent volume status
kubectl get pv | grep Available

Prevention:

Monitor cluster resource utilization
Implement cluster autoscaling
Use resource requests appropriately
Plan for node maintenance and capacity

Edge-Specific Error Codes

EDGE-001: Node Connectivity Lost

Meaning: An edge node has lost network connectivity to the Kubernetes control plane.

Root Causes:

Network infrastructure failures or instability
Firewall or security group configuration changes
DNS resolution issues
Control plane endpoint changes
Certificate expiration or rotation issues
Power or hardware failures at edge location

Immediate Actions:

# Check node status from control plane
kubectl get nodes

# Verify node-specific details
kubectl describe node <edge-node-name>

Detailed Solutions:

Network Connectivity Testing:

# Test connectivity from edge node to control plane
# (Run these commands on the edge node if accessible)
ping <control-plane-ip>
telnet <control-plane-ip> 6443

# Check DNS resolution
nslookup <control-plane-hostname>

# Verify routing
traceroute <control-plane-ip>

Certificate and Authentication Verification:

# Check kubelet certificate status
openssl x509 -in /var/lib/kubelet/pki/kubelet-client-current.pem -text -noout

# Verify kubelet configuration
systemctl status kubelet
journalctl -u kubelet --since "1 hour ago"

Recovery Procedures:

# Restart kubelet service
systemctl restart kubelet

# Reset node if necessary (from node)
kubeadm reset
kubeadm join <control-plane-endpoint> --token <token> --discovery-token-ca-cert-hash <hash>

# Drain and cordon node for maintenance (from control plane)
kubectl drain <node-name> --ignore-daemonsets
kubectl cordon <node-name>

Prevention:

Implement redundant network paths
Monitor network connectivity continuously
Set up automated certificate renewal
Use node health monitoring and alerting

EDGE-002: Resource Quota Exceeded

Meaning: The edge node has exceeded its allocated resource limits, preventing new workload scheduling.

Root Causes:

Insufficient resource planning for edge node capacity
Memory leaks in running applications
Unexpected workload spikes or scaling events
Background processes consuming resources
Storage space exhaustion

Immediate Actions:

# Check node resource usage
kubectl top node <edge-node-name>

# List resource-intensive pods
kubectl top pods --sort-by=memory -n <namespace>
kubectl top pods --sort-by=cpu -n <namespace>

Detailed Solutions:

Resource Usage Analysis:

# Get detailed node resource allocation
kubectl describe node <edge-node-name> | grep -A 20 "Allocated resources"

# Check for resource-intensive pods
kubectl get pods -o custom-columns=NAME:.metadata.name,CPU-REQUEST:.spec.containers[0].resources.requests.cpu,MEMORY-REQUEST:.spec.containers[0].resources.requests.memory --sort-by=.spec.containers[0].resources.requests.memory

# Monitor resource usage over time
watch kubectl top pods -n <namespace>

Workload Optimization:

# Scale down non-critical workloads
kubectl scale deployment <deployment-name> --replicas=1

# Update resource limits for problematic pods
kubectl patch deployment <deployment-name> -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container-name>","resources":{"limits":{"memory":"512Mi","cpu":"500m"}}}]}}}}'

# Remove completed or failed pods
kubectl delete pods --field-selector=status.phase=Succeeded
kubectl delete pods --field-selector=status.phase=Failed

Storage Management:

# Check disk usage on node
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- df -h

# Clean up unused volumes
kubectl delete pvc <unused-pvc-name>

# Remove unused images (on node)
docker system prune -a

Prevention:

Implement resource monitoring and alerting
Set appropriate resource requests and limits
Use horizontal pod autoscaling
Regular cleanup of unused resources

EDGE-003: Storage Limit Reached

Meaning: The edge node has reached its storage capacity limit, affecting application operation and new deployments.

Root Causes:

Log files growing without rotation or cleanup
Application data accumulation without management
Container image buildup without cleanup
Temporary file accumulation
Persistent volume space exhaustion

Immediate Actions:

# Check storage usage on node
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- df -h

# Check persistent volume status
kubectl get pv,pvc -A

Detailed Solutions:

Storage Analysis and Cleanup:

# Analyze disk usage by directory
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- du -sh /var/lib/docker/*

# Clean up Docker resources
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- docker system prune -a

# Remove unused container images
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- docker image prune -a

Log Management:

# Check log sizes
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- du -sh /var/log/*

# Rotate logs manually if needed
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- logrotate -f /etc/logrotate.conf

# Clear old journal logs
kubectl debug node/<node-name> -it --image=nicolaka/netshoot -- journalctl --vacuum-time=7d

Persistent Volume Management:

# Check PV usage
kubectl exec -it <pod-using-pv> -- df -h /mount/path

# Resize persistent volumes if supported
kubectl patch pvc <pvc-name> -p '{"spec":{"resources":{"requests":{"storage":"20Gi"}}}}'

# Backup and clean old data
kubectl exec -it <pod-name> -- tar -czf /backup/data.tar.gz /data/old/
kubectl exec -it <pod-name> -- rm -rf /data/old/*

Prevention:

Implement log rotation and retention policies
Set up storage monitoring and alerting
Use ephemeral storage for temporary data
Regular maintenance and cleanup schedules

Debugging Commands Quick Reference

Essential Diagnostic Commands

# Get cluster overview
kubectl cluster-info
kubectl get nodes -o wide
kubectl get events --sort-by='.lastTimestamp' -A

# Pod debugging
kubectl get pods -A -o wide
kubectl describe pod <pod-name>
kubectl logs <pod-name> --previous --timestamps

# Resource monitoring
kubectl top nodes
kubectl top pods -A --sort-by=memory
kubectl get events --field-selector type=Warning

# Network debugging
kubectl get svc,ep -A
kubectl exec -it <pod-name> -- netstat -tulpn
kubectl run netshoot --image=nicolaka/netshoot -it --rm

# Storage debugging
kubectl get pv,pvc,sc -A
kubectl describe pvc <pvc-name>

# RBAC debugging
kubectl auth can-i --list
kubectl get rolebinding,clusterrolebinding -A

Advanced Debugging Techniques

# Enable verbose output
kubectl apply -f manifest.yaml --v=8

# Use debug containers
kubectl debug <pod-name> -it --image=nicolaka/netshoot
kubectl debug node/<node-name> -it --image=nicolaka/netshoot

# Simulate pod scheduling
kubectl apply --dry-run=server -f pod.yaml

# Force delete stuck resources
kubectl delete pod <pod-name> --force --grace-period=0

# Export resources for analysis
kubectl get pod <pod-name> -o yaml > pod-debug.yaml
kubectl describe pod <pod-name> > pod-description.txt

Next Steps

For additional troubleshooting resources:

Common Issues & Solutions - Quick fixes for frequent problems
Debugging Guides - Step-by-step troubleshooting workflows

How to Use This Reference​

HTTP Error Codes​

400 - Bad Request​

401 - Unauthorized​

403 - Forbidden​

404 - Not Found​

Kubernetes Pod Error States​

ImagePullBackOff / ErrImagePull​

CrashLoopBackOff​

Pending​

Edge-Specific Error Codes​

EDGE-001: Node Connectivity Lost​

EDGE-002: Resource Quota Exceeded​

EDGE-003: Storage Limit Reached​

Debugging Commands Quick Reference​

Essential Diagnostic Commands​

Advanced Debugging Techniques​

Next Steps​

How to Use This Reference

HTTP Error Codes

400 - Bad Request

401 - Unauthorized

403 - Forbidden

404 - Not Found

Kubernetes Pod Error States

ImagePullBackOff / ErrImagePull

CrashLoopBackOff

Pending

Edge-Specific Error Codes

EDGE-001: Node Connectivity Lost

EDGE-002: Resource Quota Exceeded

EDGE-003: Storage Limit Reached

Debugging Commands Quick Reference

Essential Diagnostic Commands

Advanced Debugging Techniques

Next Steps