Cloud infrastructure is defined by state — the live state of hundreds of resources across compute, networking, storage, and services. AI assistants that can't read that state can only give you generic advice. An AI that knows your actual VPC CIDR ranges, your current EKS node utilization, and the exact error in your Terraform plan is a different class of tool.
MCP servers give your AI direct access to your infrastructure state, turning it from a documentation reader into an active infrastructure co-pilot.
Cloud Provider Access
AWS MCP Server — Infrastructure State and Resource Management
The AWS MCP server connects your AI to AWS APIs, enabling read access to your actual resource state across EC2, S3, RDS, ECS, Lambda, IAM, and every other AWS service. Instead of describing your setup, you show it.
AWS engineering workflows:
- Cost analysis: "Query our EC2 instances and identify any running instances that haven't received requests in the last 7 days — potential idle resource waste"
- Security audit: "List all S3 buckets and flag any with public read ACLs or missing server-side encryption"
- IAM review: "Find all IAM roles with administrator access policies attached and list the last-used date for each"
- Capacity planning: "Check current RDS instance sizes across all production databases and compare against CloudWatch average CPU utilization"
- Incident diagnosis: "Read the last 100 CloudTrail events for this EC2 instance — I need to understand what changed before it became unreachable"
Architecture documentation: A common use case is using the AWS MCP server to generate infrastructure documentation. "Enumerate our VPC topology, subnets, and route tables, then generate a Mermaid diagram of the network architecture" produces accurate documentation from live state instead of from someone's memory.
AWS CLI MCP Server — Command Generation and Execution
The AWS CLI MCP server wraps the AWS CLI for AI-assisted command generation with your actual account context. Instead of constructing complex CLI commands manually, your AI builds them from your resource inventory.
CLI-assisted operations:
- Generate the exact AWS CLI command to rotate a specific IAM access key, using the real key ID from your account
- Build a CloudFormation deploy command with the correct parameter overrides for a specific environment
- Construct an S3 sync command with the correct bucket path and exclusion patterns from your actual setup
- Generate a Lambda update-function-code command pointing to the correct ECR image tag
GCloud MCP Server — Google Cloud Platform Operations
For GCP-based infrastructure, the gcloud MCP server provides the same pattern: AI access to your actual GCP resource state for assisted operations, diagnostics, and documentation.
GCP engineering workflows:
- List all GKE cluster configurations and identify nodes running deprecated Kubernetes versions
- Review Cloud IAM bindings for a specific project to find over-privileged service accounts
- Check BigQuery dataset access controls before migrating data
- Generate gcloud commands for Cloud Run service deployments using actual project and region values
- Audit Cloud Storage bucket lifecycle policies for cost optimization opportunities
Azure CLI MCP Server — Azure Resource Operations
Azure infrastructure teams can use the Azure CLI MCP server to bring the same live-state awareness to Azure resource groups, virtual machines, AKS clusters, and Azure services.
Azure workflows:
- List all Azure VMs in a resource group and check their current running state and SKU
- Review Network Security Group rules for a specific subnet before making changes
- Check AKS node pool configurations and current utilization metrics
- Generate az CLI commands for Azure Key Vault secret rotation workflows
Container Orchestration
Kubernetes MCP Server — Cluster State and Workload Management
The Kubernetes MCP server gives your AI direct kubectl-equivalent access to your cluster state — pods, deployments, services, configmaps, events, and logs. This is one of the highest-value MCP servers for cloud engineers: Kubernetes troubleshooting is heavily dependent on current state, and describing it manually is error-prone and slow.
Kubernetes operations workflows:
- Incident diagnosis: "Check the events for pods in the payment namespace over the last hour — I'm getting CrashLoopBackOff alerts"
- Resource optimization: "List all deployments and compare their requested CPU/memory vs actual usage from metrics-server — identify over-provisioned workloads"
- Deployment verification: "Verify that the rollout completed successfully and all pods are running the new image tag after the deployment"
- Config review: "Read the configmap for the auth service and identify any settings that differ between staging and production"
- Network policy audit: "List all NetworkPolicy objects and identify any pods that have no egress restrictions"
Runbook automation: Complex operational procedures that involve multiple kubectl commands can be described once and executed with AI-assisted command generation. "Walk me through checking cluster health before a maintenance window" becomes a structured checklist executed against real cluster data.
Docker MCP Server — Container Diagnostics
For containerized development and operations, the Docker MCP server provides access to local or remote Docker daemon state — running containers, images, volumes, and network configurations.
Docker workflows:
- List all running containers and identify any that have been restarting frequently
- Read container logs for a specific service to diagnose a startup failure
- Inspect Docker network configuration for multi-container application debugging
- Identify large images that are consuming excess disk space on build hosts
Infrastructure as Code
Terraform MCP Server — IaC State and Plan Analysis
The Terraform MCP server brings your AI into the infrastructure-as-code workflow — reading Terraform state files, analyzing plan outputs, and understanding resource dependency graphs.
Terraform-assisted IaC workflows:
- Plan review: "Read the terraform plan output and identify any destructive changes that affect production resources — flag anything that requires manual review"
- State drift detection: "Compare the Terraform state for the database module against current AWS RDS resource state — are there drift indicators?"
- Module documentation: "Read the Terraform module files for our VPC module and generate documentation for each variable"
- Dependency analysis: "Map the resource dependencies in this Terraform configuration — which resources will be affected if I modify the security group?"
- Refactoring assistance: "Read our current Terraform state and suggest which resources should be moved into modules for better organization"
Observability
Grafana MCP Server — Dashboard and Alert State
The Grafana MCP server gives your AI access to your Grafana dashboards, data sources, and alert configurations — enabling AI-assisted dashboard building and alert review using your actual metric schema.
Grafana operations:
- Read an existing dashboard JSON and generate a new panel using the same data source configuration and query patterns
- List all firing alerts across all dashboards during an incident investigation
- Review alert thresholds across all services and identify alerts that have never fired (potential dead alerts)
- Generate dashboard documentation from actual panel configurations
OpenTelemetry MCP Server — Trace and Metric Context
The OpenTelemetry MCP server connects your AI to your observability data — traces, metrics, and logs — enabling AI-assisted performance analysis and incident diagnosis using real telemetry.
Observability-driven diagnosis:
- Fetch recent traces for a specific service and identify the slowest spans in the critical path
- Query P95 latency metrics for a service over the last 24 hours to contextualize a performance regression
- Find distributed traces that crossed the error threshold and identify their common characteristics
Documentation and Research
Brave Search MCP Server — Current Cloud Documentation
Cloud provider documentation changes constantly — new service features, deprecated APIs, updated pricing models, and security advisories. The Brave Search MCP server ensures your AI generates advice based on current documentation rather than potentially outdated training data.
Cloud documentation research:
- "Look up the current AWS Lambda limits for provisioned concurrency — I'm designing an auto-scaling policy"
- "Find the latest GKE Autopilot documentation for custom node pools — the behavior changed in a recent release"
- "Search for recent AWS RDS changes to automated backup retention policies"
- "Find the current Kubernetes documentation for the PodDisruptionBudget API — I need the exact field names"
GitHub MCP Server — Infrastructure Code Examples
Infrastructure patterns, Terraform modules, Helm charts, and Kubernetes manifests are extensively documented in open-source repositories. The GitHub MCP server gives your AI access to production-quality examples to reference when generating infrastructure code.
IaC research workflows:
- Find production-quality Terraform modules for EKS cluster setup in the AWS Terraform modules repository
- Search for Helm chart examples that match your application type and deployment pattern
- Browse GitHub Actions workflow examples for your specific deployment target (ECS, Lambda, GKE)
- Find Kubernetes operator implementations that match the pattern you're building
Recommended Stack by Cloud Engineering Role
AWS cloud engineer: AWS + Kubernetes + Terraform + Grafana + Brave Search
GCP cloud engineer: GCloud + Kubernetes + Terraform + OpenTelemetry + Brave Search
Multi-cloud / platform engineer: AWS + GCloud + Azure CLI + Kubernetes + Terraform + Grafana
SRE / on-call engineer: Kubernetes + Grafana + OpenTelemetry + AWS + Brave Search
DevOps / IaC engineer: Terraform + AWS + Docker + Kubernetes + GitHub + Brave Search
Start with your primary cloud provider's MCP server and Kubernetes — together they cover the live state access that makes AI assistance genuinely useful for infrastructure work. Add Terraform for IaC workflows and Grafana or OpenTelemetry for observability. Brave Search is always valuable for ensuring your AI's recommendations reference current documentation.
Browse the full cloud MCP servers catalog or see Best MCP Servers for DevOps Engineers for CI/CD pipeline integrations that complement this cloud infrastructure stack.