Back to blog

2026-02-22 · 6 min read

Azure Costs Are Spiking and Nobody Knows Why. This Skill Finds the Waste.

AzureCost OptimizationGitHub Copilotazure-copilot-skills

Azure bill went up 40%. Nobody knows why. Nobody wants to own it.

Orphaned disks from VMs someone deleted three months ago. NAT gateways that survived a migration nobody documented. NICs attached to literally nothing. Classic cloud rot. Every team points at every other team.

Here's the thing. Microsoft actually ships a tool that finds this stuff automatically. Most people have no idea it exists.

There's a Copilot Skill for This

Microsoft has 30 official skills for GitHub Copilot for Azure. One of them is azure-cost-optimization. Thousands of weekly installs. Not some random community thing.

It's literally a SKILL.md file that Copilot reads as context. A 9-step workflow:

  1. Check prereqs -- az CLI, costmanagement + resource-graph extensions, azqr
  2. Load Azure cost optimization best practices via MCP
  3. Run Azure Quick Review to find orphaned resources
  4. Query Azure Resource Graph (KQL) to map what you have
  5. Pull 30-day actual costs from the Cost Management REST API
  6. Validate pricing against official Azure pricing pages
  7. Grab 14-day utilization metrics from Azure Monitor
  8. Generate a prioritized markdown report -- P1, P2, P3
  9. Save everything as JSON for audit

The report labels everything: ACTUAL DATA, ACTUAL METRICS, VALIDATED PRICING, ESTIMATED SAVINGS. No made-up numbers. Every figure is sourced.

Using It with Copilot

In VS Code:

@azure optimize costs for my subscription

Done. It validates prereqs first:

az --version
az account show
az extension show --name costmanagement
az extension show --name resource-graph
azqr version

The failure modes are annoying though:

  • Missing extensions -- sometimes fails silently. Just run az extension add --name costmanagement and az extension add --name resource-graph beforehand.
  • Wrong RBAC -- you need Cost Management Reader, Monitoring Reader, and Reader. Missing the first one? 403 on cost queries. Your report will have no dollar amounts and won't tell you why.
  • Classic (ASM) resources -- invisible. Resource Graph only sees ARM. If you have pre-2014 stuff, it's not showing up.

Basically: if permissions are wrong, it degrades quietly instead of telling you something's broken. Not great.

Do It Without Copilot

This is the part that matters. Same logic, raw CLI. No Copilot license needed.

Find Orphaned Resources

KQL queries through Azure Resource Graph. These find things costing you money and doing absolutely nothing:

# Unattached managed disks
az graph query -q "Resources | where type =~ 'microsoft.compute/disks' | where isempty(managedBy) | project name, resourceGroup, location, diskSizeGb=properties.diskSizeGB, sku=sku.name"

# Orphaned NICs
az graph query -q "Resources | where type =~ 'microsoft.network/networkinterfaces' | where isempty(properties.virtualMachine) | project name, resourceGroup, location"

# Unattached public IPs
az graph query -q "Resources | where type =~ 'microsoft.network/publicipaddresses' | where isempty(properties.ipConfiguration) | project name, resourceGroup, location, sku=sku.name"

# Idle load balancers (no backend pools)
az graph query -q "Resources | where type =~ 'microsoft.network/loadbalancers' | where array_length(properties.backendAddressPools) == 0 | project name, resourceGroup, location"

# Tag coverage audit
az graph query -q "Resources | extend hasCostCenter = isnotnull(tags['CostCenter']) | summarize total=count(), tagged=countif(hasCostCenter) by type | extend coverage=round(100.0 * tagged / total, 1) | order by total desc"

That last one -- tag coverage -- run it first. If your CostCenter tags are below 80% coverage, cost allocation is basically guesswork. Fix tagging before you fix anything else.

Query 30-Day Costs

The skill uses az rest hitting the Cost Management REST API directly. Not az costmanagement query. The REST API is more reliable. The skill's own docs say this.

Make a cost-query.json:

{
  "type": "ActualCost",
  "timeframe": "Custom",
  "timePeriod": {
    "from": "2026-01-22T00:00:00Z",
    "to": "2026-02-22T00:00:00Z"
  },
  "dataset": {
    "granularity": "None",
    "aggregation": {
      "totalCost": {
        "name": "Cost",
        "function": "Sum"
      }
    },
    "grouping": [
      {
        "type": "Dimension",
        "name": "ResourceId"
      }
    ]
  }
}

Run it:

az rest --method post \
  --url "https://management.azure.com/subscriptions/<SUB_ID>/providers/Microsoft.CostManagement/query?api-version=2023-11-01" \
  --body @cost-query.json

Swap <SUB_ID> with your subscription ID from az account show.

Check Utilization

14 days of CPU data. Find the VMs that aren't doing anything:

az monitor metrics list \
  --resource "<RESOURCE_ID>" \
  --metric "Percentage CPU" \
  --interval PT1H \
  --aggregation Average \
  --start-time 2026-02-08T00:00:00Z \
  --end-time 2026-02-22T00:00:00Z

Under 10% average CPU over two weeks? Rightsizing candidate. Under 2%? Probably doesn't need to exist.

Advisor Recommendations

Azure Advisor already knows about cost issues. Most people only check it in the portal. You can just query it:

az graph query -q "AdvisorResources | where properties.category == 'Cost' | project name, impact=properties.impact, description=properties.shortDescription.solution"

Or Just Install the Skill Directly

You don't need GitHub Copilot for Azure to use this skill. Install it into any agent that supports skills:

npx skills add https://github.com/microsoft/github-copilot-for-azure --skill azure-cost-optimization

Works with Claude Code, Cursor, Codex, and 40+ other agents.

Glue It Together

Run the orphaned resource queries, the cost query, utilization, and the Advisor pull. Save each output as JSON, then feed it all to an LLM with this prompt:

I have Azure cost and resource data from four sources:
1. Orphaned resources (Resource Graph KQL)
2. 30-day actual costs (Cost Management API)
3. 14-day CPU utilization (Azure Monitor)
4. Advisor cost recommendations

Analyze all of it. Give me a prioritized optimization report
(P1/P2/P3) with estimated monthly savings per item.
Flag anything that looks like waste with no active workload.

Here's what the output typically looks like (anonymized):

## P1 -- Immediate Action

| Resource | Type | Monthly Cost | Issue |
|----------|------|-------------|-------|
| disk-backup-old-017 | Premium SSD 512GB | $73.22 | Unattached since Jan 4 |
| disk-staging-temp-003 | Premium SSD 256GB | $36.61 | Unattached since Dec 19 |
| pip-legacy-gateway | Static Public IP | $3.65 | No associated NIC |

**30-day cost for orphaned resources: $187.40/mo**

## P2 -- Rightsizing

vm-api-prod-02 (Standard_D4s_v5): avg CPU 3.1% over 14 days.
Downsize to Standard_B2ms. Est. savings: $112/mo.

## Advisor Flag

"Right-size or shutdown underutilized virtual machines"
Impact: High | Affected: 3 resources

That gets you maybe 80% of what the Copilot skill produces. For free.

Stuff to Watch Out For

  • Use az rest, not az costmanagement query. The skill itself does this. More reliable, more control over the request body.
  • Classic (ASM) = invisible. Resource Graph is ARM only. Pre-2014 deployments won't show up anywhere in these queries.
  • Free tier trap. Container Apps has 180K vCPU-seconds/month free. App Service has F1. Seeing $0 in Cost Management doesn't mean waste. Check free tier allowances before you flag something.
  • Cost Management API is free to query. Azure Monitor has rate limits but you won't hit them on a one-time audit.
  • azqr doesn't catch everything. No custom tagging policies, no spot instance candidates, no reserved instance recommendations. For RI stuff, use Advisor.
  • Don't delete anything without approval. Save audit data as JSON. Test in non-prod first. A bad cleanup script takes down production way faster than any cost spike ever will.

More Coming

I'm writing more of these. Each one breaks down an official Microsoft Copilot skill -- what it does, how it works, how to do the same thing yourself. Look for the azure-copilot-skills tag.


Back to blog