Cloud Native carbon footprint - attributing emissions to meaningful business metrics

Attributing the carbon footprint of IT infrastructure end-to-end, from server hardware up through the services and transactions it powers, produces a single set of numbers that work both ways. Rolled up, they read as carbon per transaction, per customer, or per product, so the business can track whether things are improving over time. Drilled into, the same numbers point back at specific services, hosts and components, giving engineers the chain to follow and the levers that actually move the dial.

Our attribution approach, which we call Green Native, gives that breakdown a clear structure while staying flexible enough to fit different setups. It reads from metrics that cloud-native tools and provider APIs already collect, rather than standing up a new measurement layer. Integration stays cheap, and the footprint of the measurement itself stays negligible.

Physical truth

Three things determine the carbon footprint of any piece of data-centre hardware: the energy it draws, the carbon intensity of the grid powering it, and the embodied carbon baked into the hardware during manufacture. Servers, storage and network switches all draw electricity continuously, so the first two combine to give the operational footprint.

Grid carbon intensity is the amount of CO₂ emitted per kilowatt-hour of electricity consumed (gCO₂/kWh). It varies by region depending on the local energy mix: fossil-heavy grids sit far above renewable-heavy ones. The chart below compares Austria, Germany and Switzerland to make that range visible.

Intensity also moves through the day. When demand outpaces renewable supply, grids lean harder on fossil plants and intensity climbs; when renewables cover the load, it drops. Workload timing therefore matters: the same compute run at noon and at 3 a.m. can have meaningfully different footprints.

01 CO₂ Box

Grid carbon intensity - 24 hours

gCO₂ per kWh · pure grid signal

Grid region

selected region other regions (ghost) cleanest hour time range · Click to probe a 1-hour window · drag to select a range

• in the chain

L1 · grid › › › › › › ›

Pick a region and an hour in the chart above, then a hardware profile below. The region and hour fix the grid intensity; the profile fixes how much power the workload draws; PUE (Power Usage Effectiveness) covers the data centre's overhead such as cooling, lighting, and everything else that runs alongside the IT load. Multiplying the three gives emissions for the selected slice of time. Clicking a single hour shows the per-hour result; dragging a range sums it.

02 CO₂ Box

Hardware draw

Hardware x grid x PUE · for the window selected in Box 1

Hardware profile

Include embodied carbon

Amortisation · 4 years

1 year 6 years

Data-centre PUE

1.40

1.05 · hyperscale 1.40 · enterprise 2.20 · legacy

- Selected hour PROBE · 1 h

-gCO₂eq / hour -

Click anywhere on the chart in Box 1 to pin a 1-hour window.

- Time range

-g CO₂eq · total -

Drag across the chart in Box 1 to select a window. We'll show the total carbon this hardware profile would emit over that slice.

• in the chain

›L2 · hardware › › › › › ›

Embodied carbon is the manufacturing footprint of the hardware itself, spread over its usable lifetime. It is the one part of the footprint that grows in relative terms as grids get cleaner, so how long hardware lives and how well it is utilised become as important as the energy it draws.

PUE is the ratio of the data centre's total energy consumption to the energy used by IT equipment alone. A PUE of 1.0 means every watt drawn from the grid reaches a server; anything higher captures the share spent on cooling, lighting and other overhead.

Components of the server

A server is not a single energy sink. Almost all of its draw comes from four components: CPU, RAM, storage, and GPU. Each one consumes power differently as load changes, so attributing carbon back to workloads requires modelling them separately.

Networking is a special case. It usually runs on dedicated hardware (switches, NICs, load balancers) that isn't bundled with the server's power envelope. It still emits carbon, especially for workloads with heavy data movement, but it sits outside this article's scope.

03 CO₂ Box

Server

Selected window · segment split → attribution pool

Hardware profile

Where the carbon goes · cleanest 1-hour slot

CPU

RAM

GPU

CPU · 1 g 7% RAM · 4 g 43% Storage · 0 g 0% GPU · 4 g 49%

Probe total

9 g CO₂eq

AI Inference in DE · cleanest 1-hour slot

Attribution pool

8 g CO₂eq

after 5% host overhead · splits across L4 ▸ L5 ▸ L6 ▸ L7

Yearly · 1 server

1253kg CO₂eq

≈ 9641 km driven

Change hardware, grid region, or PUE in the boxes above; the segment widget, the 24 h chart, and every box below recompute instantly.

• in the chain

› ›L3 · server › › › › ›

The relationships differ. CPU draw scales roughly linearly with utilisation. RAM draws a near-constant load per gigabyte, present or not. Storage depends on the media type: SSDs sit at a flat operational level, HDDs cycle between active and idle. GPUs have the widest dynamic range: an idle accelerator costs little, while a fully loaded one can dominate the entire host's footprint.

04 CO₂ Box

24-hour timeline

Full day · CPU / RAM / Storage / GPU · click a chip to isolate

CPU load shape

Storage I/O shape

GPU load shape

All components · 24h timeline

3.43kg CO₂eq

CPU9%

RAM57%

Storage1%

GPU34%

CPU dominates on a high-memory node; GPU absorbs almost everything on AI Training. The mini-timeline shows where in the 24 h window each component is doing work.

• in the chain

› › ›L4 · component › › › ›

Component-level energy is measurable today, though not always consistently. CPU draw is exposed by RAPL (Running Average Power Limit); RAM may be included in the same package. GPU draw is read via vendor-specific tooling (NVIDIA's NVML, AMD's ROCm-SMI, etc.). Storage is the hardest: modern drives report health and performance over SMART, but rarely real-time power, so it is often estimated from access patterns or NVMe vendor logs.

Slicing the server with virtualization

Start with a single server. Its footprint breaks cleanly into CPU, RAM, storage and GPU contributions. The next step is to attribute that footprint back to the workloads sharing the box.

In the cloud, the default unit of sharing is the VM. Physical exclusivity exists, but costs more. Multiple workloads land on the same physical host through virtualisation, which means the server's footprint has to be distributed across VMs based on how much of the host each one consumes. Providers' VM-tier billing already follows the same pattern.

For each VM, the orchestrator already exposes how much CPU, RAM, storage and GPU it has been allocated, and how much it is actually using. Those numbers drive the split, which surfaces the VMs that contribute the most and where optimisation effort pays back.

05 CO₂ Box

Virtualisation split

One server → N VMs · resource x utilisation

VMs sharing this host

2

As more VMs crowd onto the host, slice of the server carbon shrinks proportionally. Same hardware, different tenancy.

Hypervisor / platform overhead

5%

Skimmed off the non-GPU carbon (CPU + RAM + storage) before VMs get their slice. Models the hypervisor and platform tax that runs alongside guest workloads.

Share of server carbon Total: 8.53 g CO₂eq

vm-a

vm-b

vm-a · 5.22 g (61%)

vm-b · 3.31 g (39%)

What's inside each VM

vm-a

5.22 g

vm-b

3.31 g

Storage

RAM

CPU

GPU

Overhead

• in the chain

› › › ›L5 · VM › › ›

Splitting a VM's footprint by component shows where the carbon actually sits, and where it doesn't. If GPU dominates, CPU-side optimisation barely moves the number. The same view also makes the impact of an allocation change visible: bumping RAM bumps the RAM share, which may or may not matter depending on the workload's overall mix.

Virtualisation has its own overhead: the hypervisor, the management plane, the platform glue. It should be small; if it isn't, that's already a useful signal. The model attributes it in equal shares to each VM on the host, which is a pragmatic choice given the overhead doesn't track any particular tenant.

The weighting is deliberately light. Each VM is scored by what it has been allocated (vCPU·s, RAM, storage, GPU), with CPU and GPU additionally adjusted for actual utilisation, the two components where usage really moves the number. The inputs come straight from metrics that orchestrators and cloud providers already expose, so the measurement itself is essentially free. In side-by-side checks against heavier tools that try to read real-time per-VM energy, the results land in the same ballpark, which is as good as it gets today, given that fine-grained per-VM energy readings are still not widely available.

Weighting by usage also resolves the noisy-neighbour case for free. Toggling a VM to noisy in Box 5 grows its share of the host's carbon without touching its allocation. The formula simply reads a higher utilisation, and the VM that is actually doing the work ends up with the bigger slice.

Slicing the VM even further with Containers

Most modern deployments don't stop at the VM. Kubernetes (or another orchestrator) layers on top: a workload is whatever runs as a set of containers (an application, a database, a monitoring daemon), and a single node hosts many of them at once, often across several physical machines. The same split has to happen one level deeper, attributing the node's carbon to the share of each workload running on it.

06 CO₂ Box

Workload attribution

Per-node pods

Pool to split 5.22 g

k8s-overhead #1

0.15 g

Util 30%

app-1 #1

3.12 g

Util 70%

db #1

1.95 g

Util 85%

Workloads · across all nodes Sum: 8.53 g CO₂eq

Storage

RAM

CPU

GPU

app-1 2 pods · on node-a, node-b

CPU request 1 / 2.00 vCPU

Memory request 4 / 54 GB

Persistent volume 50 GB / 4 TB

GPU request 1 / 1

Replicas 2 / 2

• in the chain

› › › › ›L6 · workload › ›

The inputs are the same shape as one layer up: CPU time (reserved and used) per pod, RAM allocation, persistent-volume usage, GPU time. All standard Kubernetes metrics. The node's carbon splits across its pods by weighted contribution, then those slices sum back up per workload across every node it runs on.

A design rule worth highlighting: every gram of carbon always gets attributed. Right-sizing a pod, removing a node, dropping an unused GPU all show up as the number going down, never quietly leaking into a side bucket. Anything that can't legitimately be charged to a workload (an empty node, a GPU nobody requests) is gathered into an explicit unused category. That bucket represents spare capacity, often kept on purpose to absorb demand spikes. Box 8 later distributes it across the transactions that benefit from the headroom, so the total stays attributable end-to-end.

One caveat: the unused figure is on the pessimistic side. In reality, idle nodes and unallocated GPUs draw closer to their idle power than their loaded power, so the actual emissions of spare capacity are lower than what is shown here. The model used for this explanation doesn't loop component utilisation back into the host power curve, so idle capacity inherits the host's overall draw rather than dropping to true idle. For day-to-day decisions the directional signal "this node isn't doing useful work" is still the part that matters.

Services and Transactions

Workloads don't stand alone. A web service calls auth; auth checks against a database; monitoring observes them all. Each call lands a fraction of the consumed service's emissions on the caller, so the workload-level numbers from Box 6 are only the starting point. The next step is propagating them along the service graph back to whichever transaction triggered the chain.

Shared services like monitoring, authentication and databases are touched by almost every workload. Their carbon has to be split across consumers by a signal that tracks the actual cost each caller causes: call count, telemetry volume, query load, session count. The split basis is itself a design decision: different teams attribute the same service in different ways, and Box 7 surfaces a few alternatives per source so the choice is visible rather than hidden in a formula.

07 CO₂ Box

Service attribution

Workloads → consumers · upstream & downstream share

Split basis

Method:

k8s-overhead seriesmonitoring seriesdb seriesauth seriesapp-1 seriesapp-2 series

Monitoring splits in proportion to the telemetry each observed service produces: metric series, log streams and traces combined. A chatty app pays more than a quiet one.

Click any service in the graph to focus it. The arrow labels show how many grams flow along each edge; incoming arrows are upstream sources, outgoing arrows are downstream consumers.

• in the chain

› › › › › ›L7 · service ›

Be practical about cycles

Notice that monitoring also points back at k8s-overhead: monitoring observes the cluster control plane the same way it observes everything else, so part of monitoring's carbon belongs to k8s. That makes the graph cyclic: k8s sends grams to monitoring, monitoring sends some back. A single attribution pass can't resolve this cleanly because the back-flow arrives after k8s has already distributed its balance to db, auth and so on. The pragmatic fix is to iterate: each pass flows only the delta a service received in the previous round, so the carbon settles geometrically over a handful of passes (six in this simulation). In actual practice cycles are common but the error from cutting them is often negligible. Try setting monitoring's self-edge to zero in the panel above (meaning monitoring doesn't track its own data), or zero out the k8s↔monitoring loop and watch the totals barely move. For production, running a separate meta-monitoring is the cleaner answer in multiple regards.

Transactions are the next granularity down. They represent a specific business operation, such as a user request, an inference call or a batch job, that triggers a set of service calls. Each transaction picks up a fraction of every service it touches, summing to a per-transaction carbon footprint the business can reason about directly.

The split between transactions inside a single service comes from application sampling: APM tools, request traces or simple timing instrumentation tell you how much of a service's work each transaction accounts for. From there it is a straightforward weighted attribution.

08 CO₂ Box

Transactions

AI inference · recommendations · checkout · carbon per business unit

AI product summary app-1

Tokens per day (cleanest 5-min slot) 17.4k

→ 0.25 g per 1k tokens

Personalised recommendations app-1

Recommendation sets per day (cleanest 5-min slot) 694

→ 0.41 g per 100 recs

Order checkout app-2

Orders per day (cleanest 5-min slot) 28 Average basket value €50

→ 0.048 g per order · 0.97 mg per € of GMV

app-1 split

AI product summary Personalised recommendations

Share across transactions Total: 8.53 g CO₂eq

AI product summary

Personalised recommendations

Order checkout

AI product summary · 4.31 g (51%)

Personalised recommendations · 2.87 g (34%)

Order checkout · 1.34 g (16%)

• in the chain

› › › › › › ›L8 · tx

The final framing depends on the business. A retailer cares about carbon per order or per € of GMV; a SaaS vendor about carbon per seat or per session; an AI service about carbon per 1k tokens. The point of going all the way to transactions is to land on a metric the business already uses, so engineering and product can share the same number when deciding what to fix.

What you do in the cloud

In the cloud, the VM layer is the most reliable entry point for this model. Providers rarely surface direct energy data from the underlying hardware, so VM-level emissions are typically estimated from hardware architecture and utilisation. Projects like cloudcarbonfootprint.org formalise that. Where providers do publish sustainability data for specific managed services (databases, queues, AI APIs), that data plugs into the same attribution chain with little modification.

Model fidelity tracks input quality: the more directly a chosen metric correlates with energy use, the better the attribution. Once the inputs are in place, the optimisation order writes itself. Start with the services and components that carry the largest share, then work down to smaller ones once those have been addressed. Understanding the model matters here: knowing how it routes emissions across components and services is what lets you turn an attribution number into a confident change.

Improvements

The output is a single number: the carbon footprint, decomposed all the way from physical hardware down to per-transaction rates. The goal is to bring it down. Having that number makes optimisation a measurable activity rather than a directional one.

Start with the biggest contributors. A service that carries a large share of the total has the largest absolute reduction available, whether that comes from code-level work (CPU efficiency), capacity work (right-sizing, deletion) or architectural change. Improvements happen at every layer of the stack; the value of the model is telling you which layer to open first.

At Posedio we run carbon-footprint assessments for IT systems and help teams act on the results, from cluster sizing and workload placement to architectural change. The interactive self-assessment above is a glimpse of how we think about the problem. See greennative.posedio.com for the full picture, or get in touch to talk about your stack.