KEY FINOPS KPIS FOR OPTIMAL SAVINGS
How do you measure the success of a FinOps program?
Cloud is variable and ever-changing, and while some FinOps principles never really change, we like to evolve our philosophy with both best practices and the state of the market in mind. The FinOps Foundation recently released their 2024 State of FinOps Report, and found that priorities are shifting for many organizations. Reducing cloud waste and managing commitment discounts are now top of mind for many organizations, reflecting the economic uncertainty of the last year.
At the same time, organizations are preparing for the costs of running AI and machine learning in the cloud. Since there is little to no baseline for such costs, organizations have been investing heavily in tools that can more accurately forecast cloud usage and spend. There is also a push to empower engineers to take action and optimize workloads to help control costs.
But without access to the right data, it’s impossible to know if costs are actually under control. Read on to learn which metrics AHEAD prioritizes in its reporting to help clients derive critical insights into the effectiveness of their FinOps program.
Top 10 KPIs of a Healthy FinOps Program
As organizations continue to adopt cloud, the need to measure success of FinOps programs becomes more and more crucial. There are dozens of KPIs through which an organization can measure the success of their FinOps program, but the following are the ten KPIs that we see as the best overall indicators for our clients’ needs and goals in 2024.
Percentage Variance of Budgeted vs. Forecasted Cloud Spend
What it is: The difference between budgeted and forecasted costs for using public cloud services.
Why we measure it: This gives us insight into how well your budget aligns with your forecasts of actual cloud expenses. An initial budget usually won’t match the real-time data that reveals your actual spending on a rolling basis. Taking your past cloud consumption and anticipated changes in cloud utilization or utilization rates into consideration helps your forecasts become more accurate and your future cloud budgets become more realistic.
Anomaly Detection Cost Avoidance and Unpredicted Variance of Spend
What it is: The money you were able to avoid spending by identifying and remedying an anomaly in cloud usage, and the unpredicted variance of cloud usage over a period of time, respectively.
Why we measure it: Both calculations are unique to your organization, as both depend on how frequently you monitor your cloud usage. As your FinOps program matures, both calculations are key indicators of how well you’re avoiding unpredicted – and unbudgeted – costs.
Percentage of Commitment Discount Waste
What it is: The percentage of commitments in relation to what is deployed.
Why we measure it: This is like asking how much money you’re leaving on the table, or how many coupons you’ve let expire. In short, this KPI measures ow well you’re using the credits you’ve earned for the spend you’ve already committed to your CSP.
Percentage of Compute Spend Covered by Commitment Discounts
What it is: Pretty much what it sounds like – the percentage of your compute spend covered by your commitment discounts for the previous month.
Why we measure it: Getting the best price on cloud usage involves actually using your commitment discounts. The higher this ratio becomes, the more you save. We work with customers to determine if Reserved Instance, Reserved Capacity, or Savings Plans are the best course of action. For more dynamic environments, lower-level environments, test environments, etc., a Savings plan will likely be the best opportunity for a commitment discount. Savings Plans span the entire compute estate for any given customer (VMs, Kubernetes clusters, Web Apps, etc.). For production environments, we work with customers to configure a Reserved Instance (if a VM) or Reserved Capacity (for things like storage). Once configured, we track usage via tooling so the customer does not have to worry about calendar dates or reminders to reup the commitment discount at the end of the commitment discount period.
Untagged or Mis-Tagged Resources
What it is: Measuring the costs associated with untagged or mis-tagged resources.
Why we measure it: While not all cloud resources can be tagged, many can, and the more metadata associated with your resources, the easier it becomes to build the right visibility and forecasting for cloud workloads. Mistagged resources or untagged resources leave a lot of cloud spend hard to detect or create forecasts for, so cloud spend becomes challenging if a customer wants to build a show back or charge back mode. A big charter we work on with customers is creating the best tagging baseline possible, related to what they’re trying to achieve. From there, we help clients plug in the right dashboards, reporting, and structure for ongoing maintenance.
Percentage Resource Utilization
What it is: The measure of cloud resource utilization such as compute, block storage, or object storage.
Why we measure it: How many of each type of resource is being utilized as a percentage of the total capacity allocated? We compare these utilization rates over time to the expected utilization rate based on your running workloads. We also want to ensure every resource deployed is actually used, otherwise we need to build a plan to decommission underutilized or idle workloads. Using these measurements help us keep environments in a good hygiene state and keep cost optimized and contained.
Percentage of Idle or Unused Resources
What it is: The measure of your unused cloud resources.
Why we measure it: To reduce cloud waste. Monitoring tools help us catch unattached or orphaned storage volumes, underutilized VMs, underutilized PaaS instances, unattached elastic IPs, unattached load balancers, idle snapshots and more. This metric helps us keep workloads optimized and make the most of your available cloud resources.
Hourly Cost per CPU Core
What it is: Well, you know… Measuring the hourly average cost of each of your CPU cores.
Why we measure it: The average cost per CPU core helps determine the unit cost per CPU core. Are you paying for more compute than you need? Can reservations or migrating to instance types with better performance save you more?
Cost per Gigabytes Stored
What it is: The average cost per gigabyte stored in the cloud.
Why we measure it: Like Hourly Cost per CPU Core, Cost per Gigabytes Stored is a metric related to unit economics, which should inform the decision-making of your engineering teams to help meet cloud cost and usage goals. Storage tiers and proper data lifecycle management can help optimize this number.
Total Amortized Monthly Spend
What it is: the average monthly cost of cloud resources when accounting for long-term investments. Instead of recognizing expenses all at once, amortization spreads the cost of cloud resources over a term (e.g. 1 year or 3 years).
Why we measure it: Tracking total amortized spend on a monthly basis helps customers understand the true cost of cloud resources by spreading upfront expenses over their useful life. This provides a more accurate view of ongoing costs, aids in budgeting and forecasting, and helps identify cost optimization opportunities.
How AHEAD Tracks and Manages These KPIs
Identifying the right KPIs for your FinOps program is challenging enough, and it can often be muddled by disparate data sources that aren’t giving you the visibility you need for accurate measurement. And while it’s completely possible to build some of this reporting yourself in PowerBI – namely spend and cost trends, individual subscription or project costs, application costs, and even tracking Azure usage (which in itself is a much more complicated dashboard build) – the DYI approach can still leave you missing some critical elements of measurement.
AHEAD FinOps Services can deliver tooling, data analysis, insight reporting, and backlog development equivalent to the workload of two full time employees. Our custom tools offer more detailed insights than your standard reporting. We can create budgets and budget alerts, tagging activities, and shutdown recommendations, and our automated Cloud Reporting Insight dashboards constantly provide recommendations and courses of action for optimal cloud usage and savings.
Our work across many different verticals allows us to leverage what we learn and apply best practices to your organization immediately. AHEAD’s certified and skilled resources are experienced in cloud optimization tools and all major public clouds, and apply insights from our reporting to drive average first-year cloud cost reductions of 20-30% for our clients.
Learn more about AHEAD FinOps engagements here.
About the author
Shannon Kuehn
Managed Solution Architect, FinOps
As a Cloud and Platform Engineering leader at AHEAD, Shannon draws upon almost 20 years of rich IT experience to craft pioneering, efficiency-driven technical solutions tailored for enterprise clients. Her expertise spans a robust background in cloud computing, virtualization, and automation, complemented by recognized certifications in both Microsoft and VMware technologies. Her professional mission revolves around empowering clients to harness the transformative power of a software-defined universe while optimizing cloud operations.