Should Associations Trust AI With Their Numbers?

Picture a budget retreat for an association with national reach, dozens of chapters, and an annual conference that keeps the lights on. A Gen AI tool has produced next year’s membership forecast, estimated sponsorship revenue, and suggested changes to dues tiers. The narrative reads smoothly, the graphs look convincing, and one understated percentage change in the cash-flow section quietly flips the board’s view of risk. Staff, volunteer leaders, and section chairs walk out aligned around a story that never should have passed the first calculator check.
The Jagged Frontier Of Gen AI
This is exactly the class of problem the team at Omni Calculator set out to test with the ORCA Benchmark. Across 500 real-world quantitative tasks, including finance, health, and everyday math, no leading model scored above 63 percent. Users face a material chance of getting the wrong answer even for straightforward questions. A companion research paper on ORCA shows that most errors come from basic calculation and rounding mistakes rather than exotic logic failures. For associations that manage reserves, fund advocacy, and support chapters through shared services, that level of unreliability in Gen AI-generated numbers demands a deliberate response.
Harvard researchers describe this pattern as a jagged technological frontier. In their field experiment with 758 consultants, GPT-4 dramatically improved speed and quality on some tasks while degrading performance on other tasks that looked similarly difficult to human eyes. Their jagged frontier study shows that Gen AI capability does not rise smoothly with complexity. Instead, it spikes in some activities and drops in others, and even experienced professionals struggle to predict where the tool helps and where it harms. Workers did best when they had explicit guidance about when to lean on the system and when to rely on their own judgment. That lesson translates directly to association executives, staff, chapter officers, and section leaders. Access to Gen AI without training is a recipe for confident mistakes.
The ORCA findings reveal why this matters so much for numerically anchored work inside associations. Large language models behind Gen AI are designed to predict the next token in a sequence, not to execute deterministic arithmetic. The ORCA team reports that even when a model describes the correct formula for interest, amortization, or present value, it often misapplies that formula in multi-step calculations. Users have roughly a 40 percent chance of receiving a wrong answer on everyday math problems, with compounding errors common in financial reasoning. For associations, that touches dues modeling, conference pricing, reserve drawdown scenarios, scholarship funding, and chapter support formulas.
Where Associations Are Most Exposed
Regulators and standards bodies are starting to codify this risk. The U.S. National Institute of Standards and Technology released a dedicated Generative AI Profile as a companion to its AI Risk Management Framework, warning that Gen AI systems confabulate, producing fluent but false content, including incorrect logic and fabricated references. A practical summary of the same document explains how the profile catalogues risks such as over-reliance, automation bias, and opaque failures, and offers controls tailored to generative tools. An accessible overview of the guidance notes that the Gen AI profile exists precisely because organizations struggle to see where language fluency masks underlying unreliability. Associations sit in that crosshairs: they use narrative to persuade boards and members, and they rely on arithmetic to budget, price, and manage risk.
Governments are already experiencing how fast Gen AI adoption grows once the door opens. A recent GAO report on federal Gen AI use found that across 11 major agencies, reported generative use cases rose from 32 in 2023 to 282 in 2024, a ninefold jump. A related summary from the Defense Management Institute underscores that total AI-related use cases nearly doubled in the same period, with generative tools filling a growing share of mission support work. Analysts reviewing the GAO findings stress that agencies now need structured governance and training simply to keep up with their own pilots. Associations face a similar pattern at smaller scale: once staff, committees, and chapters start using Gen AI for member communication, education content, and basic analysis, usage accelerates and informal habits harden into de facto practice.
For associations, the jagged frontier has three direct implications. Gen AI excels at language-heavy tasks with loose numerical stakes, such as drafting policy briefs, segment-specific newsletters, or conference session descriptions. It performs inconsistently on tasks that blend narrative with arithmetic, such as sponsorship forecasts or multi-tier membership projections. It also underperforms as a primary calculator for high-stakes, multi-step financial reasoning. The right response is not to ban Gen AI, but to teach staff and volunteers exactly where it belongs in the process and to pair it with deterministic tools whenever money, risk, or member fairness depend on the numbers. The ORCA team’s public benchmark summary explicitly recommends this pattern: let Gen AI structure the problem and explain it, while trusted calculators handle the math.
A Case Study In Association Gen AI Governance
Consider a concrete case from my own work. Envision yourself as a consultant engaged by a national association with roughly 40,000 members, strong chapter activity, and sections for early-career professionals, senior leaders, and international members. The CEO and board want to use Gen AI everywhere for operational efficiency and to set an example for the field. You propose a structured approach rooted in the jagged frontier. In the first phase, you inventory how staff, chapter officers, and section volunteers already use Gen AI for budgeting, member outreach, and program design. You then run a simple awareness session where you explain the Harvard jagged frontier research and walk through examples from their own work where Gen AI shines and where it fails.
Next, you help the finance and membership teams redesign their workflows. Gen AI drafts narrative explanations for dues recommendations, scenario descriptions, and board memos. Deterministic tools perform every calculation that touches cash: dues projections by segment, conference break-even models, grant allocation formulas, and reserve stress tests. Staff cross-check any Gen AI-generated number against a vetted calculator, including public tools such as Omni’s APY calculator for interest-based scenarios or internal spreadsheets for margin analysis. Together, you define clear rules that treat Gen AI outputs as drafts that gain authority only after verification.
Then you extend that discipline to the network. Chapter treasurers receive a short playbook that explains where Gen AI supports them, for example in drafting member updates or translating guidance into local context, and where they must avoid using it as a calculator. Section leaders responsible for specialized content learn to treat Gen AI as a research and drafting partner, not as a source of final statistics. Across each audience, training emphasizes the same three messages: Gen AI is uneven, numbers that matter require independent checks, and the association expects that discipline as part of its culture of professionalism.
The payoff in this case is twofold. Internally, the association reduces the risk of Gen AI-driven miscalculations in budgets and pricing while still unlocking time savings in drafting and analysis. Externally, it models a mature standard for members who are wrestling with the same technology in their own organizations. When board members see that their association uses Gen AI to accelerate work, yet insists on calculator-backed numbers and staff trained in the jagged frontier, they see a credible blueprint rather than hype.
For association leaders, the message from ORCA and Harvard’s jagged frontier research converges on a simple principle: do not trust Gen AI with critical numbers unless a human and a deterministic tool stand between the model and the decision. Gen AI already changes how associations write, plan, and serve members. The question now is whether those changes come with quiet arithmetic errors or with clear guardrails, training, and verification. Associations that choose training over blind trust, and calculators over wishful thinking, will protect their finances, support their volunteers, and set a standard their members can confidently follow.
Key Take-Away
Gen AI excels in drafting but falters in calculations, so pairing it with verified tools, training, and human judgment is essential to avoid costly, confident mistakes. Share on XImage credit: Yan Krukau/pexels
Dr. Gleb Tsipursky, called the “Office Whisperer” by The New York Times, helps tech-forward leaders replace overpriced vendors with staff-built AI solutions. He serves as the CEO of the future-of-work consultancy Disaster Avoidance Experts. Dr. Gleb wrote seven best-selling books, and his forthcoming book with Georgetown University Press is The Psychology of Generative AI Adoption (2026). His most recent best-seller is ChatGPT for Leaders and Content Creators: Unlocking the Potential of Generative AI (Intentional Insights, 2023). His cutting-edge thought leadership was featured in over 650 articles and 550 interviews in Harvard Business Review, Inc. Magazine, USA Today, CBS News, Fox News, Time, Business Insider, Fortune, The New York Times, and elsewhere. His writing was translated into Chinese, Spanish, Russian, Polish, Korean, French, Vietnamese, German, and other languages. His expertise comes from over 20 years of consulting, coaching, and speaking and training for Fortune 500 companies from Aflac to Xerox. It also comes from over 15 years in academia as a behavioral scientist, with 8 years as a lecturer at UNC-Chapel Hill and 7 years as a professor at Ohio State. A proud Ukrainian American, Dr. Gleb lives in Columbus, Ohio.