Evaluation futures: AI’s shaping of and contribution to emerging landscapes.

Evaluation futures: AI’s shaping of and contribution to emerging landscapes.

NOTE: This is a rough first draft of a possible article. When I shared these reflections in meetings and workshops, they resonated with the audience, so I consolidated them. I hope they can also be helpful to the readers as they are now. But there is room for improvement! Ideas, suggestions, comments, references, etc… are VERY welcome, not just about details but also the overall vision. I believe in mutual support and collective intelligence, especially when things are changing, and ideas are evolving so fast!

This draft explores how AI could transform evaluation as we know it. With AI already capable of handling most evaluation tasks, the field is on the brink of a major shift: routine, bureaucratic assessments are set to become largely AI-driven. So, what’s next?

This paper identifies two key areas that call for fresh approaches and new roles for evaluators:

  • Dynamic models for real-time evaluation: As AI increasingly enables models to support real-time decision-making (think “digital twin” cities), how will evaluative skills blend into this process?
  • People-centred participatory practices: Keeping diverse human perspectives central - and groundedness into reality! - will remain essential. Can evaluation foster grounded, inclusive processes that harvest collective and diverse voices? How can evaluative skills be adapted to leverage AI in this context?

Awareness of these complementary—yet very different—spheres can help us consider what’s next for evaluators at the time of AI. And how our skills can keep evaluation (and AI) inclusive and responsible.

 

Increased automatisation of standardised evaluations.

 Artificial Intelligence (AI) has rapidly evolved to perform various evaluation tasks, from setting up methodologies to data collection and analysis, reporting, and even administrative chores.

Major changes - driven by the potential for cost-cutting and streamlining workflows – are likely to occur in the currently standardised evaluations of a bureaucratic nature: those following conventional Terms of Reference (ToRs) and embedded in project management processes, built on predefined criteria (e.g. OECD/DAC) and routine approaches, and reporting through preset formats.

They are especially suited for substitution by AI and likely to evolve into what could be termed "human-supervised, AI-driven evaluations." The substitution of human evaluators with AI for most tasks will be gradual yet swift, acknowledging the need for thorough testing and validation while delegating to AI as much as possible. The field will become increasingly adept at distinguishing which tasks can be conducted with minimal supervision and where the risks of misinterpretation, hallucination, or biases are higher. In response, the evaluation community is likely to create new templates and workflows designed to fully leverage AI's capabilities. Some processes will require only limited oversight, while others will necessitate more significant human involvement due to their complexity or the potential for error. Consequently, a variety of approaches will emerge—tailored to meet different evaluation needs and levels of supervision or expertise. The bottom line, however, is that AI will take on much of the work.

The implications are clear: AI could significantly reduce the demand for human evaluators, raising critical questions about the profession's future. If AI’s impact is confined to standardization and substitution, it would be highly problematic, as social disruption and environmental costs associated with maintaining AI systems could outweigh its benefits.

 

Drivers of novel possibilities

Yet AI opens novel possibilities for evaluation, which might stem from the combination of 4 major key drivers:

  • Big data: AI can efficiently analyse large amounts of data using various analytical methods to uncover trends and patterns. This enables evaluative activities to detect insights that might otherwise be missed in traditional analysis.
  • Timeliness: AI can process and analyze data much faster than traditional methods, enabling real-time sensemaking and adaptive approaches. This allows evaluative activities to be more responsive to ongoing changes, providing timely feedback for decision-making.
  • Inclusiveness: AI can facilitate the inclusion of diverse perspectives and stakeholders in the evaluation process by accessing more varied evidence, potentially making evaluative activities more representative. It also has the potential to democratize evaluation by supporting participatory processes where citizens and marginalized groups can actively engage.
  • Ethics: AI systems can be designed to be sensitive to ethical considerations, such as data privacy, bias, and the inclusion of marginalised groups. Ensuring that AI is ethically deployed and appropriately monitored is essential to safeguard its role in evaluation – but also in the evaluands.


 

Two complementary landscapes for evaluation

This paper explores two distinct yet complementary landscapes for evaluation. These are not simply “two types of evaluation” but represent emerging areas for developing novel and innovative practices. As AI begins to influence the field more deeply, we may experience a shift from traditional evaluations —where roles and methods have remained fairly consistent—toward a wider range of evaluative activities. These activities may differ significantly from conventional practices yet align more closely with the core definition of evaluation as "any systematic process to judge merit, worth, or significance by combining evidence and values”[https://www.betterevaluation.org/getting-started/what-evaluation] and can better leverage the unique skills of evaluation professionals and practitioners. 

·       AI-powered dynamic model evaluation means evaluators will increasingly work with aggregated data and models created across various fields, not just within evaluation. These models, built from large datasets, evaluations, and research from multiple sectors- and are increasingly improved to mirror reality and identify trends, patterns and insights.

· AI-assisted creative people-driven Evaluation: humans are in the lead, using AI for innovative approaches rooted in real-life engagement and relations, people-centred. The emphasis is on uncovering unique insights or perspectives that mainstream analysis might miss. 


AI-powered dynamic model evaluation

The trends in using AI to create large-scale models and their implications for evaluation practices are becoming increasingly clear [https://dl.acm.org/doi/pdf/10.1145/3696009]. Urban interventions and policies may help to demonstrate this in practice. Urban interventions and policies provide a strong example of this shift in action. Take "digital twin cities," for instance, where the push for real-time adaptiveness is likely to seamlessly integrate assessment, monitoring, and evaluation processes. Digital twins are virtual replicas of physical entities, enabling real-time simulation, monitoring, and analysis. AI now makes it possible to scale these models to urban and even regional levels.

While digital twins have traditionally been applied in fields like engineering, manufacturing, and urban planning, it's not difficult to envision how this concept could expand to include social structures. For instance, digital city twins are already incorporating climate and socio-economic factors. These dynamic, multilayered models—continuously updated in real-time—open up new possibilities for evaluating policy impacts with unprecedented immediacy, potentially transforming the role of evaluators.

As data analysis becomes an everyday activity through such models, this evolution calls for a fundamental redefinition of the evaluator's role. Rather than primarily producing evidence, evaluators will focus on navigating and interpreting the continuous flow of insights from real-time sources—whether through sensors (the "internet of things"), AI-driven polls, sentiment and behaviour analysis, or data generated by diverse actors. When producing new evidence, it must be highly targeted and purpose-driven, aligned with the most critical areas for insight. 

Evaluators can, of course, play a critical role in the design of such models, leveraging their core expertise in identifying which types of evidence will best capture shifts in reality and framing the questions that models can effectively answer. Their role goes well beyond that of a typical “data analyst”—it’s not just about ensuring accuracy, but about ensuring the data reflects complex and meaningful realities. Evaluators are uniquely positioned to identify gaps in data collection in the model architecture - addressing biases or misrepresentations - and ensure the evidence and findings resonate with real-world needs.

While data analysts may focus on processing and interpreting data at a granular level, evaluators synthesise evidence to provide a broader, strategic understanding. Automated systems can handle low-level, real-time queries (a practice already being tested linking bots to such models!). Still, evaluators extend beyond routine inquiries by helping model designers and users grasp the capabilities and limitations of these systems. They play a pivotal role in distinguishing between questions that automated processes can handle and those that demand deeper, strategic analysis. The ability to interrogate models at a higher level is crucial, requiring a comprehensive understanding of stakeholder needs and the importance of context. Evaluators must, therefore, rediscover the value of intricate relationships and rigorous question design.

More importantly, evaluators can stay vigilant to the gaps between reality and digital models – given the risk of technocracy. They not only reveal the questions these models can answer but also recognize their limitations—methodologically (regarding the approaches and techniques the models use), epistemologically (in terms of the knowledge they produce), and axiologically (in terms of the values they reflect or overlook). By identifying the issues that still need attention in real-world contexts, evaluators ensure that the human element remains central to complex decision-making processes.


 

AI-Assisted Creative People-Driven Evaluation

The other landscape is AI-assisted, people-driven evaluation, which recognises that AI—and the models based on it—are not a substitute for reality and cannot fully capture the rich diversity of experiences, perceptions, and ideas surrounding change.

AI has tremendous potential to democratise evaluation, offering tools and knowledge to groups historically excluded from evaluative processes. This is, for example, particularly significant in the growing movement toward the localization of evaluation, which emphasizes context-specific, community-driven insights, ensuring that local voices shape evaluative processes. It also resonates strongly with long-standing approaches like participatory and feminist evaluation, which have historically championed empowerment and agency. While these approaches have been recognized and practiced for years, they often remained at the margins. Now, however, they offer a crucial counterbalance to the increasing reliance on AI-driven models, ensuring a more inclusive and human-centered evaluation process.

Yet, it is crucial to acknowledge that AI systems carry inherent biases, making it even more urgent to ensure that the voices of underrepresented and marginalized groups are not only included throughout the evaluation process but also not distorted by these biases, which could taint the analysis and sharing of findings if not adequately kept in check. Facilitators of these evaluative processes must, therefore, remain vigilant to this risk.

A critical caveat is that people-driven evaluation goes far beyond simply analyzing sentiment on social media, which can be skewed by bots, lacks reach and often fails to capture the depth and diversity of real-world experiences. The focus here is on AI as a tool within authentic participatory processes, where it enhances engagement without preemptively shaping or guiding interactions. AI can be used to maximize and harvest meaningful participation, not as a replacement for genuine involvement.

For example, in citizen monitoring initiatives, community members collect and analyse data themselves, deriving findings for advocacy and action. Participatory evaluative activities—where evaluators adopt a more facilitative rather than “expert” role, while still safeguarding and strengthening methodology to ensure relevance and validity—have long been established and are gaining traction globally. AI has the potential to further augment these processes on many fronts—and has already proven capable of supporting existing capacities, sharing methodological learning, streamlining interactions, efficiently summarising and pre-processing data, and tailoring communication to meet the diverse needs of stakeholders—ultimately enabling richer, more inclusive evaluations. Whether in urban planning, environmental monitoring, health programs, crisis response, or agriculture, AI can amplify citizen voices while assisting in the complex task of processing—but not distorting—evidence

In this approach, humans and their relations remain central and in control, using AI to capture a broader range of experiences and perspectives while ensuring that diverse voices are heard. These processes are inherently empowering. Importantly, these approaches are not simply about “feeding more data into AI systems.” This people-centred method fosters more inclusive and impactful evaluative processes, whose results can better be shared and heard in decision-making processes, truly democratising not only the evaluation process but the broader society.

It is then crucial to maintain strict oversight of AI, ensuring its use aligns with ethical standards and safeguards the integrity of local knowledge. AI must be adapted to local contexts, avoiding one-size-fits-all applications that often fail to capture the unique complexities of different communities. The most effective evaluative activities will not just use AI but critically engage with it, fostering a virtuous cycle that strengthens both the responsible use of AI and the localisation of its processes. This ensures that AI enhances evaluation by respecting, amplifying, and remaining accountable to the voices of the communities it serves—and that evaluation itself can act as a safeguard to ensure AI’s responsible use.


 

Complementarities

 As diverse as they seem, these approaches are deeply complementary. AI-assisted, people-driven evaluation will serve as a critical reality check on AI models, illuminating issues that the technology alone cannot capture. It will help reveal unrecognized dynamics, the experiences of underrepresented groups and communities, and areas impacted by systemic biases. On the other hand, AI can make model outputs more accessible to citizens, enabling them to engage more deeply with the scenarios these models produce (as seen in projects like UrbanistAI).

Through participatory activities, citizens not only validate—or challenge—model outputs but also highlight what is missing or what doesn’t align with their reality, ambitions, and expectations. This process enriches our understanding of current realities as well as possible and desired futures. Additionally, these participatory processes can raise public awareness of the models influencing their lives, fostering greater accountability and transparency. By making citizens more conscious of the AI-driven governance shaping society, participatory evaluation strengthens civic engagement and ensures that governance remains inclusive, responsible, and human-centered.

In this way, evaluative activities will transcend mere technocratic checks, positioning themselves as safeguards to ensure that technology serves the public good and that decision-making stays grounded in diverse, human perspectives—while consistently addressing and exploring marginalized voices.

 

 

The challenges ahead

Evaluation, as it exists today, is set to be profoundly transformed by AI, and this shift will happen rapidly, with potentially enormous consequences for the field. The temptation toward automation comes with a great risk, which is far greater than just “making evaluators redundant.” The real challenge lies in the governance of AI-driven systems. In an era where data is often compared to “the new oil,” controlling the flow of information and establishing governance structures around these systems must be carefully considered.

When examining the four factors likely to shape the future of evaluation—big data, timeliness, inclusiveness, and responsible use—it becomes clear that big data and speed are set to dominate, potentially leaving inclusiveness and ethical considerations lagging behind, despite strong advocacy for their importance. 

Evaluators are uniquely positioned to rethink not only how we make sense of reality but also how we assess change with a deep understanding of the values embedded within these systems. As traditional evaluation approaches become increasingly automated, evaluators must advocate for balance—ensuring that the human element, ethical reflection, and inclusivity are not overshadowed by the rapid and still largely unchecked advances in AI and big data.

Evaluation, as it stands, is indeed becoming a thing of the past, with more processes shifting toward automation. But the pressing question remains: can we make evaluation future-proof and relevant for the challenges ahead? The answer lies in transforming evaluative activities to not only harness the power of AI but to ensure that it remains anchored in human values—integrating responsibility, inclusiveness, and transparency into the evaluation process itself. Evaluators must play a key role in shaping the systems that will define the future of evaluation - and of society as a whole -ensuring that technological progress does not outpace ethical responsibility and that governance remains inclusive, equitable, and human-centred.

Silva F.

Independent Consultant

13h

And, as usual. Let's remember that what is now shaping the world, more than AI, is our attitude toward humanity. I stand in solidarity with the people oppressed in the face of genocide, occupation, and apartheid.

To view or add a comment, sign in

More articles by Silva F.

  • DEC Appeal: heartfelt suggestions.

    DEC Appeal: heartfelt suggestions.

    Finally, the DEC appeal was launched. I hope organizations make good use of the significant funds (showing massive…

    3 Comments
  • AI supervision loops

    AI supervision loops

    [This article is a work in progress, to be improved with comments and input from readers. Thanks!] In today's rapidly…

    14 Comments
  • "Presentabilty"

    "Presentabilty"

    A recent AEA365 article put forward the issue of "presentability", noting that “AI’s ability to enhance presentability…

    5 Comments
  • Tips for talking and communicating about Gaza

    Tips for talking and communicating about Gaza

    Note: I encountered many problematic narratives when reading current statements and communication about Gaza, even in…

    20 Comments
  • Navigating aid-washing: the maritime corridor for Gaza

    Navigating aid-washing: the maritime corridor for Gaza

    Note: This piece builds on a previous post, which, I realized, needed a more detailed exploration. It is presented as…

    51 Comments

Explore topics