Evaluation futures: AIâs shaping of and contribution to emerging landscapes.
NOTE: This is a rough first draft of a possible article. When I shared these reflections in meetings and workshops, they resonated with the audience, so I consolidated them. I hope they can also be helpful to the readers as they are now. But there is room for improvement! Ideas, suggestions, comments, references, etc⦠are VERY welcome, not just about details but also the overall vision. I believe in mutual support and collective intelligence, especially when things are changing, and ideas are evolving so fast!
This draft explores how AI could transform evaluation as we know it. With AI already capable of handling most evaluation tasks, the field is on the brink of a major shift: routine, bureaucratic assessments are set to become largely AI-driven. So, whatâs next?
This paper identifies two key areas that call for fresh approaches and new roles for evaluators:
Awareness of these complementaryâyet very differentâspheres can help us consider whatâs next for evaluators at the time of AI. And how our skills can keep evaluation (and AI) inclusive and responsible.
Â
Increased automatisation of standardised evaluations.
 Artificial Intelligence (AI) has rapidly evolved to perform various evaluation tasks, from setting up methodologies to data collection and analysis, reporting, and even administrative chores.
Major changes - driven by the potential for cost-cutting and streamlining workflows â are likely to occur in the currently standardised evaluations of a bureaucratic nature: those following conventional Terms of Reference (ToRs) and embedded in project management processes, built on predefined criteria (e.g. OECD/DAC) and routine approaches, and reporting through preset formats.
They are especially suited for substitution by AI and likely to evolve into what could be termed "human-supervised, AI-driven evaluations." The substitution of human evaluators with AI for most tasks will be gradual yet swift, acknowledging the need for thorough testing and validation while delegating to AI as much as possible. The field will become increasingly adept at distinguishing which tasks can be conducted with minimal supervision and where the risks of misinterpretation, hallucination, or biases are higher. In response, the evaluation community is likely to create new templates and workflows designed to fully leverage AI's capabilities. Some processes will require only limited oversight, while others will necessitate more significant human involvement due to their complexity or the potential for error. Consequently, a variety of approaches will emergeâtailored to meet different evaluation needs and levels of supervision or expertise. The bottom line, however, is that AI will take on much of the work.
The implications are clear: AI could significantly reduce the demand for human evaluators, raising critical questions about the profession's future. If AIâs impact is confined to standardization and substitution, it would be highly problematic, as social disruption and environmental costs associated with maintaining AI systems could outweigh its benefits.
Â
Drivers of novel possibilities
Yet AI opens novel possibilities for evaluation, which might stem from the combination of 4 major key drivers:
Â
Two complementary landscapes for evaluation
This paper explores two distinct yet complementary landscapes for evaluation. These are not simply âtwo types of evaluationâ but represent emerging areas for developing novel and innovative practices. As AI begins to influence the field more deeply, we may experience a shift from traditional evaluations âwhere roles and methods have remained fairly consistentâtoward a wider range of evaluative activities. These activities may differ significantly from conventional practices yet align more closely with the core definition of evaluation as "any systematic process to judge merit, worth, or significance by combining evidence and valuesâ[https://www.betterevaluation.org/getting-started/what-evaluation] and can better leverage the unique skills of evaluation professionals and practitioners.Â
·      AI-powered dynamic model evaluation means evaluators will increasingly work with aggregated data and models created across various fields, not just within evaluation. These models, built from large datasets, evaluations, and research from multiple sectors- and are increasingly improved to mirror reality and identify trends, patterns and insights.
· AI-assisted creative people-driven Evaluation: humans are in the lead, using AI for innovative approaches rooted in real-life engagement and relations, people-centred. The emphasis is on uncovering unique insights or perspectives that mainstream analysis might miss.Â
AI-powered dynamic model evaluation
The trends in using AI to create large-scale models and their implications for evaluation practices are becoming increasingly clear [https://dl.acm.org/doi/pdf/10.1145/3696009]. Urban interventions and policies may help to demonstrate this in practice. Urban interventions and policies provide a strong example of this shift in action. Take "digital twin cities," for instance, where the push for real-time adaptiveness is likely to seamlessly integrate assessment, monitoring, and evaluation processes. Digital twins are virtual replicas of physical entities, enabling real-time simulation, monitoring, and analysis. AI now makes it possible to scale these models to urban and even regional levels.
While digital twins have traditionally been applied in fields like engineering, manufacturing, and urban planning, it's not difficult to envision how this concept could expand to include social structures. For instance, digital city twins are already incorporating climate and socio-economic factors. These dynamic, multilayered modelsâcontinuously updated in real-timeâopen up new possibilities for evaluating policy impacts with unprecedented immediacy, potentially transforming the role of evaluators.
As data analysis becomes an everyday activity through such models, this evolution calls for a fundamental redefinition of the evaluator's role. Rather than primarily producing evidence, evaluators will focus on navigating and interpreting the continuous flow of insights from real-time sourcesâwhether through sensors (the "internet of things"), AI-driven polls, sentiment and behaviour analysis, or data generated by diverse actors. When producing new evidence, it must be highly targeted and purpose-driven, aligned with the most critical areas for insight.Â
Evaluators can, of course, play a critical role in the design of such models, leveraging their core expertise in identifying which types of evidence will best capture shifts in reality and framing the questions that models can effectively answer. Their role goes well beyond that of a typical âdata analystââitâs not just about ensuring accuracy, but about ensuring the data reflects complex and meaningful realities. Evaluators are uniquely positioned to identify gaps in data collection in the model architecture - addressing biases or misrepresentations - and ensure the evidence and findings resonate with real-world needs.
While data analysts may focus on processing and interpreting data at a granular level, evaluators synthesise evidence to provide a broader, strategic understanding. Automated systems can handle low-level, real-time queries (a practice already being tested linking bots to such models!). Still, evaluators extend beyond routine inquiries by helping model designers and users grasp the capabilities and limitations of these systems. They play a pivotal role in distinguishing between questions that automated processes can handle and those that demand deeper, strategic analysis. The ability to interrogate models at a higher level is crucial, requiring a comprehensive understanding of stakeholder needs and the importance of context. Evaluators must, therefore, rediscover the value of intricate relationships and rigorous question design.
More importantly, evaluators can stay vigilant to the gaps between reality and digital models â given the risk of technocracy. They not only reveal the questions these models can answer but also recognize their limitationsâmethodologically (regarding the approaches and techniques the models use), epistemologically (in terms of the knowledge they produce), and axiologically (in terms of the values they reflect or overlook). By identifying the issues that still need attention in real-world contexts, evaluators ensure that the human element remains central to complex decision-making processes.
Â
AI-Assisted Creative People-Driven Evaluation
The other landscape is AI-assisted, people-driven evaluation, which recognises that AIâand the models based on itâare not a substitute for reality and cannot fully capture the rich diversity of experiences, perceptions, and ideas surrounding change.
AI has tremendous potential to democratise evaluation, offering tools and knowledge to groups historically excluded from evaluative processes. This is, for example, particularly significant in the growing movement toward the localization of evaluation, which emphasizes context-specific, community-driven insights, ensuring that local voices shape evaluative processes. It also resonates strongly with long-standing approaches like participatory and feminist evaluation, which have historically championed empowerment and agency. While these approaches have been recognized and practiced for years, they often remained at the margins. Now, however, they offer a crucial counterbalance to the increasing reliance on AI-driven models, ensuring a more inclusive and human-centered evaluation process.
Yet, it is crucial to acknowledge that AI systems carry inherent biases, making it even more urgent to ensure that the voices of underrepresented and marginalized groups are not only included throughout the evaluation process but also not distorted by these biases, which could taint the analysis and sharing of findings if not adequately kept in check. Facilitators of these evaluative processes must, therefore, remain vigilant to this risk.
A critical caveat is that people-driven evaluation goes far beyond simply analyzing sentiment on social media, which can be skewed by bots, lacks reach and often fails to capture the depth and diversity of real-world experiences. The focus here is on AI as a tool within authentic participatory processes, where it enhances engagement without preemptively shaping or guiding interactions. AI can be used to maximize and harvest meaningful participation, not as a replacement for genuine involvement.
For example, in citizen monitoring initiatives, community members collect and analyse data themselves, deriving findings for advocacy and action. Participatory evaluative activitiesâwhere evaluators adopt a more facilitative rather than âexpertâ role, while still safeguarding and strengthening methodology to ensure relevance and validityâhave long been established and are gaining traction globally. AI has the potential to further augment these processes on many frontsâand has already proven capable of supporting existing capacities, sharing methodological learning, streamlining interactions, efficiently summarising and pre-processing data, and tailoring communication to meet the diverse needs of stakeholdersâultimately enabling richer, more inclusive evaluations. Whether in urban planning, environmental monitoring, health programs, crisis response, or agriculture, AI can amplify citizen voices while assisting in the complex task of processingâbut not distortingâevidence
In this approach, humans and their relations remain central and in control, using AI to capture a broader range of experiences and perspectives while ensuring that diverse voices are heard. These processes are inherently empowering. Importantly, these approaches are not simply about âfeeding more data into AI systems.â This people-centred method fosters more inclusive and impactful evaluative processes, whose results can better be shared and heard in decision-making processes, truly democratising not only the evaluation process but the broader society.
It is then crucial to maintain strict oversight of AI, ensuring its use aligns with ethical standards and safeguards the integrity of local knowledge. AI must be adapted to local contexts, avoiding one-size-fits-all applications that often fail to capture the unique complexities of different communities. The most effective evaluative activities will not just use AI but critically engage with it, fostering a virtuous cycle that strengthens both the responsible use of AI and the localisation of its processes. This ensures that AI enhances evaluation by respecting, amplifying, and remaining accountable to the voices of the communities it servesâand that evaluation itself can act as a safeguard to ensure AIâs responsible use.
Â
Complementarities
 As diverse as they seem, these approaches are deeply complementary. AI-assisted, people-driven evaluation will serve as a critical reality check on AI models, illuminating issues that the technology alone cannot capture. It will help reveal unrecognized dynamics, the experiences of underrepresented groups and communities, and areas impacted by systemic biases. On the other hand, AI can make model outputs more accessible to citizens, enabling them to engage more deeply with the scenarios these models produce (as seen in projects like UrbanistAI).
Through participatory activities, citizens not only validateâor challengeâmodel outputs but also highlight what is missing or what doesnât align with their reality, ambitions, and expectations. This process enriches our understanding of current realities as well as possible and desired futures. Additionally, these participatory processes can raise public awareness of the models influencing their lives, fostering greater accountability and transparency. By making citizens more conscious of the AI-driven governance shaping society, participatory evaluation strengthens civic engagement and ensures that governance remains inclusive, responsible, and human-centered.
In this way, evaluative activities will transcend mere technocratic checks, positioning themselves as safeguards to ensure that technology serves the public good and that decision-making stays grounded in diverse, human perspectivesâwhile consistently addressing and exploring marginalized voices.
Â
Â
The challenges ahead
Evaluation, as it exists today, is set to be profoundly transformed by AI, and this shift will happen rapidly, with potentially enormous consequences for the field. The temptation toward automation comes with a great risk, which is far greater than just âmaking evaluators redundant.â The real challenge lies in the governance of AI-driven systems. In an era where data is often compared to âthe new oil,â controlling the flow of information and establishing governance structures around these systems must be carefully considered.
When examining the four factors likely to shape the future of evaluationâbig data, timeliness, inclusiveness, and responsible useâit becomes clear that big data and speed are set to dominate, potentially leaving inclusiveness and ethical considerations lagging behind, despite strong advocacy for their importance.Â
Evaluators are uniquely positioned to rethink not only how we make sense of reality but also how we assess change with a deep understanding of the values embedded within these systems. As traditional evaluation approaches become increasingly automated, evaluators must advocate for balanceâensuring that the human element, ethical reflection, and inclusivity are not overshadowed by the rapid and still largely unchecked advances in AI and big data.
Evaluation, as it stands, is indeed becoming a thing of the past, with more processes shifting toward automation. But the pressing question remains: can we make evaluation future-proof and relevant for the challenges ahead? The answer lies in transforming evaluative activities to not only harness the power of AI but to ensure that it remains anchored in human valuesâintegrating responsibility, inclusiveness, and transparency into the evaluation process itself. Evaluators must play a key role in shaping the systems that will define the future of evaluation - and of society as a whole -ensuring that technological progress does not outpace ethical responsibility and that governance remains inclusive, equitable, and human-centred.
Independent Consultant
13hAnd, as usual. Let's remember that what is now shaping the world, more than AI, is our attitude toward humanity. I stand in solidarity with the people oppressed in the face of genocide, occupation, and apartheid.