Philanthropic and investment entities often use a portfolio approach to address economic, social, and environmental challenges.
For example, they may fund multiple initiatives to enhance financial inclusion or expand access to digital skills training for marginalized communities. This approach allows them to test various strategies in diverse contexts, and identify what works best and for whom. However, this comes with a challenge for portfolio managers, how can they make confident decisions when presented with fundamentally different types of evidence?
Take two digital upskilling programs for entrepreneurs. One has gathered a selection of insights that shows high participant satisfaction scores and a series of uplifting stories. The other has modest results from a well-implemented randomized evaluation. Which one should get continued investment? This question reveals an often frustrating challenge in portfolio management: while demonstrating impact is key to scaling successful interventions, an end-to-end approach to measurement isn’t always feasible. This requires a framework to join disparate kinds of evidence.
This challenge emerges from the nature of portfolio work itself. Programs may be at different implementation stages, involve distinct initiatives, or offer different opportunities for measuring change based on their design and context. These differences naturally lead to varied impact measurement approaches. A startup testing a new financial inclusion solution requires different evaluation metrics than an established program preparing for national expansion.

Descriptive studies give an indication of immediate outcomes but may overstate long-term impact
Caribou’s analysis of 143 studies in our recently updated Small Business Evidence Map shows that different evaluation approaches tell different parts of an impact story. Initial assessments, like ex-post surveys or interviews, often present a positive view of a program. By capturing immediate reactions and experiences, they help programs adapt to local contexts and highlight key components that drive success.
This aligns with Bertrand and Mullainathan’s (2001) insights into subjective survey data reliability. Data often captures immediate reactions and intentions rather than long-term behaviors. Participants, especially young entrepreneurs, may express optimism about their future, influenced more by personality and outlook than by any tangible support they received. McKenzie and Woodruff (2014) document similar patterns in business training evaluations, where participants’ stated intentions often diverged from actual implementation when faced with day-to-day time and resource constraints.
One of Caribou’s partners recently conducted a survey on the impact of a digital upskilling platform for entrepreneurs in a middle-income country. They asked about entrepreneurs’ perception of their business’ growth over the previous three months:
The data alone might be encouraging — over half of surveyed entrepreneurs reported business growth. This aligns with what we often see in entrepreneurship programs: participants tend to express optimism about their business trajectory. However, in this case their optimism has nothing to do with any support they received. Additional data reveals a fuller picture:
Comparing the control group to entrepreneurs who were actually invited to participate in the program shows almost identical levels of perceived growth. What initially looked like evidence of program success — high rates of perceived business growth among participants — was simply capturing the natural optimism of entrepreneurs and perhaps broader economic conditions. Without this comparison group, this positive outlook might have been mistakenly attributed to the program.
This example illustrates a broader point: the validity of findings depends heavily on study design and the potential for biases. Selection bias, recall bias, and attribution bias are common challenges. Without proper comparison, positive outcomes may be incorrectly attributed to a program’s influence. Yet understanding underwhelming or null results often requires looking beyond the numbers. Qualitative insights and implementation analysis can reveal more about the mechanisms behind underperformance.
Causal studies provide more reliable, and (usually) more modest, estimates of effectiveness
Causal studies, including Randomized Controlled Trials (RCTs), attempt to address these challenges by measuring actual behaviors rather than intentions. Their strict design and focus on specific pathways make it easier to attribute change to a program, though often they can only test limited pathways at a time. Between purely observational studies and RCTs lie other methods — like qualitative comparative analysis and process tracing — that can help establish causal relationships while accommodating complexity.
While causal studies excel at establishing causality, they often do so at the cost of generalizability. They are costly and may miss subtle mechanisms or long-term effects shaped by cultural, institutional, or local contexts. The trade-off between internal validity (precision in measurement) and external validity (ability to generalize) can be significant in portfolio evaluation, yet often lies beyond the scope of experimental designs (see Ravallion 2001 for an entertaining and insightful discussion of such methodological tensions).
Why not both? Strategically matching impact evidence types
Effective portfolio facilitation requires matching evidence approaches to specific needs. A mixed methods approach can strengthen causal claims by combining qualitative insights with quantitative rigor. Qualitative methods can reveal underlying mechanisms, while descriptive quantitative analysis helps identify patterns that need further testing.
This matching process requires careful judgment. For instance, an upskilling intervention might seem ready for scaling based on participant support, but portfolio managers need to understand if it actually improves business outcomes and under what conditions. Meanwhile, a program showing modest impacts in an RCT may offer key implementation lessons that could boost impact in different iterations through a careful analysis of its components.
Four practical ways to achieve a nuanced understanding of portfolio impact
Understanding impact requires both rigor and context. As development practitioners, we need to appreciate the value and limitations of both observational and causal studies to develop a nuanced understanding of intervention outcomes. At Caribou, we employ four specific strategies to effectively navigate the challenges of diverse impact approaches within a portfolio:
- We strategically mix impact approaches within projects and portfolios. A mixed methods approach can be used either within specific projects or across a portfolio of projects to get a fuller picture of impact. Within a single project, combining descriptive with causal studies can provide both qualitative insights and rigorous validation. However, using both methods for every project is rarely feasible due to resource or time constraints. At the portfolio level, employing a mix of study types across different projects can balance depth and rigor — using observational methods where exploratory insights are needed and causal studies where robust validation is possible. This strategic use of mixed methods gives us a more comprehensive understanding of impact across varied contexts. However, resource-intensive causal studies may not be needed for every scenario, especially if strong evidence is already available.
- We exercise caution in interpretation. In practical terms, this means recognizing the limitations of each study type and not overrelying on any one method. For instance, when descriptive studies indicate a positive outcome,we consider whether biases might influence these findings. If causal evidence is lacking, we use the insights cautiously and look for additional validation before making major decisions or scaling interventions.
- We adapt evaluation methods by program phase. In early stages, we use descriptive methods to gather insights and inform design. As programs mature, we incorporate causal methods to validate impact and ensure robustness.
- We communicate findings transparently. We communicate clearly about the type of evidence used and its limitations, helping decision-makers make well-considered choices.
By strategically combining and understanding the strengths and limitations of observational and causal studies, portfolio managers can overcome the challenges of using diverse impact approaches and instead, embrace their contributions. This helps achieve a balanced understanding of what works, for whom, and in what context.
If you are interested in discussing our approach, contact us.
Authors
Marius Karabaczek
Follow Marius Karabaczek on LinkedInSenior Manager, Measurement & Impact
See More by Marius Karabaczek