Mixed-Methods Research at the Pace of Agile

Context

The premise of "single-researcher coverage across 22+ product teams" sounds impossible until you decompose what coverage actually means. The teams in question were building components of AWS's financial workflow tooling — payment processing, billing, financial reporting, internal compliance instrumentation. Each had its own product manager, its own engineering team, its own cadence. They weren't all running at the same intensity at the same time, and they weren't all asking the same kinds of research questions.

The challenge was that each of those 22+ teams had real research needs that, if unaddressed, would lead to product decisions made on weak evidence. The traditional model — one researcher embedded with one team — was infeasible at the budget level the function operated within. The alternative model — researchers as a shared service, ticket-driven and reactive — fails for a different reason: it loses the strategic context that makes research valuable.

What emerged, over three and a half years, was a hybrid coverage model that treated different research types as different services with different cadences, embedded into team rituals rather than parallel to them.

Method

Coverage was decomposed across four research types, each with its own embedment pattern.

Foundational research — questions about user populations, mental models, and workflow patterns that change slowly — ran on a quarterly cadence and informed multiple teams simultaneously. A foundational study on, for example, how AWS Fintech users actually structure financial reconciliation workflows would inform 6–8 product teams whose work touched that workflow. The study was a single research investment producing widely-distributed insight. These studies were usually mixed-methods: qualitative depth from 12–20 interviews, quantitative validation from a survey or behavioral telemetry pull, integrated into a single report with finding atoms tagged for repository retrieval.

Evaluative research — feedback on specific design proposals, prototypes, or shipped features — was structured as a "research sprints" pattern. Each sprint was 5–7 business days, fixed-shape: 5 user sessions, structured protocol, rapid synthesis, debrief with the requesting team within 48 hours of the last session. Teams could request a research sprint via the standardized intake. The constraint was capacity: roughly 6–8 sprints per quarter could be supported. The benefit was predictability — teams that booked a sprint knew exactly what they'd get and when. Most evaluative research used a small number of standardized templates from the methodology library.

Longitudinal research — diary studies, repeat-engagement quantitative surveys, behavioral pattern analysis over weeks or months — ran on a different timescale. These were the highest-leverage research investments because they revealed patterns invisible in point-in-time studies. The function ran 2–3 longitudinal programs concurrently. Findings from longitudinal work often landed unexpectedly — a quarterly diary study would surface a workflow pattern that turned out to be relevant to a team's current sprint planning. The match between insight and need wasn't always predictable, which is why longitudinal work has to be funded against organizational capacity, not specific team requests.

Large-N quantitative work — surveys with sample sizes large enough to support segmentation, behavioral data analysis, instrumentation studies — was structured around two predictable annual moments (a state-of-users survey, a workflow telemetry deep-dive) plus opportunistic deployments when high-priority questions could be answered with quantitative methods. These produced the data that gave the function its quantitative literacy and made qualitative findings more credible to engineering and PM stakeholders.

The coverage model worked because it matched research type to organizational rhythm, not the other way around. Teams running on agile cadences got evaluative research at agile speed. Strategic conversations got foundational and longitudinal investment that fed multi-quarter planning. The repository made all of this discoverable — a team's PM looking for "what do we know about how users handle reconciliation discrepancies" could pull finding atoms from a 2024 longitudinal study, a 2025 foundational survey, and three evaluative sprints — and synthesize their own answer without needing me to broker it.

Insight

The deepest lesson, and the one most often missed, is that research embedded into team rituals scales differently from research delivered through requests. The traditional model assumes researchers receive requests, scope them, execute, and deliver. That model collapses past about 6–8 product teams per researcher. What scales beyond that is research surfaces — quarterly readouts where multiple teams gather to absorb foundational findings, sprint patterns that teams can rely on without re-negotiating each time, repository structures that allow self-service synthesis. The researcher's role shifts from delivery agent to surface architect.

The second lesson is that methodology templates compound. The 19 templates we built — interview guides, survey instruments, usability protocols, JTBD interview guides, journey-mapping artifacts, requirements documents — weren't novel inventions. Each was a refinement of methodologies that already existed. Their value was their existence as a usable, retrievable artifact at the moment of need. Templates that don't exist when needed get reinvented poorly. Templates that exist and are easy to find get used and improved over time.

The third lesson, which I learned the hardest way, is that scaling research coverage is bounded by stakeholder absorption capacity, not researcher production capacity. The function could produce more research than the organization could metabolize. The bottleneck moved from "we don't have enough research" to "we don't read the research we have." This is what the ResearchOps build was actually fixing — making research absorbable, not just producible.

Impact

Across the period, the function ran roughly 110 evaluative research sprints, 14 foundational studies, 8 longitudinal programs, and 6 large-N quantitative deployments. Research outputs informed an estimated $2B+ in roadmap and feature decisions across the AWS Fintech organization. Direct attributable wins included contributions to 13 specific product features and an estimated 50% reduction in workflow friction in pilot studies through fit-gap analysis between as-is workflows and proposed system designs.

The methodology library was used across all 22+ product teams. Templates were updated quarterly based on what was learned from their use. Several templates were adopted by adjacent research functions in the broader AWS organization.

The hybrid coverage model held up across personnel changes, leadership transitions, and re-orgs. The structural pattern — foundational, evaluative, longitudinal, quantitative as separate services with separate cadences — proved more durable than I expected. Teams that joined the function later inherited it as the way research worked, not as an active design choice that required defending.

Reflection — what I'd do differently, what generalizes

The thing I'd do differently is invest more aggressively in longitudinal research from the start. Longitudinal work has the worst time-to-insight (weeks to months) of any research type, and is therefore the easiest to defer when teams are under sprint pressure. But longitudinal findings, once they land, change strategic conversations more than any other research type. The function spent its first year underweight on longitudinal, and the cumulative cost was significant. By year two, I'd corrected this; by year three, longitudinal work was funded as a baseline capacity rather than as an opportunistic investment.

What generalizes: the four-research-type decomposition (foundational, evaluative, longitudinal, quantitative), each with distinct embedment patterns, transfers well to other enterprise B2B research contexts. I've used variants in subsequent advisory work in healthcare technology and clinical software. The cadence numbers shift — healthcare moves slower; consumer tech moves faster — but the structural decomposition holds.

What does not generalize: the 22+ teams figure depended on AWS-specific organizational maturity. AWS had product managers who could read research without translation, engineering teams who'd worked with researchers before, design teams accustomed to evidence-driven iteration. In organizations without that maturity, a single researcher covering 22+ teams collapses regardless of the structural model. The coverage ratio scales with the organization's research literacy, not with the researcher's heroism.

For research leaders considering coverage models: the first diagnostic question is whether stakeholders ask for research outcomes or research outputs. Stakeholders asking for outputs — "we need 5 user interviews on this prototype" — indicate an organization that hasn't yet learned to specify the question they want answered. Stakeholders asking for outcomes — "we need to understand whether reconciliation workflows are sensitive enough to changes in posting cadence to justify the engineering work" — are ready for the multi-service coverage model. Without that level of question literacy, the model is premature.

Mixed-Methods Research at the Pace of Agile: 22 Product Teams, One Researcher

Context

Method

Insight

Impact

Reflection — what I'd do differently, what generalizes

Other case studies →