What Is the Oral Defense?

The oral defense is the final component of Level 3 certification (EAS/CAEE) at eval.qa. Unlike the written exam—which tests breadth of knowledge across eval domains—the oral defense assesses your ability to think on your feet, defend technical decisions, and communicate eval concepts to critical audiences.

It is a 30-45 minute panel examination conducted by 2-3 master-level eval practitioners who ask probing questions about your portfolio of eval work, your methodology, your reasoning, and your ability to handle edge cases and criticism. The goal is not to stump you, but to ensure you have genuine expertise—not just memorized answers.

Why does it exist? Because eval expertise is as much about judgment and reasoning as it is about knowledge. Written exams can be passed by smart people who haven't done real work. Oral defenses cannot.

Format Overview: 30-45 Minutes, 2-3 Evaluators

Timeline

Panel Composition

Typically 2-3 experienced evaluators from different backgrounds: a practitioner with deep production eval experience, an academic or researcher with methodological rigor background, possibly a domain specialist (healthcare, finance, AI safety, etc.). This diversity ensures your defense is tested from multiple angles.

Remote or In-Person?

Typically conducted via video conference (Zoom, Google Meet) to accommodate geography. Some in-person defenses available in major cities. Technical requirements: stable internet, quiet room, clear audio/video, ability to share screen.

The Purpose

The oral defense is not designed to be adversarial or trick you. Evaluators want to understand your thinking, test your ability to handle critique, and ensure your expertise is genuine. If you've done real eval work and can articulate your reasoning, you will pass.

The Five Examination Domains

1. Technical Eval Knowledge

What's being tested: Do you understand eval fundamentals deeply? Can you explain metrics, benchmark design, statistical validity, and technical tradeoffs?

Example questions:

2. Methodology Justification

What's being tested: Can you defend your choices? Do you understand the why behind your eval design, not just the what?

Example questions:

3. Stakeholder Communication

What's being tested: Can you communicate eval findings to non-technical audiences? Do you understand how to present results to build trust and drive decisions?

Example questions:

4. Ethical Reasoning

What's being tested: Do you think about the ethical implications of your eval? Do you understand potential harms (false positives, bias, unfair comparison)?

Example questions:

5. Practical Application

What's being tested: Can you apply eval knowledge to real problems? Can you iterate and improve based on feedback?

Example questions:

30-45
Minutes Total Duration
2-3
Evaluators on Panel
5-10
Minute Opening Presentation
5 Domains
Being Assessed

Preparing Your Eval Portfolio for the Defense

What to Bring

Constructing Your Primary Case Study

Choose a project where:

Structure:

Tone: Honest, not polished marketing. Evaluators respect clarity, humility, and willingness to acknowledge limitations more than perfection.

Common Portfolio Mistakes

Presenting a project where you followed a template without making real decisions. Overselling results or hiding limitations. Choosing a project you can't defend in depth. Submitting something from five years ago that you've forgotten the details of. Keep your portfolio recent and genuine.

Opening Presentation Format: 5-10 Minutes, Tight Structure

Your opening presentation is your chance to frame the conversation. Make it punchy and strategic.

Recommended Structure

  1. "The Question" (1 min): Start with the business/research question that motivated your eval. Not background, not context—the core question. Example: "We built a new medical coding AI. Before deploying to hospitals, we needed to answer: does it reduce clinician burden without increasing error rates?"
  2. "The Approach" (2 min): High-level description of how you designed the evaluation. Key decisions. Example: "We ran a prospective study with 50 clinicians, comparing the AI against current workflow on three dimensions: speed, accuracy, and human satisfaction."
  3. "Key Finding" (1 min): Your most important result. Be specific. Example: "The AI reduced documentation time by 35% (p<0.01) with no significant difference in error rates, but clinician satisfaction dropped 20%."
  4. "So What?" (1 min): What decision did your eval enable? Example: "We decided to deploy with mandatory user training and weekly feedback loops, based on the satisfaction finding."
  5. "The Tradeoff" (1-2 min): Articulate one key limitation or tension in your eval. Show self-awareness. Example: "Our study was small and geographically limited. We're treating this as a pilot, knowing we'll need follow-up evaluation with larger and more diverse cohorts."

What NOT to Do

Anticipated Question Categories: 20+ Sample Questions

Methodological Questions

Scenario-Based Questions

Reflection & Growth Questions

Ethical & Professional Questions

Communication Questions

Common Mistakes Candidates Make

Over-Scripting

The mistake: Memorizing answer word-for-word. When a question is phrased slightly differently, you get thrown off.

The fix: Prepare themes and key points, not scripts. Practice articulating the same idea in different ways. Evaluators can tell when you're reciting versus thinking.

Defensive Posture

The mistake: Taking critical questions as attacks. Answering with justifications rather than exploration. "That's not a fair question because..."

The fix: Treat tough questions as invitations to demonstrate your thinking. "That's a great point. Here's how I thought about that, and here are the tradeoffs..." Evaluators are impressed by intellectual humility, not defensiveness.

Inability to Admit Uncertainty

The mistake: Overconfident answers. Pretending you knew the answer when you didn't or claiming certainty where it doesn't exist.

The fix: "I don't know" is a strong answer if followed by "Here's how I'd approach finding out..." or "Here's what I should have done..." Show your reasoning process, not just your knowledge.

Losing the Thread

The mistake: Getting lost in technical details during the opening presentation. Forgetting to anchor back to the original question or stakeholder need.

The fix: Before each answer, pause and ask yourself: "How does this connect to the core eval question?" Explicitly articulate that connection.

Ignoring Non-Technical Dimensions

The mistake: Focusing entirely on methodology and metrics, ignoring stakeholder communication, ethical dimensions, or practical constraints.

The fix: In your opening and throughout, weave in evidence that you understand eval as a human/organizational endeavor, not just a technical one. "Our stakeholders needed to understand this result in 2 weeks, which is why we chose..."

Strong Defense Hallmarks

You can articulate the business/research question clearly. You can defend every methodological choice. You acknowledge limitations without apologizing for them. You can pivot between technical and non-technical language. You show curiosity about evaluator feedback and questions. You're comfortable saying "I don't know, but here's how I'd find out."

How Evaluators Score the Defense: The Rubric

Dimension Exceptional (A) Proficient (B) Developing (C) Below Threshold (F) Technical Knowledge Demonstrates deep understanding of eval principles, can explain complex tradeoffs, handles technical questions with sophistication Solid understanding of methodology, can explain key decisions, mostly accurate on technical details Basic understanding present, some gaps or inaccuracies, struggles with complex follow-up Significant gaps in understanding, inaccurate technical explanations Methodology Justification Every choice explicitly justified, aware of alternatives and why they chose differently, understands tradeoffs deeply Choices justified with reasonable rationale, aware of constraints and tradeoffs Some justification present, but reasoning incomplete, may not have considered alternatives Cannot articulate why specific choices were made Communication Articulates complex ideas clearly to technical and non-technical audiences, tailors language to audience Communicates effectively to most audiences, minor clarity issues Communication is clear to technical audiences, struggles with non-technical explanation Communication unclear, difficult to follow reasoning Ethical Reasoning Proactively identifies ethical concerns, explains mitigation thoughtfully, shows awareness of potential harms Identifies and addresses ethical concerns when raised, reasonable mitigation Acknowledges ethics when prompted, limited depth Dismisses or ignores ethical concerns Practical Application Shows how eval informs decisions, can adapt methodology to new contexts, learns from experience Eval led to clear decisions, can adapt approach to some new constraints Eval completed but impact unclear, limited flexibility in approach No clear connection between eval and decisions, cannot adapt

Passing threshold: Evaluators must rate you at least "Proficient" on 4/5 dimensions and no more than one "Developing" rating. Anything below "Developing" is an automatic fail.

Practice Session Guide: How to Prepare Effectively

Phase 1: Solo Preparation (2-3 weeks before defense)

Week 1: Finalize your case study. Write it out. Read it aloud. Time yourself on the opening (aim for 7-8 minutes).

Week 2: Generate anticipated questions (use the list above). For each, write a 2-3 sentence core answer. Practice pivoting to different angles.

Week 3: Record yourself presenting the opening. Listen back. Refine clarity, pacing, eye contact cues (even though you're recording).

Phase 2: Peer Mock Defenses (1-2 weeks before)

Find 2-3 peer evaluators: Ideally other eval professionals or people who've passed the Level 3 defense. If not available, anyone with critical thinking skills works.

Structure each mock (45 min total):

  • 5-7 min: Your opening presentation
  • 25-30 min: Evaluators ask questions (use the anticipated list)
  • 10 min: Feedback. What worked? Where were you unclear? Where could you be stronger?

Do 2-3 mocks minimum. First one will be rough. Second and third will reveal patterns in your weaker areas.

Phase 3: Refinement (1 week before)

Focus on the weaknesses revealed in mocks. If you struggled with ethical questions, prepare deeper answers on ethics. If communication was unclear, practice simplifying explanation. If you were defensive, practice responding to tough questions with curiosity.

Do one final mock with critical feedback.

What Feedback to Prioritize

  • Clarity: "I didn't follow your explanation." Address immediately.
  • Coherence: "I don't see how X connects to Y." Explicitly articulate connections.
  • Depth: "You said that but didn't justify it." Add reasoning.
  • Confidence: "You seemed unsure of that answer." Practice the answer until it's solid.

Day-of Protocol: Logistics and Nerves

Technical Setup (30 min before)

  • Test your internet connection, camera, microphone, speaker.
  • Open your slides or materials in a separate window.
  • Have your case study document easily accessible (but don't read from it).
  • Quiet room, no distractions, door closed.
  • Wear business casual (you're being recorded/observed).

Mental Preparation (15 min before)

  • Review your opening one time. Don't memorize—just familiarize.
  • Remember: evaluators want you to pass. They're rooting for you.
  • Nerves are normal and expected. Evaluators know you'll be nervous.
  • Focus on authenticity over perfection. A genuine answer with some stumbling is better than a polished non-answer.

During the Defense

  • Opening: Make eye contact with the camera (imagine evaluators are there). Speak slowly and clearly.
  • Questions: Pause before answering. Take your time. If you need a moment to think, say so.
  • If you don't know: "That's a great question. I didn't encounter that in my case study, but here's how I'd approach it..." or "I'm not sure, but here's what I would need to research."
  • If you misspeak: "Let me rephrase that..." and continue. Don't apologize profusely.
  • Take notes: Jotting down evaluator questions shows you're engaged and helps you remember to follow up if time permits.

Managing Nerves

  • Deep breathing before the call and between segments.
  • Remember: you've prepared. You know this material. Evaluators know you're nervous.
  • Nerves show you care. That's a good sign.
  • If your voice shakes, keep going. It's normal.

Preparation Timeline: 6 Weeks to Defense

  • Weeks 1-2: Finalize case study. Write it up. Get feedback from mentor or peer.
  • Weeks 3-4: Develop anticipated questions list. Write core answers. Practice opening presentation 5+ times.
  • Weeks 5-6: Conduct 2-3 mock defenses with peers. Get critical feedback. Refine weak areas. Do final preparation.
  • Day before: Light review. Get good sleep. Avoid cramming.
  • Day of: Tech check 30 min before. Breathe. You're ready.