← Portfolio
System Architecture

AI Quality is
Workflow Design

Why model accuracy is not enough for high-stakes healthcare data.

Tabitha Khadse

Most AI conversations start with accuracy.

What was the score? How well did the model perform? Did the output look right?

But in healthcare, that is not enough. A model can look accurate in a demo and still fail in the workflow.

The Focus: Before the Model

I built the Healthcare Data Annotation Workflows project to study the work that happens before the model output.

A model can score well on easy cases but fail on rare cases. It can look strong in testing, but create confusion when the workflow has no review path.

AI quality has to be designed as a workflow. Not just measured as a number.

Project Architecture

Operating Models Workflow Design Schema Contracts Adjudication Consensus Logic Drift Simulation

Workflow Problem 1: Unclear Labels

A label sounds simple until reviewers disagree. One sees it one way. Another sees it differently. A third is unsure.

That does not always mean the reviewers are wrong. It often means the label definition is weak.

The Root Causes:

  • The guideline is too broad.
  • The examples are too clean.
  • The categories overlap.
  • If uncertainty isn't handled, the model learns from inconsistent data.

Workflow Problem 2: No Path for Uncertainty

In high stakes healthcare data, uncertainty needs a path. Reviewers should not be forced to guess. They must be able to flag unclear cases.

Find

Reviewers identify cases that do not fit the current rules. They do not guess. They flag uncertainty.

Resolve

Those cases are reviewed and clarified. This turns uncertainty into process improvement.

Label

The updated rule is applied consistently. Without that loop, teams move fast but create weak labels.

Workflow Problem 3:
Agreement Scores Can Hide Risk

Phantom Agreement

Happens when the overall agreement score looks strong because most cases are easy. Reviewers agree on the simple records. The score looks good. But the hard cases are still failing.

Healthcare AI teams need more than one overall score. They need to measure agreement on hard cases, outliers, ambiguous cases, and new data patterns. If quality checks only measure the average, the risky cases can disappear inside the metric.

Workflow Problem 4: The Schema is Too Thin

AI quality also depends on what the system records. A label alone is not enough. A stronger annotation schema should capture:

Label Confidence Reviewer role Guideline version Time spent Flags Consensus round Reason for uncertainty

This matters because quality needs traceability. If a label changes, the team should know why. If reviewers disagree, the team should know where. The schema is part of the quality system.

Workflow Problem 5: Drift is Missed

Quality Drifts Quietly

The labels start consistent. Then new data appears. Reviewers interpret rules differently. Guidelines change. The team keeps moving. The output still looks complete. But quality starts to shift.

Quality Needs Monitoring

That is why the project includes drift simulation. Without gold standard reviews, reviewer calibration, and hard case checks, the team may not notice that quality is changing.

Why Model Accuracy is Not Enough

Model accuracy tells you something. But it does not answer every workflow question.

  • Was the training data labeled consistently?
  • Were edge cases reviewed?
  • Were uncertain cases flagged?
  • Were disagreements resolved?
  • Was the schema detailed enough?
  • Was the guideline version tracked?
  • Was there an audit trail?
  • Did quality drift over time?

In healthcare, workflow questions matter.

The Engineering Decision:
Treat AI Quality like a System.

Inputs Rules States Exceptions Review Paths Escalations Outputs Audit Trails

This is the same thinking used in software engineering. Forms need validation. Dashboards need useful signals. Automation needs exception handling. APIs need contracts. Healthcare AI needs the same level of workflow discipline.

Why My RCM Background Helped

My revenue cycle background helped me see this clearly. RCM taught me that rules, documentation, status, exceptions, and audit trails matter.

Authorizations matter. Payer rules matter. Denial reasons matter. A small workflow gap can create rework later.

The RCM to AI Quality Parallel:

If the input workflow is messy, the output will be hard to trust. If the rules are unclear, the result will be inconsistent. If exceptions are not tracked, the same problems repeat.

Good AI systems need good workflow design.

AI quality is not only about the model. It is about the workflow around the model: the labels, the rules, the review process, the disagreement path, the schema, the edge cases, the drift checks, and the audit trail.

Recruiter Signal

Target Roles

Software Engineer Full Stack Engineer Frontend Engineer

Domain Focus

Healthcare Technology AI Quality & Workflows Internal Platforms

Core Strengths

Data Quality Tools Workflow Automation Validation Logic

"My strongest fit is a role where software needs to support real workflows, not just display information. I want to build tools where accuracy, review, documentation, and trust matter."

Thank You! Let's Connect.

If you are a recruiter, hiring manager, engineer, or healthcare technology team looking for someone who can connect healthcare workflow thinking with software engineering, I would be happy to connect.

Tabitha Khadse

Sources Reviewed For Project:

  • • FDA Good Machine Learning Practice
  • • HIMSS Responsible AI Governance
  • • STAPLE medical imaging consensus research
  • • Medical inter annotator agreement research
```