On-device Medical Chatbot for Nurse-Midwives

Researcher & Engineer · D-tree International, Zanzibar, Tanzania

Feb 2026 – Present

Built an on-device medical chatbot running fully offline on edge devices to provide real-time, evidence-based guidance to nurse-midwives in low-connectivity settings in Zanzibar.

nmrenyi/mamai

On-device medical search for nurses and midwives in Zanzibar — offline RAG with Gemma 4 on Android

Motivation

In Zanzibar, maternal and newborn mortality remain a significant challenge, disproportionately affecting rural households.
Nurse-midwives frequently face complex cases, but accessible guidance at the point of care is severely limited.
High data costs and poor network connectivity make online searches unreliable.

That's why we developed MAM-AI, an on-device AI assistant built for nurse-midwives in Zanzibar. Providing real-time, evidence-based, locally relevant guidance.

No internet needed. Always available.

Demo

A short demo of the on-device chatbot running offline on an edge device.

System Design

RAG pipeline running fully offline on an Android edge device.

The app uses Retrieval-Augmented Generation (RAG): the nurse-midwife's query is embedded and used to retrieve the most relevant passages from indexed training materials, which are then injected into the prompt alongside the original question. Gemma 4 E4B (int4 quantization) runs the final generation entirely on-device — no internet required.

Dataset

20,534

Total QA pairs

OBGYN questions

Source benchmarks

Public medical datasets

To evaluate the on-device model, we curated an open OBGYN QA dataset by aggregating five public medical benchmarks, spanning Africa, India, Kenya, and the USA. The dataset is released open-source at obgyn-qa-collection.

Dataset	Items	Format	Geographic Focus
AfriMed-QA	697	MCQ + Short Answer	Pan-African (Ghana, Nigeria, Kenya, Malawi, South Africa)
MedMCQA	18,508	Multiple Choice	India (entrance exams)
Kenya Clinical Vignettes	284	Clinical scenarios	Kenya
MedQA-USMLE	1,025	Board-style MCQ	USA
Women's Health Benchmark	20	Expert prompts	-

Model

We deploy Gemma 4 E4B, released on April 2, 2025 — Google's latest on-device model optimized for edge hardware. Running in int4 quantization.

Evaluation

Full evaluation results coming soon.

Answering Accuracy

We evaluate model accuracy across multiple medical QA benchmarks, including MCQ datasets (AfriMedQA, MedQA USMLE, MedMCQA) and open-ended clinical vignettes (Kenya Vignettes, AfriMedQA SAQ, WHB Stumps). Open-ended responses are scored by an LLM judge on accuracy, safety, completeness, helpfulness, and clarity.

Results coming soon.

Latency

We benchmark on-device latency on real Android hardware, measuring time-to-first-token (TTFT), decode throughput (tokens/sec), and end-to-end query time across short, medium, and long clinical queries.

Results coming soon.

Stability

We evaluate response consistency under repeated identical queries and across varying conversation history lengths, assessing whether the model produces reliable outputs under the constraints of on-device inference.

Results coming soon.

Dangerous Scenario Recognition

A dedicated evaluation of how the app handles high-stakes clinical emergencies — including postpartum hemorrhage, eclampsia, neonatal respiratory distress, and sepsis. We assess whether the model correctly identifies emergency escalation triggers, avoids underreacting to critical presentations, and produces safe, actionable guidance aligned with official protocols.

Results coming soon.