Google Research Introduces MedGemma 1.5: Enhancing AI for Medical Imaging and Speech Recognition

Introduction

In a significant advancement for healthcare technology, Google Research has announced the release of MedGemma 1.5, an enhanced version of its open multimodal large language model designed specifically for medical applications. This update focuses on improving the model's ability to handle complex medical imaging and text-based tasks, making it a valuable tool for developers building AI-driven healthcare solutions. Paired with the newly introduced MedASR, a specialized speech-to-text model, these tools aim to streamline clinical workflows and enhance accessibility in medical settings. The release, part of Google's Health AI Developer Foundations (HAI-DEF) program, underscores the company's commitment to open-source AI that can operate efficiently, even offline.

Key Features and Architecture of MedGemma 1.5

MedGemma 1.5 is built as a multimodal model capable of processing text, 2D images, and high-dimensional medical data. The primary variant, MedGemma 1.5 4B, features 4 billion parameters, allowing it to run on resource-constrained devices without internet connectivity. This makes it suitable for offline analysis in clinical environments where data privacy and low latency are critical. The model integrates with MedSigLIP, an image encoder, to support inputs like multiple slices from CT or MRI scans and patches from histopathology slides.

The architecture expands on the original MedGemma 1 by incorporating new training datasets and techniques, enabling native interpretation of full 3D scans such as CTs and MRIs—a capability described by Google as a first for an open medical generalist model. It also retains the 27B parameter model from MedGemma 1 for more demanding text-heavy tasks. Full DICOM support is available when deployed on Google Cloud's Vertex AI, facilitating seamless integration into professional healthcare systems.

Performance Improvements and Benchmarks

MedGemma 1.5 demonstrates substantial gains over its predecessor, MedGemma 1 4B, across various benchmarks. In high-dimensional imaging, it achieves a +14% accuracy boost on MRI disease-related findings classification (65% vs. 51%) and a +3% improvement on CT findings (61% vs. 58%). For histopathology slides, it matches the performance of specialized models with a ROUGE-L score of 0.49 on single-slide cases.

Anatomical localization on the Chest ImaGenome benchmark sees a dramatic +35% intersection over union (38% vs. 3%), while longitudinal medical imaging on the MS-CXR-T benchmark improves by +5% macro accuracy (66% vs. 61%). In general medical image interpretation across chest X-rays, dermatology, histopathology, and ophthalmology, there's a +3% gain (62% vs. 59%).

Text-based functionalities also see enhancements: medical reasoning on MedQA increases by +5% (69% vs. 64%), and electronic health record (EHR) question-answering on EHRQA jumps by +22% (90% vs. 68%). Lab report extraction improves with a +18% retrieval macro F1 score (78% vs. 60%). These benchmarks are based on internal and public datasets, with Google emphasizing that outputs require clinical validation and are not intended for direct medical use without further adaptation.

Integration with MedASR for Enhanced Workflows

Complementing MedGemma 1.5 is MedASR, an open automated speech recognition model fine-tuned for medical dictation. It significantly reduces transcription errors compared to general-purpose models like Whisper large-v3, achieving 58% fewer errors on chest X-ray dictations (5.2% vs. 12.5% word error rate) and 82% fewer on diverse medical dictations (5.2% vs. 28.2%).

This integration enables voice-driven workflows, such as generating text prompts from spoken input for MedGemma to interpret images or extract information from reports. Tutorials on GitHub demonstrate how developers can combine these models to create end-to-end solutions, from speech transcription to multimodal analysis, potentially improving efficiency in clinical documentation and decision-making.

Availability, Licensing, and Developer Resources

Both MedGemma 1.5 and MedASR are available for free download on Hugging Face under the HAI-DEF collection, with variants accessible via Vertex AI for scalable deployment. Licensing permits research and commercial use, subject to Google's terms and prohibited use policy. The models are trained on de-identified datasets to prioritize privacy, aligning with on-device deployment benefits highlighted in community feedback.

Google provides extensive resources, including GitHub notebooks for tasks like high-dimensional CT analysis, histopathology interpretation, anatomical localization, longitudinal imaging, fine-tuning with LoRA, and reinforcement learning. Model cards detail capabilities, limitations, and ethical considerations.

The MedGemma Impact Challenge and Community Response

To spur innovation, Google Research has launched the MedGemma Impact Challenge, a Kaggle-hosted hackathon with a $100,000 prize pool. Running from January 13, 2026, to February 24, 2026, it encourages participants to build AI applications using HAI-DEF models, focusing on human-centered solutions in healthcare and life sciences. Prizes are distributed across a main track ($75,000) and special categories like agentic workflows ($10,000), novel tasks ($10,000), and edge AI ($5,000). Judging criteria emphasize effective model use, problem importance, impact potential, feasibility, and execution.

Early community feedback on platforms like X has been positive, with developers praising the models' offline capabilities, privacy advantages, and potential for real-world applications. Comments highlight how MedGemma 1.5's improvements make it more viable for edge deployments in hospitals, while MedASR addresses key bottlenecks in clinical dictation. Since the original MedGemma release, millions of downloads and hundreds of community variants on Hugging Face indicate strong adoption.

Examples of real-world use include Qmed Asia's askCPG tool for navigating clinical guidelines in Malaysia and Taiwan's National Health Insurance Administration's application for preoperative lung cancer assessments.

Implications for Healthcare AI

The release of MedGemma 1.5 and MedASR represents a step toward more accessible, privacy-focused AI in medicine. By open-sourcing these tools, Google enables developers worldwide to create adaptable applications that could enhance diagnostics, streamline workflows, and support patient care in underserved areas. However, Google stresses the need for fine-tuning, validation, and human oversight to ensure safety and reliability in clinical settings. As the field evolves, ongoing community contributions and ethical considerations will be crucial to realizing the full potential of these technologies.

Google Research Releases MedGemma 1.5: A Leap Forward in Open Multimodal AI for Healthcare