Flagship project delivers step change in text analytics capability

November 15, 2022

The HDR UK National Text Analytics Project team recently came together to share the impacts of their work and opportunities for clinical natural language processing (NLP). Rene Ndoyi, one of the attendees, describes his experience of the HDR UK National Text Analytics project symposium.


Author: Rene Ndoyi, Intern at Institute of Health Informatics


Maximizing text analytics capability for health data research: key learnings from the HDR UK National Text Analytics project symposium

On 28 September 2022, the HDR UK National Text Analytics Project team, led by Professor Richard Dobson (UCL Institute of Health Informatics; King’s College London) and Dr Angus Roberts (King’s College London), came together to share the impacts of their work and opportunities for the clinical

natural language processing (NLP) community to deliver and use new NLP tools at this HDR UK symposium.


This flagship project has delivered a step-change in text analytics capability, enabling a major shift in the UK’s ability to use research-ready, actionable, real-time electronic health records by delivering data-driven systems with potential to transform patient care. Sixty people from across HDR UK and the text analytics community attended the symposium to hear about the wide-reaching impacts of the project, learn about methods, tools and challenges for NLP and text analytics research, and discuss what the community needs to be able to access and use NLP resources for research. One of the attendees, Rene Ndoyi describes his thoughts and learning from the symposium below.


My name is Rene Ndoyi, a recent graduate of the HDR UK Black Internship Programme and intern at the UCL Institute of Health Informatics. The internship programme was such a success in my quest to develop a career in health data science. Among the many interesting projects that I was introduced to is the National Text Analytics Resource – led by Professor Richard Dobson (UCL Institute of Health Informatics; King’s College London) and Dr Angus Roberts (King’s College London).


This flagship project has delivered a step-change in text analytics capability, enabling a major shift in the UK’s ability to use research-ready, actionable, real-time electronic health records by delivering data-driven systems with potential to transform patient care. The project has built a community and brought together specialised resources that provide researchers with the tools and support to explore unstructured free text clinical data, using natural language processing (NLP) and text analytics.


Sixty people from across HDR UK and the text analytics community attended the symposium to hear about the wide-reaching impacts of the project, learn about methods, tools and challenges for NLP and text analytics research. Attendees also discussed what the community needs to be able to access and use NLP resources for research.


My internship mentor, Natalie Fitzpatrick, recommended that I attend the symposium as one of the many ways that the project brings together a community but also creates awareness of opportunities for NLP research being carried out across HDR UK.


It was very insightful and interesting to learn about the work that has been done and the success the project has earned over the past five years.


As an early career researcher who is building my skills in data science, I was keen to learn of the various tools and methods that have been developed to address the challenges of using unstructured free text data. A key piece of work is CogStack, a clinical information retrieval and extraction platform to create richer, more useful clinical information to improve healthcare. The tool enables querying data, without having to code thousands of SQL queries, based on real-time data.

Another tool I learnt about was MedCAT, which extracts information from Electronic Health Records and links it to biomedical vocabulary systems like SNOMED-CT and UMLS. Both of these tools are available for the research community to use via the Health Data Research Innovation Gateway, with the code made open source on GitHub.


Efforts to develop and apply these kinds of tools are important in tackling challenges around avoiding bias, transferability and model sharing.


The team described various ways that they are approaching this – from improving access to unstructured data for research, to developing trusted models of governance and standards. They have developed a template model sharing agreement that is being used across 10 different NHS Trusts to date, so that NLP models can be shared easily.


I also learnt that analysis of free text data can be achieved through R programming, a language I am currently learning. The idea of coding reproducible step by step workflows and frameworks is related to my internship learning experiences. Under Dr Johan Thygesen’s supervision, we are exploring development of reproducible and extensible frameworks, based on a previous study that developed a framework for Covid 19 trajectories among 57 million Adults in England.


Speakers also highlighted the importance of data governance and employing user-centred approaches. Natalie Fitzpatrick gave an interesting talk on creating a free text donated databank to develop and train NLP tools. I was fascinated to hear people’s feedback about this databank. Stakeholders, including patients and the public, researchers, clinicians and information governance and ethics experts, shared their thoughts through focus groups. There was a lot of support for the databank, but important issues were highlighted, such as the need to overcome different forms of bias, lack of generalisability, poor quality of data and patients’ ability to access their data to correct errors.


From my experiences at the symposium, I have no doubt that these efforts will harness more opportunities for improved patient care. I look forward to future meetings and opportunities to learn more about the National Text Analytics Resource project.


Share

December 17, 2025
We are pleased to welcome Dr Jacqueline Matthew - Clinical Research Fellow/Sonographer at King's College London - who will deliver her talk “From Noise to Signal: A Clinical Researcher's Perspective on Translating Advances in Prenatal imaging into Practice" as part of our Seminar Series. Abstract: Over the past decade, machine learning approaches in prenatal imaging has advanced from exploratory academic prototypes to clinically usable, real-time tools, but the path between those two endpoints is rarely straightforward. In this talk, Jacqueline offers a clinical researcher’s perspective on translating biomedical engineering innovations into real-world impact, tracing the journey from the iFIND project’s early breakthroughs in automated fetal imaging to the creation of Fraiya, an AI-driven ultrasound platform now entering clinical deployment. She will unpack the technical, clinical, and regulatory hurdles that shape this trajectory: data acquisition at scale, annotation complexity, model robustness, pipeline optimisation for real-time use, clinical safety engineering, regulatory strategy, and integration with NHS digital ecosystems. Beyond the technical achievements, the session reflects honestly on the innovation “gaps” that researchers and engineers encounter when stepping into entrepreneurship. From productising research outputs, building 'with' clinicians and service users not just 'for' them, securing buy-in, navigating procurement, and proving value in operationally stretched healthcare services. The aim is to provide a pragmatic and motivating roadmap for researchers and innovators seeking to turn biomedical AI research into deployable, sustainable solutions in healthcare. Seminar Series Event : “From Noise to Signal: A Clinical Researcher's Perspective on Translating Advances in Prenatal imaging into Practice. Date and Time: Thursday 22 January 2026, 15:00 – 16.00 hrs (GMT) Location: K39, King's Building, Strand Campus Attendance: Mandatory for all DRIVE-Health students, therefore please accept the calendar invitation. Registration: Alumni and wider King's College London research community all welcome - please email drive-health-cdt@kcl.ac.uk to let us know if you would like to attend. Biography Jacqueline is a clinical academic, sonographer, and MedTech entrepreneur with over 20 years of experience in advancing pregnancy care through compassionate, technology-driven solutions. Specialising in ultrasound and fetal MRI, Jacqueline’s work focuses on leveraging cutting-edge imaging technologies to improve screening, diagnosis, and care for pregnant women. With a PhD in advanced 3D ultrasound and fetal MRI, Jacqueline uses machine learning to refine diagnostic pathways, pushing the boundaries of what’s possible in prenatal care. As Clinical Lead and Chief Medical Officer at an early-stage health tech startup, she has been at the forefront of developing a real-time AI-powered pregnancy ultrasound platform, with ambitions to transform how scans are performed, enhancing diagnostic accuracy, and empowering healthcare professionals to deliver more informed and compassionate care. Jacqueline’s work has earned her widespread recognition, including being named one of the inaugural winners of the NHS England CAHPO Gold Award for Excellence, which celebrates health professionals who exemplify exceptional contributions to healthcare and the NHS values.
October 22, 2025
We were thrilled to welcome Dr Abhi Pratap - Global Clinical Development Lead at Boehringer Ingelheim who delivered our October Seminar Series. In his talk “Why Mental Health Needs More Than New Drugs: Using Digital Health to Bring Patient-Centredness to Research and Care" , Abhi shared case examples from emerging clinical studies to show how digital health can bridge the gap between clinical research and patient care in mental health. We will explore digital health solutions that help quantify the real-world experiences of health that matter to people - bringing us closer to understanding what treatments work for whom, why, when, and for how long. Abstract: Innovation in mental health treatment has been strikingly limited compared to other fields of medicine. In the last 15 years, fewer than five truly novel psychiatric drugs have received regulatory approval. This stagnation reflects multifaceted challenges linked to heterogeneity of psychiatric disorders often lacking biological markers grounded in disease biology. Additionally, there is significant reliance on subjective clinician-, rater-, or patient-reported outcomes, which increases variability in trial outcomes and poses challenges in patient selection and endpoint determination. Clinical studies also encounter persistent obstacles, such as high dropout rates, poor generalizability, and endpoints that frequently do not reflect what patients and their families value most. Consequently, there is a critical gap in new treatment development that are patient-centered, enhancing quality of life in real-world settings. Use-case-centered implementation of digital health technologies offers a realistic path to address many of these barriers. Real-world data collected from smart devices can enable the continuous and ecologically valid capture of mood, cognition, behavior, and functioning, augmenting traditional, episodic assessments. This richer measurement framework can enhance sensitivity to change, reduce trial inefficiencies, and ground outcomes more directly in patients lived experience. In addition, the same smart devices can be used to deliver digital adaptations of psychosocial interventions, expanding access to evidence-based care and offering personalized and scalable options for populations that have been historically underserved due to stigma, geography, or cost. Dr. Abhi Pratap is the Global Clinical Development Lead at Boehringer Ingelheim, where he oversees clinical development programs for digital therapeutics aimed at addressing unmet needs in serious mental illnesses. Before joining Boehringer, he worked at Biogen, managing one of the largest decentralised studies on cognitive trajectories in real-world settings in collaboration with Apple. With over 15 years of experience in translational biomedical research, Dr. Pratap has led numerous health research studies that promote partnerships between academia and industry. His primary focus is on using digital health technologies to gain a deeper understanding of the real-life experiences of individuals with neurological and psychiatric disorders. His cross-sector research aims to accelerate patient-centered clinical development in central nervous system (CNS) disorders. Most recently, he led a successful pivotal Phase III trial targeting experiential negative symptoms of schizophrenia (NCT05838625) using a digital therapeutic. This study is among the first confirmatory trials to show improvement in negative symptoms to date. Additionally, Dr. Pratap serves as an adjunct faculty member at the University of Washington in Seattle and Boston University, and he is a visiting research fellow at King’s College London.