Flagship project delivers step change in text analytics capability

November 15, 2022

The HDR UK National Text Analytics Project team recently came together to share the impacts of their work and opportunities for clinical natural language processing (NLP). Rene Ndoyi, one of the attendees, describes his experience of the HDR UK National Text Analytics project symposium.


Author: Rene Ndoyi, Intern at Institute of Health Informatics


Maximizing text analytics capability for health data research: key learnings from the HDR UK National Text Analytics project symposium

On 28 September 2022, the HDR UK National Text Analytics Project team, led by Professor Richard Dobson (UCL Institute of Health Informatics; King’s College London) and Dr Angus Roberts (King’s College London), came together to share the impacts of their work and opportunities for the clinical

natural language processing (NLP) community to deliver and use new NLP tools at this HDR UK symposium.


This flagship project has delivered a step-change in text analytics capability, enabling a major shift in the UK’s ability to use research-ready, actionable, real-time electronic health records by delivering data-driven systems with potential to transform patient care. Sixty people from across HDR UK and the text analytics community attended the symposium to hear about the wide-reaching impacts of the project, learn about methods, tools and challenges for NLP and text analytics research, and discuss what the community needs to be able to access and use NLP resources for research. One of the attendees, Rene Ndoyi describes his thoughts and learning from the symposium below.


My name is Rene Ndoyi, a recent graduate of the HDR UK Black Internship Programme and intern at the UCL Institute of Health Informatics. The internship programme was such a success in my quest to develop a career in health data science. Among the many interesting projects that I was introduced to is the National Text Analytics Resource – led by Professor Richard Dobson (UCL Institute of Health Informatics; King’s College London) and Dr Angus Roberts (King’s College London).


This flagship project has delivered a step-change in text analytics capability, enabling a major shift in the UK’s ability to use research-ready, actionable, real-time electronic health records by delivering data-driven systems with potential to transform patient care. The project has built a community and brought together specialised resources that provide researchers with the tools and support to explore unstructured free text clinical data, using natural language processing (NLP) and text analytics.


Sixty people from across HDR UK and the text analytics community attended the symposium to hear about the wide-reaching impacts of the project, learn about methods, tools and challenges for NLP and text analytics research. Attendees also discussed what the community needs to be able to access and use NLP resources for research.


My internship mentor, Natalie Fitzpatrick, recommended that I attend the symposium as one of the many ways that the project brings together a community but also creates awareness of opportunities for NLP research being carried out across HDR UK.


It was very insightful and interesting to learn about the work that has been done and the success the project has earned over the past five years.


As an early career researcher who is building my skills in data science, I was keen to learn of the various tools and methods that have been developed to address the challenges of using unstructured free text data. A key piece of work is CogStack, a clinical information retrieval and extraction platform to create richer, more useful clinical information to improve healthcare. The tool enables querying data, without having to code thousands of SQL queries, based on real-time data.

Another tool I learnt about was MedCAT, which extracts information from Electronic Health Records and links it to biomedical vocabulary systems like SNOMED-CT and UMLS. Both of these tools are available for the research community to use via the Health Data Research Innovation Gateway, with the code made open source on GitHub.


Efforts to develop and apply these kinds of tools are important in tackling challenges around avoiding bias, transferability and model sharing.


The team described various ways that they are approaching this – from improving access to unstructured data for research, to developing trusted models of governance and standards. They have developed a template model sharing agreement that is being used across 10 different NHS Trusts to date, so that NLP models can be shared easily.


I also learnt that analysis of free text data can be achieved through R programming, a language I am currently learning. The idea of coding reproducible step by step workflows and frameworks is related to my internship learning experiences. Under Dr Johan Thygesen’s supervision, we are exploring development of reproducible and extensible frameworks, based on a previous study that developed a framework for Covid 19 trajectories among 57 million Adults in England.


Speakers also highlighted the importance of data governance and employing user-centred approaches. Natalie Fitzpatrick gave an interesting talk on creating a free text donated databank to develop and train NLP tools. I was fascinated to hear people’s feedback about this databank. Stakeholders, including patients and the public, researchers, clinicians and information governance and ethics experts, shared their thoughts through focus groups. There was a lot of support for the databank, but important issues were highlighted, such as the need to overcome different forms of bias, lack of generalisability, poor quality of data and patients’ ability to access their data to correct errors.


From my experiences at the symposium, I have no doubt that these efforts will harness more opportunities for improved patient care. I look forward to future meetings and opportunities to learn more about the National Text Analytics Resource project.


Share

September 2, 2025
We're pleased to announce that DRIVE-Health PhD student, Dr Hugh Logan-Ellis - a Diabetes and Endocrinology Registrar at King's and ex-Research Fellow in the Department of Medicine at Dalhousie University - will deliver our September Seminar Series. In his talk “Extracting Clinical Value from EHR Data: Challenges, Pitfalls, and Practical Lessons" , In his talk, Hugh will share what clinicians have taught him about the reality of working with Electronic Health Record data and what they genuinely need from #AI tools, rather than what researchers might think they should want. Hugh has learned that making the most clinically useful tool could matter more than theoretical perfection. He'll discuss some principles he's gathered to help create AI solutions that fit seamlessly into clinical workflows, which he hopes might help others bridge the gap between academic research and genuine patient benefit. Using his PhD research on creating a single unit of health from #EHR data as a central example, Hugh will explore broader challenges: the messiness of real-world clinical data, the proliferation of unused risk scores, and why so many promising algorithms never make it past publication. These insights aim to help researchers develop tools that won't just die in papers, but have a real chance of improving clinical care. Seminar Series Event: " Extracting Clinical Value from EHR Data: Challenges, Pitfalls, and Practical Lessons" Date and Time: Thursday 25 September 2025, 12:00 – 13.00 hrs (BST) Location: The Judy Dunn Room, SGDP Building, Denmark Hill Campus, London, SE5 8AF Attendance: Mandatory for all DRIVE-Health students, therefore please accept the calendar invitation. Registration: Alumni and wider King's College London research community all welcome - please email drive-health-cdt@kcl.ac.uk to let us know if you would like to attend. Abstract: Picture the scene: It's Saturday morning, you're the senior resident doctor on call in a busy hospital, and you have a 40-page list of patients due for review. Half of your junior colleagues have called in sick, and you know you can't possibly see everyone. How do you decide who needs to be seen most urgently? The information to make these decisions is in the electronic health records, but accessing it quickly means opening each patient's chart individually. My PhD tries to tackle this problem: could we use an algorithm to compress scattered clinical data into a single, practical number? This question has led me on an interesting journey. I've spoken with clinicians from around the world about how they decide who is "sickest," discovering a surprising variety of terms for essentially the same idea and realising we might need more than one measure. My research has taken me to Canada to collaborate with Professor Kenneth Rockwood OC, whose groundbreaking work on frailty measurement has significantly shaped clinical practice worldwide. Working alongside him has given me valuable insights into why some academic ideas successfully transform patient care, while others remain confined to journals. As I explored increasingly sophisticated approaches to measure sickness, from simple laboratory-based indices to complex machine learning models, I stumbled across a key insight. Supervised machine learning can hindered by retrospective health data because when sick patients are successfully treated, they don’t have poor outcomes. This isn't just a quirky finding relevant to my PhD; it has broader implications for using a supervised paradigm on retrospective data whenever effective treatments are already in place. Bio Hugh is a resident medical doctor specialising in Internal Medicine and Diabetes and Endocrinology, working on his PhD at King's College London. His research focuses on measuring patient health status using electronic health records, drawing on his experience working across various healthcare settings in the UK and internationally.
April 9, 2025
We’re pleased to announce that Dr Petroula Laiou from King's College London , will deliver our May Seminar Series with her talk, "Bridging the Gap: Turning Academic Research into Clinical Innovation " . Petroula will share her journey of translating cutting-edge academic research into a mission-driven MedTech company. The spinout is pioneering a novel approach to forecasting and preventing seizures in people with drug-resistant epilepsy - an innovation rooted in years of interdisciplinary work at the intersection of clinical neuroscience, signal processing, and artificial intelligence. Dr. Laiou will take the audience through the full translational pathway: from identifying an unmet clinical need, designing and analysing first-in-human studies, and developing a seizure prediction algorithm, to securing translational funding, navigating the intellectual property landscape, and filing an international patent (PCT/GB2024/052456). She will reflect on key lessons learned during her time in the King’s MedTech Accelerator Programme - where the team won the Best Innovation award - and share insights on building bridges between academia and industry, shaping a commercialization strategy, and transitioning from researcher to entrepreneur. The talk will also highlight the challenges and rewards of launching a spinout in the healthcare sector and offer practical advice for PhD students and early-career researchers considering the entrepreneurial route. Seminar Series Event: "Bridging the Gap: Turning Academic Research into Clinical Innovation" Date and Time: Wednesday 7 May 2025, 15:00 – 16.00 hrs (BST) Location: The Lorna Wing Room, SGDP Building, Denmark Hill Campus, London, SE5 8AF Attendance: Mandatory for all DRIVE-Health students, therefore please accept the calendar invitation. Registration: Alumni and wider King's College London research community all welcome - please email drive-health-cdt@kcl.ac.uk to let us know if you would like to attend. Dr. Petroula Laiou is a Research Fellow in Predictive Modelling and Clinical Neuroscience at King’s College London. With a background in mathematics, computational physics, and a PhD in signal analysis, her research bridges computer science, neuroscience, and machine learning. Her work focuses on developing predictive models and digital biomarkers for neurological and psychiatric disorders, including epilepsy and depression. Dr. Laiou led the development of a novel seizure forecasting algorithm using intracranial EEG and cortical responses to electrical stimulation—research that led to the filing of an international patent (PCT/GB2024/052456). She is the recipient of multiple research grants, including an MRC award as Principal Investigator, and her translational work was recognised by the King’s MedTech Accelerator Programme, where her team won the Best Innovation award. She has authored over 40 peer-reviewed publications, presented at major international conferences, and actively contributes to interdisciplinary collaborations across academia, hospitals, and industry.