Flagship project delivers step change in text analytics capability

November 15, 2022

The HDR UK National Text Analytics Project team recently came together to share the impacts of their work and opportunities for clinical natural language processing (NLP). Rene Ndoyi, one of the attendees, describes his experience of the HDR UK National Text Analytics project symposium.


Author: Rene Ndoyi, Intern at Institute of Health Informatics


Maximizing text analytics capability for health data research: key learnings from the HDR UK National Text Analytics project symposium

On 28 September 2022, the HDR UK National Text Analytics Project team, led by Professor Richard Dobson (UCL Institute of Health Informatics; King’s College London) and Dr Angus Roberts (King’s College London), came together to share the impacts of their work and opportunities for the clinical

natural language processing (NLP) community to deliver and use new NLP tools at this HDR UK symposium.


This flagship project has delivered a step-change in text analytics capability, enabling a major shift in the UK’s ability to use research-ready, actionable, real-time electronic health records by delivering data-driven systems with potential to transform patient care. Sixty people from across HDR UK and the text analytics community attended the symposium to hear about the wide-reaching impacts of the project, learn about methods, tools and challenges for NLP and text analytics research, and discuss what the community needs to be able to access and use NLP resources for research. One of the attendees, Rene Ndoyi describes his thoughts and learning from the symposium below.


My name is Rene Ndoyi, a recent graduate of the HDR UK Black Internship Programme and intern at the UCL Institute of Health Informatics. The internship programme was such a success in my quest to develop a career in health data science. Among the many interesting projects that I was introduced to is the National Text Analytics Resource – led by Professor Richard Dobson (UCL Institute of Health Informatics; King’s College London) and Dr Angus Roberts (King’s College London).


This flagship project has delivered a step-change in text analytics capability, enabling a major shift in the UK’s ability to use research-ready, actionable, real-time electronic health records by delivering data-driven systems with potential to transform patient care. The project has built a community and brought together specialised resources that provide researchers with the tools and support to explore unstructured free text clinical data, using natural language processing (NLP) and text analytics.


Sixty people from across HDR UK and the text analytics community attended the symposium to hear about the wide-reaching impacts of the project, learn about methods, tools and challenges for NLP and text analytics research. Attendees also discussed what the community needs to be able to access and use NLP resources for research.


My internship mentor, Natalie Fitzpatrick, recommended that I attend the symposium as one of the many ways that the project brings together a community but also creates awareness of opportunities for NLP research being carried out across HDR UK.


It was very insightful and interesting to learn about the work that has been done and the success the project has earned over the past five years.


As an early career researcher who is building my skills in data science, I was keen to learn of the various tools and methods that have been developed to address the challenges of using unstructured free text data. A key piece of work is CogStack, a clinical information retrieval and extraction platform to create richer, more useful clinical information to improve healthcare. The tool enables querying data, without having to code thousands of SQL queries, based on real-time data.

Another tool I learnt about was MedCAT, which extracts information from Electronic Health Records and links it to biomedical vocabulary systems like SNOMED-CT and UMLS. Both of these tools are available for the research community to use via the Health Data Research Innovation Gateway, with the code made open source on GitHub.


Efforts to develop and apply these kinds of tools are important in tackling challenges around avoiding bias, transferability and model sharing.


The team described various ways that they are approaching this – from improving access to unstructured data for research, to developing trusted models of governance and standards. They have developed a template model sharing agreement that is being used across 10 different NHS Trusts to date, so that NLP models can be shared easily.


I also learnt that analysis of free text data can be achieved through R programming, a language I am currently learning. The idea of coding reproducible step by step workflows and frameworks is related to my internship learning experiences. Under Dr Johan Thygesen’s supervision, we are exploring development of reproducible and extensible frameworks, based on a previous study that developed a framework for Covid 19 trajectories among 57 million Adults in England.


Speakers also highlighted the importance of data governance and employing user-centred approaches. Natalie Fitzpatrick gave an interesting talk on creating a free text donated databank to develop and train NLP tools. I was fascinated to hear people’s feedback about this databank. Stakeholders, including patients and the public, researchers, clinicians and information governance and ethics experts, shared their thoughts through focus groups. There was a lot of support for the databank, but important issues were highlighted, such as the need to overcome different forms of bias, lack of generalisability, poor quality of data and patients’ ability to access their data to correct errors.


From my experiences at the symposium, I have no doubt that these efforts will harness more opportunities for improved patient care. I look forward to future meetings and opportunities to learn more about the National Text Analytics Resource project.


Share

April 9, 2025
We’re pleased to announce that Dr Petroula Laiou from King's College London , will deliver our May Seminar Series with her talk, "Bridging the Gap: Turning Academic Research into Clinical Innovation " . Petroula will share her journey of translating cutting-edge academic research into a mission-driven MedTech company. The spinout is pioneering a novel approach to forecasting and preventing seizures in people with drug-resistant epilepsy - an innovation rooted in years of interdisciplinary work at the intersection of clinical neuroscience, signal processing, and artificial intelligence. Dr. Laiou will take the audience through the full translational pathway: from identifying an unmet clinical need, designing and analysing first-in-human studies, and developing a seizure prediction algorithm, to securing translational funding, navigating the intellectual property landscape, and filing an international patent (PCT/GB2024/052456). She will reflect on key lessons learned during her time in the King’s MedTech Accelerator Programme - where the team won the Best Innovation award - and share insights on building bridges between academia and industry, shaping a commercialization strategy, and transitioning from researcher to entrepreneur. The talk will also highlight the challenges and rewards of launching a spinout in the healthcare sector and offer practical advice for PhD students and early-career researchers considering the entrepreneurial route. Seminar Series Event: "Bridging the Gap: Turning Academic Research into Clinical Innovation" Date and Time: Wednesday 7 May 2025, 15:00 – 16.00 hrs (BST) Location: The Lorna Wing Room, SGDP Building, Denmark Hill Campus, London, SE5 8AF Attendance: Mandatory for all DRIVE-Health students, therefore please accept the calendar invitation. Registration: Alumni and wider King's College London research community all welcome - please email drive-health-cdt@kcl.ac.uk to let us know if you would like to attend. Dr. Petroula Laiou is a Research Fellow in Predictive Modelling and Clinical Neuroscience at King’s College London. With a background in mathematics, computational physics, and a PhD in signal analysis, her research bridges computer science, neuroscience, and machine learning. Her work focuses on developing predictive models and digital biomarkers for neurological and psychiatric disorders, including epilepsy and depression. Dr. Laiou led the development of a novel seizure forecasting algorithm using intracranial EEG and cortical responses to electrical stimulation—research that led to the filing of an international patent (PCT/GB2024/052456). She is the recipient of multiple research grants, including an MRC award as Principal Investigator, and her translational work was recognised by the King’s MedTech Accelerator Programme, where her team won the Best Innovation award. She has authored over 40 peer-reviewed publications, presented at major international conferences, and actively contributes to interdisciplinary collaborations across academia, hospitals, and industry.
March 5, 2025
We were thrilled to welcome Charles Friedman from the University of Michigan Medical School , who delivered our March Seminar Series with his talk, "Why AI and Learning Health Systems Need Each Other " . Charles began by advancing the idea that, while both are extremely important: AI is a means and Learning Health Systems (LHS) are an end--and why it is most important to maintain that distinction. He introduced the socio-technical infrastructure required for high-functioning learning systems and argue that this infrastructure provides a framework, actually a schematic, for successfully implementing AI into healthcare. Charles Friedman is Professor of Learning Health Sciences at the University of Michigan Medical School, where he directs the Knowledge Systems Laboratory. He was formerly Founding Chair of the Department of Learning Health Sciences and the Josiah Macy Jr. Professor of Medical Education. He holds joint appointments in the Schools of information and Public Health. He is editor-in-chief of the open-access journal Learning Health Systems and co-chair of the multi-national movement to Mobilize Computable Biomedical Knowledge. Throughout his career, Friedman has developed and studied methods to improve health, education, and research through innovative applications of information technology. Most recently, Friedman has focused his academic interests and activities on the concept of Learning Health Systems that improve health by marrying discovery to implementation, and the socio-technical infrastructure required to sustain these systems. Friedman is a Distinguished Fellow of the American College of Medical Informatics, and a founding fellow of the International Academy of Health Sciences Informatics. He holds an honorary doctorate from the University of Lucerne in Switzerland for his contributions to the science of Learning Health Systems. Prior to coming to Michigan, Friedman held executive positions at the Office of the National Coordinator for Health IT (ONC) in the U.S. Department of Health and Human Services. Immediately prior to his work in the government, he was Associate Vice Chancellor for Biomedical Informatics, and Founding Director of the Center for Biomedical Informatics at the University of Pittsburgh. Seminar Series Event: "Why AI and Learning Health Systems Need Each Other" Date and Time: Wednesday 26 March 2025, 10:00 – 11.00 hrs (BST) Location: The Anatomy Museum, King's Building, Room K6.36, Strand Campus, Strand, London, WC2R 2LS Attendance: Mandatory for all DRIVE-Health students, therefore please accept the calendar invitation. Registration: Alumni and wider King's College London research community all welcome - please email drive-health-cdt@kcl.ac.uk to let us know if you would like to attend.