Flagship project delivers step change in text analytics capability

November 15, 2022

The HDR UK National Text Analytics Project team recently came together to share the impacts of their work and opportunities for clinical natural language processing (NLP). Rene Ndoyi, one of the attendees, describes his experience of the HDR UK National Text Analytics project symposium.


Author: Rene Ndoyi, Intern at Institute of Health Informatics


Maximizing text analytics capability for health data research: key learnings from the HDR UK National Text Analytics project symposium

On 28 September 2022, the HDR UK National Text Analytics Project team, led by Professor Richard Dobson (UCL Institute of Health Informatics; King’s College London) and Dr Angus Roberts (King’s College London), came together to share the impacts of their work and opportunities for the clinical

natural language processing (NLP) community to deliver and use new NLP tools at this HDR UK symposium.


This flagship project has delivered a step-change in text analytics capability, enabling a major shift in the UK’s ability to use research-ready, actionable, real-time electronic health records by delivering data-driven systems with potential to transform patient care. Sixty people from across HDR UK and the text analytics community attended the symposium to hear about the wide-reaching impacts of the project, learn about methods, tools and challenges for NLP and text analytics research, and discuss what the community needs to be able to access and use NLP resources for research. One of the attendees, Rene Ndoyi describes his thoughts and learning from the symposium below.


My name is Rene Ndoyi, a recent graduate of the HDR UK Black Internship Programme and intern at the UCL Institute of Health Informatics. The internship programme was such a success in my quest to develop a career in health data science. Among the many interesting projects that I was introduced to is the National Text Analytics Resource – led by Professor Richard Dobson (UCL Institute of Health Informatics; King’s College London) and Dr Angus Roberts (King’s College London).


This flagship project has delivered a step-change in text analytics capability, enabling a major shift in the UK’s ability to use research-ready, actionable, real-time electronic health records by delivering data-driven systems with potential to transform patient care. The project has built a community and brought together specialised resources that provide researchers with the tools and support to explore unstructured free text clinical data, using natural language processing (NLP) and text analytics.


Sixty people from across HDR UK and the text analytics community attended the symposium to hear about the wide-reaching impacts of the project, learn about methods, tools and challenges for NLP and text analytics research. Attendees also discussed what the community needs to be able to access and use NLP resources for research.


My internship mentor, Natalie Fitzpatrick, recommended that I attend the symposium as one of the many ways that the project brings together a community but also creates awareness of opportunities for NLP research being carried out across HDR UK.


It was very insightful and interesting to learn about the work that has been done and the success the project has earned over the past five years.


As an early career researcher who is building my skills in data science, I was keen to learn of the various tools and methods that have been developed to address the challenges of using unstructured free text data. A key piece of work is CogStack, a clinical information retrieval and extraction platform to create richer, more useful clinical information to improve healthcare. The tool enables querying data, without having to code thousands of SQL queries, based on real-time data.

Another tool I learnt about was MedCAT, which extracts information from Electronic Health Records and links it to biomedical vocabulary systems like SNOMED-CT and UMLS. Both of these tools are available for the research community to use via the Health Data Research Innovation Gateway, with the code made open source on GitHub.


Efforts to develop and apply these kinds of tools are important in tackling challenges around avoiding bias, transferability and model sharing.


The team described various ways that they are approaching this – from improving access to unstructured data for research, to developing trusted models of governance and standards. They have developed a template model sharing agreement that is being used across 10 different NHS Trusts to date, so that NLP models can be shared easily.


I also learnt that analysis of free text data can be achieved through R programming, a language I am currently learning. The idea of coding reproducible step by step workflows and frameworks is related to my internship learning experiences. Under Dr Johan Thygesen’s supervision, we are exploring development of reproducible and extensible frameworks, based on a previous study that developed a framework for Covid 19 trajectories among 57 million Adults in England.


Speakers also highlighted the importance of data governance and employing user-centred approaches. Natalie Fitzpatrick gave an interesting talk on creating a free text donated databank to develop and train NLP tools. I was fascinated to hear people’s feedback about this databank. Stakeholders, including patients and the public, researchers, clinicians and information governance and ethics experts, shared their thoughts through focus groups. There was a lot of support for the databank, but important issues were highlighted, such as the need to overcome different forms of bias, lack of generalisability, poor quality of data and patients’ ability to access their data to correct errors.


From my experiences at the symposium, I have no doubt that these efforts will harness more opportunities for improved patient care. I look forward to future meetings and opportunities to learn more about the National Text Analytics Resource project.


Share

January 6, 2025
We’re pleased to announce that Stuart Harrison from ETHOS , will deliver our first 2025 Seminar Series with his talk, "Effective deployment of digital health focused technology at scale " . Stuart has led the Clinical Safety movement in the NHS alongside some of the most prominent Clinical leaders for over 20 years. Stuart is now the co-founder & director of ETHOS, a company providing ethical services to the health industry. Seminar Series Event: "Effective deployment of digital health focused technology at scale" Date and Time: 15:00 – 16.00, Wednesday 29 January 2025 Location: The Judy Dunn Room, Social Genetic and Developmental Psychiatry Building, Denmark Hill Campus, Memory Lane, London SE5 8AF Registration: EPSRC DRIVE-Health students, alumni and wider King's College London research community. Please email drive-health-cdt@kcl.ac.uk to register interest. Abstract: ETHOS will provide insight into the requirements for the safe, secure, and effective deployment of digital health focused technology at scale. Discussions concerning early research problem identification, health system challenges and taking research through to minimum viable product (MVP) and minimum marketable product (MMP). The objective is to highlight the benefit of earlier alignment with regulatory challenges to aid successful interventions and to demonstrate standards can be an enabler not a barrier to innovation. Stuart Harrison has led the Clinical Safety movement in the NHS alongside some of the most prominent Clinical leaders for over 20 years. Stuart is now the co-founder & director of a company providing ethical services to the health industry. ETHOS Ltd was formed in 2014 as a result of a feasibility study completed in partnership with a large pharmaceutical company in the interests of furthering medical science / MedTech innovation. ETHOS was formed from subject matter experts in the compliance requirements for the NHS covering security, information governance, clinical safety, Medical Devices and General Data Protection Regulations. Stuart’s background is Engineering, particularly safety critical industries where safety has immediate risk to harm to system users or the wider general population. He was one of the original authors of the clinical safety standards. An expert advisor (BSI UK) international safety, security, and effectiveness standards; leading this area since 2017 and creating a legacy from the widely recognised NHS clinical safety practises into the international health informatics industry. Stuart has significantly contributed to over 1000 health software systems being clinically assured and provided subject matter input to over 3000 service incidents with patient safety impact in the NHS. He led the creation of clinical risk management toolkits to enable self-certification across the industry for low-risk unregulated health software & ensuring they are compatible with new medical device regulations. A specialist advisor to NICE for medical technology and work closely with MHRA and other arm’s length bodies where patient safety and health software initiatives are needed. A steering group member and advisor to many professional institutions and organisations representing digital health; Stuart is helping to influence safety culture and methods across a number of domains. Stuart was co-author of the government’s Regulators Pioneer Fund bid to address the assurance of AI & machine learning in health software. Having successfully facilitated a £1M research grant being awarded to NHS Digital & MHRA. Digital Leader finalist – Digital City Awards 2021. Stuart is currently studying part time for a PhD at the University of Warwick on the subject of clinical decision supporting systems including safety concepts for emerging technology & complementary regulatory frameworks, the inclusion of mobile health data into safer decision making and exploring the lifecycle models of clinical decision supporting systems.
September 12, 2024
We’re thrilled to announce that John Jumper, PhD , will kick-off our 2024/2025 Seminar Series with his talk, "Extending AlphaFold to make predictions across the universe of biomolecular interactions" . John is one of the key pioneers behind the development of Google’s DeepMind AlphaFold - an artificial intelligence model to predict protein structures from their amino acid sequence with high accuracy. This in-person event promises to be an incredible opportunity to hear from one of the foremost innovators in AI and biology. Seminar Series Event: Extending AlphaFold to make predictions across the universe of biomolecular interactions Date and Time: 14:00 – 15.00, Thursday 10 October 2024 Location: The Council Room, 2nd floor, The King’s Building, Strand Campus Registration: Limited to EPSRC DRIVE-Health students in the first instance. Please email drive-health-cdt@kcl.ac.uk to check availability. Abstract: The high accuracy of AlphaFold 2 in predicting protein structures and protein-protein interactions raises the question of how to extend the success of AlphaFold to general biomolecular modeling, including protein-nucleic and protein-small molecule structure predictions as well as the effects of post-translational modification. In this talk, I will discuss our latest work on AlphaFold 3 to develop a single deep learning system that makes accurate predictions across these interaction types, as well as examine some of the remaining challenges in predicting the universe of biologically-relevant protein interactions.
Share by: