Available
Project number:
2025_70
Start date:
October 2025
Project themes:
Main supervisor:
Research Fellow
Co-supervisor:
Dr Thomas Searle
Additional Information:
A Unified Transformer-Based Natural Language Processing Framework for Electronic Health Records
Background
Electronic Health Records (EHRs) contain a vast amount of valuable clinical information in both structured and unstructured formats. Extracting that information from the records can improve patient care, support clinical decision-making, and advance medical research. The CogStack platform [Noor et al., 2022] is an integrated information retrieval and extraction ecosystem which has been deployed in multiple large National Health Service (NHS) Foundation Trust hospitals in the UK. It enables various natural language processing (NLP) tasks to be built on top of it. Typical NLP tasks include Named-Entity Recognition and Linking (NER+L), Entity Relationship Extraction (ERE), Information Extraction (IE), Summarisation and De-identification.
In recent years, transformers [Lin et al., 2022] have revolutionised NLP by effectively capturing contextual relationships within text. For example, Foresignt [Kraljevic et al., 2024], a generative pretrained transformer, can forecast patient's trajectory using EHRs data. Given the diversity of the clinical NLP tasks, there is a significant opportunity to develop a unified framework that leverages transformer-based models to enhance the processing of EHR data.
Retrieval Augmented Generation (RAG) [Lewis et al., 2020] provides a practical approach to implementing this unified framework. RAG combines the strengths of retrieval-based methods and generative pretrained transformer models. This combination supplies the transformer model with specific context, improving the accuracy and relevance of the generated outputs.
Novelty & Importance
A unified transformer-based NLP framework, through RAG, would facilitate seamless integration of the NLP tasks within a single system. This would enhance efficiency and consistency across clinical applications. Such a framework could also improve the relevance of extracted clinical information. Ultimately, this integration would support better patient outcomes, inform clinical decision-making, and accelerate medical research by providing healthcare professionals with valuable insights derived from EHR data.
Aims & Objectives
The project aims to:
• develop a unified transformer-based NLP framework for EHRs by integrating multiple clinical NLP tasks
• enhance the usability and performance of NLP tasks using retrieval augmented generation (RAG)
• evaluate and validate the framework in real-world clinical settings within the CogStack platform
References
Noor, K., Williams, R. J., O’Brien, N., et al. (2022). CogStack—Open source information retrieval and extraction platform for healthcare data. Journal of Biomedical Informatics, 123, 103934.
Lin, T., Wang, Y., Liu, X., & Qiu, X. (2022). A survey of transformers. AI open, 3, 111-132.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.
Kraljevic, Z., Bean, D., Shek, A., Bendayan, R., Hemingway, H., Yeung, J. A., ... & Dobson, R. J. (2024). Foresight—a generative pretrained transformer for modelling of patient timelines using electronic health records: a retrospective modelling study. The Lancet Digital Health, 6(4), e281-e290.
We are now accepting applications for 1 October 2025
How to apply
Candidates should possess or be expected to achieve a 1st or upper 2nd class degree in a relevant subject including the biosciences, computer science, mathematics, statistics, data science, chemistry, physics, and be enthusiastic about combining their expertise with other disciplines in the field of healthcare.
Important information for International Students:
It is the responsibility of the student to apply for their Student Visa. Please note that the EPSRC DRIVE-Health studentship does not cover the visa application fees or the Immigration Health Surcharge (IHS) required for access to the National Health Service. The IHS is mandatory for anyone entering the UK on a Student Visa and is currently £776 per year for each year of study. Further detail can be found under the International Students tab below.
Next Steps
- Applications submitted by the closing date of Thursday 6 February 2025 will be considered by the CDT. We will contact shortlisted applicants with information about this part of the recruitment process.
- Candidates will be invited to attend an interview. Interviews are projected to take place in April 2025.
- Project selection will be through a panel interview chaired by either Professor Richard Dobson and Professor Vasa Curcin (CDT Directors) followed by informal discussion with prospective supervisors.
- If you have any questions related to the specific project you are applying for, please contact the main supervisor of the project directly.
For any other questions about the recruitment process, please email us at