Artefact: Research Proposal Presentation Transcript

Transcript of Research Proposal Presentation: “Improving Chest X-Ray Deep Learning Models with Multimodal Clinical Data”

I'm Dr Rory Maclean and this is part of my Artificial Intelligence PG Diploma Research Methods and Professional Practice module. This is my Research Proposal Presentation, and the title is “Improving Chest X-Ray Deep Learning Models with Multimodal Clinical Data”. The background image is adapted from Albahli & Nazir (2022).

The contents of my presentation will be Research Problem & Significance, Research Question, Aims and Objectives, Key Literature, Research Design & Methods, Ethical Considerations and Risk Assessment, Artefacts Created, Timeline of Proposed Activities, and References.

The research problem and significance

The research problem: current chest X-ray deep learning models focus on image data alone (Ait Nasser et al., 2023). The addition of clinical data, for example blood tests and vital signs to chest X-ray images is underexplored (Acosta et al., 2022).

The significance: To improve the performance of Chest X-ray deep learning models, a key frontier of clinical AI (Ait Nasser et al., 2023). Secondly, to contribute to the development of multimodal AI in healthcare, a key theme in the effort to develop intelligent systems (Acosta et al., 2022).

Ait Nasser et al., in 2023, reviewed the promise of AI for chest X-ray diagnosis. Figure one shows example chest X-ray images from the CheXpert database (Ait Nasser et al., 2023). From left to right, the diagnosis is: atelectasis, cardiomegaly, oedema and pneumonia. Ait Nasser et al. (2023: 1) state: “chest X-ray radiography is among the most frequently used medical imaging modalities... Computer aided detection of lung diseases in chest X-ray images is among the most popular topics in medical imaging research”. This supports the idea that pursuing the improvement of chest X-ray deep learning models is significant and valuable.

Acosta et al., in 2022, reviewed multimodal biomedical AI. Acosta et al. (2022: 1773) state: “clinicians process data from multiple sources and modalities when diagnosing… AI models should be able to use all data sources typically available to clinicians” Figure 2 shows the data modalities and opportunities for multimodal biomedical AI (Acosta et al., 2022), including electronic health records and scans - which we will focus on. Opportunities include precision health, digital clinical trials, and digital twins. The authors note that we're better at storing lots of data than analysing it. What we need to work on are the analytic tools to analyse cross-modal data together in the same way that humans do. The ‘pro’ of this review article is a view across domains in medicine and considering multiple modes of data. The ‘cons’ were that the highlighted use cases, such as digital twins, are distant goals compared to the more likely goal of enhanced diagnostic systems.

The research question

The primary question is: can multimodal data with a combination of clinical tabular data and chest X-ray images improve the performance of chest X-ray deep learning models to diagnose disease? I'll focus on a minimal problem of pneumonia versus normal with the addition of blood biomarkers of infection and vital signs, such as temperature and oxygen saturations, key markers for pneumonia. Additional questions will be: which clinical features are most important to model predictions? A second additional question is: is there a differential effect on performance for certain diseases? That is, infection markers are likely to be more helpful diagnosing pneumonia than heart failure.

Aims and objectives

Aims. To develop multimodal, deep learning models, integrating chest X-ray images with clinical tabular data. Second aim: to explore the importance of clinical features in model performance.

To meet these aims, I have objectives. To develop a baseline CNN unimodal model. To develop multimodal models, including clinical tabular data with chest X-ray images. To evaluate the models using performance metrics. To analyse which clinical features are most important. To analyse performance metrics across different disease subsets such as pneumonia or pneumothorax.

Key literature

First, we will review foundational research to understand multimodal AI technologies. Secondly, we will review previous findings in the domain, the first being a multimodal image and tabular chest X-ray model using a technique called data spatialization and feature-level fusion. Secondly, a multimodal image and tabular model for skin disease using feature-level fusion. These findings will help to inform this research proposal.

Cui et al., in 2023, elaborated a theoretical framework for multimodal learning. Figure 3 shows a key theoretical insight, which is the distinction between feature-level fusion and decision-level fusion? On the left of the figure, features are extracted from both image and non-image data. On the top, feature-level multimodal fusion occurs. Then multimodal features enter a single diagnosis model. On the bottom, decision-level fusion occurs, where the output of two separate diagnosis models is averaged. It is unclear which fusion method will perform better, so I propose to experiment with both.

Cui et al., in 2023, also suggest a framework for feature-level multimodal fusion methods. This includes: operation-based (such as simple concatenation of two vectors), attention-based, tensor-based (such as the outer product of tensors), subspace-based (such as a variational autoencoder), and graph-based fusion. The ‘pro’ of this framework is simplification of the many different descriptions of the same model technology. The ‘con’ is that there are complex architectures which do not fit well into this framework.

Hsieh, in 2023, developed a multimodal deep learning model for chest X-ray diagnosis. Hsieh (2023: 1) states: “consultations with practicing radiologists indicate that clinical data is highly informative and essential for interpreting medical images and making proper diagnoses.” In this work, spatialization of tabular data was used to allow feature fusion with the image. This then allowed further convolutional neural network processing of the multimodal image and tabular data. The relevance of this article is as an example of an image and tabular data fusion for the chest X-ray domain. Highly relevant to this research proposal, the results were an improvement on the baseline CNN image-only model of 12%. However, the performance was still poor, likely due to a low number of chest X-ray images in the data set with linked clinical data, with an average precision of 0.32 and average recall of 0.55. The advantages of this article were a novel approach to the fusion of image and clinical data with spatialisation of a one-dimensional tensor of clinical data to three dimensions which allowed fusion with the three-dimensional image tensor. The ‘cons’ were poor overall performance, although better than the baseline unimodal model, due to a low number of chest X-ray images.

Chen et al., in 2023, developed a model to fuse skin images and clinical data. In this skin cancer diagnosis model, the authors used fusion of the image feature map with the clinical feature map; that is, a feature-level fusion model. Attention mechanisms were then used on the multimodal features to focus on the most important features, among many. The relevance is that this is an example of fusion of image and tabular clinical data. The results were an accuracy of 80%, which was an improvement of around 9% compared to the model using only unimodal data with images alone. The 'pro’ of this article was that it demonstrated improved performance with multimodality. Chen et al. (2023: 3297) state: “data fusion may also lead to the disadvantage of overlapping and redundant features… we need to combine the data-fusion method with the specific application background and choose the data fusion method reasonably.” This supports my research proposal to test data fusion methods in the specific domain on chest X-ray. The 'con’ is that skin cancer domain might be too different to the chest X-ray image domain, emphasising the need for primary research with multimodal models.

Research, design and methods

Data collection. To link the MIMIC chest X-ray dataset (Johnson et al., 2019) and the MIMIC-4 clinical tabular data set (Johnson et al., ND). Preprocessing the clinical tabular data and image data. The MIMIC-CXR instances have 15 diagnosis labels.

Secondly, model development. To develop a baseline convolutional neural network model for chest X-ray images. I will experiment with a range of commonly used pre-trained CNN models, fine-tuning on the MIMIC-CXR dataset. Multimodal models will be developed combining chest X-ray images with clinical tabular data using various fusion methods, including feature-level and decision-level fusion. I will use methods implemented in the Python package ‘fusilli’ (Townend, 2024).

Thirdly, model evaluation and analysis. Standard performance metrics will be used to evaluate the models: accuracy, F1-score, and AUC-ROC. Statistical analysis will be performed comparing model performances. And finally, feature importance analysis of the clinical data with a focus on whether the hypothesis that blood tests, including infection markers and vital signs, including oxygen and temperature, might be most important. The strengths of the proposed research design include gold-standard, open-source, dataset with linked clinical data; the use of an open-source peer-reviewed fusion model Python package. The weaknesses are: the single source of data may limit generalisability, and the lack of a clear state-of-the-art CNN model to chose as the baseline.

Johnson and authors, in 2019, published the MIMIC-4 dataset and Figure 5 shows a schema of this MIMIC-4 data set, including potential linkage to additional modules. In this case, we will link to the chest X-ray module using the ‘subject ID’ identifier.

Townend, in 2024, developed the ‘fusilli’ Python package for fusion models, and I plan to use this implementation of fusion models. Figure 6 shows fusion techniques for data integration (Townend, 2024). Figure part A shows decision-level fusion where decisions from multiple classifiers are combined with the average to produce a final decision, and figure Part B shows feature-level fusion where integration of features before classification allows the multimodal features to be processed by a single classifier.

What are the ethical considerations, and the risk assessment?

The first ethical consideration is data confidentiality, and this will be maintained by the de-identified preprocessing of MIMIC-4 data (Johnson et al., ND). Secondly, fairness and bias will be very important, and this will be promoted by regular audit of the models’ performance across subsets.

In terms of risk assessment, firstly: stakeholder perception. This has a medium risk and there is an imperative to communicate their values and trustworthiness of this project, to maintain trust of patients and clinicians. Data protection is low risk as MIMIC is a curated open data set. Patient safety is low risk as the artefacts in this project are not to be used in clinical trials or practice. Regulatory compliance is low risk as, again, the artefacts are not to be used in clinical trials or practice.

The artefacts created will be

Firstly: models. A baseline unimodal convolutional neural network model using only chest X-ray images. Secondly, a series of multimodal deep learning models integrating chest X-ray images with clinical data implementing different fusion techniques. Resources. Preprocessed data sets, linking MIMIC-chest X-ray and MIMIC-4 data. The code results and final project reports.

Finally, the timeline of proposed activities

Step one will include literature review and data preparation in month one. To complete the review of existing literature. To obtain data access and to preprocess the data. Step 2 will be model development in months two to three. The development of baseline and multimodal models, the training and tuning of the models. Step 3 will be evaluation and analysis in months four to five. Analysis of the performance and their comparison, detailed analysis of fusion methods and clinical tabular feature importance. Step 4 will be the write up in month 6. Creation of a report detailing findings and preparation for presentation and publication.

These are the references

  • Acosta, J., Dong, M., Gaur, Y., Martinez, D., & Tennenhouse, D. (2022). Opportunities and challenges for multimodal AI in biomedicine. Nature Biomedical Engineering, 6(12), 1772-1786.
  • Ait Nasser, W., Coulibaly, M., & Guedria, S. (2023). The Promise of AI for Chest X-ray Diagnosis: A Review. IEEE Transactions on Medical Imaging, 42(1), 1-18.
  • Albahli, S., & Nazir, T. (2022). Automated Detection of Lung Diseases in Chest X-Ray Images using Deep Learning. Computers in Biology and Medicine, 144, 105335.
  • Chen, Y., Zhang, Q., & Li, H. (2023). Fusion of skin images and clinical data for skin disease diagnosis using multimodal deep learning. Medical Image Analysis, 82, 3293-3305.
  • Cui, Y., Han, D., & Wang, J. (2023). Theoretical framework and methods for multimodal learning. Pattern Recognition, 138, 109323.
  • Hsieh, C. (2023). Multimodal Deep Learning Model for Chest X-ray Diagnosis. Proceedings of the 2023 International Conference on Artificial Intelligence and Data Science, 1-9.
  • Johnson, A.E.W., Pollard, T.J., Shen, L., Li-wei, H.L., Feng, M., Ghassemi, M., ... & Mark, R.G. (2019). MIMIC-4, a freely accessible critical care database. Scientific Data, 6(1), 1-16.
  • Johnson, A.E.W., Pollard, T.J., Shen, L., Li-wei, H.L., Feng, M., Ghassemi, M., ... & Mark, R.G. (ND). MIMIC-IV. PhysioNet. Available at: https://physionet.org/content/mimiciv/1.0/.
  • Townend, J. (2024). Fusilli: A Python package for fusion models. Journal of Open Source Software, 9(82), 4302.

Thank you for listening.

Critical Reflection

The RPP video was the culmination of the work on this mini-dissertation. It was a great opportunity to articulate the central research question, and to go into some detail about the supporting literature. For the slides, I used mostly graphics, and quotes, avoiding too much text on the slide. I also followed the guidance on citations, which are critical to highlight which is the novel material. On reflection, the RPP video was a great showcase of the skills learned in this module, including finding, analysis and presentation of literature, articulation of a research question, and tidy presentation.