How will EUCAIM manage the legal requirements of the General Data Protection Regulation (GDPR) when collecting the data? In practical terms will data leave the hospital? How will you ensure data privacy?

This is a key standpoint for any European project based on unleashing access to health data. To ensure full compliance with data privacy principles, EUCAIM will be built under the data privacy by design and by default approach.

There are currently two scenarios for the implementation of the infrastructure: in the first one, data in the fragmented research repositories is already anonymised (e.g., data coming from fragmented repositories such as previously EU-funded projects and other research repositories). In this scenario, privacy is no longer an issue. Instead, the challenge will be how to make these repositories interoperable to be able to populate the Atlas of Cancer Images. Anonymized data from established research (such as data coming from clinical trials) will be easier to include, as this data is already pseudonymized and only a final de-identification step is needed (full anonymization).

In the second scenario, related to data at hospitals and real-world clinical practice, we are pursuing a federated learning approach, whereby the data remains distributed, stored at the local sites, with all the protective privacy measures in place. The data will not leave the hospital premises, safeguarding its privacy. As part of the project implementation, we will investigate how to do this securely and evaluate the associated risks. To allow this, the infrastructure software will get access to the data warehouse at the hospitals, for which the hospitals will need to have a structured data warehouse following the requirements that will be defined by the EUCAIM project. With data warehouses following these standards and the data being safe and quality-controlled, the process of extraction of the data from the Electronic Health Records (EHRs) to the data warehouse will be harmonised and normalised in an easier way. For this, a checklist will be provided by the project to the different hospitals, and if the hospital has enough computing resources, it may be possible to deploy AI solutions directly on the data in the hospital. The data won’t have to leave the hospital. The discussion on the best data warehouse architectures has already started. The project will publish these guidelines and checklist within the first year to provide technical guidance to individual hospitals.

What is the role of proprietary data in the infrastructure created by EUCAIM project?

There are occasions when data is used that may be protected under a license, being proprietary to an organization or a project. All proprietary data will be identified in the data availability statement. The statement will further describe restrictions on reuse and federated use of this data. In the long term, EUCAIM will favour the storage of non-proprietary data within the Atlas of anonymised Cancer Images.

How will the quality of data be controlled and ensured? We know that annotations vary from one institution to another. Will you have mechanisms to foster data quality?

It is an important challenge to curate the data in order to be able to develop robust AI models. EUCAIM has tools to monitor the image quality, in particular tools that can be used to annotate images through machine learning models, and tools to evaluate the quality of newly acquired images through a quality index score. Data quality criteria and data quality tools will be further elaborated by the project and data curation tools will be further improved. It is also expected that the infrastructure will help finding new self-learning methods to cluster the images and provide such services to data providers.

Will the federated infrastructure include histological images?

The infrastructure will address mainly medical images from devices such as for example MRI, CT, ultrasound, single photon emission computed tomography (SPECT), positron emission tomography (PET), X-ray, and mammography. In addition, the infrastructure is to be linked to other repositories, including biobanks for histological images, for example via the BBMRI-ERIC or the infrastructure created by the BIGPICTURE project.

Are there any clinical questions of particular importance that medical researchers would like to see addressed in EUCAIM?

Oncology is the main clinical aspect that will be addressed. There is no focus on specific pre-defined tumours or clinical questions. The project will deal with practical unmet clinical questions coming from researchers, practitioners and innovators, such as easy recognition of small lesions, phenotyping of tumours, dealing with prognosis and treatment allocation, and in the near future, evaluation of staging. The platform will be open to all types of tumours, including for example breast, lung, colorectal or skin cancer. Even though the platform will not initially focus on addressing specific clinical questions, it is planned that by project end a set of clinical use-cases will be chosen for platform testing and validation. This will allow fine-tuning of the different platform implementations in terms of requirements and capabilities.

For a more comprehensive overview visit the FAQ section on our webpage: FAQ - Cancer Image Europe.

Please note that we are in the process of collecting further FAQs. This section will be updated continuously.