Even knowing how sub-populations with a similar profile responded to specific treatments would be a big help. However, characterizing these sub-populations requires pooling data from many different hospitals, with all the standardization and privacy issues that go with it. Supported by Health-RI, researchers have now come up with a novel solution – take the research question to the data rather than the data to the research question. That way, patient data never needs to leave the hospital.
Every day, clinicians have to make tough decisions. For example: Does my cancer patient need high dose radiation therapy or will a low dose do? Does radiation therapy need to be combined with chemotherapy, with the adverse side effects it may bring?
Most of today’s clinical guidelines are still based on whole-population averages for treatment efficacy and side effects, including patients that range in age between early adulthood and old age, and patients with widely varying disease characteristics. But If you are a 20-year old male, what you really want to know is the results and side effects of a specific treatment in other males aged around twenty with the same disease characteristics as you. Being able to tap into the clinical records of patients with a similar profile in terms of age, lifestyle, comorbidity etc., would therefore help clinicians improve their treatment recommendations. André Dekker, Professor of Clinical Data Science at Maastricht University and clinical researcher at MAASTRO Clinic, wants to make that possible without violating patient privacy.
André Dekker, Professor of Clinical Data Science at Maastricht University and clinical researcher at MAASTRO Clinic
A typical doctor might have seen the effect of a specific treatment on a few hundred patients but is unlikely to know what happened to patients in other hospitals, because that information is locked away behind another hospital’s security wall.
It’s not only clinical information about appropriate sub-populations that could be relevant for informed decision-making. For example, knowing whether similar patients were able to return to work after a particular treatment, and how quickly they were able to do so, could be very useful in helping patients to plan ahead. In the Netherlands, this type of information is recorded by Statistics Netherlands (CBS, Centraal Bureau voor de Statistiek) but to protect people’s privacy hospitals and CBS can’t easily share this.
The Personal Health Train
To make effective use of the wealth of information that’s hidden in hospital data repositories and government databases, Dekker and many collaborators have developed a concept called the ‘Personal Health Train’. “Using our Personal Health Train concept, we can give physicians access to data on the effect of a treatment in say 35,000 people, without sharing data in a centralized database. Instead, we bring research algorithms to the data wherever it happens to be, which means patient data doesn’t have to leave the hospital,” says Dekker.
Working with Data from China
The first thing participating hospitals need to do is set up datasets that can be used for research, with each data element linked to a license that describes what researchers are allowed to do with it, ranging from ‘nobody is allowed to do anything with this element’ to ‘anyone can use this element as they wish’. According to Dekker, these secure licensed datasets represent the ‘stations’ in the Personal Health Train network. To utilize the data, researchers send their research questions off on a so-called ‘train’ that calls at each of these ‘stations’, only taking on board the results of applying the research question to the data, not the data itself. In this way, hospitals, participants in clinical studies and patients remain in full control. By posing the same question to each data source and accumulating the answers they get back, researchers can answer questions such as: ‘Which data elements are most predictive of 5-year survival after lung cancer?’ With sufficient stations spread across the Netherlands, the results will be representative of sub-populations in the Netherlands as a whole.
However, Dekker’s plans already reach far beyond the Netherlands.
Around 25 clinical centers around the world, including hospitals in China, are already cooperating with one another using the Personal Health Train concept. According to Dekker, one of the biggest challenges was making the data from different sources interoperable.
“We don’t speak Chinese. So without any preparations, it’s very difficult for us to use data from Chinese hospitals. Furthermore, different hospitals use different software for storing health data. For our concept to work, each hospital needs to make their data interoperable, which can be as simple as ensuring that everyone uses a standard code to designate the column in an Excel spreadsheet that contains the patient’s age,” says Dekker. “Once the data is made FAIR – Findable, Accessible, Interoperable and Reusable – researchers can ask questions by sending their research algorithm to all participating medical centers. We started ten years ago with lung cancer. Later on, we added prostate cancer and breast cancer, and we keep on extending our cooperation by adding more types of cancer over time.”
So far, the results have been impressive. “Physicians find it hard to predict whether a lung cancer patient will survive, and in most cases their predictions aren’t any better than tossing a coin. Yet our models made the right prediction for 75% of patients,” says Dekker. He is, however, keen to point out that models in general need to get better, because one interesting finding was that people are more accepting of mistakes made by humans than they are of mistakes made by computers!
Supported by Health-RI
Every medical center that joins the network needs to nominate someone who is responsible for getting permission to use their organization’s patient data. However, according to Dekker this is rarely difficult, because privacy officers are already enthusiastic about the Personal Health Train approach. To help them curate their data and make it FAIR, Health-RI already hosts a number of Wiki pages and software tools. Anonymizing the data is supported with tools developed by the Dutch TraIT (Translational Research IT) consortium, now an integral part of Health-RI, which can automatically remove patient names and ages from medical images such as CT and MRI scans. TraIT tools can also help with data standardization – for example, automatically recognizing key words such as ‘man’ or ‘woman’ in medical records to insert the correct gender code in spreadsheets.
As the Personal Health Train starts to be used for more projects and more diseases, the idea is that each University Medical Center in the Netherlands will have its own part-time Health-RI person on-site to maintain the UMC’s existing data ‘stations’ and add new ones. To offset costs, revenues such as royalty fees that result from research based on the data would be split between the researchers and the data provider.
Taking a holistic approach
Dekker and his team are also working hard to extend the range of data providers beyond clinical organizations. As a first step in the process, they have initiated a pilot project with Statistics Netherlands (CBS) that will look into the effects of a person’s socio-economic environment on their risk of getting diabetes – a holistic approach to diabetes prevention.
“Using CBS data for health research is a promising way to further improve the quality of Dutch health care,” says Magchiel van Meeteren, managing director of CBS’s Center for Big Data Statistics.
CBS has a wealth of privacy-sensitive information about Dutch patients, such as family compositions, educational attainment, where people live, what they do for a living, how much money they earn, and what the school results are like in their area. It also holds environmental data such as whether sports facilities are readily available in someone’s neighborhood, and the local air quality.
In theory, therefore, CBS could answer questions that would help clinicians and patients to make more informed choices. For example, what percentage of patients return to their jobs after a specific type of treatment for their disease? Provided they have the patient’s consent, they could also provide valuable background information about patients taking part in clinical trials. The Personal Health Train offers a potential solution to maintaining the privacy of CBS data, but due to the sensitivity of the data, the system’s privacy protection needs to be rigorously tested – something that’s currently being done in a diabetes research project funded under the NWO National Science Agenda (NWA - Nationale Wetenschapsagenda) program.
“This new project poses a lot of new ethical and security questions and we’re happy that we can use the expertise of Health-RI and don’t have to start from scratch. Specialists in legal and ethical issues are involved from the start,” says van Meeteren. “However, CBS can only provide real data if the IT infrastructure used is proven to be safe. So our priority is to build a safe IT infrastructure, which is why at the moment we are only providing fictional data in the project. We provide different types of data and these data need to ‘meet’ the data of the hospitals in order to analyze it.”
To ensure the required privacy, the project team have come up with a novel solution. Both datasets will be encrypted and brought together in a trusted secure environment where no human being can access it. Within this secure environment, the data will be analyzed automatically against the research question, and when that analysis is complete and the research question answered, the data will be instantly and automatically destroyed.
Van Meeteren believes that the efforts of Health-RI to establish a national research infrastructure in the Netherlands will be of significant help in the endeavor: “When all hospitals in the Netherlands provide data in the same way, the results of our analyses will be better, because there will be fewer distortions caused by differences in the data structure"
Dekker agrees, also stating that another advantage of bundling capabilities into Health-RI is that a single helpdesk, managed by Health-RI, will help to ensure that everybody uses the same interpretation of the laws concerning data protection. Furthermore, this helpdesk will provide legal advice, such as templates for standard contracts between researchers and hospitals.
“These services save researchers a lot of time. Through Health-RI we have easy access to the knowledge acquired in BBMRI projects about the legal aspects of working with health data, about how to formulate contracts with hospitals, how to make data anonymous, and when to ask for additional consent from the patient,” says Dekker. “In that way researchers don’t have to reinvent the wheel and make the same mistakes other people have made before, which means that government money can be spent more efficiently.”
It is important to achieve continuity
It’s by achieving efficiency improvements that Dekker believes the Dutch government can be persuaded to provide on-going support for Health-RI.
“We sincerely hope the Dutch government will support Health-RI and enable us to continue our work. We want to contribute to the usage of the wealth of information that’s available in Dutch hospitals in a well-organized, sustainable national infrastructure,” he says. “For our first big projects we’re using a patchwork of grants, partly from the EU, partly from NWO-TTW* and partly from the Dutch and Australian government, all of which will eventually expire. For new projects, you can get new grants, but we also want to maintain what we’ve already built to the benefit of patients. If there’s no money to maintain what we have, the system will start to fail and need to be rebuilt at higher cost for the next project.”
* NWO-TTW (NWO-domein Toegepaste en Technische Wetenschappen) is the NWO domain for Applied and Engineering Sciences.