• Home
  • AI is data hungry: The challenges of data prep for medtech
06. 09.2018

AI is data hungry: The challenges of data prep for medtech

Artificial intelligence has tremendous potential for the healthcare system. But it’s success requires large quantities of accurate data, among other things.

Artificial intelligence can provide insights that would not otherwise be evident to healthcare providers. AI can use both structured and unstructured data to find patterns in data. Furthermore, AI not only offers new insights but has capabilities far beyond what has been previously thought of.

Artificial intelligence in healthcare requires a continual input of fresh data. Without new data, the models may change in an unpredictable way. New data are also needed since trends may change over time which needs to be accounted for by models. It is therefore crucial that we have new data for input for these AI models. However, there are concerns and challenges when it comes to the data that is needed for using AI in the healthcare industry.

We should be concerned about the integrity of the data that is used since we increasingly rely on AI models and their predictions, and these predictions may worsen over time.


The other problem is that electronic health record systems (EHRs) are often different from hospital to hospital. This means that the format and semantics (meaning) of data may not be the same with different EHRs – or different deployments of the same EHR. The consequence of this is that the data from different hospitals, clinics, and trials are not interoperable and all the data that are collected from them have to be converted into a standard form before further analysis can take place. The result is that data have to be integrated and normalized before being used. This adds more time and complexity to the process but is absolutely necessary. This was evident at Massachusetts General Hospital when trying to change from an old system to a new Epic EHR system. What they found was that data had to be extracted from the old system and converted before being incorporated into the new system. It was also tricky because staff had to be trained on the new system.

Data in EHRs can have problems such as inconsistencies and incompleteness. This was evident in a study done on the survival of pancreatic patients using data extracted from an EHR. In the study, data mining revealed several problems with the data. For instance, 52% of patients did not have the information on the stage of their disease, such as tumor size. This meant that a great deal of data was excluded from the analysis. Data was also very incomplete. For instance, duration of medical treatments such as chemotherapy was often not recorded.

This study really showed the problems when insufficient data are recorded. The problem with incompleteness could stem from the healthcare provider or from the patient, or both.  It is crucial that data be accurate and complete, not only for use in decision-making and healthcare management but also for secondary use in clinical research studies.

Privacy and regulatory constraints

Another challenge for AI in healthcare is that there are significant privacy issues that must be addressed when collecting and sharing health data. The privacy issue has significant ramifications and can cause access to data to be a difficult and time-consuming process. It can be difficult and expensive to obtain data that is routinely used in clinical trials or research studies, including longitudinal studies.

Healthcare data is extremely sensitive and it is understandable that people do not personal records to be widely shared. People may be embarrassed, fired, denied jobs or insurances, because of their health information being shared. Sharing data requires strict rules and regulations to ensure data do not violate the privacy of individual patients. Countries have varying laws regarding the use of healthcare data and it is very important that these regulations be followed. For example, British Columbia has passed specific laws regarding the use of data with diverse levels of support.

Data integrity

Healthcare providers may be overworked and not have the time to focus on data entry. A lack of training may also mean that such providers do not recognize the value of the data. The reality is that big data is critical for having an effective and useful AI system. Big data and AI in healthcare are projected to grow and will be worth more than $10 billion by the year 2024. Employees should therefore be educated and trained not only on what data to enter and how to enter data correctly, but the relevance and importance of the data needs to be emphasized. Models that are based on erroneous and inaccurate data will generate erroneous and flawed results. This can have costly ramifications since these AI models and results are used in decision-making and healthcare management.

Data need to be complete, clean and accurate in order to be used in AI systems. That has to happen in each clinic, hospital, lab or research center – since models often have to be turned for local variants in population or clinical guidelines. AI models can be useful and can improve decision-making capabilities of doctors, but that usefulness and value depends on the integrity of the local data.  It is possible that better education of healthcare providers on the tremendous value of this data will act to encourage healthcare providers to take data entry seriously.  Data are useful and crucial in providing good care and even in evaluating disease trends.

It is also important to realize that unstructured data in free text form may be as important as structured data.  A good natrual language processing tool can deal with the problems of unstructured data. The data that is entered into systems have to be reliable and accurate. It is therefore important that data improvement initiatives be introduced so that healthcare providers can rely on data that has been entered into a system.


Incorporating AI into healthcare offers unique challenges that other industries don’t face. Data has to be accurate, reliable and complete – but also have the same meaning as in other locations, and remain high quality over time. Ensuring that AI models keep learning over time and from local data is important to getting high-quality results. Privacy is another major concern, which can be resolved by following regulations regarding data use. When building AI systems, it’s important to work with teams and product who have strong healthcare AI domain expertise, are aware of these challenges and have successfully addressed them before.

Source: Luke Potgieter, John Snow Labs
Medical Design & Outsourcing