How can we use AI to Preserve Privacy in Biomedicine?

Artificial Intelligence (AI) is an emerging and innovative field set to change every industry at its core. AI utilizes data available to learn how to solve tasks; it greatly surpasses the human brain in terms of efficiency and accuracy. AI

1401/08/07
|
13:41

Artificial Intelligence (AI) is an emerging and innovative field set to change every industry at its core. AI utilizes data available to learn how to solve tasks; it greatly surpasses the human brain in terms of efficiency and accuracy. AI has shown great promise through its successful integration in many fields, such as autonomous driving, voice assistants, and more.

The advantages of AI make its integration in biomedicine and healthcare essential and inevitable. AI allows the big data in healthcare to be broken down and analyzed to obtain a greater understanding and identify patterns and risks that might be overlooked by the human mind.

AI has shown great promise in healthcare by analyzing genomic and biomedical data, representing drug-like molecules, and modeling cells and their functions. These success stories are not limited to biomedicine research but also in diagnosing conditions and involvement in inpatient healthcare. AI technologies have surpassed humans' accuracy in detecting breast cancer and predicting sepsis. These advantages of AI make it a field that demands attention as it can revolutionize healthcare accuracy and understanding of biomedicine. Alongside these benefits, there are also many privacy concerns regarding the use of the data that AI uses.

AI depends on learning from data collected from individuals and ensuring that sensitive data remains confidential is crucial to ensuring the advancement of AI. Studies have revealed that AI techniques do not always maintain data privacy. An example is a study that demonstrated that the inclusion of an individual in a dataset could easily be deduced by querying the presence of a specific allele and using that to identify family members.

Membership inference is another attack that can infer the membership of an individual by querying the data available or statistics that are published by genome-wide association studies (GWAS). These studies have led to restricting access to pseudonymized data and introducing privacy laws surrounding AI data in the U.S. and EU. Consequently, to allow collaborative research, industry and academic institutions must enforce privacy-preserving techniques to ensure confidentiality and compliance with the law.

Privacy-preserving AI techniques
Recently, there have been various AI techniques proposed to preserve privacy in biomedicine. These techniques can be largely categorized into four categories: cryptographic techniques, differential privacy, federated learning, and hybrid approaches.

Cryptographic techniques involve homomorphic encryptions (HE) to be performed on statistics and compute these while preserving data privacy. HE can be partially HE (PHE) or fully HE (FHE) and they determine the level of operations that the data has undergone. PHE means that the encrypted data has undergone either addition or multiplication operations and FHE means that addition and multiplication operations have been applied to the encrypted data. Another type of cryptographic encryption is secure multiparty computation (SMPC). This requires participating organizations to share a separate and different secret with different computing parties that compute it and share the results with other computing parties that calculate a final result. This result is then returned to the participants as it is the same.

Differential privacy is one of the most advanced methods for eliminating data privacy breaches. This concept relies on introducing random data pieces to large data sets that camouflage the sensitive data. This is a standard method that is utilized by Google, Apple, the United States Census Bureau, and, more recently, the fields of healthcare and biomedicine. This is ideal for large centralized datasets where the presence of a specific individual’s data is statistically indistinguishable from a dataset without this individual's data.

Federated learning is a process that relies on the participants, such as hospitals, not sharing the data that they have but rather extracting knowledge from their data to share with the clients. This requires local institutions to have trained AIs that can compute that data and apply the relevant parameters and share the results, without the sensitive information, with a coordinator that builds a global model. This demand has increased the AI platforms that aim to apply federated learning to data in the healthcare sector.

An emerging technique to preserve privacy is a hybrid approach that combines federated learning with other techniques such as cryptographic and differential. The advantage of federated learning is that it prevents the sharing of patients’ data with third parties; however, there is potential for abuse concerning the parameters that are shared with the coordinator if it is compromised. Therefore, applying HE to the encrypted data or introducing random data could be beneficial in eliminating the possibility of a data privacy breach.

AI emergence has spotlighted how sensitive patient data is handled and used. This is particularly important in the field of biomedicine and healthcare, where trust is crucial for ethical reasons and also for the progression of medicine. AI can be employed to prevent these data privacy breaches and ensure that patient confidentiality is preserved.

دسترسی سریع