Study: Machine Learning, Artificial Intelligence Reidentify Wearable Device User Data

Study: Machine Learning, Artificial Intelligence Reidentify Wearable Device User Data

Wearable activity trackers such as FitBits, step counters, and bike ride trackers may not be doing enough to sufficiently deidentify the data they collect from users, according to researchers who analyzed large national physical activity data sets.

Researchers from the University of California at Berkeley note in a study published in JAMA Open Network that despite data aggregation and the removal of protected health information (PHI) from deidentified physical activity data by the manufacturers of the devices, there has not been research to investigate the potential for reidentification for third-party users of the data. To test this themselves, investigators performed a cross-sectional study of national physical activity data from 14,451 individuals using support vector machines (SVMs) and random forest methods from machine learning. They then successfully reidentified 4,720 adults and 2,427 children using a data set from the National Health and Nutrition Examination Survey.

The ability to reidentify so many individuals has implications for policymakers, healthcare providers, health IT vendors, device manufacturers, and, of course, consumers.

“Reidentification of data is not just theoretical but has been demonstrated in several contexts,” investigators wrote. “For instance, demographics in an anonymized data set can function as a quasi-identifier that is capable of being used to reidentify individuals. Reidentification is also possible using online search data, movie rating data, social network data, and genetic data.”

Researchers cited cases in which location information from activity trackers could be used to identify the location of military sites, leading the military to begin restricting the disclosure of location identification.

“However, device manufacturers continue to share deidentified physical activity data with individuals’ employers, advertisers, and health care organizations. Thus, it is vital to be able to quantify the privacy risks from sharing such data,” they wrote.

Click here to read the full study.

Mary Butler is the associate editor at Journal of AHIMA.