Annie Palmer | The Daily Mail | Source URL
Many facial recognition systems are being trained using millions of online photos uploaded by everyday people and, more often than not, the photos are being taken without users' consent, an NBC News investigation has found.
In one worrying case, IBM scraped almost a million photos from unsuspecting users on Flickr to build its facial recognition database.
The practice not only raises privacy concerns, but also fuels fears that the systems could one day be used to disproportionately target minorities.
IBM's database, called 'Diversity in Faces,' was released in January as part of the company's efforts to 'advance the study of fairness and accuracy in facial recognition technology.'
The database was released following a study from MIT Media Lab researcher Joy Buolamwini, which found that popular facial recognition services from Microsoft, IBM and Face++ vary in accuracy based on gender and race.
The Diversity in Faces dataset is based on 100 million images published with Creative Commons licenses, which allows anyone to reuse the photos without paying a licensing fee.
However, only academic or corporate research groups can request access to the Diversity in Faces database, according to NBC News.
The website was only able to view the contents of IBM's database after obtaining it from a source.
Once the photos are collected, they're then tagged by age, measurements of facial attributes, skin tone, gender and other characteristics.
Many photographers were surprised to find their photos had been to train IBM's algorithms.
'None of the people I photographed had any idea their images were being used in this way,' Greg Peverill-Conti, who had 700 of his photos used in the dataset, told NBC News.
'It seems a little sketchy that IBM can use these pictures without saying anything to anybody.'
IBM defended the database, saying that it helps ensure fairness in facial recognition technology and promised to protect 'the privacy of individuals.'
'IBM has been committed to building responsible, fair and trusted technologies for more than a century and believes it is critical to strive for fairness and accuracy in facial recognition,' an IBM spokesperson told Mail Online in a statement.
'We take the privacy of individuals very seriously and have taken great care to comply with privacy principles, including limiting the Diversity in Faces dataset to publicly available image annotations and limiting the access of the dataset to verified researchers.
'Individuals can opt-out of this dataset,' the spokesperson added.
The firm has also argued that the dataset wouldn't be used for its commercial products; instead, it will only be used for research purposes.
IBM told NBC News it would assist anyone who wanted their photos removed from the training dataset.
Despite this, NBC News found that it was almost impossible for users to prevent their photos from being used.
To request for removal, photographers have to email IBM with links of each photo they want taken down.
But the contents of the database aren't publicly available, so it's extremely difficult for photographers to know which of their photos have been swept up in the database.
Flickr users whose photos were scraped voiced concerns about the database.
'Of course, you can never forget about the good uses of image recognition such as finding family pictures faster, but it can also be used to restrict fundamental rights and privacy,' Georg Holzer, whose photos were used, told NBC News.
'I can never approve or accept the widespread use of such a technology.
'Since I assume that IBM is not a charitable organization and at the end of the day wants to make money with this technology, this is clearly a commercial use,' he added.
Additionally, experts pointed out that IBM isn't the only organization potentially using users' photos without their permission.
The proliferation of content supplied to social networking sites like Facebook, Google, YouTube and others has made it that much easier for researchers to find data for their studies.
'This is the dirty little secret of AI training sets,' Jason Schultz, a professor at the NYU School of Law, told NBC News.
'Researchers often just grab whatever images are available in the wild.'
It comes as tech giants ranging from Amazon to Microsoft have faced growing scrutiny from human rights and privacy advocates over their facial recognition software.
Amazon, in particular, has dealt with pushback over its decision to sell its 'Rekognition' software to government agencies.