Can artificial intelligence recognize hate speech?

This data representation shows common terms used in hateful comments on Reddit, Twitter, and other social media sites compared to non-hateful comments. (Courtesy of the Anti-Defamation League) This data representation shows common terms used in hateful comments on Reddit, Twitter, and other social media sites compared to non-hateful comments. (Courtesy of the Anti-Defamation League)

BERKELEY, Calif — A group of researchers are fighting online hate speech by teaching computers to recognize it on social media platforms.

The Online Hate Index project out of the D-Lab at the University of California, Berkeley, in partnership with the Anti-Defamation League, aims to identify hate speech, study its impact, and eventually design a plan to counteract hateful content.

Using artificial intelligence, teams of social scientists and data analysts are working to code programs that can search through thousands of posts looking for malicious content, said Claudia Von Vacano, executive director of digital humanities at Berkeley. Right now, the program correctly identifies about 85 percent of hate speech even though the project is in its early stages.

The software is used in connection with a problem-solving lab of experts, helping companies to navigate the line between protected free speech and content dangerously targeting marginalized groups, Von Vacano said.

The Online Hate Index started in 2012 by Brittan Heller, director of technology and society at the Anti-Defamation League, and Von Vacano.

It began by targeting hate speech on Reddit, the popular web forum. The project then attracted interest from companies such as Google, Twitter, and Facebook, which formed partnerships with the ADL and the D-Lab, and plan to use the online hate index on their platforms, Von Vacano said.

Daniel Kelly, assistant director of policy and programs for the Anti-Defamation League, explained that the ADL began working to fight online hate in 2014, when it released guidelines for companies hoping to limit damage done by extremists online. The Online Hate Index is an innovative project that is designed to target aspects of online hate that have been overlooked by similar studies, he said.

“What we are doing is using machine learning and social science to understand hate speech in a new way,” Kelly said. “We are taking it from the perspective of targets of hate online.”

Kelly said that the project aims to be transparent by lifting the “black veil,” when it comes to data and analytics from social media companies. Many companies keep their data and statistics private when it comes to terms of service and user policies. One of his main concerns about data coming from these companies, is that the ADL and D-Lab don’t know if these policies incorporate the perspectives of marginalized groups who are affected by them.

Both the D- Lab and ADL recruited members for their research teams with diverse perspectives and backgrounds, including varying ethnicities, genders, academic fields, and perspectives, said Von Vacano, who is also in charge of recruitment tor the Online Hate Index project.

“Our linguist, for example, is delving deeper into issues of threat,” Von Vacano said.

One of the largest challenges faced by the teams was defining the intensity of statements made by Reddit users, Von Vacano said, as hate speech is not clearly defined. To solve this problem, the ADL and D-Lab use a scale to characterize posts. At the first degree of biased posts, someone might hint at hateful opinions. Next, hateful content may become dehumanizing to a whole class of people. The most extreme examples of online hate are direct threats to individuals. Examples of online threats include doxing, where people with malicious intent publish information, like a home address or phone number, that puts someone in harms way and leaves them vulnerable for unwanted attention or visitors.

“Going into the project, we kind of naïvely thought that we could ingest large amounts of text and, at the other end, say on a binary level ‘this is hate… this is not hate,’ “Von Vacano said.
“At this point, we have a much more sophisticated understanding of hate speech as a linguistic phenomenon, and we are really dissecting hate speech as a construct with multiple components.”

In February 2018, the first stage of the project was completed, and more information can be found on the ADL’s website. Phase two is scheduled to be released in July.