GB2572320A

GB2572320A - Hate speech detection system for online media content

Info

Publication number: GB2572320A
Application number: GB1803954.5A
Authority: GB
Inventors: Ghulati Dhruv; Waseem Zeerak
Original assignee: Factmata Ltd
Current assignee: Factmata Ltd
Priority date: 2018-03-12
Filing date: 2018-03-12
Publication date: 2019-10-02
Also published as: GB201803954D0

Abstract

The present invention relates to detection of contentious content for online media. Specifically, the present invention relates to the detection of contentious content such as hate speech. According to the first aspect, there is provided a method for training a machine learning classifier to detect contentious content, the method comprising the steps of: receiving content data 102 as input data; receiving annotation data 104 for said content; receiving metadata 106 in relation to said annotation data; and determining a learned approach to classifying whether the content is contentious based on said annotation data for said content and said metadata in relation to said annotation data. The method may receive further content as input data; determine a classification whether the further content is contentious using the classifier and further determining the learned approach to a high degree of certainty. The content may be generated online. The method may comprise reviewing the source of the content, reviewing the relationship between the source of the content and the user, reviewing the domain from which the content was generated, reviewing the profile and user history of the author of the content and/or performing natural language processing.

Description

HATE SPEECH DETECTION SYSTEM FOR ONLINE MEDIA CONTENT

Field

The present invention relates to detection of contentious content for online media. Specifically, the present invention relates to the detection of contentious content such as hate speech.

Background

Contentious content detection systems, and hate speech detection systems in particular, focus on detecting language in online media content that can be hurtful, abusive, or incite hate towards a particular civil group or section of society. This may include: sexist, racist or ethnic slurs; content targeting a minority; content which seek to negatively distort views on a marginalized group; negatively stereotyping content; and content which shows to defend xenophobia. The technology of these detection systems has been mandated across countries and blocs such as the European Union as being damaging to the functioning of democracy, and the healthy discourse on the internet.

Online social platforms are beset with contentious content. Such content may frighten, intimidate, or silence users within these communities. In some case, such content may wrongly inspire users to share content, generate similar content, or even commit violence. The widespread problems brought about by online contentious content are widely recognised in society and despite the knowledge of the impacts such content have, reliable solutions are lacking and effective methods and systems have not been achieved.

Methods of detecting the presence of hateful or abusive language within text coming from online media sources contain annotation processes. However, annotation processes for hate speech currently can have a negative psychological impact during the course of annotation and moderation including post-traumatic stress disorder in recent times. Current systems are expensive to implement and often require the training of keyword systems which do not perform to a high standard across several established platforms. It is acknowledged that keyword-based methods and systems are insufficient for the detection of contentious content such as hate speech and substantial improvements over current approaches must be made.

Currently, models that are being and have been built can typically deal with content within a single category/domain i.e. hate speech/cyberbullying/toxicity etc. However, such approaches have not yet been capable of managing to find and leverage the commonalities between these distinct domains. Further, training models on specific types of underlying forms within each domain, for example racism and sexism within hate speech, also perform poorly when implemented for other domains of contentious content. An aspect of this invention will address the issues of leveraging commonalities between distinct domains of contentious content within online communities and platforms.

Summary of Invention

Aspects and/or embodiments seek to provide a method for training and detecting contentious content online.

According to the first aspect, there is provided a method for training a machine learning classifier to detect contentious content, the method comprising the steps of: receiving content as input data; receiving annotation data for said content; receiving metadata in relation to said annotation data; and determining a learned approach to classifying whether the content is contentious based on said annotation data for said content and said metadata in relation to said annotation data.

A learning classifier detecting contentious content may allow policing of content in order to create a safer online environment and increase user engagement in the process.

Optionally, there is provided the method further comprising the steps of: receiving further content as input data; determining a classification whether the further content is contentious using the machine learning classifier; and further determining the learned approach to classifying whether the content is contentious based on the further content, wherein the step of determining a classification whether the further content is contentious using the machine learning classifier determines that said further content is contentious content with a high degree of certainty.

Optionally, there is provided the method further comprising the steps of: receiving additional content as input data; determining a classification whether the additional content is contentious using the machine learning classifier; and transmitting the additional content to a reviewing module for classification, wherein the step of determining a classification whether the additional content is contentious using the machine learning classifier determines that said further content is contentious content with a low degree of certainty.

Optionally, the content comprises of content generated online.

Optionally, there is provided the method further comprising any or all of the steps of: reviewing the source of the content; reviewing the relationship between the source of the content and a user; reviewing the domain from which the content is generated; reviewing the profile and user history of the author of the content; reviewing the profiles and user histories of the users within the community the content was generated; reviewing the relationship between the content and other communities; reviewing dictionaries of slurs; reviewing word embeddings; reviewing for contentious words; reviewing sentiments in relation to the unlabelled content; querying one or more questions in relation to the content; and/or examining linguistic cues within the content as part of a natural language processing (NLP) computational stage.

Optionally, a score is determined for said content: optionally wherein determining a score comprises determining a similarity score and/or a probability score and/or threshold score, and/or optionally wherein the similarity score determines an output of the predicted abusive qualities of the content.

The score may serve as an indicator of the level of contentiousness within content.

Optionally, the contentious content comprises any one or more of: hate speech; cyberbullying; cyber-threats; online harassment; online abuse; sexism; racism; ethnic slur; attack on a minority; negative stereotyping of a minority; negatively distorting the views on a marginalised group/minority; defending xenophobia; and/or defending sexism. Optionally, the contentious content is categorised as explicitly/implicitly targeting a generalised other and/or a named entity.

The classifier may be trained to output the type of contentious contents detected in order to provide a clear indication of the contentious content.

Optionally, one or more classifications and/or scores is assigned a weighting. Optionally, there is provided the method wherein the steps are carried out in a hierarchical order.

The weighting of the one or more classifications and/or scores may account for the various factors taken into account as well as the performance of the classifier.

Optionally, the reviewing module allows one or more users to provide annotation data and metadata in relation to said annotation data for the additional content: optionally through a web platform or browser extension.

The reviewing module can simplify the process of the one or more users providing the annotation data.

Optionally, the user creates a reviewer profile comprising one or more of: social fraction classification; political stance; geographic location; qualification details; and/or experiences.

The reviewer profile can add to user metadata in order to mitigate bias in content or annotation and also serves as training data for unlabelled content.

Optionally, the annotation data comprises one or more of: tags; scores; descriptions; and/or labels.

According to a second aspect, there is provided a method of detecting contentious content, the method comprising the steps of: inputting one or more pieces of content; using the classifier of any preceding claim; and determining a classification of whether the one or more pieces of content is contentious content.

Detecting contentious content may allow policing of content in order to create a safer online environment and increase user engagement in the process.

Optionally, the one or more pieces of content comprises of one or more of: unlabelled content; manually labelled content; scored content; and/or URLs. Optionally, the classifier comprises one or more of: a multi-task learning model; a logistic regression model; a joint-learning model; support vector machines; neural networks; decision trees; and/or an ensemble of classifiers. Optionally, the classifier determines commonalities between one or more of: domains; underlying forms of the domains; dimensions; linguistic characteristics; geographic location; political stance.

The classifier may leverage commonalities between distinctive categories in order to detect contentious content within the distinctive categories accurately.

According to a third aspect, there is provided an apparatus operable to perform the method of any preceding feature.

According to a fourth aspect, there is provided a system operable to perform the method of any preceding feature.

According to a fifth aspect, there is provided a computer program operable to perform the method and/or apparatus and/or system of any preceding feature.

Brief Description of Drawings

Embodiments will now be described, by way of example only and with reference to the accompanying drawings having like-reference numerals, in which:

Figure 1 shows a general overview of the combined training and implementation of a machine learning classifier;

Figure 2 shows the process including a triage system in training the machine learning classifier; and

Figure 3 shows a conceptual representation of comparing similarities in content between two communities.

Specific Description

Referring to Figures 1, 2 and 3, example embodiments of a method of detecting contentious content will now be described.

Referring to the Figures 1 and 2, example embodiments of training a machine learning classifier in order to detect contentious online content will now be described.

In an embodiment, the method of training a machine learning classifier, 116 as shown in Figure 1, initially starts with content data, 102, which may or may not contain contentious content such as hate speech, cyber-bullying, cyber-threats, online-harassment and online-abuse etc. Each content data is generated within an online source which may be unlabelled data as shown in Figure 2, 202, and are input through a classifier 108. An example of such a classifier may be a triage system as shown in Figure 2 as 204. Each content is assigned a probability representing the confidence of determining contentious content within the unlabelled content. The probabilities will be assigned depending on how many questions were asked before rejection. These probabilities then are used to determine whether the content is considered to have a high confidence of being contentious content, as shown as 110 and 206, or low confidence of being contentious content, as shown as 112 and 208.

In this embodiment, the detection method is implemented at a sentence level. When larger text is presented as input, e.g. whole articles, the article may be split into sentences and each sentence is scored for contentious content independently. The article may be then scored for contentious content as a whole.

In understanding contentious content, content may comprise of one or more domains such as hate speech, cyber-bullying, online-abuse etc. In the example of hate speech, this may be understood as sexist, racist or ethnic statements that, for example: use sexist, racist or ethnic slur; attack a minority; seeks to negatively distort views on a marginalised group/minority; negatively stereotypes a minority; and/or defends xenophobia or sexism. However, the view on the domain of hate speech may vary in time and the method as described here may be altered such that the classifier detects contentious content.

So, for example, the following utterances should be marked as hate speech:

Except that there was no such sexual torture and she is a lying bitch

I told you all Muslims steal things

Niggers look dirty “I hate Asian cleaners”

And more complicated cases are as follows:

I am not a sexist, but girls do not know mathematics and physics

There is no comparing the vileness of Mohammed to Jesus or Buddha, or Lao Tse” “Women are delicate flowers and need to be protected”

In an embodiment, the triage system, 204, may be designed to generate and ask a number of questions to each input content. Examples of such questions that may be asked may be as follows. Does the document contain a human or demographic entity? Is the document negatively sentimented? What is the stance towards the entity? Does the document bear high similarity to documents in highly toxic communities? These questions will be answered using Natural Language Processing tools like stance detection and sentiment analysis etc.

In an embodiment, there may be a weighting of the questions which are asked by the system. Optionally, the weighting may depend on the level of certainty of the system in answering the questions i.e. if it is known that for a particular question a response is correct only 70% of the time, a weight to that question might be applied such that its level of certainty is taken into consideration.

In an embodiment, the questions may either be set up such that all questions are asked at once and then the result of them are passed into a classifier. Optionally there may be a hierarchal order to the questions which are asked by the system. Should the system ask each question one at a time in turn, the initial questions will focus on those that have a high recall such that as many relevant documents as possible may be retrieved i.e. first general broad questions, narrowing down to more specific questions. Alternatively, higher weighted questions asked first and then follow with an appropriate set of underlying questions determined by the system. In training the machine learning classifier, the focus of generating the questions may target the precision level in determining contentious content, such that negative examples are mitigated and only those of contentious content are retained as labelled data.

In an embodiment, various approaches may be implemented. Implemented here is the approach of building methods for leveraging unlabelled data for automating the annotation process. Large annotated datasets may be created by leveraging the fact that there are known communities, for example on Twitter, Facebook, Voat, and Reddit, where a majority of the content is contentious. This information may be leveraged along with NLP techniques such as stance detection in order to determine user profiles, user histones, sentiment, word embeddings, dictionaries of slurs and contentious words, in order to estimate likelihood of a document being abusive. Such an approach may be implemented by computing how close a newly generated content is to a known abusive community.

Also implemented is a ‘bag of communities’ approach as shown as 300. Figure 3 shows a conceptual representation of where, in this case, two source communities are employed as shown as 302 and 306. Once new and unlabelled content, such as posts or blogs, are generated in a community, similarity scores may be assigned to each content by means of comparison against pre-existing contents which have been generated within other communities as shown as 304. A downstream classifier makes use of the similarity scores in order to make predictions regarding contentious content.

In this embodiment, the ‘bag of communities’ approach, 300, is used to filter content which are unlikely to be seen as contentious content in combination with methods such as sentiment analysis, target detection and stance detection etc. The aim of this system is specifically to minimise the load on annotators and be able to prepare any given annotator that there’s a likelihood that they will be facing abusive comments for annotation.

In an embodiment, the content for which result in a low confidence, 112 and 206, such that the content is not seen to contain contentious content, or not high enough confidence, will be assigned probabilities and assigned to annotators for review as shown as 114 and 210. The probability will be assigned depending on how many questions were asked before rejection.

In an embodiment, there may be an approach in content annotation such as an intersectional feminist approach. This specifically means try to attack the problem of how a vast body of literature may be applied from the social sciences on hate speech, bullying, etc. and incorporate them into computational methods. In practice, this may be done via author profiling and data set annotation. For example, by getting annotators who are female and feminists to help label articles which they find hateful towards female feminists, and make it clear in the profile of the annotator building the dataset that they are in fact female and feminists. Building an annotation platform may comprise of setting up a full annotation pipeline and product/tool which enable users to self-classify the social faction from which they are part of, e.g. black, white, feminist, and then their annotations will be considered in this light. The annotator profiles may also comprise of qualifications, experiences, political stance etc. Using a web platform and/or browser extension, an annotation platform allows users to tag, score and/or label articles and share their descriptions and tags.

The contents for which has been determined high confidence of being contentious content, 110 and 208, at the end of the classifier, or the specific case of a triage system, are then added to labelled data that we already have and are used by the machine learning classifier, as shown as 212, which looks at new unseen documents to predict whether they contain hate speech or not.

In an embodiment, the high-confidence contentious contents are added to a labelled dataset from which the model of the classifier may be trained further as shown as 212. The classifier trained may also be functionable on urls which are tasked to check. When building a machine classifier model, labelled content will be checked against an evaluation/test set, which will be sampled from the datasets available at the time of training.

Within the machine learning classifier, 116 and 214, many machine learning models may be implemented, including but not limited to, a multi-task learning model, logistic regression, jointlearning, support vector machines etc. Depending on the model and the data which is used to train the model, the features used in modelling will change appropriately. In other embodiments there may require various different types of learning models in order to detect contentious content on multiple different types of data such as news or comments. The classifier may also include an ensemble of classifiers where the model is trained on the predictions of n models. Each model may potentially individually predict hate speech but not necessarily.

Referring to Figures 1 and 2, a method of using a trained classifier in detecting contentious content will be now described.

In an embodiment, various classifiers may be implemented as 116 and 214. One classifier may take into consideration domain adaptations, which may be any model which classifies for contentious content. This may be in the case of a single domain model and/or a multi domain model. For example, the classifier may detect sexist comments and another independently that can detect racist comments and a model that can detect both.

In this embodiment, various forms of abuse may share commonalities across 2 pairs of overlapping dimensions: Explicit/Implicit abuse and generalised/directed abuse. Various form of contentious content may be represented as different domains such as hate-speech and online abuse which are thus expressed within the said two pairs of dimensions. In addition to commonalities across distinct forms of hate speech such as racism, anti-semitism, and sexism, commonalities within written form are also leveraged upon. Such commonalities include whether there is a specific target of an utterance or it is aimed at a generalised other. Further, the model may also leverage commonalities that may arise along axis of explicit and implicit language for hate speech. In an embodiment, linguistic, geographic, political commonalities as well as commonalities in sentiment which may occur across different instances of hate speech may also be utilized.

In an embodiment, various models may take into consideration one or a combination of features and/or feature selection methods. These may comprise of, for example, transfer learning, clustering, dimensionality reduction, chi-squared test, joint learning, multi-task learning, generalising beyond informal text found on social media towards arbitrary websites, comments on articles, articles, blog posts etc. Other such methods for training a machine learning model on one data set and predicting on a different one which may have different distributions, topics, etc may also be embedded into the classifier. Clustering documents may allow the checking whether a document exists in the cluster, if so then a feature may be activated in the models mentioned above.

Any system feature as described herein may also be provided as a method feature, and vice versa. As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure.

Any feature in one aspect may be applied to other aspects, in any appropriate combination. In particular, method aspects may be applied to system aspects, and vice versa. Furthermore, any, some and/or all features in one aspect can be applied to any, some and/or all features in 5 any other aspect, in any appropriate combination.

It should also be appreciated that particular combinations of the various features described and defined in any aspects can be implemented and/or supplied and/or used independently.

Claims

1. A method for training a machine learning classifier to detect contentious content, the method comprising the steps of:

receiving content as input data;

receiving annotation data for said content;

receiving metadata in relation to said annotation data; and determining a learned approach to classifying whether the content is contentious based on said annotation data for said content and said metadata in relation to said annotation data.

2. The method of Claim 1 further comprising the steps of:

receiving further content as input data;

determining a classification whether the further content is contentious using the machine learning classifier; and further determining the learned approach to classifying whether the content is contentious based on the further content, wherein the step of determining a classification whether the further content is contentious using the machine learning classifier determines that said further content is contentious content with a high degree of certainty.

3. The method of any preceding claim further comprising the steps of:

receiving additional content as input data;

determining a classification whether the additional content is contentious using the machine learning classifier; and transmitting the additional content to a reviewing module for classification, wherein the step of determining a classification whether the additional content is contentious using the machine learning classifier determines that said further content is contentious content with a low degree of certainty.

4. The method of any preceding claim wherein the content comprises of content generated online.

5. The method of any preceding claim further comprising any or all of the steps of: reviewing the source of the content; reviewing the relationship between the source of the content and a user; reviewing the domain from which the content is generated; reviewing the profile and user history of the author of the content; reviewing the profiles and user histories of the users within the community the content was generated; reviewing the relationship between the content and other communities; reviewing dictionaries of slurs; reviewing word embeddings; reviewing for contentious words; reviewing sentiments in relation to the unlabelled content; querying one or more questions in relation to the content; and/or examining linguistic cues within the content as part of a natural language processing (NLP) computational stage.

6. The method of any preceding claim wherein a score is determined for said content: optionally wherein determining a score comprises determining a similarity score and/or a probability score and/or threshold score, and/or optionally wherein the similarity score determines an output of the predicted abusive qualities of the content.

7. The method as claimed in any preceding claim wherein the contentious content comprises any one or more of: hate speech; cyber-bullying; cyber-threats; online harassment; online abuse; sexism; racism; ethnic slur; attack on a marginalised group/minority; negative stereotyping of a marginalised group/minority; negatively distorting the views on a marginalised group/minority; defending xenophobia; and/or defending sexism.

8. The method as claimed in any preceding claim wherein the contentious content is categorised as explicitly/implicitly targeting a generalised other and/or a named entity.

9. The method of any preceding claim wherein one or more classifications and/or scores is assigned a weighting.

10. The method of any preceding claim wherein the steps are carried out in a hierarchical order.

11. The method of Claim 3 wherein the reviewing module allows one or more users to provide annotation data and metadata in relation to said annotation data for the additional content: optionally through a web platform or browser extension.

12. The method of Claim 11 wherein the user creates a reviewer profile comprising one or more of: social fraction classification; political stance; geographic location; qualification details; and/or experiences.

13. The method of any preceding claim wherein the annotation data comprises one or more of: tags; scores; descriptions; and/or labels.

14. The method of detecting contentious content, the method comprising the steps of:

inputting one or more pieces of content;

using the classifier of any preceding claim; and determining a classification of whether the one or more pieces of content is 5 contentious content.

15. The method of Claim 14 wherein the one or more pieces of content comprises of one or more of: unlabelled content; manually labelled content; scored content; and/or urls.

16. The method of any preceding claim wherein the classifier comprises one or more of:

io a multi-task learning model; a logistic regression model; a joint-learning model;

support vector machines; neural networks; decision trees; and/or an ensemble of classifiers.

17. The method of any preceding claim wherein the classifier determines commonalities between one or more of: domains; underlying forms of the domains; dimensions;

15 linguistic characteristics; geographic location; and/or political stance.