WO2022147528A1 - Système et procédé de traitement du langage naturel pour détecter la diversité sociale et l'inclusion - Google Patents
Système et procédé de traitement du langage naturel pour détecter la diversité sociale et l'inclusion Download PDFInfo
- Publication number
- WO2022147528A1 WO2022147528A1 PCT/US2022/011112 US2022011112W WO2022147528A1 WO 2022147528 A1 WO2022147528 A1 WO 2022147528A1 US 2022011112 W US2022011112 W US 2022011112W WO 2022147528 A1 WO2022147528 A1 WO 2022147528A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- content
- natural
- training
- language processing
- analyst
- Prior art date
Links
- 238000003058 natural language processing Methods 0.000 title claims abstract description 86
- 238000000034 method Methods 0.000 title claims abstract description 65
- 238000012549 training Methods 0.000 claims description 111
- 208000027418 Wounds and injury Diseases 0.000 claims description 8
- 230000006378 damage Effects 0.000 claims description 8
- 208000014674 injury Diseases 0.000 claims description 8
- 229960005486 vaccine Drugs 0.000 claims description 5
- 230000008447 perception Effects 0.000 abstract description 2
- 238000002372 labelling Methods 0.000 description 10
- 238000012360 testing method Methods 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000012795 verification Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 241000282412 Homo Species 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 230000016571 aggressive behavior Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 230000000391 smoking effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/338—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Definitions
- Web pages on the internet provide vast quantities of information that are read by large numbers of people with diverse demographics. Natural language processing may be used to process such information to efficiently understand and extract context, including context and nuances in meaning.
- a person When a person reads content (e.g., a news article, a blog post, a social media post, etc.) on a web page, the person can pick up a social perspective present in the content.
- content e.g., a news article, a blog post, a social media post, etc.
- an article may describe racial diversity in corporations as having a positive effect on society in America. The person may consciously or subconsciously associate that social perspective with other additional content in close proximity to the original content (e.g., on the same web page).
- a provider of this additional content is often cautious as to how the original content may affect perception of their additional content, wanting to avoid being associated with certain social perspectives.
- some social perspectives, such as racial diversity may be advantageous to the provider.
- an advertiser may use a brand safety floor that defines content near which a brand should not appear.
- the brand safety floor may indicate that an advertisement should not appear on a web page that includes content related to negative attributes, such as death, injury, crime, profanity, and so on.
- Prior-art systems use algorithms (e.g., sentiment analysis and named entity recognition) to automatically detect this negative content.
- sentiment analysis and named entity recognition e.g., sentiment analysis and named entity recognition
- the content is flagged as including negative attributes, and the advertiser will frequently choose to avoid advertising on any webpage containing the content (e.g., no bid is placed on an online auction system). Since the content was mislabeled, it would have been acceptable, if not beneficial, for the advertiser to place an advertisement proximate to the content. As a result, this mislabeling results in missed impressions for the advertiser and missed revenue for the content publisher.
- Appendix A provides examples of how prior-art systems misclassify online news content, and therefore how opportunities for placing advertisements on a web page can be missed when using these prior-art systems.
- One aspect of the present embodiments includes the realization that content with diversity, equality, and inclusion may also include negative sentiment and topics. For instance, many brand safety floors view the topic of violence as negative, but many news articles may talk about diversity and violence together. Similarly, other systems block articles that use negatively valenced words, and as such would label any critical assessment as being negative (e.g., an article that highlights the problems associated with Vietnamese). In these ways, prior-art systems label content erroneously. Positive social perspectives include diversity, racial equality, inclusion, and others.
- Prior-art artificial intelligence (Al) algorithms use named entity recognition and sentiment analysis to identify only negative sentiment and topics in the content, and these prior-art Al algorithms thereby indicate any content (e.g., including content with diversity, equality, and inclusion) as being sensitive and therefore should be avoided.
- the present embodiments solve this problem by classifying content based on topics and social perspectives.
- some of the present embodiments detect inclusion of both negative topics (e.g., crime, injury, military conflict, etc.) and positive social perspectives (e.g., diversity, equality, inclusion, etc.) in content to help a provider to better whether additional content would be better associated with the original content on the web page, or not.
- Another aspect of the present embodiments includes the realization that a person is only able to reliably make up a finite number of decisions when labeling training content.
- the person may be looking for any one of three categories (e.g., crime, injury, and military conflict) within the training content.
- three categories e.g., crime, injury, and military conflict
- Increasing the number of decisions required by the human to label training content results in lower quality training content.
- a training set for a classifier may be limited to three categories, and therefore the classifier is only able to label content with up to three labels.
- each one-class classifier outputs a probability that inputted text belongs to each of the classes, yielding a collection of probabilities that may be used to evaluate the content suitability for association by a third party.
- the classes are not limited to negative attributes (referred to herein as “topics”) and thus the third party may target content that includes certain categories and excludes certain other categories.
- a method classifies textual content.
- the method includes receiving the textual content from a requestor and determining, using a first one-class classifier trained to determine membership within a first class of a plurality of social-perspective classes, a first probability of the textual content belonging to the first class.
- the method also includes generating an attribute set to include the first probability and sending the attribute set to the requestor.
- a natural -language processing system includes a processor, a memory communicatively coupled with the processor, a first one-class classifier stored in the memory and trained to determine membership within a first class of a plurality of socialperspective classes, and machine-readable instructions stored in the memory.
- the machine- readable instructions when executed by the processor, control the natural-language processing system to: receive textual content from a requestor; determine, using the first one-class classifier, a first probability of the textual content belonging to the first class; generate an attribute set that includes the first probability; and send the attribute set to the requestor.
- FIG. 1 is a schematic illustrating one example advanced classifier system for detecting social diversity and inclusion in textual content, in embodiments.
- FIG. 2 is a schematic illustrating example training and verification of classifiers of the system of FIG. 1, in embodiments.
- FIG. 3 is a flowchart illustrating one example method for training and validating one classifier of FIG. 1, in embodiments.
- FIG. 4 is a flowchart illustrating one example method for automatically classifying textual content for both topics and social perspectives, in embodiments.
- the embodiments herein classify text, such as web content (e.g., news articles, blogs, social media, and so on) on a web page, to determine whether the text contains undesirable social perspectives. For example, where a content provider does not wish to be associated with gun rights, any article classified to include content relating to gun rights would be undesirable.
- Prior-art artificial intelligence (Al) algorithms are trained to label content that includes negative attributes as being negative. However, when the content presents negative attributes in a positive way, such as when discussing how gun violence can perpetuate Vietnamese, the prior-art Al algorithms are insufficient. For example, such prior-art Al algorithms identify content that mentions the topics of race and gun violence, but that does not mean those articles have a pro-diversity social perspective.
- the present embodiments describe an improved Al classification algorithm that uses a neural network that is trained to recognize the social perspective that the article portrays, including racial diversity and inclusion. That is, the improved Al classification algorithm is not necessarily trained to recognize specific sentiment or topics, but rather societal perspectives on society and culture.
- FIG. 1 is a schematic illustrating one example of an natural-language processing (NLP) system 100 for detecting social diversity and inclusion in textual content 162.
- Textual content 162 is received (e.g., directly or indirectly via a URL) from an advertising platform, either a display-side platform or supply-side platform 150 that is in communication with an advertiser or publisher 160 requesting additional information (e.g., from a web page 164) about the textual content therein.
- Textual content 162 may represent one or more of news articles, blog, social media post, and so on.
- NLP system 100 is implemented using one or more computers that include at least one processor and a memory storing machine-readable instructions that, when executed by the at least one processor, implement the functionality described herein.
- NLP system 100 is implemented in the cloud using one or more online services.
- NLP system 100 is a distributed web services architecture that is designed for adaptability and scalability of services.
- NLP system 100 may interface with one or more client applications that make social media content and annotation requests through a web service tier of NLP system 100 that serves as the public application programming interface (API) to the service.
- API application programming interface
- NLP system 100 may efficiently classify a thousand pieces of content 162 in less than a second, a feat that requires efficiencies in all system areas.
- NLP system 100 may utilize available supervised deep learning options and may include a set of pre-trained transformers that are generalizable for social media platforms and news platforms. These embeddings are trained based on hundreds of millions of messages on different platforms, including Twitter, Facebook, and online news sites.
- NLP system 100 may include thousands of classifiers 102, each having various differences from others, including the type of transformer (pre-trained, vs. in- sample), learning rates, and neural-network structures (e.g., number of layers, connectivity between layers, use of max pooling layers, etc.).
- Each classifier 102 is tested by performing robust in and out of sample validation via commonly accepted performance metrics (e.g., logloss, Fl, precision and recall).
- Performance metrics e.g., logloss, Fl, precision and recall
- Each algorithm is externally validated. Before it’s considered ready for use, it must perform well on textual content (e.g., news) it has never seen. To address drift of precision and recall of models across time, each classifier is regularly updated with new training data.
- NLP system 100 includes a plurality of one-class classifiers 102 (e.g., illustratively shown with classifiers 102(l)-102(N), where N is a positive integer), each trained to classify textual content 162.
- Each of the classifiers 102 outputs a probability, thereby generating an attribute set 104 (also known as annotations) indicative of topics and social perspectives within textual content 162.
- attribute set 104 also known as annotations
- both topics 106 and social perspectives 108 may be identified in textual content 162.
- NLP system 100 automatically identifies trending topics within social media feeds, news feeds, and so on.
- Topics 106 may include death, injury, crime, military, anti -vaccine, sex, profanity, vices, politics, explicit sexual content, harmful acts, hate speech, acts of aggression, obscenity, drugs, smoking, alcohol, spam, and terrorism. However, topics 106 may include additional or alternative subjects without departing from the scope hereof.
- Social perspectives 108 may include racial diversity, gender diversity, religious diversity, economic diversity, and so on. However, social perspectives 108 may include additional or alternative subjects without departing from the scope hereof.
- each classifier 102 is implemented as at least one Al algorithm (e.g., a neural network, or other such technology) that is trained (described in more detail below with reference to FIG.
- topics 106 are also referred to herein as topic classes.
- social perspective 108 are also referred to herein as social-perspective classes.
- each classifier 102 may be trained based on a finite number of human decisions (e.g., three), where these decisions may classify the training content as having positive or negative content and may be based upon topics 106 and social perspectives 108.
- NLP system 100 may include an application programming interface 101 that interfaces with one or more other computer systems through a computer network (e.g., the internet, WANs, LANs, etc.). Accordingly, NLP system 100 may provide a classification service to one or more entities, including a demand-side platform 130 or supply-side platform 150.
- a computer network e.g., the internet, WANs, LANs, etc.
- NLP system 100 may provide a classification service to one or more entities, including a demand-side platform 130 or supply-side platform 150.
- supply-side platform 150 provides an advertising service to at least publisher 160 and has an inventory of textual content 162 being published by publisher 160.
- Publisher 160 may include space on web page 164 for displaying additional content (e.g., advertisements) whereby supply-side platform 150 operates to attract and provide the additional content to publisher 160.
- supply-side platform 150 may monitor web page 164 (e.g., and other web pages) to discover content 162 (e.g., new news articles, reports, etc.) when newly added to web page 164.
- supply-side platform 150 sends the content 162 (e.g., e.g., the actual content 162 or a URL identifying content 162 on web page 164) to NLP system 100.
- NLP system 100 receives (or retrieves) content 162 and uses at least two classifiers 102 to process content 162 to generate a corresponding attribute set 104 according to topics 106 and social perspectives 108 identified within content 162.
- each classifier 102 may be trained to classify based on a few (e.g., up to three) different attributes (i.e., topics 106 and social perspectives 108) and generate probability 103 indicative of content 162 being a match for each of these attributes.
- classifier 102(1) is trained using topics 106 such that probability 103(1) indicates a likelihood that content 162 includes the topic
- classifier 102(2) is trained using social perspectives 108 such that probability 103(2) indicates a likelihood that content 162 includes the social perspectives.
- the training set (see training sets 208 of FIG. 2) used to train each classifier 102 is labeled based on a human making up to finite number of simple decisions.
- NLP system 100 sends the generated attribute set 104 to supply-side platform 150.
- NLP system 100 may also send a set of cut-offs 105 (e.g., threshold values corresponding to attribute set 104) that facilitate binary classification of attribute set 104, and that allows supplyside platform 150 to adjust the threshold values as best suited to current needs.
- Supply-side platform 150 may append attribute set 104 to a header bid request that it sends to exchange 140.
- Exchange 140 may for example represent an advertisement exchange.
- Exchange 140 shares the header bid request (including attribute set 104) with a demand-side platform 130.
- Demand-side platform 130 provides a service to a content provider 120 that wishes to place additional content 122 on web page 164.
- Demand-side platform 130 may interact with exchange 140 to place a bid in an auction implemented by exchange 140 to display additional content 122 on web page 164 based on attribute set 104. Particularly, demand-side platform 130 may decide whether or not to make the bid based on whether attribute set 104 aligns with suitability requirements (e.g., brand safety when additional content 122 is an advertisement for a particular brand) of content provider 120, and may determine a bid amount based on whether attribute set 104 aligns with the suitability requirements. For example, where content provider 120 instructs demand-side platform 130 to place additional content 122 alongside pro-racial diversity news content, demand-side platform 130 uses attribute set 104 to determine that content 162 suitably includes pro-racial diversity news, and places a bid with exchange 140 accordingly.
- suitability requirements e.g., brand safety when additional content 122 is an advertisement for a particular brand
- attribute set 104 is not limited to only identify topics 106 within content 162, but may also identify social perspectives 108 (e.g., such as pro-racial diversity) thereby providing demand-side platform 130 and content provider 120 with additional opportunity for placing additional content 122 with web page 164 as compared with traditional classifications that only identify when topics are present or not.
- social perspectives 108 e.g., such as pro-racial diversity
- demand-side platform 130 evaluates attribute set 104 to select web page 164 only when attribute set 104 does not indicate that content 162 includes political news.
- attribute set 104 allows this content to be identified, even though it may also include certain undesirable topics.
- FIG. 2 is a schematic illustrating example training and verification of classifiers 102 of NLP system 100 of FIG. 1.
- NLP system 100 includes a training content labeling interface 202 that interacts with a plurality of analysts 220 (also called coders) to generate a label set 204 (also called annotations) of three labels, shown as A, B, and C, for each of a plurality of training content 206 (e.g., training textual content).
- a plurality of analysts 220 also called coders
- label set 204 also called annotations
- training content 206 e.g., training textual content
- training set 208 suitable for training classifier 102(1).
- training set 208 includes upward of one-hundred thousand sets of training content 206 and corresponding label sets 204.
- Training content 206 may include previously published content, such as news articles, blogs, social media posts, etc.
- training content labeling interface 202 engages one or more analysts 220 (e.g., humans) to read training content 206 and respond to one to a set of simple questions. The answers to these questions form a corresponding label set 204 for the training content.
- Training content labeling interface 202 may generate analyst instructions 210 that guide analysts 220 on how to read training content 206 and how respond to the questions to generate label set 204.
- analyst instructions 210 may detail questions relating to the specific topics 106 and social perspectives 108 to be evaluated by analyst 220.
- Training content labeling interface 202 collects, for a particular group of attributes, training content 206 and its corresponding label set 204 to form training set 208.
- Each training set 208 includes thousands of different training content 206 and label sets 204.
- NLP system 100 uses training set 208 to train classifier 102(1) to recognize the corresponding a finite number of attributes (e.g., one, two, three, etc.) selected forthe training set 208. Once trained, classifier 102(1) may be used to process content 162 and generate corresponding probability 103(1).
- a main consideration of any service is the quality of output. Accordingly, before a newly trained classifier 102 is used to process content 162, the classifier is first externally validated.
- a content analyst 260 may use a verification interface 250 to select a completely new, random, set of test content 252 (e.g., news stories that just appeared online in the past week) and generate a corresponding true label set 254 for the attributes being processed by classifier 102(1).
- Verification interface 250 invokes classifier 102(1) to process test content 252 and generate probability 103(1). Verification interface 250 then compares true label set 254 (generated by analysts 260) with probability 103(1) to determine performance 256 that defines precision and recall of classifier 102(1), where true label set 254 is considered the “true” observations.
- Performance 256 of classifier 102(1) is compared to performance criteria 258 to determine whether classifier 102(1) is sufficiently trained and suitable for use. Classifier 102(1) is only considered good enough for deployment when the precision and recall are scientifically rigorous, typically above .80 for both precision and recall. Accordingly, NLP system 100 ensures that any newly trained classifier 102 is able to classify new, unseen data at an acceptable level.
- the following example illustrates how two independent analysts 220 (also called “coders”) generate label set 204 for training content 206 that is “stacked” for positive racial diversity.
- analysts 220 agree upon a set of analyst instructions 210, also known in social science as a “codebook.”
- An example of analyst instructions 210 is provided below in the section titled “Codebook Example.”
- classifier 102 e.g., a neural network
- the annotated training content is then sampled in a stratified way.
- the sample is comprised of:
- FIG. 3 is a flowchart illustrating one example method 300 for training and validating classifier 102 of FIG. 1.
- the method 300 may be performed with NLP system 100 of FIG. 1.
- method 300 defines categories and instructions for labeling training content.
- training content labeling interface 202 of NLP system 100 interacts with at least one analyst 220 to determine topics 106, social perspectives 108, and analyst instructions 210.
- method 300 captures training content and labels from humans.
- training content labeling interface 202 interacts with at least one analyst 220 to capture training content 206 and corresponding label sets 204.
- method 300 builds training set from training content and labels.
- training content labeling interface 202 generates training set 208 to include training content 206 and corresponding label sets 204.
- method 300 trains one classifier using the training set.
- training content labeling interface 202 trains classifier 102(1) using training set 208.
- method 300 captures test content and true labels from analyst.
- verification interface 250 of NLP system 100 interacts with at least one analyst 260 to capture test content 252 and corresponding true label set 254.
- method 300 uses the classifier to process the test content.
- verification interface 250 invokes classifier 102(1) to process test content 252 and generate probability 103(1) corresponding to test content 252.
- method 300 compares the attribute probabilities from the classifier against the true labels to determine performance of the classifier.
- verification interface 250 compares probability 103(1) against true label set 254 to determine performance 256 of classifier 102(1).
- Block 316 is a decision. If, in block 316, method 300 determines that performance of the classifier meets performance criteria, method 300 continues with block 318; otherwise, method 300 continues with block 304, where blocks 304 through 316 repeat to improve training of the classifier.
- method 300 makes the classifier available for use.
- FIG. 4 is a flowchart illustrating one example method 400 for automatically classifying textual content for topic and social perspective attributes.
- Method 400 is implemented by application programming interface 101 of NLP system 100 of FIG. 1, for example.
- method 400 receives textual content from a requestor.
- application programming interface 101 receives content 162 from supply-side platform 150.
- method 400 determines, using a first classifier, a first probability of at least one attribute being present in the textual content.
- application programming interface 101 invokes classifier 102(1) to process content 162 and generate probability 103(1).
- method 400 determines, using a second classifier, a second probability of at least one attribute being present in the textual content.
- application programming interface 101 invokes classifier 102(2) to process content 162 and generate probability 103(2).
- method 400 generates an attribute set to include the first probability and the second probability.
- application programming interface 101 generates attribute set 104 to include probability 103(1) and probability 103(2).
- method 400 sends the attribute set to the requestor.
- application programming interface 101 sends attribute set 104 to supply-side platform 150.
- a natural -language processing method for detecting social inclusion and diversity includes receiving textual content from a requestor and determining, using a first one- class classifier trained to determine membership within a first class of a plurality of socialperspective classes, a first probability of the textual content belonging to the first class.
- the natural-language processing method also includes generating an attribute that includes the first probability and sending the attribute set to the requestor.
- (B) The natural-language processing method denoted as (A), one or more of the plurality of social-perspective classes being selected from the group consisting of: racial diversity, gender diversity, religious diversity, and economic diversity.
- (C) Either of the natural -language processing methods denoted as (A) and (B), further including determining, using a second one-class classifier trained to determine membership of a second class representing a topic, a second probability of the textual content belonging to the second class; wherein the attribute set includes the second probability.
- the attribute set includes the second probability.
- (D) The natural-language processing method denoted as (C), one or more of the plurality of topic classes being selected from the group consisting of: death and injury, crime, military, anti-vaccine, sex, profanity, vice, and politics.
- (E) Either of the natural -language processing methods denoted as (C) and (D), further including determining a set of threshold values corresponding to the first probability and the second probability, wherein said generating further comprises including the set of threshold values in the attribute set.
- the first supervisory label set indicates how said each analyst classified the first training content into one or more of the plurality of social-perspective classes.
- the second supervisory label set indicates how said each analyst classified the second training content into one or more of the plurality of topic classes.
- the naturallanguage processing method also includes generating a first training set by combining the first training content with the first supervisory label set, generating a second training set by combining the second training content with the second supervisory label set, training the first one-class classifier with the first training set, and training the second one-class classifier with the second training set.
- (G) The natural-language processing method denoted as (F), further comprising sending analyst instructions to each analyst.
- Each analyst generates the first and second supervisory label sets by classifying the first and second training contents based on the analyst instructions.
- (H) The natural -language processing method denoted as (G), further comprising generating the analyst instructions based on the plurality of social-perspective classes and the plurality of topic classes.
- (J) The natural -language processing method denoted as (I), the requestor comprising a server that generates additional content for displaying on the website.
- (K) Either of the natural -language processing method denoted as (I) and (J), the requestor provides the attribute set to an exchange server for use by a demand-side platform to generate a bid to place the additional content on the website.
- a natural -language processing system for detecting social inclusion and diversity includes a processor, a memory communicatively coupled with the processor, a first one- class classifier stored in the memory and trained to determine membership within a first class of a plurality of social-perspective classes.
- the natural -language processing system also includes machine-readable instructions stored in the memory that, when executed by the processor, control the natural-language processing system to: receive textual content from a requestor; determine, using the first one-class classifier, a first probability of the textual content belonging to the first class; generate an attribute set that includes the first probability; and send the attribute set to the requestor.
- (M) The natural -language processing system denoted as (L), one or more of the plurality of social-perspective classes being selected from the group consisting of: racial diversity, gender diversity, religious diversity, and economic diversity.
- (N) Either of the natural -language processing systems denoted as (L) and (M), further including a second one-class classifier stored in the memory and trained to determine membership within a second class of a plurality of topic classes.
- the natural-language processing system also includes additional machine-readable instructions stored in the memory that, when executed by the processor, control the natural -language processing system to: determine, using the second one-class classifier, a second probability of the textual content belonging to the second class; and include the second probability in the attribute set.
- (P) Either of the natural -language processing systems denoted as (N) and (O), the memory storing additional machine-readable instructions that, when executed by the processor, control the natural -language processing system: to determine a set of threshold values corresponding to the first probability and the second probability; and include the set of threshold values in the attribute set.
- the first supervisory label set indicates how said each analyst classified the first training content into one or more of the plurality of socialperspective classes.
- the second supervisory label set indicates how said each analyst classified the second training content into one or more of the plurality of topic classes.
- (R) The natural-language processing system denoted as (R), the memory storing additional machine-readable instructions that, when executed by the processor, control the naturallanguage processing system to send analyst instructions to each analyst. Said each analyst generates the first and second supervisory label sets by classifying the first and second training contents based on the analyst instructions.
- (T) The natural-language processing system denoted as (S), the requestor providing the attribute set to an exchange server for use by a demand-side platform to generate a bid to place the additional content on the website.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Computing Systems (AREA)
- Accounting & Taxation (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Finance (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Economics (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Entrepreneurship & Innovation (AREA)
- Human Resources & Organizations (AREA)
- Primary Health Care (AREA)
- Tourism & Hospitality (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention concerne un système et procédé de traitement du langage naturel qui classifient du contenu textuel reçu de la part d'un demandeur. Un premier classificateur mono-classe, entraîné pour déterminer l'appartenance à une première classe parmi une pluralité de classes de points de vue sociaux, est utilisé pour déterminer une première probabilité que le contenu textuel appartienne à la première classe. Un second classificateur mono-classe, entraîné pour déterminer l'appartenance à une seconde classe parmi une pluralité de classes de sujets, est utilisé pour déterminer une seconde probabilité que le contenu textuel appartienne à la seconde classe. Un ensemble d'attributs qui comprend les première et seconde probabilités est ensuite envoyé au demandeur. En détectant à la fois les sujets négatifs et les points de vue sociaux positifs dans le contenu textuel, le demandeur obtient une meilleure mesure selon laquelle le contenu textuel est susceptible ou non d'affecter la perception d'un éventuel contenu supplémentaire affiché avec celui-ci, permettant ainsi à un fournisseur du contenu supplémentaire d'éviter une association avec certains points de vue sociaux indésirables.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163133741P | 2021-01-04 | 2021-01-04 | |
US63/133,741 | 2021-01-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022147528A1 true WO2022147528A1 (fr) | 2022-07-07 |
Family
ID=82260987
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2022/011112 WO2022147528A1 (fr) | 2021-01-04 | 2022-01-04 | Système et procédé de traitement du langage naturel pour détecter la diversité sociale et l'inclusion |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022147528A1 (fr) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070067322A1 (en) * | 2005-08-29 | 2007-03-22 | Harrison Shelton E Jr | Political system, method and device |
US20100198834A1 (en) * | 2000-02-10 | 2010-08-05 | Quick Comments Inc | System for Creating and Maintaining a Database of Information Utilizing User Options |
US20100332321A1 (en) * | 2002-07-16 | 2010-12-30 | Google Inc. | Method and System for Providing Advertising Through Content Specific Nodes Over the Internet |
US20130054559A1 (en) * | 2011-08-30 | 2013-02-28 | E-Rewards, Inc. | System and Method for Generating a Knowledge Metric Using Qualitative Internet Data |
-
2022
- 2022-01-04 WO PCT/US2022/011112 patent/WO2022147528A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100198834A1 (en) * | 2000-02-10 | 2010-08-05 | Quick Comments Inc | System for Creating and Maintaining a Database of Information Utilizing User Options |
US20100332321A1 (en) * | 2002-07-16 | 2010-12-30 | Google Inc. | Method and System for Providing Advertising Through Content Specific Nodes Over the Internet |
US20070067322A1 (en) * | 2005-08-29 | 2007-03-22 | Harrison Shelton E Jr | Political system, method and device |
US20130054559A1 (en) * | 2011-08-30 | 2013-02-28 | E-Rewards, Inc. | System and Method for Generating a Knowledge Metric Using Qualitative Internet Data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kumar et al. | Systematic literature review of sentiment analysis on Twitter using soft computing techniques | |
Han et al. | Fake news detection in social networks using machine learning and deep learning: Performance evaluation | |
Koltsova et al. | Mapping the public agenda with topic modeling: The case of the Russian livejournal | |
Burnap et al. | Detecting tension in online communities with computational Twitter analysis | |
Pennacchiotti et al. | A machine learning approach to twitter user classification | |
Gupta et al. | Emotion detection in email customer care | |
Bhuvaneshwari et al. | Spam review detection using self attention based CNN and bi-directional LSTM | |
Du et al. | Understanding visual memes: An empirical analysis of text superimposed on memes shared on twitter | |
US20100138402A1 (en) | Method and system for improving utilization of human searchers | |
Umar et al. | Detection and analysis of self-disclosure in online news commentaries | |
Okazaki et al. | How to mine brand Tweets: Procedural guidelines and pretest | |
US20110219299A1 (en) | Method and system of providing completion suggestion to a partial linguistic element | |
US11100252B1 (en) | Machine learning systems and methods for predicting personal information using file metadata | |
US20220058464A1 (en) | Information processing apparatus and non-transitory computer readable medium | |
CA3237882A1 (fr) | Modeles bases sur l'apprentissage automatique pour le marquage de donnees de texte | |
Cabral et al. | FakeWhastApp. BR: NLP and Machine Learning Techniques for Misinformation Detection in Brazilian Portuguese WhatsApp Messages. | |
Rahman et al. | Using natural language processing to improve suicide classification requires consideration of race | |
Mounika et al. | Design of book recommendation system using sentiment analysis | |
Zhang et al. | “Less is more”: Mining useful features from Twitter user profiles for Twitter user classification in the public health domain | |
Bashir et al. | Human aggressiveness and reactions towards uncertain decisions | |
GB2572320A (en) | Hate speech detection system for online media content | |
Tarasova | Classification of hate tweets and their reasons using svm | |
WO2022147528A1 (fr) | Système et procédé de traitement du langage naturel pour détecter la diversité sociale et l'inclusion | |
Janchevski et al. | Andrejjan at semeval-2019 task 7: A fusion approach for exploring the key factors pertaining to rumour analysis | |
Lee et al. | Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22734833 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22734833 Country of ref document: EP Kind code of ref document: A1 |