WO2022147528A1 - Système et procédé de traitement du langage naturel pour détecter la diversité sociale et l'inclusion - Google Patents

Système et procédé de traitement du langage naturel pour détecter la diversité sociale et l'inclusion Download PDF

Info

Publication number
WO2022147528A1
WO2022147528A1 PCT/US2022/011112 US2022011112W WO2022147528A1 WO 2022147528 A1 WO2022147528 A1 WO 2022147528A1 US 2022011112 W US2022011112 W US 2022011112W WO 2022147528 A1 WO2022147528 A1 WO 2022147528A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
natural
training
language processing
analyst
Prior art date
Application number
PCT/US2022/011112
Other languages
English (en)
Inventor
Tobias HOPP
Chris VARGO
Original Assignee
The Regents Of The University Of Colorado, A Body Corporate
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Regents Of The University Of Colorado, A Body Corporate filed Critical The Regents Of The University Of Colorado, A Body Corporate
Publication of WO2022147528A1 publication Critical patent/WO2022147528A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • Web pages on the internet provide vast quantities of information that are read by large numbers of people with diverse demographics. Natural language processing may be used to process such information to efficiently understand and extract context, including context and nuances in meaning.
  • a person When a person reads content (e.g., a news article, a blog post, a social media post, etc.) on a web page, the person can pick up a social perspective present in the content.
  • content e.g., a news article, a blog post, a social media post, etc.
  • an article may describe racial diversity in corporations as having a positive effect on society in America. The person may consciously or subconsciously associate that social perspective with other additional content in close proximity to the original content (e.g., on the same web page).
  • a provider of this additional content is often cautious as to how the original content may affect perception of their additional content, wanting to avoid being associated with certain social perspectives.
  • some social perspectives, such as racial diversity may be advantageous to the provider.
  • an advertiser may use a brand safety floor that defines content near which a brand should not appear.
  • the brand safety floor may indicate that an advertisement should not appear on a web page that includes content related to negative attributes, such as death, injury, crime, profanity, and so on.
  • Prior-art systems use algorithms (e.g., sentiment analysis and named entity recognition) to automatically detect this negative content.
  • sentiment analysis and named entity recognition e.g., sentiment analysis and named entity recognition
  • the content is flagged as including negative attributes, and the advertiser will frequently choose to avoid advertising on any webpage containing the content (e.g., no bid is placed on an online auction system). Since the content was mislabeled, it would have been acceptable, if not beneficial, for the advertiser to place an advertisement proximate to the content. As a result, this mislabeling results in missed impressions for the advertiser and missed revenue for the content publisher.
  • Appendix A provides examples of how prior-art systems misclassify online news content, and therefore how opportunities for placing advertisements on a web page can be missed when using these prior-art systems.
  • One aspect of the present embodiments includes the realization that content with diversity, equality, and inclusion may also include negative sentiment and topics. For instance, many brand safety floors view the topic of violence as negative, but many news articles may talk about diversity and violence together. Similarly, other systems block articles that use negatively valenced words, and as such would label any critical assessment as being negative (e.g., an article that highlights the problems associated with Vietnamese). In these ways, prior-art systems label content erroneously. Positive social perspectives include diversity, racial equality, inclusion, and others.
  • Prior-art artificial intelligence (Al) algorithms use named entity recognition and sentiment analysis to identify only negative sentiment and topics in the content, and these prior-art Al algorithms thereby indicate any content (e.g., including content with diversity, equality, and inclusion) as being sensitive and therefore should be avoided.
  • the present embodiments solve this problem by classifying content based on topics and social perspectives.
  • some of the present embodiments detect inclusion of both negative topics (e.g., crime, injury, military conflict, etc.) and positive social perspectives (e.g., diversity, equality, inclusion, etc.) in content to help a provider to better whether additional content would be better associated with the original content on the web page, or not.
  • Another aspect of the present embodiments includes the realization that a person is only able to reliably make up a finite number of decisions when labeling training content.
  • the person may be looking for any one of three categories (e.g., crime, injury, and military conflict) within the training content.
  • three categories e.g., crime, injury, and military conflict
  • Increasing the number of decisions required by the human to label training content results in lower quality training content.
  • a training set for a classifier may be limited to three categories, and therefore the classifier is only able to label content with up to three labels.
  • each one-class classifier outputs a probability that inputted text belongs to each of the classes, yielding a collection of probabilities that may be used to evaluate the content suitability for association by a third party.
  • the classes are not limited to negative attributes (referred to herein as “topics”) and thus the third party may target content that includes certain categories and excludes certain other categories.
  • a method classifies textual content.
  • the method includes receiving the textual content from a requestor and determining, using a first one-class classifier trained to determine membership within a first class of a plurality of social-perspective classes, a first probability of the textual content belonging to the first class.
  • the method also includes generating an attribute set to include the first probability and sending the attribute set to the requestor.
  • a natural -language processing system includes a processor, a memory communicatively coupled with the processor, a first one-class classifier stored in the memory and trained to determine membership within a first class of a plurality of socialperspective classes, and machine-readable instructions stored in the memory.
  • the machine- readable instructions when executed by the processor, control the natural-language processing system to: receive textual content from a requestor; determine, using the first one-class classifier, a first probability of the textual content belonging to the first class; generate an attribute set that includes the first probability; and send the attribute set to the requestor.
  • FIG. 1 is a schematic illustrating one example advanced classifier system for detecting social diversity and inclusion in textual content, in embodiments.
  • FIG. 2 is a schematic illustrating example training and verification of classifiers of the system of FIG. 1, in embodiments.
  • FIG. 3 is a flowchart illustrating one example method for training and validating one classifier of FIG. 1, in embodiments.
  • FIG. 4 is a flowchart illustrating one example method for automatically classifying textual content for both topics and social perspectives, in embodiments.
  • the embodiments herein classify text, such as web content (e.g., news articles, blogs, social media, and so on) on a web page, to determine whether the text contains undesirable social perspectives. For example, where a content provider does not wish to be associated with gun rights, any article classified to include content relating to gun rights would be undesirable.
  • Prior-art artificial intelligence (Al) algorithms are trained to label content that includes negative attributes as being negative. However, when the content presents negative attributes in a positive way, such as when discussing how gun violence can perpetuate Vietnamese, the prior-art Al algorithms are insufficient. For example, such prior-art Al algorithms identify content that mentions the topics of race and gun violence, but that does not mean those articles have a pro-diversity social perspective.
  • the present embodiments describe an improved Al classification algorithm that uses a neural network that is trained to recognize the social perspective that the article portrays, including racial diversity and inclusion. That is, the improved Al classification algorithm is not necessarily trained to recognize specific sentiment or topics, but rather societal perspectives on society and culture.
  • FIG. 1 is a schematic illustrating one example of an natural-language processing (NLP) system 100 for detecting social diversity and inclusion in textual content 162.
  • Textual content 162 is received (e.g., directly or indirectly via a URL) from an advertising platform, either a display-side platform or supply-side platform 150 that is in communication with an advertiser or publisher 160 requesting additional information (e.g., from a web page 164) about the textual content therein.
  • Textual content 162 may represent one or more of news articles, blog, social media post, and so on.
  • NLP system 100 is implemented using one or more computers that include at least one processor and a memory storing machine-readable instructions that, when executed by the at least one processor, implement the functionality described herein.
  • NLP system 100 is implemented in the cloud using one or more online services.
  • NLP system 100 is a distributed web services architecture that is designed for adaptability and scalability of services.
  • NLP system 100 may interface with one or more client applications that make social media content and annotation requests through a web service tier of NLP system 100 that serves as the public application programming interface (API) to the service.
  • API application programming interface
  • NLP system 100 may efficiently classify a thousand pieces of content 162 in less than a second, a feat that requires efficiencies in all system areas.
  • NLP system 100 may utilize available supervised deep learning options and may include a set of pre-trained transformers that are generalizable for social media platforms and news platforms. These embeddings are trained based on hundreds of millions of messages on different platforms, including Twitter, Facebook, and online news sites.
  • NLP system 100 may include thousands of classifiers 102, each having various differences from others, including the type of transformer (pre-trained, vs. in- sample), learning rates, and neural-network structures (e.g., number of layers, connectivity between layers, use of max pooling layers, etc.).
  • Each classifier 102 is tested by performing robust in and out of sample validation via commonly accepted performance metrics (e.g., logloss, Fl, precision and recall).
  • Performance metrics e.g., logloss, Fl, precision and recall
  • Each algorithm is externally validated. Before it’s considered ready for use, it must perform well on textual content (e.g., news) it has never seen. To address drift of precision and recall of models across time, each classifier is regularly updated with new training data.
  • NLP system 100 includes a plurality of one-class classifiers 102 (e.g., illustratively shown with classifiers 102(l)-102(N), where N is a positive integer), each trained to classify textual content 162.
  • Each of the classifiers 102 outputs a probability, thereby generating an attribute set 104 (also known as annotations) indicative of topics and social perspectives within textual content 162.
  • attribute set 104 also known as annotations
  • both topics 106 and social perspectives 108 may be identified in textual content 162.
  • NLP system 100 automatically identifies trending topics within social media feeds, news feeds, and so on.
  • Topics 106 may include death, injury, crime, military, anti -vaccine, sex, profanity, vices, politics, explicit sexual content, harmful acts, hate speech, acts of aggression, obscenity, drugs, smoking, alcohol, spam, and terrorism. However, topics 106 may include additional or alternative subjects without departing from the scope hereof.
  • Social perspectives 108 may include racial diversity, gender diversity, religious diversity, economic diversity, and so on. However, social perspectives 108 may include additional or alternative subjects without departing from the scope hereof.
  • each classifier 102 is implemented as at least one Al algorithm (e.g., a neural network, or other such technology) that is trained (described in more detail below with reference to FIG.
  • topics 106 are also referred to herein as topic classes.
  • social perspective 108 are also referred to herein as social-perspective classes.
  • each classifier 102 may be trained based on a finite number of human decisions (e.g., three), where these decisions may classify the training content as having positive or negative content and may be based upon topics 106 and social perspectives 108.
  • NLP system 100 may include an application programming interface 101 that interfaces with one or more other computer systems through a computer network (e.g., the internet, WANs, LANs, etc.). Accordingly, NLP system 100 may provide a classification service to one or more entities, including a demand-side platform 130 or supply-side platform 150.
  • a computer network e.g., the internet, WANs, LANs, etc.
  • NLP system 100 may provide a classification service to one or more entities, including a demand-side platform 130 or supply-side platform 150.
  • supply-side platform 150 provides an advertising service to at least publisher 160 and has an inventory of textual content 162 being published by publisher 160.
  • Publisher 160 may include space on web page 164 for displaying additional content (e.g., advertisements) whereby supply-side platform 150 operates to attract and provide the additional content to publisher 160.
  • supply-side platform 150 may monitor web page 164 (e.g., and other web pages) to discover content 162 (e.g., new news articles, reports, etc.) when newly added to web page 164.
  • supply-side platform 150 sends the content 162 (e.g., e.g., the actual content 162 or a URL identifying content 162 on web page 164) to NLP system 100.
  • NLP system 100 receives (or retrieves) content 162 and uses at least two classifiers 102 to process content 162 to generate a corresponding attribute set 104 according to topics 106 and social perspectives 108 identified within content 162.
  • each classifier 102 may be trained to classify based on a few (e.g., up to three) different attributes (i.e., topics 106 and social perspectives 108) and generate probability 103 indicative of content 162 being a match for each of these attributes.
  • classifier 102(1) is trained using topics 106 such that probability 103(1) indicates a likelihood that content 162 includes the topic
  • classifier 102(2) is trained using social perspectives 108 such that probability 103(2) indicates a likelihood that content 162 includes the social perspectives.
  • the training set (see training sets 208 of FIG. 2) used to train each classifier 102 is labeled based on a human making up to finite number of simple decisions.
  • NLP system 100 sends the generated attribute set 104 to supply-side platform 150.
  • NLP system 100 may also send a set of cut-offs 105 (e.g., threshold values corresponding to attribute set 104) that facilitate binary classification of attribute set 104, and that allows supplyside platform 150 to adjust the threshold values as best suited to current needs.
  • Supply-side platform 150 may append attribute set 104 to a header bid request that it sends to exchange 140.
  • Exchange 140 may for example represent an advertisement exchange.
  • Exchange 140 shares the header bid request (including attribute set 104) with a demand-side platform 130.
  • Demand-side platform 130 provides a service to a content provider 120 that wishes to place additional content 122 on web page 164.
  • Demand-side platform 130 may interact with exchange 140 to place a bid in an auction implemented by exchange 140 to display additional content 122 on web page 164 based on attribute set 104. Particularly, demand-side platform 130 may decide whether or not to make the bid based on whether attribute set 104 aligns with suitability requirements (e.g., brand safety when additional content 122 is an advertisement for a particular brand) of content provider 120, and may determine a bid amount based on whether attribute set 104 aligns with the suitability requirements. For example, where content provider 120 instructs demand-side platform 130 to place additional content 122 alongside pro-racial diversity news content, demand-side platform 130 uses attribute set 104 to determine that content 162 suitably includes pro-racial diversity news, and places a bid with exchange 140 accordingly.
  • suitability requirements e.g., brand safety when additional content 122 is an advertisement for a particular brand
  • attribute set 104 is not limited to only identify topics 106 within content 162, but may also identify social perspectives 108 (e.g., such as pro-racial diversity) thereby providing demand-side platform 130 and content provider 120 with additional opportunity for placing additional content 122 with web page 164 as compared with traditional classifications that only identify when topics are present or not.
  • social perspectives 108 e.g., such as pro-racial diversity
  • demand-side platform 130 evaluates attribute set 104 to select web page 164 only when attribute set 104 does not indicate that content 162 includes political news.
  • attribute set 104 allows this content to be identified, even though it may also include certain undesirable topics.
  • FIG. 2 is a schematic illustrating example training and verification of classifiers 102 of NLP system 100 of FIG. 1.
  • NLP system 100 includes a training content labeling interface 202 that interacts with a plurality of analysts 220 (also called coders) to generate a label set 204 (also called annotations) of three labels, shown as A, B, and C, for each of a plurality of training content 206 (e.g., training textual content).
  • a plurality of analysts 220 also called coders
  • label set 204 also called annotations
  • training content 206 e.g., training textual content
  • training set 208 suitable for training classifier 102(1).
  • training set 208 includes upward of one-hundred thousand sets of training content 206 and corresponding label sets 204.
  • Training content 206 may include previously published content, such as news articles, blogs, social media posts, etc.
  • training content labeling interface 202 engages one or more analysts 220 (e.g., humans) to read training content 206 and respond to one to a set of simple questions. The answers to these questions form a corresponding label set 204 for the training content.
  • Training content labeling interface 202 may generate analyst instructions 210 that guide analysts 220 on how to read training content 206 and how respond to the questions to generate label set 204.
  • analyst instructions 210 may detail questions relating to the specific topics 106 and social perspectives 108 to be evaluated by analyst 220.
  • Training content labeling interface 202 collects, for a particular group of attributes, training content 206 and its corresponding label set 204 to form training set 208.
  • Each training set 208 includes thousands of different training content 206 and label sets 204.
  • NLP system 100 uses training set 208 to train classifier 102(1) to recognize the corresponding a finite number of attributes (e.g., one, two, three, etc.) selected forthe training set 208. Once trained, classifier 102(1) may be used to process content 162 and generate corresponding probability 103(1).
  • a main consideration of any service is the quality of output. Accordingly, before a newly trained classifier 102 is used to process content 162, the classifier is first externally validated.
  • a content analyst 260 may use a verification interface 250 to select a completely new, random, set of test content 252 (e.g., news stories that just appeared online in the past week) and generate a corresponding true label set 254 for the attributes being processed by classifier 102(1).
  • Verification interface 250 invokes classifier 102(1) to process test content 252 and generate probability 103(1). Verification interface 250 then compares true label set 254 (generated by analysts 260) with probability 103(1) to determine performance 256 that defines precision and recall of classifier 102(1), where true label set 254 is considered the “true” observations.
  • Performance 256 of classifier 102(1) is compared to performance criteria 258 to determine whether classifier 102(1) is sufficiently trained and suitable for use. Classifier 102(1) is only considered good enough for deployment when the precision and recall are scientifically rigorous, typically above .80 for both precision and recall. Accordingly, NLP system 100 ensures that any newly trained classifier 102 is able to classify new, unseen data at an acceptable level.
  • the following example illustrates how two independent analysts 220 (also called “coders”) generate label set 204 for training content 206 that is “stacked” for positive racial diversity.
  • analysts 220 agree upon a set of analyst instructions 210, also known in social science as a “codebook.”
  • An example of analyst instructions 210 is provided below in the section titled “Codebook Example.”
  • classifier 102 e.g., a neural network
  • the annotated training content is then sampled in a stratified way.
  • the sample is comprised of:
  • FIG. 3 is a flowchart illustrating one example method 300 for training and validating classifier 102 of FIG. 1.
  • the method 300 may be performed with NLP system 100 of FIG. 1.
  • method 300 defines categories and instructions for labeling training content.
  • training content labeling interface 202 of NLP system 100 interacts with at least one analyst 220 to determine topics 106, social perspectives 108, and analyst instructions 210.
  • method 300 captures training content and labels from humans.
  • training content labeling interface 202 interacts with at least one analyst 220 to capture training content 206 and corresponding label sets 204.
  • method 300 builds training set from training content and labels.
  • training content labeling interface 202 generates training set 208 to include training content 206 and corresponding label sets 204.
  • method 300 trains one classifier using the training set.
  • training content labeling interface 202 trains classifier 102(1) using training set 208.
  • method 300 captures test content and true labels from analyst.
  • verification interface 250 of NLP system 100 interacts with at least one analyst 260 to capture test content 252 and corresponding true label set 254.
  • method 300 uses the classifier to process the test content.
  • verification interface 250 invokes classifier 102(1) to process test content 252 and generate probability 103(1) corresponding to test content 252.
  • method 300 compares the attribute probabilities from the classifier against the true labels to determine performance of the classifier.
  • verification interface 250 compares probability 103(1) against true label set 254 to determine performance 256 of classifier 102(1).
  • Block 316 is a decision. If, in block 316, method 300 determines that performance of the classifier meets performance criteria, method 300 continues with block 318; otherwise, method 300 continues with block 304, where blocks 304 through 316 repeat to improve training of the classifier.
  • method 300 makes the classifier available for use.
  • FIG. 4 is a flowchart illustrating one example method 400 for automatically classifying textual content for topic and social perspective attributes.
  • Method 400 is implemented by application programming interface 101 of NLP system 100 of FIG. 1, for example.
  • method 400 receives textual content from a requestor.
  • application programming interface 101 receives content 162 from supply-side platform 150.
  • method 400 determines, using a first classifier, a first probability of at least one attribute being present in the textual content.
  • application programming interface 101 invokes classifier 102(1) to process content 162 and generate probability 103(1).
  • method 400 determines, using a second classifier, a second probability of at least one attribute being present in the textual content.
  • application programming interface 101 invokes classifier 102(2) to process content 162 and generate probability 103(2).
  • method 400 generates an attribute set to include the first probability and the second probability.
  • application programming interface 101 generates attribute set 104 to include probability 103(1) and probability 103(2).
  • method 400 sends the attribute set to the requestor.
  • application programming interface 101 sends attribute set 104 to supply-side platform 150.
  • a natural -language processing method for detecting social inclusion and diversity includes receiving textual content from a requestor and determining, using a first one- class classifier trained to determine membership within a first class of a plurality of socialperspective classes, a first probability of the textual content belonging to the first class.
  • the natural-language processing method also includes generating an attribute that includes the first probability and sending the attribute set to the requestor.
  • (B) The natural-language processing method denoted as (A), one or more of the plurality of social-perspective classes being selected from the group consisting of: racial diversity, gender diversity, religious diversity, and economic diversity.
  • (C) Either of the natural -language processing methods denoted as (A) and (B), further including determining, using a second one-class classifier trained to determine membership of a second class representing a topic, a second probability of the textual content belonging to the second class; wherein the attribute set includes the second probability.
  • the attribute set includes the second probability.
  • (D) The natural-language processing method denoted as (C), one or more of the plurality of topic classes being selected from the group consisting of: death and injury, crime, military, anti-vaccine, sex, profanity, vice, and politics.
  • (E) Either of the natural -language processing methods denoted as (C) and (D), further including determining a set of threshold values corresponding to the first probability and the second probability, wherein said generating further comprises including the set of threshold values in the attribute set.
  • the first supervisory label set indicates how said each analyst classified the first training content into one or more of the plurality of social-perspective classes.
  • the second supervisory label set indicates how said each analyst classified the second training content into one or more of the plurality of topic classes.
  • the naturallanguage processing method also includes generating a first training set by combining the first training content with the first supervisory label set, generating a second training set by combining the second training content with the second supervisory label set, training the first one-class classifier with the first training set, and training the second one-class classifier with the second training set.
  • (G) The natural-language processing method denoted as (F), further comprising sending analyst instructions to each analyst.
  • Each analyst generates the first and second supervisory label sets by classifying the first and second training contents based on the analyst instructions.
  • (H) The natural -language processing method denoted as (G), further comprising generating the analyst instructions based on the plurality of social-perspective classes and the plurality of topic classes.
  • (J) The natural -language processing method denoted as (I), the requestor comprising a server that generates additional content for displaying on the website.
  • (K) Either of the natural -language processing method denoted as (I) and (J), the requestor provides the attribute set to an exchange server for use by a demand-side platform to generate a bid to place the additional content on the website.
  • a natural -language processing system for detecting social inclusion and diversity includes a processor, a memory communicatively coupled with the processor, a first one- class classifier stored in the memory and trained to determine membership within a first class of a plurality of social-perspective classes.
  • the natural -language processing system also includes machine-readable instructions stored in the memory that, when executed by the processor, control the natural-language processing system to: receive textual content from a requestor; determine, using the first one-class classifier, a first probability of the textual content belonging to the first class; generate an attribute set that includes the first probability; and send the attribute set to the requestor.
  • (M) The natural -language processing system denoted as (L), one or more of the plurality of social-perspective classes being selected from the group consisting of: racial diversity, gender diversity, religious diversity, and economic diversity.
  • (N) Either of the natural -language processing systems denoted as (L) and (M), further including a second one-class classifier stored in the memory and trained to determine membership within a second class of a plurality of topic classes.
  • the natural-language processing system also includes additional machine-readable instructions stored in the memory that, when executed by the processor, control the natural -language processing system to: determine, using the second one-class classifier, a second probability of the textual content belonging to the second class; and include the second probability in the attribute set.
  • (P) Either of the natural -language processing systems denoted as (N) and (O), the memory storing additional machine-readable instructions that, when executed by the processor, control the natural -language processing system: to determine a set of threshold values corresponding to the first probability and the second probability; and include the set of threshold values in the attribute set.
  • the first supervisory label set indicates how said each analyst classified the first training content into one or more of the plurality of socialperspective classes.
  • the second supervisory label set indicates how said each analyst classified the second training content into one or more of the plurality of topic classes.
  • (R) The natural-language processing system denoted as (R), the memory storing additional machine-readable instructions that, when executed by the processor, control the naturallanguage processing system to send analyst instructions to each analyst. Said each analyst generates the first and second supervisory label sets by classifying the first and second training contents based on the analyst instructions.
  • (T) The natural-language processing system denoted as (S), the requestor providing the attribute set to an exchange server for use by a demand-side platform to generate a bid to place the additional content on the website.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Computing Systems (AREA)
  • Accounting & Taxation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Finance (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Tourism & Hospitality (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un système et procédé de traitement du langage naturel qui classifient du contenu textuel reçu de la part d'un demandeur. Un premier classificateur mono-classe, entraîné pour déterminer l'appartenance à une première classe parmi une pluralité de classes de points de vue sociaux, est utilisé pour déterminer une première probabilité que le contenu textuel appartienne à la première classe. Un second classificateur mono-classe, entraîné pour déterminer l'appartenance à une seconde classe parmi une pluralité de classes de sujets, est utilisé pour déterminer une seconde probabilité que le contenu textuel appartienne à la seconde classe. Un ensemble d'attributs qui comprend les première et seconde probabilités est ensuite envoyé au demandeur. En détectant à la fois les sujets négatifs et les points de vue sociaux positifs dans le contenu textuel, le demandeur obtient une meilleure mesure selon laquelle le contenu textuel est susceptible ou non d'affecter la perception d'un éventuel contenu supplémentaire affiché avec celui-ci, permettant ainsi à un fournisseur du contenu supplémentaire d'éviter une association avec certains points de vue sociaux indésirables.
PCT/US2022/011112 2021-01-04 2022-01-04 Système et procédé de traitement du langage naturel pour détecter la diversité sociale et l'inclusion WO2022147528A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163133741P 2021-01-04 2021-01-04
US63/133,741 2021-01-04

Publications (1)

Publication Number Publication Date
WO2022147528A1 true WO2022147528A1 (fr) 2022-07-07

Family

ID=82260987

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2022/011112 WO2022147528A1 (fr) 2021-01-04 2022-01-04 Système et procédé de traitement du langage naturel pour détecter la diversité sociale et l'inclusion

Country Status (1)

Country Link
WO (1) WO2022147528A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070067322A1 (en) * 2005-08-29 2007-03-22 Harrison Shelton E Jr Political system, method and device
US20100198834A1 (en) * 2000-02-10 2010-08-05 Quick Comments Inc System for Creating and Maintaining a Database of Information Utilizing User Options
US20100332321A1 (en) * 2002-07-16 2010-12-30 Google Inc. Method and System for Providing Advertising Through Content Specific Nodes Over the Internet
US20130054559A1 (en) * 2011-08-30 2013-02-28 E-Rewards, Inc. System and Method for Generating a Knowledge Metric Using Qualitative Internet Data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100198834A1 (en) * 2000-02-10 2010-08-05 Quick Comments Inc System for Creating and Maintaining a Database of Information Utilizing User Options
US20100332321A1 (en) * 2002-07-16 2010-12-30 Google Inc. Method and System for Providing Advertising Through Content Specific Nodes Over the Internet
US20070067322A1 (en) * 2005-08-29 2007-03-22 Harrison Shelton E Jr Political system, method and device
US20130054559A1 (en) * 2011-08-30 2013-02-28 E-Rewards, Inc. System and Method for Generating a Knowledge Metric Using Qualitative Internet Data

Similar Documents

Publication Publication Date Title
Kumar et al. Systematic literature review of sentiment analysis on Twitter using soft computing techniques
Han et al. Fake news detection in social networks using machine learning and deep learning: Performance evaluation
Koltsova et al. Mapping the public agenda with topic modeling: The case of the Russian livejournal
Burnap et al. Detecting tension in online communities with computational Twitter analysis
Pennacchiotti et al. A machine learning approach to twitter user classification
Gupta et al. Emotion detection in email customer care
Bhuvaneshwari et al. Spam review detection using self attention based CNN and bi-directional LSTM
Du et al. Understanding visual memes: An empirical analysis of text superimposed on memes shared on twitter
US20100138402A1 (en) Method and system for improving utilization of human searchers
Umar et al. Detection and analysis of self-disclosure in online news commentaries
Okazaki et al. How to mine brand Tweets: Procedural guidelines and pretest
US20110219299A1 (en) Method and system of providing completion suggestion to a partial linguistic element
US11100252B1 (en) Machine learning systems and methods for predicting personal information using file metadata
US20220058464A1 (en) Information processing apparatus and non-transitory computer readable medium
CA3237882A1 (fr) Modeles bases sur l'apprentissage automatique pour le marquage de donnees de texte
Cabral et al. FakeWhastApp. BR: NLP and Machine Learning Techniques for Misinformation Detection in Brazilian Portuguese WhatsApp Messages.
Rahman et al. Using natural language processing to improve suicide classification requires consideration of race
Mounika et al. Design of book recommendation system using sentiment analysis
Zhang et al. “Less is more”: Mining useful features from Twitter user profiles for Twitter user classification in the public health domain
Bashir et al. Human aggressiveness and reactions towards uncertain decisions
GB2572320A (en) Hate speech detection system for online media content
Tarasova Classification of hate tweets and their reasons using svm
WO2022147528A1 (fr) Système et procédé de traitement du langage naturel pour détecter la diversité sociale et l'inclusion
Janchevski et al. Andrejjan at semeval-2019 task 7: A fusion approach for exploring the key factors pertaining to rumour analysis
Lee et al. Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22734833

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22734833

Country of ref document: EP

Kind code of ref document: A1