CN112836486A - Group hidden-in-field analysis method based on word vectors and Bert - Google Patents

Group hidden-in-field analysis method based on word vectors and Bert Download PDF

Info

Publication number
CN112836486A
CN112836486A CN202011451101.2A CN202011451101A CN112836486A CN 112836486 A CN112836486 A CN 112836486A CN 202011451101 A CN202011451101 A CN 202011451101A CN 112836486 A CN112836486 A CN 112836486A
Authority
CN
China
Prior art keywords
implicit
words
sentence
bert
attitude
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011451101.2A
Other languages
Chinese (zh)
Other versions
CN112836486B (en
Inventor
韩旭
王博
蒋沁学
陈根华
黄博帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202011451101.2A priority Critical patent/CN112836486B/en
Publication of CN112836486A publication Critical patent/CN112836486A/en
Application granted granted Critical
Publication of CN112836486B publication Critical patent/CN112836486B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a group hidden vertical field analysis method based on word vectors and Bert, wherein a Bert model is trained through a text corpus, and group hidden vertical field analysis is carried out by combining sentence vectors; the system comprises a data analysis module, a model training module and an implicit vertical analysis module; the data analysis module is used for analyzing and extracting the speech data published by the users in the social group and classifying the speech of the users according to the target words and the attribute words mentioned in the implicit association test; sentence segmentation is carried out on the extracted text to obtain a statement set A, and then a sentence set B containing both target words and attribute words and a set C not containing the target words and the attribute words are extracted; the model training module is used for constructing a model for learning the language big data text prejudice of the social group; the implicit standpoint analysis module measures the relationship between the corresponding target words and the attribute words according to the distance between the embedded vectors of the sentences, so that the implicit standpoint attitude of the social group users is quantified.

Description

Group hidden-in-field analysis method based on word vectors and Bert
Technical Field
The invention belongs to the field of group language data analysis in social computing and social psychology, and relates to a method for analyzing hidden position attitude in a group, in particular to a method for analyzing hidden position in a group based on word vectors and Bert.
Background
In social media language big data (such as microblog, Twitter, news, wikipedia and the like), the language published by the user can potentially reflect the attitude of the user to things or attitudes. An attitude standpoint is the perspective of an individual or group's propensity toward concepts or objects. Currently, research on the attitude standpoint of individuals or groups is mainly focused on the area of social psychology. The attitude is divided into an external attitude and an internal attitude from the standpoint of attitude, the external attitude is conscious, controllable and easy to report, and the internal attitude is uncontrollable and can not be consciously acquired [1 ]. Meanwhile, languages have also been used to mine attitudes [2-4] by means of natural language processing techniques, and attitudes of individuals or groups to certain events, objects, characters or concepts can be mined by analyzing emotions and semantics of individual or group utterances [5 ].
In psychological studies, implicit association testing requires the active cooperation of the test subjects to conduct the experiment [7] and can only be measured on a small population. Although human beings have had some breakthrough in learning only textual representations, human readers have difficulty judging the integrity of those sentences that are inexperienced and are expressively expressed without a rich background. Learning a biased AI model does so to some extent.
Implicit attitude is the degree of internal attitude of the individual's heart that affects the individual's behavior in an unconscious manner. The implicit association test [9] is one of the major measures of psychologically measuring the implicit attitudes, designed to reliably assess individual attitude differences in a manner that produces large effect sizes [10 ]. Greenwald and Banaji assert that implicit and explicit memory can be applied to studies on individual or community attitudes [11 ]. Implicit association test is a method proposed by Greenwald et al in 1998 to measure the implicit status of a tested object by measuring the association between concept vocabularies and attribute vocabularies. This association can also affect the attitude and behavior of the individual if the memory unavailable for consciousness can affect the individual's actions. Exploiting differences in conceptual associations of individuals helps psychological researchers understand attitudes that cannot be measured by self-reporting assessment due to lack of awareness and social expectation bias [12 ].
Currently, attitude measurement based on attitude analysis of text languages [13-19] mainly relies on explicit expression of attitudes in text, and no in-depth study is conducted on implicit attitudes. The text emotion analysis method is a main method for measuring attitudes. Emotion analysis refers to the use of natural language processing, text analysis, computational linguistics to identify, extract, quantify and study emotional state and subjective information [6], which is intended to determine the author's attitude to a certain topic, or attitude polarity to a document, object or event. The general or Aspect-based attitude [20] expressed in online talk comments can be understood through sentiment analysis, where the attitude may be a sentiment state. Emotion analysis will generally classify opinions in text into categories of "positive", "neutral", and "negative" [21 ]. Generally speaking, the attitude of the research based on emotion analysis usually includes several key factors such as objects, attributes, attitude polarity, attitude holders, and the like. Meanwhile, some resources and techniques in natural language processing are also applied to attitude measurement, such as external dictionary [18] and syntactic analysis [19 ].
In 2017, Caliskan et al proposed a word embedding association test [22] to measure attitudes by linking the strength of association in implicit association tests to semantic distances between words to understand attitude. Word embedding (word embedding) is a semantic representation of words that depends on the context of words in a corpus. In word embedding, words that are closer in vector space should be semantically closer.
In 2019, May et al developed the word embedding association test to measure attitudes, measured from the sentence encoder standpoint, found that the sentence level test is more likely to cause significant associations than the vocabulary level test, and although the vocabulary level test is more effective, although the word embedding association test introduced the concept of classical psychometric measures into automated language analysis, this method also did not distinguish between explicit and implicit attitudes, which allowed the results to be the common effect of the explicit and implicit attitude standpoints, and this could be confused in cases where the explicit and implicit attitudes were different.
First, however, the measurement of the exo-attitudes and endo-attitudes of individuals or groups is mainly focused on psychosocial studies, but the psychology-based measurement methods require active cooperation of subjects, and only a small number of subjects can perform experiments, and cannot be applied to large-scale studies on attitudes of individuals or groups, and also cannot analyze the historical attitudes of subjects. Second, although the attitude measurement methods based on text language are not limited by the number of the tested objects and the tested objects, and can be applied to attitude measurement of large-scale population, the methods mainly depend on explicit expression in the text and do not deeply research the implicit attitude. Meanwhile, the latest attitude measurement method based on word embedding does not distinguish between the external appearance attitude and the internal hiding attitude. The explicit attitude and the implicit attitude have different roles in social life, the clear attitude can be transmitted in public places and forms a mainstream value view, and the implicit attitude in the mind can determine the behavior of an individual without consciousness. Therefore, it is necessary to clearly distinguish the extrinsic and implicit attitudes and perform attitude measurement. Third, word-based methods, such as WEAT or verb extraction, are simpler. They consider only a single word, regardless of the grammar and context. Decision ground at the sentence level has a deeper meaning [23], allowing the calculation of cosine similarities for different sentences, such as the similarity of a question and corresponding answer. The more appropriate a particular answer is for a given question, the higher its cosine similarity. Therefore, it is necessary to extend WEAT to SEAT for field analysis in large data.
[ reference documents ]
[1]Timothy D.Wilson,Samuel Lindsey,Tonya Y.Schooler.2000.A model of dual attitudes, 107(1):101-126.
[2]Sap,M.Prasetio,M.C.,Holtzman,A.,Rashkin,H.,Choi,Y.2017.Connotation Frames of Power and Agency in Modern Films.In Proc.EMNLP 2017
[3]McKenzie,R.M.and E.Carrie,Implicit–explicit attitudinal discrepancy and the investigation of language attitude change in progress.Journal of Multilingual and Multicultural Development,2018. 0(0):p.1-15.
[4]Carpenter,Jordan,Daniel
Figure BDA0002826926520000031
Lucie Flekova,Salvatore Giorgi,Courtney Hagan, Margaret Kern,Anneke Buffone,Lyle Ungar,Martin Seligman,Real Men don’t say'cute':Using Automatic Language Analysis to Isolate Inaccurate Aspects of Stereotypes,Social Psychological and Personality Science,2016.
[5]Liu,B.2010.Sentiment Analysis and Subjectivity.Handbook of natural language processing,2, 627-666.
[6]Northrup,D.A.1996.The Problem of the Self-Report In Survey Research.Institute for Social Research.11(3)
[7]Stone,A.A.,Turkkan,J.S.,Bachrach,C.A.,Jobe,J.B.,Kurtzman,H.S.,Cain,V.S.2003.The science of self-report:Implications for research and practice.Experimental Psychology 50(3):231-232
[8]Paulhus,D.L.,Vazire,S.2007.The self-report method.Handbook of research methods in personality psychology,1,224-239.
[9]Greenwald,A.G.,McGhee,D.E.,Schwartz,J.L.1998.Measuring individual differences in implicit cognition:the implicit association test.Journal of personality and social psychology,74(6), 1464-1480.
[10]Lane,K.A.,Banaji,M.R.,Nosek,B.A.,Greenwald,A.G.2007.Understanding and Using the Implicit Association Test:IV:What We Know(So Far)about the Method.In B.Wittenbrink&N. Schwarz(Eds.),Implicit measures of attitudes,59-102.New York,NY,US:Guilford Press.
[11]Greenwald,A.G.,Banaji,M.R.1995.Implicit social cognition:attitudes,self-esteem,and stereotypes.Psychological review,102(1),4.
[12]Nosek,B.A.,Greenwald,A.G.,Banaji,M.R.2005.Understanding and using the Implicit Association Test:II.Method variables and construct validity.Personality and Social Psychology Bulletin,31(2),166-180.
[13]Cambria,E.,Poria,S.,Gelbukh,A.,&Thelwall,M.2017.Sentiment analysis is a big suitcase. IEEE Intelligent Systems,32(6),74-80.
[14]Mohammad Tubishat,Norisma Idris,Mohammad A.M.Abushariah,2018.Implicit aspect extraction in sentiment analysis,Information Processing and Management:an International Journal, v.54 n.4,p.545-563.
[15]Khanna,B.,Moses,S.,&Nirmala,M.2018.SoftMax based User Attitude Detection Algorithm for Sentimental Analysis.Procedia Computer Science,125,313-320.
[16]Chaturvedi,I.,Cambria,E.,Welsch,R.E.,&Herrera,F.(2018).Distinguishing between facts and opinions for sentiment analysis:Survey and challenges.Information Fusion,44,65-77.
[17]Wagner,C.,Garcia,D.,Jadidi,M.,&Strohmaier,M.2015.It's a Man's WikipediaAssessing Gender Inequality in an Online Encyclopedia.In ICWSM(pp.454-463).
[18]Hube,C.2017.Bias in Wikipedia.In Proceedings of the 26th International Conference on World Wide Web Companion(pp.717-721).
[19]Christoph Hube and Besnik Fetahu.2018.Detecting Biased Statements in Wikipedia.In Companion Proceedings of the The Web Conference 2018(WWW'18).1779-1786.
[20]Poria,S.,Cambria,E.,Gelbukh,A.2016.Aspect extraction for opinion mining with a deep convolutional neural network.Knowledge-Based Systems,108,42-49.
[21]
Figure BDA0002826926520000044
I.,
Figure BDA0002826926520000045
M.,
Figure BDA0002826926520000046
J.2016.Multilingual Twitter sentiment classification:The role of human annotators.PloS one,11(5),e0155036.
[22]Caliskan,A.,Bryson,J.J.,Narayanan,A.2017.Semantics derived automatically from language corpora contain human-like biases.Science,356(6334),183-186.
[23]Jentzsch,S.;Schramowski,P.;Rothkopf,C.A.;and Kersting,K.2019.Semantics derived automatically from language corpora contain human-like moral choices.In Proceedings of the 2019 AAAI/ACM Conference on AI,Ethics,and Societ
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a method for analyzing the implicit discrete fields in a group based on word vectors and Bert.
The purpose of the invention is realized by the following technical scheme:
a hidden-in-group-based method for analyzing the hidden-in-group ground based on word vectors and Bert trains a Bert model through a text corpus and performs hidden-in-group ground analysis by combining sentence vectors; the system comprises a data analysis module, a model training module and an implicit vertical analysis module;
the data analysis module is used for analyzing and extracting the speech data published by the users in the social group and classifying the speech of the users according to the target words and the attribute words mentioned in the implicit association test; sentence segmentation is carried out on the extracted text to obtain a statement set A, and then a sentence set B containing both target words and attribute words and a set C not containing the target words and the attribute words are extracted;
the model training module is used for constructing a model for learning the language big data text prejudice of the social group; acquiring an embedded vector of each sentence based on a set obtained by the data analysis module according to the target words and the attribute words;
the implicit standpoint analysis module measures the relationship between the corresponding target words and the attribute words according to the distance between the embedded vectors of the sentences, so that the implicit standpoint attitude of the social group users is quantified.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects: the method can quantify the encoding influence of the AI model inheriting the prejudice of people on the real sentence, and can accurately quantify the attitude analysis from two angles of external display and internal hiding because the method is independent of the model due to corpus-level training.
The invention distinguishes between macroscopic and microscopic on a sentence-level basis. By adopting a corpus-level method, the influence of the AI model on the coding of the real sentence after inheriting the prejudice of people can be more fully researched by researching on a macro level and a micro level independently from the model. The invention also performs static and dynamic analysis on the group interior attitude. The implicit vertical field analysis method also provides a thought for analyzing the influence of implicit bias on the integrity of explicit expression.
Drawings
Fig. 1 and 2 are schematic diagrams of the overall framework of the method of the invention.
Dynamic evolution of hidden attitude within the population of figures 3a to 3d over wikipedia and twitter.
FIG. 4 is a dynamic evolution diagram of implicit attitude deviation of a population versus occupational gender.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The method for analyzing the hidden vertical field in the group based on the word vector and the Bert comprises three modules: the system comprises a data analysis module, a model training module and a vertical analysis module.
1. Data analysis module
The data analysis module is used for analyzing and extracting the speech data published by the users in the social group, and the speech classification of the wikipedia and the twitter users is carried out according to the target words and the attribute words mentioned in the implicit association test in the embodiment. And carrying out sentence segmentation on the extracted text to obtain a statement set A, and then extracting a sentence set B containing the target words and the attribute words and a set C not containing the target words and the attribute words. Here we use the wikipediae xtractor to extract text, the python toolkit space to perform word segmentation, and the Stanford syntax analysis tool to analyze sentence structure.
In the word embedding association test, GloVe and word2vec show the implicit standpoint of social groups. In the invention, the word level implicit association test is extended to the sentence level implicit association test. To measure the implicit standpoint of the sentence encoder for the social group, the method creates a test whose target concept is the names of african-american and european-american recognitions. Its attributes are composed of terms describing african americans and european americans, and sentence versions of the attributes and target concept terms are inserted in the sentence template. Specifically, each corresponding word in The example vocabulary set is placed into several semantically ambiguous sentence templates, such as { "This is a rose", "A rose is her", "This will be a rose", "The rose are hairs", … … }. Each corresponding vocabulary in the attribute vocabulary set is placed into several semantically ambiguous sentence templates, such as { "the re is love", "That is happy", "This is a friend", "the y are evenl", … … }.
2. Model training plate
And a model training plate, wherein the model training plate adopts a multi-layer bidirectional Transformer encoder Bert model, L is expressed as the number of layers, H is expressed as the number of hidden layers, and A is expressed as the number of self-attack heads. In all cases, the number of feedforward (output of full link layer) is set to 4H, and when H is 768, the number of feedforward is 3072, and the model architecture used is as follows:
BERTbase:L=12,H=768,A=12,TotalParameters=110M
3. hidden vertical field analysis module
The method measures the explicit attitude and the implicit attitude of the group to the target word respectively by using a sentence set obtained by analyzing the data by using a model.
As shown in fig. 1 and fig. 2, for big data words in wikipedia and twitter, a sentence set a is obtained through data analysis, and a sentence set B containing both target words and attribute words is extracted. The invention first classifies the words according to the concepts and the words of the attributes contained in the sentences, and gives a pair of concepts CiAnd CjAnd a pair of attributes DpAnd DqDefinition of CwiAnd CwjAre respectively concept CiAnd CjThe concept vocabulary set of DwpAnd DwqAre respectively an attribute DpAnd DqThe attribute vocabulary set of (2) contains the concept CiOr CjThe concept vocabulary and the attribute D ofpOr DqThe attribute vocabulary of (a) is set B. For example, consider two sets of target words (e.g., programmer, engineer, scientist, and nurse, teacher, librarian, etc.) and two sets of attribute words (e.g., male, female). The original assumption is that there is no difference in the relative similarity between the two sets of target words and the two sets of attribute words. In the form, let X and Y be two groups of equal-size target words, A two groups of attribute words, and B two groups of attribute words. Let cos (a; b) denote the cosine of the angle between vectors a and b.
Specifically, the attitude deviation is calculated by the following formula:
Figure BDA0002826926520000061
where s (a, C1, C2) represents the degree of association between w and the attribute word.
Figure BDA0002826926520000062
Wherein e.size represents the concept of population pair CiOr CjAnd attribute DpOr DqThe correlation size of (1) is the attitude deviation.
In the part of calculating attitude deviation, in order to ensure the scientificity of the invention, a significance test is also set. Wherein:
p=Pri[s(Xi,Yi,A,B)>s(X,Y,A,B)] (3)
for model training, the Bert model trained based on set A is recorded as BertA(ii) a And marking a Bert model trained based on the set A-B as BertA-B(ii) a We use Bert separatelyAAnd BertA-BAnd (4) carrying out coding calculation on the sentence template in the step (1).
In The interior attitude calculation, sentences of 'target words + fictional words + attribute words', such as 'The Black man is good/bad' and 'The man is good/bad for math', are manually constructed and are called 'positive/negative manual combination sentences'. Sentences containing both target words and positive/negative attribute words in the real corpus are called positive/negative real combination sentences, and sentences of 'attribute words + fictional words' are artificially constructed and are called 'positive/negative artificial attribute sentences'. The code called Bert _ A-B for the positive artificial combination sentence is 'Bert 1', the code called Bert _ A-B for the negative artificial combination sentence is 'Bert 2', the code called Bert _ A-B for the positive negative artificial combination sentence is 'Bert 1.2', the code called Bert _ A-B for the positive real combination sentence is 'Bert 3', the code called Bert _ A-B for the negative real combination sentence is 'Bert 4', the code called Bert _ A-B for the positive negative real combination sentence is 'Bert 3.4', the code called Bert _ A for the positive artificial combination sentence is 'Bert 5', the code called Bert _ A for the negative artificial combination sentence is 'Bert 6', the code called Bert _ A for the positive negative artificial combination sentence is 'Bert 5.6', the code called Bert _ A for the positive artificial combination sentence is 'Bert 7', the code called Bert _ A for the negative true combined sentence is 'Bert 8', the code called Bert _ A for the positive negative true combined sentence is 'Bert 7.8', the code called Bert _ A-B for the positive artificial attribute sentence is 'Bert 9', the code called Bert _ A-B for the negative artificial attribute sentence is 'Bert 10', the code called Bert _ A for the positive artificial attribute sentence is 'Bert 11', and the code called Bert _ A for the negative artificial attribute sentence is 'Bert 12'.
In order to illustrate that the model with implicit bias (Bert _ A-B) and the model with implicit hybrid bias (Bert _ A) have obvious difference in the attitude involved by the attribute words when the attitude expression sentences of the artificial structure are coded. This example is designed for the following experiment:
ImplicitBias.Size1=S(Bert1.2,Bert9,Bert10)-S(Bert5.6,Bert11,Bert12) (4)
in order to illustrate that the model with implicit bias (Bert _ A-B) and the model with explicit implicit mixed bias (Bert _ A) have obvious difference in the attitude involved by the attribute words when the real attitude expression sentences are coded. This example designs the following experiment:
ImplicitBias.Size2=S(Bert3.4,Bert9,Beer10)-S(Bert7.8,Bert11,Bert12) (5)
specifically, the experimental material of this embodiment has 2 related data sets, and these types of data sets are widely used in related research, which are: twitter data sets and wikipedia data sets. The invention analyzes the attitude and the attitude evolution of the hidden ground in the group.
1. Macroscopic computation of hidden vertical attitude within a cluster
In the invention, the word level implicit association test is extended to the sentence level implicit association test.
Specifically, each corresponding word in The example vocabulary set is placed into several semantically ambiguous sentence templates, such as { "This is a rose", "goose is her", "This will be a rose", "The rose are hairs", … … }. Each corresponding vocabulary in the attribute vocabulary set is placed into several semantically ambiguous sentence templates, such as { "the re is love", "That is happy", "This is a friend", "the y are evenl", … … }. And (5) coding the sentences through the trained Bert model, and calculating the relation between sentence vectors.
TABLE 1
Figure BDA0002826926520000071
Figure BDA0002826926520000081
Table 1 shows the differences between the wikipedia and twitter corpus based implicit association test at the sentence level and the word embedding level based implicit association test. d represents the effect magnitude. p represents a hypothetical test value. Wherein the first column is in turn: the target word: flowers and insects, musical instruments and weapons, european and african american names, male and female names, mathematics and art, science and art, mental and physical ailments, young and old names. The second column is sequentially: attribute words: pleasant and unpleasant, cause and home, male and female, temporary and permanent, pleasant and unpleasant.
2. Microcosmic calculation of hidden vertical attitude in group
Although the bias of people on target words is reflected on sentence-level coding in the macroscopic calculation of the hidden attitude in the group, the influence of an AI model on the coding of a real sentence after inheriting the explicit/implicit bias of people is not researched. Especially those used to express the encoding of a sentence for a particular object (i.e., a "target word + attribute word" sentence). Whether the latter is biased in the encoding means how well the AI model understands the attitudes expressing sentences and how well the human reader understands the sentences. For those sentences that are expressed in the exons that are not considered as authentic, it is difficult for human readers to judge the integrity without rich background. Learning a biased AI model does so to some extent. To investigate this idea, experiment 2 was performed.
TABLE 2
Figure BDA0002826926520000091
Table 2 shows the differences between the Bert model encoding the artificial combined sentences and the real corpus sentences. The DA represents the coding deviation of the model for the artificial combination sentences and the artificial attribute sentences. PA represents a hypothetical test value for DA. The DB represents the coding deviation of the model to the real combined sentence and the artificial attribute sentence. PB denotes a hypothetical test value of DB. Wherein the first column is in turn: the target word: flowers and insects, musical instruments and weapons, european and african american names, male and female names, mathematics and art, science and art, mental and physical ailments, young and old names. The second column is sequentially: attribute words: pleasant and unpleasant, cause and home, male and female, temporary and permanent, pleasant and unpleasant.
3. Dynamic analysis of hidden standings within a population
In the present embodiment, in order to study the evolution of implicit attitudes in social groups, corpus data is divided into months, and a total of 36 months of data is included. And respectively calculating the implicit attitude. As shown in fig. 3a to 3 d.
From the above analysis results, it is found that the difference between the implicit attitude deviation and the explicit attitude deviation is more significant in the social concept (african-american-european-american) than in the societal concept (flower-worm, musical instrument-weapon). There are significant differences in the implicit and extrinsic attitude deviations, and these differences are consistent with classical psychological experimental reports: flowers are more active than insects; musical instruments are more aggressive than weapons; the names of european americans are more recent than those of african americans with positive attributes associated with them. Implicit discrete field analysis also shows that some people express a certain attribute attitude of the target in an explicit way, and the true attitude is opposite in the reality. In the dynamic change of the hidden attitude within the last 3 years, the visible prejudice always exists, and is suddenly high and suddenly low, but the whole is stable. Figure 4 also illustrates that society has an increasing acceptance of women and that women have an increasing acceptance rate in the industry. The invention distinguishes between macroscopic and microscopic on a sentence-level basis. A corpus-level method is adopted, the model is independent, the influence of the AI model on the coding of the real sentence after inheriting the prejudice of people can be more fully researched in two levels of macro and micro. The invention also performs static and dynamic analysis on the group interior attitude. The implicit vertical field analysis method also provides a thought for analyzing the influence of implicit bias on the integrity of explicit expression.
The present invention is not limited to the above-described embodiments. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the specific embodiments described above are merely illustrative and not restrictive. Those skilled in the art, having the benefit of this disclosure, may effect numerous modifications thereto and changes may be made without departing from the scope of the invention as defined by the claims and their equivalents.

Claims (1)

1. A hidden vertical analysis method in a group based on word vectors and Bert is characterized in that a Bert model is trained through a text corpus, and hidden vertical analysis in the group is carried out by combining sentence vectors; the system comprises a data analysis module, a model training module and an implicit vertical analysis module;
the data analysis module is used for analyzing and extracting the speech data published by the users in the social group and classifying the speech of the users according to the target words and the attribute words mentioned in the implicit association test; sentence segmentation is carried out on the extracted text to obtain a statement set A, and then a sentence set B containing both target words and attribute words and a set C not containing the target words and the attribute words are extracted;
the model training module is used for constructing a model for learning the language big data text prejudice of the social group; acquiring an embedded vector of each sentence based on a set obtained by a data analysis module according to the target words and the attribute words;
the implicit standpoint analysis module measures the relationship between the corresponding target words and the attribute words according to the distance between the embedded vectors of the sentences, so that the implicit standpoint attitude of the social group users is quantified.
CN202011451101.2A 2020-12-09 2020-12-09 Group hidden-in-place analysis method based on word vectors and Bert Active CN112836486B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011451101.2A CN112836486B (en) 2020-12-09 2020-12-09 Group hidden-in-place analysis method based on word vectors and Bert

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011451101.2A CN112836486B (en) 2020-12-09 2020-12-09 Group hidden-in-place analysis method based on word vectors and Bert

Publications (2)

Publication Number Publication Date
CN112836486A true CN112836486A (en) 2021-05-25
CN112836486B CN112836486B (en) 2022-06-03

Family

ID=75923517

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011451101.2A Active CN112836486B (en) 2020-12-09 2020-12-09 Group hidden-in-place analysis method based on word vectors and Bert

Country Status (1)

Country Link
CN (1) CN112836486B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110781391A (en) * 2019-10-22 2020-02-11 腾讯科技(深圳)有限公司 Information recommendation method, device, equipment and storage medium
CN110852062A (en) * 2019-10-17 2020-02-28 天津大学 Method for automatically measuring group external attitude and internal attitude by using speech information
US20200175119A1 (en) * 2018-12-04 2020-06-04 Electronics And Telecommunications Research Institute Sentence embedding method and apparatus based on subword embedding and skip-thoughts
CN111753044A (en) * 2020-06-29 2020-10-09 浙江工业大学 Regularization-based language model for removing social bias and application
CN111966917A (en) * 2020-07-10 2020-11-20 电子科技大学 Event detection and summarization method based on pre-training language model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200175119A1 (en) * 2018-12-04 2020-06-04 Electronics And Telecommunications Research Institute Sentence embedding method and apparatus based on subword embedding and skip-thoughts
CN110852062A (en) * 2019-10-17 2020-02-28 天津大学 Method for automatically measuring group external attitude and internal attitude by using speech information
CN110781391A (en) * 2019-10-22 2020-02-11 腾讯科技(深圳)有限公司 Information recommendation method, device, equipment and storage medium
CN111753044A (en) * 2020-06-29 2020-10-09 浙江工业大学 Regularization-based language model for removing social bias and application
CN111966917A (en) * 2020-07-10 2020-11-20 电子科技大学 Event detection and summarization method based on pre-training language model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吴胜涛等: "正义动机的他人凸显效应:基于词嵌入联想测验的证据", 《科学通报》 *

Also Published As

Publication number Publication date
CN112836486B (en) 2022-06-03

Similar Documents

Publication Publication Date Title
Ramat Grammaticalization processes in the area of temporal and modal relations
Adger et al. Variation in English syntax: theoretical implications
CN110717018A (en) Industrial equipment fault maintenance question-answering system based on knowledge graph
El Gohary et al. A computational approach for analyzing and detecting emotions in Arabic text
Zad et al. A survey of deep learning methods on semantic similarity and sentence modeling
Madabushi et al. CxGBERT: BERT meets construction grammar
CN108874896A (en) A kind of humorous recognition methods based on neural network and humorous feature
CN112527968A (en) Composition review method and system based on neural network
CN112417161A (en) Method and storage device for recognizing upper and lower relationships of knowledge graph based on mode expansion and BERT classification
Gurevych et al. Semantic coherence scoring using an ontology
Alès et al. A methodology to design human-like embodied conversational agents
Leão et al. Extending WordNet with UFO foundational ontology
CN112836486B (en) Group hidden-in-place analysis method based on word vectors and Bert
Bayat The Impact of Ellipses on Reading Comprehension.
Behzadi Natural language processing and machine learning: A review
Deshors A multifactorial approach to linguistic structure in L2 spoken and written registers
CN115080690A (en) NLP-based automatic correction method and system for test paper text
Kehler Coherence establishment as a source of explanation in linguistic theory
Yu et al. Extraction of implicit quantity relations for arithmetic word problems in chinese
Yang et al. Detecting senior executives’ personalities for predicting corporate behaviors: an attention-based deep learning approach
Lee Natural Language Processing: A Textbook with Python Implementation
Sun The application of improving machine learning algorithm and voice technology in the teaching evaluation of ideological and political education
Verbeke et al. Differential subject marking in Nepali imperfective constructions: A probabilistic grammar approach
Almog Would you believe that?
Peng et al. Readability assessment for Chinese L2 sentences: an extended knowledge base and comprehensive evaluation model-based method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant