CN110275953B - Personality classification method and apparatus - Google Patents
Personality classification method and apparatus Download PDFInfo
- Publication number
- CN110275953B CN110275953B CN201910540702.1A CN201910540702A CN110275953B CN 110275953 B CN110275953 B CN 110275953B CN 201910540702 A CN201910540702 A CN 201910540702A CN 110275953 B CN110275953 B CN 110275953B
- Authority
- CN
- China
- Prior art keywords
- personality
- neural network
- recurrent neural
- preset
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000000306 recurrent effect Effects 0.000 claims abstract description 77
- 238000013528 artificial neural network Methods 0.000 claims abstract description 76
- 239000013598 vector Substances 0.000 claims abstract description 75
- 238000012360 testing method Methods 0.000 claims abstract description 44
- 238000012549 training Methods 0.000 claims description 55
- 238000012545 processing Methods 0.000 claims description 37
- 230000011218 segmentation Effects 0.000 claims description 18
- 230000002457 bidirectional effect Effects 0.000 claims description 14
- 210000002569 neuron Anatomy 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 230000000295 complement effect Effects 0.000 claims 2
- 150000001875 compounds Chemical class 0.000 claims 2
- 230000008569 process Effects 0.000 abstract description 12
- 238000003062 neural network model Methods 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 4
- 206010029216 Nervousness Diseases 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 206010011469 Crying Diseases 0.000 description 1
- 206010027940 Mood altered Diseases 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000003304 psychophysiological effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The personality classification method and device provided by the application obtain a test text to be analyzed, preprocess the test text to enable the test text to be converted into word vectors capable of being processed by a neural network model, and input the word vectors into a recurrent neural network. After data output by a preset network layer in the recurrent neural network is spliced with data in a personality correlation coefficient table, the data is input into a classification layer to obtain a personality classification result of an author corresponding to the test text, and the personality correlation coefficient table records preset correlation degrees among different personality traits. Therefore, in the process of analyzing the test text through the recurrent neural network, the predicted personality classification result is more accurate by combining the preset correlation degree among different personality traits.
Description
Technical Field
The application relates to the field of data processing, in particular to a personality classification method and device.
Background
Personality is the dynamic organization of the intrinsic psychophysiological system of an individual that determines a person's unique adaptation to his environment. There are many personality structural models in personality psychology, and among many personality structural models, the five personality models in personality genre are widely used because of their advantages of stability, measurability, high reliability, wide application range, and the like. The five personality models comprise five personality traits, namely openness, accountability, extroversion, hommization and nervousness, and the personality type of each person can be determined by the five personality traits.
The personality classification aiming at the text data is to obtain the personality type of the author corresponding to the tested text content by analyzing the text content, such as blogs or prose. In the personality classification, each personality is generally divided into two types, i.e., a high personality and a low personality, according to a certain threshold, wherein the threshold may be an average score, a median number, and the like of the personality. At present, a common method is to establish a binary classification model for each personality, but the method ignores the correlation among personality traits, and results in low classification accuracy.
Disclosure of Invention
In order to overcome at least one of the deficiencies in the prior art, an object of the present application is to provide a personality classification method applied to a data processing device, wherein the data processing device is preset with a trained recurrent neural network, the trained recurrent neural network comprises a feature extraction layer, a classification layer and a personality correlation coefficient table recording preset correlation degrees between different personality traits, the method comprising:
acquiring a word vector of a test text;
inputting the word vector into the recurrent neural network;
and splicing the data output by a preset network layer in the recurrent neural network with the data in the personality correlation coefficient table, and inputting the data into the classification layer to obtain the personality type of the author corresponding to the test text.
Optionally, the recurrent neural network is a bidirectional recurrent neural network.
Optionally, the method further comprises:
aiming at a currently input word vector, obtaining an upper feature vector and a lower feature vector of the currently input word vector through the bidirectional recurrent neural network;
and splicing the currently input word vector, the above feature vector and the below feature vector into a new feature vector.
Optionally, the preset network layer is a maximum pooling layer.
Optionally, the preset degree of correlation between different personality traits is obtained by calculating pearson correlation between the personalities.
Optionally, before obtaining the word vector of the test text, the method further includes:
performing word segmentation processing on the test text to obtain a corresponding word segmentation result;
and processing the word segmentation result through a word vector conversion tool to obtain the word vector.
Optionally, the method further comprises a training step of the recurrent neural network:
acquiring a word vector corresponding to a training text, wherein the word vector of the training text is marked with a plurality of personality trait labels;
and inputting the word vector of the training text into the recurrent neural network based on a preset loss function, and iteratively adjusting the weight of the recurrent neural network through a back propagation algorithm until the output value of the loss function is smaller than a preset threshold value.
Another objective of the embodiments of the present application is to provide a personality classification device, which is applied to a data processing device, where the data processing device is preset with a trained recurrent neural network, the trained recurrent neural network includes a feature extraction layer, a classification layer, and a personality correlation coefficient table recording preset correlation degrees between different personality traits, and the personality classification device includes an obtaining module, an input module, and a classification module;
the acquisition module is used for acquiring word vectors of the test texts;
the input module is used for inputting the word vector into the recurrent neural network;
and the classification module is used for splicing data output by a preset network layer in the recurrent neural network with data in the personality correlation coefficient table and inputting the data into the classification layer to obtain the personality type of the author corresponding to the test text.
Optionally, the personality classification device further includes a training module, and the training module trains the recurrent neural network by:
acquiring a word vector corresponding to a training text, wherein the word vector of the training text is marked with a plurality of personality trait labels;
and inputting the word vector of the training text into the recurrent neural network based on a preset loss function, and iteratively adjusting the weight of the recurrent neural network through a back propagation algorithm until the output value of the loss function is smaller than a preset threshold value.
Optionally, the recurrent neural network is a bidirectional recurrent neural network.
Compared with the prior art, the method has the following beneficial effects:
the personality classification method and device provided by the embodiment of the application acquire the test text to be analyzed, preprocess the test text to enable the test text to be converted into word vectors capable of being processed by a neural network model, and input the word vectors into a recurrent neural network. After data output by a preset network layer in the recurrent neural network is spliced with data in a personality correlation coefficient table, the data is input into a classification layer to obtain a personality classification result of an author corresponding to the test text, and the personality correlation coefficient table records preset correlation degrees among different personality traits. Therefore, in the process of analyzing the test text through the recurrent neural network, the predicted personality classification result is more accurate by combining the preset correlation degree among different personality traits.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a hardware configuration diagram of a data processing device according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating steps of a personality classification method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a framework of a recurrent neural network provided in an embodiment of the present application;
fig. 4 is a personality correlation coefficient table provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a personality classification device according to an embodiment of the present application;
fig. 6 is a second schematic structural diagram of a personality classification device according to an embodiment of the present application.
Icon: 100-a data processing device; 130-a processor; 110-personality classification means; 120-a memory; 501-recursive layers; 502-pooling layer; 503-full connection layer; 504-softmax layer; 505-personality correlation coefficient table; 1101-an acquisition module; 1102-an input module; 1103-classification module; 1104-training module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Referring to fig. 1, fig. 1 is a hardware structure diagram of a data processing device 100 according to an embodiment of the present disclosure, where the data processing device 100 includes a processor 130, a memory 120, and a personality classification apparatus 110.
The elements of the memory 120 and the processor 130 are electrically connected to each other, directly or indirectly, to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The personality classification device 110 includes at least one software functional module that may be stored in the memory 120 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the data processing apparatus 100. The processor 130 is used for executing executable modules stored in the memory 120, such as software functional modules and computer programs included in the personality classification device 110.
The Memory 120 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 120 is used for storing a program, and the processor 130 executes the program after receiving the execution instruction.
The processor 130 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The data processing device 100 may be, but is not limited to, a smart phone, a Personal Computer (PC), a tablet PC, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), and the like.
Referring to fig. 2, fig. 2 is a flowchart of a personality classification method applied to the data processing apparatus 100 shown in fig. 1, where the data processing apparatus 100 is preset with a trained recurrent neural network. Referring to fig. 3, fig. 3 is a network structure diagram of a recurrent neural network according to an embodiment of the present disclosure. The recurrent neural network comprises a feature extraction layer, a classification layer and a personality correlation coefficient table 505 recorded with preset correlation degrees among different personality traits; wherein the classification layer comprises a full connection layer 503 and a softmax layer 504; the feature extraction layer includes a recursive layer 501 and a pooling layer 502. The method including the respective steps will be described in detail below.
And S100, acquiring a word vector of the text to be tested.
Alternatively, the data processing device 100 may obtain a large amount of test text from a local or network, and the test text may be a blog, a prose, a diary, a composition, or the like. Before inputting the test text into the recurrent neural network, the test text needs to be preprocessed so that the test text is converted into word vectors that the recurrent neural network can process.
For example, in one possible example, the data processing apparatus 100 performs a word segmentation process on the test text through a vocabulary or a dictionary. It should be noted that if the test text is text data of chinese type, the word is distinguished by a space as in chinese and english. Therefore, the data processing apparatus 100 first needs to perform word segmentation processing on the test text before processing the test text. The quality of the word segmentation process often affects the analysis result of the test text.
For example, for "today weather is very good! The word segmentation is carried out, good word segmentation results are 'today', 'weather' and 'true and good', and poor word segmentation results are 'today', 'day', 'qi' and 'true and good'. It can be seen that different word segmentation results bring completely different semantic meanings.
The data processing device 100 one-hot encodes the test text after the word segmentation, i.e. how many states have how many corresponding bits. For example, for "today's weather is really good! The word segmentation result of the 'today', 'weather' and 'true' is subjected to one-hot coding, and the word segmentation result corresponds to 3 states, and therefore corresponds to 3 bits. The "today" coding result is "100"; the encoding result of "weather" is "010"; the "true" coding result is "001".
The data processing apparatus 100 obtains the word vectors of the test text by looking up a pre-trained word vector table. For example, the word vector for "today" is [0.2, 0.3 ]; the word vector for "weather" is [0.4, 0.8], and the word vector for "true good" is [0.5, 0.9 ]. The pre-trained word vector table is obtained by training in an external corpus by using a skip-gram in the word2 vec.
And step S200, inputting the word vector into the recurrent neural network.
Step S300, splicing the data output by a preset network layer in the recurrent neural network with the data in the personality related coefficient table 505, and inputting the classification layer to obtain the personality type of the author corresponding to the test text.
Alternatively, it is worth to be noted that, the semantic information of a certain position in the text data is often related to the context of the text data, so the recurrent neural network in this embodiment may be a bidirectional recurrent neural network. The Recurrent Neural network is a Bidirectional Recurrent Neural Network (BRNN) and can well process the context information in the text data.
For example, "my phone is bad, i intend () a new phone", predict words that should be filled in brackets, from information in the front of brackets that can be filled in "buy" and "repair", or bad phone that causes a bad mood, brackets that can be filled in "crying," "walking around," and "eating a lot". But if the information behind the brackets is taken into account, the probability of filling in "buy" in the brackets is greater.
Based on the idea, the output of the bidirectional cyclic neural network at the current time i not only depends on the input at the previous time i-1 in the sequence, but also depends on the input at the subsequent time i + 1. Aiming at the currently input word vector, the bidirectional recurrent neural network extracts an upper feature vector and a lower feature vector of the current word vector, wherein the upper feature vector cl(wi) Can be expressed as:
cl(wi)=f(W(l)cl(wi-1)+W(sl)e(wi-1));
wherein, cl(wi-1) The forward output of the recurrent layer 501 of the bidirectional recurrent neural network at time i-1, e (w)i-1) Word vector, W, of the bidirectional recurrent neural network at time i-1(l)And W(sl)Respectively, their corresponding weights.
Context feature vector cr(wi) Can be expressed as:
cr(wi)=f(W(r)cr(wi+1)+W(sr)e(wi+1));
wherein, cr(wi+1) The backward output, e (w), of the recurrent layer 501 of the bidirectional recurrent neural network for time i +1i+1) Word vector, W, of the bidirectional recurrent neural network at time i +1(r)And W(sr)Respectively, the weights corresponding thereto. The data processing device 100 splices the above feature vector and the below feature vector to obtain the current semantic feature xi:
xi=[cl(wi);e(wi);cr(wi)];
WhIs xiWeight of (a), bhIs xiIs an activation function of the hidden layer, and the calculation formula is:
optionally, the preset network layer is a maximum pooling layer, and the data processing apparatus 100 processes the output characteristics of the recursive layer 501 through the maximum pooling layer to obtain ypool. The calculation method is as follows:
will ypoolThe data in the personality-related coefficient table 505 are spliced to obtain xfAnd then input into the fully-connected layer 503, in the following calculation manner:
xf=[ypool;r];
yf=Wfxf+bf;
where r is a preset correlation degree between different personality traits recorded in the personality correlation coefficient table 505, and the preset correlation degree between different personality traits is obtained by calculating pearson correlation between the personalities. WfIs xfWeight matrix of bfIs xfOffset of (a) yfIs the output of the fully connected layer 503.
For example, in one possible example, a personality correlation coefficient table 505 representing the degree of correlation between patency, accountability, hommization, extroversion, and neural matter is shown in FIG. 4. The degree of correlation between the personality is determined by the "correlation coefficient" and "significance". In the figure, "double tail" represents a measurement standard, and a corresponding measurement standard also has "single tail". In the "two-tailed" metric, a significance level of less than 0.05 greater than 0.01 is indicated by one ". sup.", and a significance level of less than 0.01 is indicated by two ". sup.". As shown in fig. 4, the correlation coefficient between "openness" and "accountability" is 0.29, and the significance between the two is less than 0.05 and more than 0.01.
The data processing device 100 converts the correlation coefficient and the significance parameter in the personality correlation coefficient table 505 into a column vector, and processes the column vector with the maximum pooling layer to obtain ypoolAnd splicing, and inputting a full connection layer 503 in the classification layer to obtain a classification result of the personality type of the author corresponding to the test text. Wherein, the output of the fully-connected layer 503 is connected with the softmax layer 504, and the data processing device 100 normalizes the data output by the fully-connected layer 503 through the softmax layer 504 to obtain the probability that each trait of the corresponding author of the test text belongs to the high trait and the probability threshold thereof.
For example, in one possible example, the output of the softmax layer has 10 outputs in total, a personality probability divided into 5 personality traits, and a probability threshold corresponding to the 5 personality traits. Wherein each personality trait can be divided into high trait and low trait, for example, extroversion can be divided into high extroversion and low extroversion. If the personality probability of one personality is more than or equal to the corresponding probability threshold value, the personality trait is high trait; if the personality trait is less than the corresponding probability threshold, the personality trait is a low trait. The calculation mode of softmax is as follows:
if softmax output is {0.05, 0.1, 0.16, 0.13, 0.06, 0.11, 0.04, 0.09, 0.14, 0.12} where 0.05 is the probability that the author has high openness, 0.1 is the threshold for high openness, 0.05<0.1 so the author has no high openness, i.e., the author has low openness; 0.16 is the probability that the author has high liability, 0.16>0.13, so the author has high liability.
Optionally, an embodiment of the present application further provides a training method for the recurrent neural network, where the training method includes:
the data processing apparatus 100 acquires a word vector corresponding to a training text, the word vector of the training text being labeled with a plurality of personality trait labels. Before obtaining the word vector of the training text, preprocessing the training text is required to obtain the word vector of the training text. The preprocessing method comprises the steps of firstly carrying out word segmentation on the training text, carrying out one-hot coding on data subjected to word segmentation, and then converting the training text in the one-hot coding form into corresponding word vectors by searching a pre-trained word vector table. In the embodiment of the application, the pre-trained word vector table is obtained by training in an external corpus by using a skip-gram in word2 vec.
And inputting the word vector of the training text into the recurrent neural network based on a preset loss function, and iteratively adjusting the weight of the recurrent neural network through a back propagation algorithm until the output value of the loss function is smaller than a preset threshold value. The preset loss function is calculated in the following manner:
wherein the content of the first and second substances,is a training text diThe related personality of (1). For example, if the training text diThe personality of the corresponding author is high conscientity and high nervousness, so the conscientity and nervousness are diThe relevant personality of (1); the remaining three personality traits are then open, outward and preferred personality diThe number of unrelated personality traits,is thatComplement of, i.e. diAre not relevant.Is a training text diThe output of the second neuron of (2),is a training text diThe output probability of the relevant personality trait label,is a training text diThe output probability of an unrelated personality trait label,andthe larger the gap between them, the better,is a training text diAnd the output probability of the threshold value of the related personality trait label 2k is higher than the threshold value, and the larger the output probability is, the better the output probability is.Is a training text dlThreshold of irrelevant personality trait label 2 j.
The embodiment of the application also provides a personality classification device 110. Referring to fig. 5, fig. 5 is a schematic structural diagram of the personality classification device 110, which is applied to the data processing device 100, the data processing device 100 is preset with a trained recurrent neural network, the trained recurrent neural network includes a feature extraction layer, a classification layer, and a personality correlation coefficient table 505 recorded with preset correlation degrees between different personality traits, and the personality classification device 110 includes an obtaining module 1101, an input module 1102, and a classification module 1103.
The obtaining module 1101 is configured to obtain a word vector of the test text.
In the embodiment of the present application, the obtaining module 1101 is configured to perform step S100 in fig. 2, and for a detailed description of the obtaining module 1101, reference may be made to a detailed description of step S100.
The input module 1102 is configured to input the word vector into the recurrent neural network.
In the embodiment of the present application, the input module 1102 is configured to perform step S200 in fig. 2, and reference may be made to the detailed description of step S200 for a detailed description of the input module 1102.
The classification module 1103 is configured to splice data output by a preset network layer in the recurrent neural network with data in the personality related coefficient table 505, and input the data into the classification layer to obtain the personality type of the author corresponding to the test text.
In the embodiment of the present application, the classification module 1103 is configured to perform step S300 in fig. 2, and reference may be made to the detailed description of step S300 for a detailed description of the classification module 1103.
Referring to fig. 6, the personality classification device 110 further includes a training module 1104, where the training module 1104 trains the recurrent neural network by:
acquiring a word vector corresponding to a training text, wherein the word vector of the training text is marked with a plurality of personality trait labels;
and inputting the word vector of the training text into the recurrent neural network based on a preset loss function, and iteratively adjusting the weight of the recurrent neural network through a back propagation algorithm until the output value of the loss function is smaller than a preset threshold value.
The recurrent neural network may be a bidirectional recurrent neural network.
To sum up, the personality classification method and device provided in the embodiment of the present application obtain a test text to be analyzed, preprocess the test text to convert the test text into word vectors that can be processed by a neural network model, and input the word vectors into a recurrent neural network. After data output by a preset network layer in the recurrent neural network is spliced with data in a personality correlation coefficient table, the data is input into a classification layer to obtain a personality classification result of an author corresponding to the test text, and the personality correlation coefficient table records preset correlation degrees among different personality traits. Therefore, in the process of analyzing the test text through the recurrent neural network, the predicted personality classification result is more accurate by combining the preset correlation degree among different personality traits.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (6)
1. A personality classification method applied to a data processing device, the data processing device being preset with a trained recurrent neural network, the trained recurrent neural network including a feature extraction layer, a classification layer, and a personality correlation coefficient table in which preset correlation degrees between different personality traits are recorded, wherein the preset correlation degrees between different personality traits are obtained by calculating a pearson correlation between personality traits, the method comprising:
acquiring a word vector of a test text;
inputting the word vector into the recurrent neural network;
splicing data output by a preset network layer in the recurrent neural network with data in the personality correlation coefficient table, and inputting the data into the classification layer to obtain the personality type of the author corresponding to the test text;
the method further comprises a training step of the recurrent neural network:
acquiring a word vector corresponding to a training text, wherein the word vector of the training text is marked with a plurality of personality trait labels;
inputting the word vector of the training text into the recurrent neural network based on a preset loss function, and iteratively adjusting the weight of the recurrent neural network through a back propagation algorithm until the output value of the loss function is smaller than a preset threshold value;
wherein the expression of the preset loss function E is:
in the formula (I), the compound is shown in the specification,characterizing the training text diThe relative personality of (a) of (b),is thatThe complement of, characterizing the training text diUnrelated personality of (a);is the training text diThe associated personality trait tag obtains the output probability from the 2k neuron,representing the training text diThe output probability obtained from the 2 j-th neuron by the irrelevant personality trait tag,representing the training text diThe relevant personality trait tag is the threshold corresponding to the output probability obtained from the 2k neuron,representing the training sample diThe irrelevant personality trait labels are threshold values corresponding to the output probability obtained from the 2 j-th neuron.
2. The personality classification method of claim 1, wherein the recurrent neural network is a bidirectional recurrent neural network.
3. The personality classification method of claim 1, wherein the predetermined network layer is a maximum pooling layer.
4. The personality classification method of claim 1, wherein obtaining word vectors for test text further comprises:
performing word segmentation processing on the test text to obtain a corresponding word segmentation result;
and processing the word segmentation result through a word vector conversion tool to obtain the word vector.
5. A personality classification device is applied to data processing equipment, a trained recurrent neural network is preset on the data processing equipment, the trained recurrent neural network comprises a feature extraction layer, a classification layer and a personality correlation coefficient table recorded with preset correlation degrees among different personality traits, wherein the preset correlation degrees among the different personality traits are obtained by calculating the Pearson correlation among the personality traits, and the personality classification device comprises an acquisition module, an input module and a classification module;
the acquisition module is used for acquiring word vectors of the test texts;
the input module is used for inputting the word vector into the recurrent neural network;
the classification module is used for splicing data output by a preset network layer in the recurrent neural network with data in the personality correlation coefficient table and inputting the data into the classification layer to obtain the personality type of the author corresponding to the test text;
the personality classification device also comprises a training module, and the training module trains the recurrent neural network in the following mode:
acquiring a word vector corresponding to a training text, wherein the word vector of the training text is marked with a plurality of personality trait labels;
inputting the word vector of the training text into the recurrent neural network based on a preset loss function, and iteratively adjusting the weight of the recurrent neural network through a back propagation algorithm until the output value of the loss function is smaller than a preset threshold value;
wherein the expression of the preset loss function E is:
in the formula (I), the compound is shown in the specification,characterizing the training text diThe relative personality of (a) of (b),is thatThe complement of, characterizing the training text diUnrelated personality of (a);is the training text diThe associated personality trait tag obtains the output probability from the 2k neuron,representing the training text diThe output probability obtained from the 2 j-th neuron by the irrelevant personality trait tag,representing the training text diThe relevant personality trait tag is the threshold corresponding to the output probability obtained from the 2k neuron,representing the training sample diThe irrelevant personality trait labels are threshold values corresponding to the output probability obtained from the 2 j-th neuron.
6. The personality classification device of claim 5, wherein the recurrent neural network is a bidirectional recurrent neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910540702.1A CN110275953B (en) | 2019-06-21 | 2019-06-21 | Personality classification method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910540702.1A CN110275953B (en) | 2019-06-21 | 2019-06-21 | Personality classification method and apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110275953A CN110275953A (en) | 2019-09-24 |
CN110275953B true CN110275953B (en) | 2021-11-30 |
Family
ID=67961812
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910540702.1A Expired - Fee Related CN110275953B (en) | 2019-06-21 | 2019-06-21 | Personality classification method and apparatus |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110275953B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112487184A (en) * | 2020-11-26 | 2021-03-12 | 北京智源人工智能研究院 | User character judging method and device, memory and electronic equipment |
CN113268740B (en) * | 2021-05-27 | 2022-08-16 | 四川大学 | Input constraint completeness detection method of website system |
CN113221560B (en) * | 2021-05-31 | 2023-04-18 | 平安科技(深圳)有限公司 | Personality trait and emotion prediction method, personality trait and emotion prediction device, computer device, and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451118A (en) * | 2017-07-21 | 2017-12-08 | 西安电子科技大学 | Sentence-level sensibility classification method based on Weakly supervised deep learning |
CN108108354A (en) * | 2017-06-18 | 2018-06-01 | 北京理工大学 | A kind of microblog users gender prediction's method based on deep learning |
US10169656B2 (en) * | 2016-08-29 | 2019-01-01 | Nec Corporation | Video system using dual stage attention based recurrent neural network for future event prediction |
CN109376784A (en) * | 2018-10-29 | 2019-02-22 | 四川大学 | A kind of personality prediction technique and personality prediction meanss |
CN109597891A (en) * | 2018-11-26 | 2019-04-09 | 重庆邮电大学 | Text emotion analysis method based on two-way length Memory Neural Networks in short-term |
CN109829154A (en) * | 2019-01-16 | 2019-05-31 | 中南民族大学 | Semantic-based personality prediction technique, user equipment, storage medium and device |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104615589A (en) * | 2015-02-15 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Named-entity recognition model training method and named-entity recognition method and device |
CN105701460B (en) * | 2016-01-07 | 2019-01-29 | 王跃明 | A kind of basketball goal detection method and apparatus based on video |
CN105975504A (en) * | 2016-04-28 | 2016-09-28 | 中国科学院计算技术研究所 | Recurrent neural network-based social network message burst detection method and system |
US10049103B2 (en) * | 2017-01-17 | 2018-08-14 | Xerox Corporation | Author personality trait recognition from short texts with a deep compositional learning approach |
-
2019
- 2019-06-21 CN CN201910540702.1A patent/CN110275953B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10169656B2 (en) * | 2016-08-29 | 2019-01-01 | Nec Corporation | Video system using dual stage attention based recurrent neural network for future event prediction |
CN108108354A (en) * | 2017-06-18 | 2018-06-01 | 北京理工大学 | A kind of microblog users gender prediction's method based on deep learning |
CN107451118A (en) * | 2017-07-21 | 2017-12-08 | 西安电子科技大学 | Sentence-level sensibility classification method based on Weakly supervised deep learning |
CN109376784A (en) * | 2018-10-29 | 2019-02-22 | 四川大学 | A kind of personality prediction technique and personality prediction meanss |
CN109597891A (en) * | 2018-11-26 | 2019-04-09 | 重庆邮电大学 | Text emotion analysis method based on two-way length Memory Neural Networks in short-term |
CN109829154A (en) * | 2019-01-16 | 2019-05-31 | 中南民族大学 | Semantic-based personality prediction technique, user equipment, storage medium and device |
Also Published As
Publication number | Publication date |
---|---|
CN110275953A (en) | 2019-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992782B (en) | Legal document named entity identification method and device and computer equipment | |
CN109871545B (en) | Named entity identification method and device | |
CN108363790B (en) | Method, device, equipment and storage medium for evaluating comments | |
CN108829822B (en) | Media content recommendation method and device, storage medium and electronic device | |
CN107908635B (en) | Method and device for establishing text classification model and text classification | |
CN109376222B (en) | Question-answer matching degree calculation method, question-answer automatic matching method and device | |
CN111695352A (en) | Grading method and device based on semantic analysis, terminal equipment and storage medium | |
CN109960728B (en) | Method and system for identifying named entities of open domain conference information | |
CN108304373B (en) | Semantic dictionary construction method and device, storage medium and electronic device | |
CN110275953B (en) | Personality classification method and apparatus | |
CN110334186B (en) | Data query method and device, computer equipment and computer readable storage medium | |
CN112632226B (en) | Semantic search method and device based on legal knowledge graph and electronic equipment | |
US10915756B2 (en) | Method and apparatus for determining (raw) video materials for news | |
Singh et al. | HINDIA: a deep-learning-based model for spell-checking of Hindi language | |
CN110852071B (en) | Knowledge point detection method, device, equipment and readable storage medium | |
CN114661872A (en) | Beginner-oriented API self-adaptive recommendation method and system | |
CN114443846A (en) | Classification method and device based on multi-level text abnormal composition and electronic equipment | |
CN112732863A (en) | Standardized segmentation method for electronic medical records | |
CN112182126A (en) | Model training method and device for determining matching degree, electronic equipment and readable storage medium | |
CN107729509B (en) | Discourse similarity determination method based on recessive high-dimensional distributed feature representation | |
CN114417891B (en) | Reply statement determination method and device based on rough semantics and electronic equipment | |
Wakchaure et al. | A scheme of answer selection in community question answering using machine learning techniques | |
CN115577109A (en) | Text classification method and device, electronic equipment and storage medium | |
CN112364666B (en) | Text characterization method and device and computer equipment | |
CN114925175A (en) | Abstract generation method and device based on artificial intelligence, computer equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20211130 |