CN110275953B - Personality classification method and apparatus - Google Patents

Personality classification method and apparatus Download PDF

Info

Publication number
CN110275953B
CN110275953B CN201910540702.1A CN201910540702A CN110275953B CN 110275953 B CN110275953 B CN 110275953B CN 201910540702 A CN201910540702 A CN 201910540702A CN 110275953 B CN110275953 B CN 110275953B
Authority
CN
China
Prior art keywords
personality
neural network
recurrent neural
preset
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910540702.1A
Other languages
Chinese (zh)
Other versions
CN110275953A (en
Inventor
林涛
吴芝明
冯豆豆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201910540702.1A priority Critical patent/CN110275953B/en
Publication of CN110275953A publication Critical patent/CN110275953A/en
Application granted granted Critical
Publication of CN110275953B publication Critical patent/CN110275953B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The personality classification method and device provided by the application obtain a test text to be analyzed, preprocess the test text to enable the test text to be converted into word vectors capable of being processed by a neural network model, and input the word vectors into a recurrent neural network. After data output by a preset network layer in the recurrent neural network is spliced with data in a personality correlation coefficient table, the data is input into a classification layer to obtain a personality classification result of an author corresponding to the test text, and the personality correlation coefficient table records preset correlation degrees among different personality traits. Therefore, in the process of analyzing the test text through the recurrent neural network, the predicted personality classification result is more accurate by combining the preset correlation degree among different personality traits.

Description

Personality classification method and apparatus
Technical Field
The application relates to the field of data processing, in particular to a personality classification method and device.
Background
Personality is the dynamic organization of the intrinsic psychophysiological system of an individual that determines a person's unique adaptation to his environment. There are many personality structural models in personality psychology, and among many personality structural models, the five personality models in personality genre are widely used because of their advantages of stability, measurability, high reliability, wide application range, and the like. The five personality models comprise five personality traits, namely openness, accountability, extroversion, hommization and nervousness, and the personality type of each person can be determined by the five personality traits.
The personality classification aiming at the text data is to obtain the personality type of the author corresponding to the tested text content by analyzing the text content, such as blogs or prose. In the personality classification, each personality is generally divided into two types, i.e., a high personality and a low personality, according to a certain threshold, wherein the threshold may be an average score, a median number, and the like of the personality. At present, a common method is to establish a binary classification model for each personality, but the method ignores the correlation among personality traits, and results in low classification accuracy.
Disclosure of Invention
In order to overcome at least one of the deficiencies in the prior art, an object of the present application is to provide a personality classification method applied to a data processing device, wherein the data processing device is preset with a trained recurrent neural network, the trained recurrent neural network comprises a feature extraction layer, a classification layer and a personality correlation coefficient table recording preset correlation degrees between different personality traits, the method comprising:
acquiring a word vector of a test text;
inputting the word vector into the recurrent neural network;
and splicing the data output by a preset network layer in the recurrent neural network with the data in the personality correlation coefficient table, and inputting the data into the classification layer to obtain the personality type of the author corresponding to the test text.
Optionally, the recurrent neural network is a bidirectional recurrent neural network.
Optionally, the method further comprises:
aiming at a currently input word vector, obtaining an upper feature vector and a lower feature vector of the currently input word vector through the bidirectional recurrent neural network;
and splicing the currently input word vector, the above feature vector and the below feature vector into a new feature vector.
Optionally, the preset network layer is a maximum pooling layer.
Optionally, the preset degree of correlation between different personality traits is obtained by calculating pearson correlation between the personalities.
Optionally, before obtaining the word vector of the test text, the method further includes:
performing word segmentation processing on the test text to obtain a corresponding word segmentation result;
and processing the word segmentation result through a word vector conversion tool to obtain the word vector.
Optionally, the method further comprises a training step of the recurrent neural network:
acquiring a word vector corresponding to a training text, wherein the word vector of the training text is marked with a plurality of personality trait labels;
and inputting the word vector of the training text into the recurrent neural network based on a preset loss function, and iteratively adjusting the weight of the recurrent neural network through a back propagation algorithm until the output value of the loss function is smaller than a preset threshold value.
Another objective of the embodiments of the present application is to provide a personality classification device, which is applied to a data processing device, where the data processing device is preset with a trained recurrent neural network, the trained recurrent neural network includes a feature extraction layer, a classification layer, and a personality correlation coefficient table recording preset correlation degrees between different personality traits, and the personality classification device includes an obtaining module, an input module, and a classification module;
the acquisition module is used for acquiring word vectors of the test texts;
the input module is used for inputting the word vector into the recurrent neural network;
and the classification module is used for splicing data output by a preset network layer in the recurrent neural network with data in the personality correlation coefficient table and inputting the data into the classification layer to obtain the personality type of the author corresponding to the test text.
Optionally, the personality classification device further includes a training module, and the training module trains the recurrent neural network by:
acquiring a word vector corresponding to a training text, wherein the word vector of the training text is marked with a plurality of personality trait labels;
and inputting the word vector of the training text into the recurrent neural network based on a preset loss function, and iteratively adjusting the weight of the recurrent neural network through a back propagation algorithm until the output value of the loss function is smaller than a preset threshold value.
Optionally, the recurrent neural network is a bidirectional recurrent neural network.
Compared with the prior art, the method has the following beneficial effects:
the personality classification method and device provided by the embodiment of the application acquire the test text to be analyzed, preprocess the test text to enable the test text to be converted into word vectors capable of being processed by a neural network model, and input the word vectors into a recurrent neural network. After data output by a preset network layer in the recurrent neural network is spliced with data in a personality correlation coefficient table, the data is input into a classification layer to obtain a personality classification result of an author corresponding to the test text, and the personality correlation coefficient table records preset correlation degrees among different personality traits. Therefore, in the process of analyzing the test text through the recurrent neural network, the predicted personality classification result is more accurate by combining the preset correlation degree among different personality traits.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a hardware configuration diagram of a data processing device according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating steps of a personality classification method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a framework of a recurrent neural network provided in an embodiment of the present application;
fig. 4 is a personality correlation coefficient table provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a personality classification device according to an embodiment of the present application;
fig. 6 is a second schematic structural diagram of a personality classification device according to an embodiment of the present application.
Icon: 100-a data processing device; 130-a processor; 110-personality classification means; 120-a memory; 501-recursive layers; 502-pooling layer; 503-full connection layer; 504-softmax layer; 505-personality correlation coefficient table; 1101-an acquisition module; 1102-an input module; 1103-classification module; 1104-training module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Referring to fig. 1, fig. 1 is a hardware structure diagram of a data processing device 100 according to an embodiment of the present disclosure, where the data processing device 100 includes a processor 130, a memory 120, and a personality classification apparatus 110.
The elements of the memory 120 and the processor 130 are electrically connected to each other, directly or indirectly, to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The personality classification device 110 includes at least one software functional module that may be stored in the memory 120 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the data processing apparatus 100. The processor 130 is used for executing executable modules stored in the memory 120, such as software functional modules and computer programs included in the personality classification device 110.
The Memory 120 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 120 is used for storing a program, and the processor 130 executes the program after receiving the execution instruction.
The processor 130 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The data processing device 100 may be, but is not limited to, a smart phone, a Personal Computer (PC), a tablet PC, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), and the like.
Referring to fig. 2, fig. 2 is a flowchart of a personality classification method applied to the data processing apparatus 100 shown in fig. 1, where the data processing apparatus 100 is preset with a trained recurrent neural network. Referring to fig. 3, fig. 3 is a network structure diagram of a recurrent neural network according to an embodiment of the present disclosure. The recurrent neural network comprises a feature extraction layer, a classification layer and a personality correlation coefficient table 505 recorded with preset correlation degrees among different personality traits; wherein the classification layer comprises a full connection layer 503 and a softmax layer 504; the feature extraction layer includes a recursive layer 501 and a pooling layer 502. The method including the respective steps will be described in detail below.
And S100, acquiring a word vector of the text to be tested.
Alternatively, the data processing device 100 may obtain a large amount of test text from a local or network, and the test text may be a blog, a prose, a diary, a composition, or the like. Before inputting the test text into the recurrent neural network, the test text needs to be preprocessed so that the test text is converted into word vectors that the recurrent neural network can process.
For example, in one possible example, the data processing apparatus 100 performs a word segmentation process on the test text through a vocabulary or a dictionary. It should be noted that if the test text is text data of chinese type, the word is distinguished by a space as in chinese and english. Therefore, the data processing apparatus 100 first needs to perform word segmentation processing on the test text before processing the test text. The quality of the word segmentation process often affects the analysis result of the test text.
For example, for "today weather is very good! The word segmentation is carried out, good word segmentation results are 'today', 'weather' and 'true and good', and poor word segmentation results are 'today', 'day', 'qi' and 'true and good'. It can be seen that different word segmentation results bring completely different semantic meanings.
The data processing device 100 one-hot encodes the test text after the word segmentation, i.e. how many states have how many corresponding bits. For example, for "today's weather is really good! The word segmentation result of the 'today', 'weather' and 'true' is subjected to one-hot coding, and the word segmentation result corresponds to 3 states, and therefore corresponds to 3 bits. The "today" coding result is "100"; the encoding result of "weather" is "010"; the "true" coding result is "001".
The data processing apparatus 100 obtains the word vectors of the test text by looking up a pre-trained word vector table. For example, the word vector for "today" is [0.2, 0.3 ]; the word vector for "weather" is [0.4, 0.8], and the word vector for "true good" is [0.5, 0.9 ]. The pre-trained word vector table is obtained by training in an external corpus by using a skip-gram in the word2 vec.
And step S200, inputting the word vector into the recurrent neural network.
Step S300, splicing the data output by a preset network layer in the recurrent neural network with the data in the personality related coefficient table 505, and inputting the classification layer to obtain the personality type of the author corresponding to the test text.
Alternatively, it is worth to be noted that, the semantic information of a certain position in the text data is often related to the context of the text data, so the recurrent neural network in this embodiment may be a bidirectional recurrent neural network. The Recurrent Neural network is a Bidirectional Recurrent Neural Network (BRNN) and can well process the context information in the text data.
For example, "my phone is bad, i intend () a new phone", predict words that should be filled in brackets, from information in the front of brackets that can be filled in "buy" and "repair", or bad phone that causes a bad mood, brackets that can be filled in "crying," "walking around," and "eating a lot". But if the information behind the brackets is taken into account, the probability of filling in "buy" in the brackets is greater.
Based on the idea, the output of the bidirectional cyclic neural network at the current time i not only depends on the input at the previous time i-1 in the sequence, but also depends on the input at the subsequent time i + 1. Aiming at the currently input word vector, the bidirectional recurrent neural network extracts an upper feature vector and a lower feature vector of the current word vector, wherein the upper feature vector cl(wi) Can be expressed as:
cl(wi)=f(W(l)cl(wi-1)+W(sl)e(wi-1));
wherein, cl(wi-1) The forward output of the recurrent layer 501 of the bidirectional recurrent neural network at time i-1, e (w)i-1) Word vector, W, of the bidirectional recurrent neural network at time i-1(l)And W(sl)Respectively, their corresponding weights.
Context feature vector cr(wi) Can be expressed as:
cr(wi)=f(W(r)cr(wi+1)+W(sr)e(wi+1));
wherein, cr(wi+1) The backward output, e (w), of the recurrent layer 501 of the bidirectional recurrent neural network for time i +1i+1) Word vector, W, of the bidirectional recurrent neural network at time i +1(r)And W(sr)Respectively, the weights corresponding thereto. The data processing device 100 splices the above feature vector and the below feature vector to obtain the current semantic feature xi
xi=[cl(wi);e(wi);cr(wi)];
Wherein, e (w)i) For the current word vector, the implied semantics are obtained as follows
Figure BDA0002102472840000084
Figure BDA0002102472840000083
WhIs xiWeight of (a), bhIs xiIs an activation function of the hidden layer, and the calculation formula is:
Figure BDA0002102472840000081
optionally, the preset network layer is a maximum pooling layer, and the data processing apparatus 100 processes the output characteristics of the recursive layer 501 through the maximum pooling layer to obtain ypool. The calculation method is as follows:
Figure BDA0002102472840000082
will ypoolThe data in the personality-related coefficient table 505 are spliced to obtain xfAnd then input into the fully-connected layer 503, in the following calculation manner:
xf=[ypool;r];
yf=Wfxf+bf
where r is a preset correlation degree between different personality traits recorded in the personality correlation coefficient table 505, and the preset correlation degree between different personality traits is obtained by calculating pearson correlation between the personalities. WfIs xfWeight matrix of bfIs xfOffset of (a) yfIs the output of the fully connected layer 503.
For example, in one possible example, a personality correlation coefficient table 505 representing the degree of correlation between patency, accountability, hommization, extroversion, and neural matter is shown in FIG. 4. The degree of correlation between the personality is determined by the "correlation coefficient" and "significance". In the figure, "double tail" represents a measurement standard, and a corresponding measurement standard also has "single tail". In the "two-tailed" metric, a significance level of less than 0.05 greater than 0.01 is indicated by one ". sup.", and a significance level of less than 0.01 is indicated by two ". sup.". As shown in fig. 4, the correlation coefficient between "openness" and "accountability" is 0.29, and the significance between the two is less than 0.05 and more than 0.01.
The data processing device 100 converts the correlation coefficient and the significance parameter in the personality correlation coefficient table 505 into a column vector, and processes the column vector with the maximum pooling layer to obtain ypoolAnd splicing, and inputting a full connection layer 503 in the classification layer to obtain a classification result of the personality type of the author corresponding to the test text. Wherein, the output of the fully-connected layer 503 is connected with the softmax layer 504, and the data processing device 100 normalizes the data output by the fully-connected layer 503 through the softmax layer 504 to obtain the probability that each trait of the corresponding author of the test text belongs to the high trait and the probability threshold thereof.
For example, in one possible example, the output of the softmax layer has 10 outputs in total, a personality probability divided into 5 personality traits, and a probability threshold corresponding to the 5 personality traits. Wherein each personality trait can be divided into high trait and low trait, for example, extroversion can be divided into high extroversion and low extroversion. If the personality probability of one personality is more than or equal to the corresponding probability threshold value, the personality trait is high trait; if the personality trait is less than the corresponding probability threshold, the personality trait is a low trait. The calculation mode of softmax is as follows:
Figure BDA0002102472840000091
if softmax output is {0.05, 0.1, 0.16, 0.13, 0.06, 0.11, 0.04, 0.09, 0.14, 0.12} where 0.05 is the probability that the author has high openness, 0.1 is the threshold for high openness, 0.05<0.1 so the author has no high openness, i.e., the author has low openness; 0.16 is the probability that the author has high liability, 0.16>0.13, so the author has high liability.
Optionally, an embodiment of the present application further provides a training method for the recurrent neural network, where the training method includes:
the data processing apparatus 100 acquires a word vector corresponding to a training text, the word vector of the training text being labeled with a plurality of personality trait labels. Before obtaining the word vector of the training text, preprocessing the training text is required to obtain the word vector of the training text. The preprocessing method comprises the steps of firstly carrying out word segmentation on the training text, carrying out one-hot coding on data subjected to word segmentation, and then converting the training text in the one-hot coding form into corresponding word vectors by searching a pre-trained word vector table. In the embodiment of the application, the pre-trained word vector table is obtained by training in an external corpus by using a skip-gram in word2 vec.
And inputting the word vector of the training text into the recurrent neural network based on a preset loss function, and iteratively adjusting the weight of the recurrent neural network through a back propagation algorithm until the output value of the loss function is smaller than a preset threshold value. The preset loss function is calculated in the following manner:
Figure BDA0002102472840000101
wherein the content of the first and second substances,
Figure BDA00021024728400001011
is a training text diThe related personality of (1). For example, if the training text diThe personality of the corresponding author is high conscientity and high nervousness, so the conscientity and nervousness are diThe relevant personality of (1); the remaining three personality traits are then open, outward and preferred personality diThe number of unrelated personality traits,
Figure BDA0002102472840000102
is that
Figure BDA0002102472840000103
Complement of, i.e. diAre not relevant.
Figure BDA0002102472840000104
Is a training text diThe output of the second neuron of (2),
Figure BDA0002102472840000105
is a training text diThe output probability of the relevant personality trait label,
Figure BDA0002102472840000106
is a training text diThe output probability of an unrelated personality trait label,
Figure BDA0002102472840000107
and
Figure BDA0002102472840000108
the larger the gap between them, the better,
Figure BDA0002102472840000109
is a training text diAnd the output probability of the threshold value of the related personality trait label 2k is higher than the threshold value, and the larger the output probability is, the better the output probability is.
Figure BDA00021024728400001010
Is a training text dlThreshold of irrelevant personality trait label 2 j.
The embodiment of the application also provides a personality classification device 110. Referring to fig. 5, fig. 5 is a schematic structural diagram of the personality classification device 110, which is applied to the data processing device 100, the data processing device 100 is preset with a trained recurrent neural network, the trained recurrent neural network includes a feature extraction layer, a classification layer, and a personality correlation coefficient table 505 recorded with preset correlation degrees between different personality traits, and the personality classification device 110 includes an obtaining module 1101, an input module 1102, and a classification module 1103.
The obtaining module 1101 is configured to obtain a word vector of the test text.
In the embodiment of the present application, the obtaining module 1101 is configured to perform step S100 in fig. 2, and for a detailed description of the obtaining module 1101, reference may be made to a detailed description of step S100.
The input module 1102 is configured to input the word vector into the recurrent neural network.
In the embodiment of the present application, the input module 1102 is configured to perform step S200 in fig. 2, and reference may be made to the detailed description of step S200 for a detailed description of the input module 1102.
The classification module 1103 is configured to splice data output by a preset network layer in the recurrent neural network with data in the personality related coefficient table 505, and input the data into the classification layer to obtain the personality type of the author corresponding to the test text.
In the embodiment of the present application, the classification module 1103 is configured to perform step S300 in fig. 2, and reference may be made to the detailed description of step S300 for a detailed description of the classification module 1103.
Referring to fig. 6, the personality classification device 110 further includes a training module 1104, where the training module 1104 trains the recurrent neural network by:
acquiring a word vector corresponding to a training text, wherein the word vector of the training text is marked with a plurality of personality trait labels;
and inputting the word vector of the training text into the recurrent neural network based on a preset loss function, and iteratively adjusting the weight of the recurrent neural network through a back propagation algorithm until the output value of the loss function is smaller than a preset threshold value.
The recurrent neural network may be a bidirectional recurrent neural network.
To sum up, the personality classification method and device provided in the embodiment of the present application obtain a test text to be analyzed, preprocess the test text to convert the test text into word vectors that can be processed by a neural network model, and input the word vectors into a recurrent neural network. After data output by a preset network layer in the recurrent neural network is spliced with data in a personality correlation coefficient table, the data is input into a classification layer to obtain a personality classification result of an author corresponding to the test text, and the personality correlation coefficient table records preset correlation degrees among different personality traits. Therefore, in the process of analyzing the test text through the recurrent neural network, the predicted personality classification result is more accurate by combining the preset correlation degree among different personality traits.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only for various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and all such changes or substitutions are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (6)

1. A personality classification method applied to a data processing device, the data processing device being preset with a trained recurrent neural network, the trained recurrent neural network including a feature extraction layer, a classification layer, and a personality correlation coefficient table in which preset correlation degrees between different personality traits are recorded, wherein the preset correlation degrees between different personality traits are obtained by calculating a pearson correlation between personality traits, the method comprising:
acquiring a word vector of a test text;
inputting the word vector into the recurrent neural network;
splicing data output by a preset network layer in the recurrent neural network with data in the personality correlation coefficient table, and inputting the data into the classification layer to obtain the personality type of the author corresponding to the test text;
the method further comprises a training step of the recurrent neural network:
acquiring a word vector corresponding to a training text, wherein the word vector of the training text is marked with a plurality of personality trait labels;
inputting the word vector of the training text into the recurrent neural network based on a preset loss function, and iteratively adjusting the weight of the recurrent neural network through a back propagation algorithm until the output value of the loss function is smaller than a preset threshold value;
wherein the expression of the preset loss function E is:
Figure FDA0003249804270000011
in the formula (I), the compound is shown in the specification,
Figure FDA0003249804270000012
characterizing the training text diThe relative personality of (a) of (b),
Figure FDA0003249804270000013
is that
Figure FDA0003249804270000014
The complement of, characterizing the training text diUnrelated personality of (a);
Figure FDA0003249804270000021
is the training text diThe associated personality trait tag obtains the output probability from the 2k neuron,
Figure FDA0003249804270000022
representing the training text diThe output probability obtained from the 2 j-th neuron by the irrelevant personality trait tag,
Figure FDA0003249804270000023
representing the training text diThe relevant personality trait tag is the threshold corresponding to the output probability obtained from the 2k neuron,
Figure FDA0003249804270000024
representing the training sample diThe irrelevant personality trait labels are threshold values corresponding to the output probability obtained from the 2 j-th neuron.
2. The personality classification method of claim 1, wherein the recurrent neural network is a bidirectional recurrent neural network.
3. The personality classification method of claim 1, wherein the predetermined network layer is a maximum pooling layer.
4. The personality classification method of claim 1, wherein obtaining word vectors for test text further comprises:
performing word segmentation processing on the test text to obtain a corresponding word segmentation result;
and processing the word segmentation result through a word vector conversion tool to obtain the word vector.
5. A personality classification device is applied to data processing equipment, a trained recurrent neural network is preset on the data processing equipment, the trained recurrent neural network comprises a feature extraction layer, a classification layer and a personality correlation coefficient table recorded with preset correlation degrees among different personality traits, wherein the preset correlation degrees among the different personality traits are obtained by calculating the Pearson correlation among the personality traits, and the personality classification device comprises an acquisition module, an input module and a classification module;
the acquisition module is used for acquiring word vectors of the test texts;
the input module is used for inputting the word vector into the recurrent neural network;
the classification module is used for splicing data output by a preset network layer in the recurrent neural network with data in the personality correlation coefficient table and inputting the data into the classification layer to obtain the personality type of the author corresponding to the test text;
the personality classification device also comprises a training module, and the training module trains the recurrent neural network in the following mode:
acquiring a word vector corresponding to a training text, wherein the word vector of the training text is marked with a plurality of personality trait labels;
inputting the word vector of the training text into the recurrent neural network based on a preset loss function, and iteratively adjusting the weight of the recurrent neural network through a back propagation algorithm until the output value of the loss function is smaller than a preset threshold value;
wherein the expression of the preset loss function E is:
Figure FDA0003249804270000031
in the formula (I), the compound is shown in the specification,
Figure FDA0003249804270000032
characterizing the training text diThe relative personality of (a) of (b),
Figure FDA0003249804270000033
is that
Figure FDA0003249804270000034
The complement of, characterizing the training text diUnrelated personality of (a);
Figure FDA0003249804270000035
is the training text diThe associated personality trait tag obtains the output probability from the 2k neuron,
Figure FDA0003249804270000036
representing the training text diThe output probability obtained from the 2 j-th neuron by the irrelevant personality trait tag,
Figure FDA0003249804270000037
representing the training text diThe relevant personality trait tag is the threshold corresponding to the output probability obtained from the 2k neuron,
Figure FDA0003249804270000038
representing the training sample diThe irrelevant personality trait labels are threshold values corresponding to the output probability obtained from the 2 j-th neuron.
6. The personality classification device of claim 5, wherein the recurrent neural network is a bidirectional recurrent neural network.
CN201910540702.1A 2019-06-21 2019-06-21 Personality classification method and apparatus Expired - Fee Related CN110275953B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910540702.1A CN110275953B (en) 2019-06-21 2019-06-21 Personality classification method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910540702.1A CN110275953B (en) 2019-06-21 2019-06-21 Personality classification method and apparatus

Publications (2)

Publication Number Publication Date
CN110275953A CN110275953A (en) 2019-09-24
CN110275953B true CN110275953B (en) 2021-11-30

Family

ID=67961812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910540702.1A Expired - Fee Related CN110275953B (en) 2019-06-21 2019-06-21 Personality classification method and apparatus

Country Status (1)

Country Link
CN (1) CN110275953B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487184A (en) * 2020-11-26 2021-03-12 北京智源人工智能研究院 User character judging method and device, memory and electronic equipment
CN113268740B (en) * 2021-05-27 2022-08-16 四川大学 Input constraint completeness detection method of website system
CN113221560B (en) * 2021-05-31 2023-04-18 平安科技(深圳)有限公司 Personality trait and emotion prediction method, personality trait and emotion prediction device, computer device, and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451118A (en) * 2017-07-21 2017-12-08 西安电子科技大学 Sentence-level sensibility classification method based on Weakly supervised deep learning
CN108108354A (en) * 2017-06-18 2018-06-01 北京理工大学 A kind of microblog users gender prediction's method based on deep learning
US10169656B2 (en) * 2016-08-29 2019-01-01 Nec Corporation Video system using dual stage attention based recurrent neural network for future event prediction
CN109376784A (en) * 2018-10-29 2019-02-22 四川大学 A kind of personality prediction technique and personality prediction meanss
CN109597891A (en) * 2018-11-26 2019-04-09 重庆邮电大学 Text emotion analysis method based on two-way length Memory Neural Networks in short-term
CN109829154A (en) * 2019-01-16 2019-05-31 中南民族大学 Semantic-based personality prediction technique, user equipment, storage medium and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615589A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Named-entity recognition model training method and named-entity recognition method and device
CN105701460B (en) * 2016-01-07 2019-01-29 王跃明 A kind of basketball goal detection method and apparatus based on video
CN105975504A (en) * 2016-04-28 2016-09-28 中国科学院计算技术研究所 Recurrent neural network-based social network message burst detection method and system
US10049103B2 (en) * 2017-01-17 2018-08-14 Xerox Corporation Author personality trait recognition from short texts with a deep compositional learning approach

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10169656B2 (en) * 2016-08-29 2019-01-01 Nec Corporation Video system using dual stage attention based recurrent neural network for future event prediction
CN108108354A (en) * 2017-06-18 2018-06-01 北京理工大学 A kind of microblog users gender prediction's method based on deep learning
CN107451118A (en) * 2017-07-21 2017-12-08 西安电子科技大学 Sentence-level sensibility classification method based on Weakly supervised deep learning
CN109376784A (en) * 2018-10-29 2019-02-22 四川大学 A kind of personality prediction technique and personality prediction meanss
CN109597891A (en) * 2018-11-26 2019-04-09 重庆邮电大学 Text emotion analysis method based on two-way length Memory Neural Networks in short-term
CN109829154A (en) * 2019-01-16 2019-05-31 中南民族大学 Semantic-based personality prediction technique, user equipment, storage medium and device

Also Published As

Publication number Publication date
CN110275953A (en) 2019-09-24

Similar Documents

Publication Publication Date Title
CN109992782B (en) Legal document named entity identification method and device and computer equipment
CN109871545B (en) Named entity identification method and device
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN108829822B (en) Media content recommendation method and device, storage medium and electronic device
CN107908635B (en) Method and device for establishing text classification model and text classification
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN111695352A (en) Grading method and device based on semantic analysis, terminal equipment and storage medium
CN109960728B (en) Method and system for identifying named entities of open domain conference information
CN108304373B (en) Semantic dictionary construction method and device, storage medium and electronic device
CN110275953B (en) Personality classification method and apparatus
CN110334186B (en) Data query method and device, computer equipment and computer readable storage medium
CN112632226B (en) Semantic search method and device based on legal knowledge graph and electronic equipment
US10915756B2 (en) Method and apparatus for determining (raw) video materials for news
Singh et al. HINDIA: a deep-learning-based model for spell-checking of Hindi language
CN110852071B (en) Knowledge point detection method, device, equipment and readable storage medium
CN114661872A (en) Beginner-oriented API self-adaptive recommendation method and system
CN114443846A (en) Classification method and device based on multi-level text abnormal composition and electronic equipment
CN112732863A (en) Standardized segmentation method for electronic medical records
CN112182126A (en) Model training method and device for determining matching degree, electronic equipment and readable storage medium
CN107729509B (en) Discourse similarity determination method based on recessive high-dimensional distributed feature representation
CN114417891B (en) Reply statement determination method and device based on rough semantics and electronic equipment
Wakchaure et al. A scheme of answer selection in community question answering using machine learning techniques
CN115577109A (en) Text classification method and device, electronic equipment and storage medium
CN112364666B (en) Text characterization method and device and computer equipment
CN114925175A (en) Abstract generation method and device based on artificial intelligence, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211130