CN110309308A - Text information classification method and device and electronic equipment - Google Patents
Text information classification method and device and electronic equipment Download PDFInfo
- Publication number
- CN110309308A CN110309308A CN201910568734.2A CN201910568734A CN110309308A CN 110309308 A CN110309308 A CN 110309308A CN 201910568734 A CN201910568734 A CN 201910568734A CN 110309308 A CN110309308 A CN 110309308A
- Authority
- CN
- China
- Prior art keywords
- emotion
- text information
- vector
- word
- sorted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a method and a device for classifying text information and electronic equipment, wherein the method comprises the following steps: acquiring character information to be classified, and performing word segmentation processing to obtain word segmentation words; determining a target word vector corresponding to each participle word according to a predetermined corresponding relation between the word and the word vector; determining the emotion degree weight corresponding to each participle word based on a weight setting rule; determining an emotion vector according to the target word vector corresponding to each word-segmentation word and the emotion degree weight; inputting the emotion vector into a first classification model which is trained in advance, and determining whether the character information to be classified has an emotional tendency; and if the character information to be classified has the emotional tendency, inputting the emotional vector into a second classification model which is trained in advance, and determining the emotional tendency type of the character information to be classified. Therefore, the electronic equipment can accurately determine the emotional tendency type of the character information to be classified, and further research and processing can be conveniently carried out subsequently.
Description
Technical field
The present invention relates to technical field of information processing, more particularly to a kind of classification method of text information, device and electricity
Sub- equipment.
Background technique
Currently, all there is a large amount of text information in many fields, for example, currently having many items in block chain field
Mesh or currency type, and new project is also constantly being emerged in large numbers, and each project has the characteristics that different and application scenarios, goes to solve
Different some practical problems.For the things of any new rise at the initial stage just developed, many projects are all very different, users
Evaluation to these projects is also to pass different judgements on.These evaluation informations are a kind of text informations.In another example in social platform
In, user has a large amount of comment information to video or topic etc., these comment informations are also a kind of text information.
These text informations generally have certain Sentiment orientation, for example, liking, not liking.And these Sentiment orientations
It is of great significance to follow-up work.For example, improvement subsequent for the project in block chain field etc. has directive function.?
In network platform field, also there is important guiding effect for the improvement of interactive mode, visual effect etc., therefore, at present
A kind of method for needing Sentiment orientation type that can determine text information.
Summary of the invention
The classification method for being designed to provide a kind of text information, device and the electronic equipment of the embodiment of the present invention, with right
Text information is classified, and determines the Sentiment orientation type of text information.Specific technical solution is as follows:
In a first aspect, the embodiment of the invention provides a kind of classification methods of text information, which comprises
Text information to be sorted is obtained, and word segmentation processing is carried out to the text information to be sorted, obtains participle word;
According to the corresponding relationship of predetermined word and term vector, the corresponding target word of each participle word is determined
Vector;
Based on predetermined weight setting rule, the corresponding emotion degree weight of each participle word is determined;
According to the corresponding target term vector of each participle word and emotion degree weight, the text to be sorted is determined
The corresponding emotion vector of information;
The emotion vector is inputted into the first disaggregated model that training is completed in advance, determines that the text information to be sorted is
It is no that there is Sentiment orientation, wherein first disaggregated model includes the corresponding relationship of emotion vector and Sentiment orientation;
If the text information to be sorted has Sentiment orientation, the emotion vector is inputted into that training in advance is completed
Two disaggregated models determine the Sentiment orientation type of the text information to be sorted, wherein second disaggregated model includes emotion
The corresponding relationship of vector and Sentiment orientation type.
Optionally, described based on predetermined weight setting rule, determine the corresponding emotion of each participle word
The step of degree weight, comprising:
Determine sentiment dictionary belonging to each participle word, wherein the sentiment dictionary is to pre-establish by each
The dictionary of kind emotion word composition;
By default weight corresponding to identified sentiment dictionary, it is determined as participle word pair corresponding to the sentiment dictionary
The emotion degree weight answered.
Optionally, described according to the corresponding target term vector of each participle word and emotion degree weight, determine institute
The step of stating text information to be sorted corresponding emotion vector, comprising:
According to formulaDetermine the corresponding emotion vector of the text information to be sorted
vector;
Wherein, n is the quantity for the participle word that the text information to be sorted includes, wiFor the corresponding feelings of participle word i
Sense degree weight, veciFor the corresponding target term vector of participle word i.
Optionally, described that the emotion vector is inputted to the second disaggregated model that preparatory training is completed, it determines described wait divide
The step of Sentiment orientation type of class text information, comprising:
The emotion vector is inputted into the second disaggregated model that training is completed in advance, obtains the second disaggregated model output
Tag along sort;
Corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type determines the text information to be sorted
Sentiment orientation type be positive emotion or negative emotion.
Optionally, the training method of disaggregated model, comprising:
Obtain preliminary classification model;
Determine the corresponding emotion vector sample of multiple text information samples obtained in advance;
According to the semanteme of each text information sample, the corresponding affective tag of text message sample is determined;
Based on the emotion vector sample and its corresponding affective tag, the parameter of the preliminary classification model is adjusted, directly
The number of iterations to the preliminary classification model reaches preset times, or, the prediction label of preliminary classification model output
Accuracy reaches preset value, and deconditioning obtains the disaggregated model;Wherein, the disaggregated model includes first classification
Model and second disaggregated model.
Optionally, the step of multiple text information samples that the determination obtains in advance corresponding emotion vector sample, packet
It includes:
Word segmentation processing is carried out to the multiple text information samples obtained in advance, obtaining each text information sample includes
Participle sample;
According to the corresponding relationship of predetermined word and term vector, the corresponding term vector of each participle sample is determined
Sample;
Based on predetermined weight setting rule, the corresponding emotion degree weight of each term vector sample is determined;
According to the corresponding term vector sample of each participle sample and emotion degree weight, each text letter is determined
Cease the corresponding emotion vector sample of sample.
Second aspect, the embodiment of the invention provides a kind of sorter of text information, described device includes:
Classified literals data obtaining module, for obtaining text information to be sorted, and to the text information to be sorted into
Row word segmentation processing obtains participle word;
Target term vector determining module determines each for the corresponding relationship according to predetermined word and term vector
The corresponding target term vector of the participle word;
Emotion degree weight determination module, for determining each participle based on predetermined weight setting rule
The corresponding emotion degree weight of word;
Emotion vector determining module, for according to the corresponding target term vector of each participle word and emotion degree power
Weight, determines the corresponding emotion vector of the text information to be sorted;
Sentiment orientation determining module, first for completing emotion vector input model training module training in advance
Disaggregated model, determines whether the text information to be sorted has Sentiment orientation, wherein first disaggregated model includes emotion
The corresponding relationship of vector and Sentiment orientation;
Classification type determining module, if for the text information to be sorted have Sentiment orientation, by the emotion to
Input model training module the second disaggregated model that training is completed in advance is measured, determines the Sentiment orientation of the text information to be sorted
Type, wherein second disaggregated model includes the corresponding relationship of emotion vector Yu Sentiment orientation type.
Optionally, the emotion degree weight determination module includes:
Sentiment dictionary determination unit, for determining sentiment dictionary belonging to each participle word, wherein the emotion
Dictionary is the dictionary being made of various emotion words pre-established;
Emotion degree weight determining unit, for being determined as this for default weight corresponding to identified sentiment dictionary
The corresponding emotion degree weight of participle word corresponding to sentiment dictionary.
Optionally, the emotion vector determining module includes:
Emotion vector determination unit, for according to formulaIt determines described to be sorted
The corresponding emotion vector v ector of text information;
Wherein, n is the quantity for the participle word that the text information to be sorted includes, wiFor the corresponding feelings of participle word i
Sense degree weight, veciFor the corresponding target term vector of participle word i.
Optionally, the classification type determining module includes:
Tag along sort determination unit is obtained for the emotion vector to be inputted the second disaggregated model that training is completed in advance
The tag along sort exported to second disaggregated model;
Classification type determination unit, for the corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type,
The Sentiment orientation type for determining the text information to be sorted is positive emotion or negative emotion.
Optionally, the model training module includes:
Preliminary classification model acquiring unit, for obtaining preliminary classification model;
Emotion vector sample determination unit, for determining the corresponding emotion vector of multiple text information samples obtained in advance
Sample;
Affective tag determination unit determines the text information sample for the semanteme according to each text information sample
This corresponding affective tag;
Parameter adjustment unit adjusts described initial for being based on the emotion vector sample and its corresponding affective tag
The parameter of disaggregated model, until the number of iterations of the preliminary classification model reaches preset times, or, the preliminary classification model
The accuracy of the prediction label of output reaches preset value, and deconditioning obtains the disaggregated model;Wherein, the disaggregated model
Including first disaggregated model and second disaggregated model.
Optionally, the emotion vector sample determination unit includes:
Sample acquisition subelement is segmented, for carrying out word segmentation processing to the multiple text information samples obtained in advance, is obtained
The participle sample that each text information sample includes;
Term vector sample acquisition subelement determines every for the corresponding relationship according to predetermined word and term vector
The corresponding term vector sample of a participle sample;
Emotion degree weight determines subelement, for determining each institute's predicate based on predetermined weight setting rule
The corresponding emotion degree weight of vector sample;
Emotion vector sample determines subelement, for according to the corresponding term vector sample of each participle sample and emotion
Degree weight determines the corresponding emotion vector sample of each text information sample.
The third aspect, the embodiment of the invention provides a kind of electronic equipment, including processor, communication interface, memory and
Communication bus, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes point of any of the above-described text information
Class method and step.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage
Dielectric memory contains computer program, and the computer program realizes any of the above-described text information when being executed by processor
Classification method step.
In scheme provided by the embodiment of the present invention, electronic equipment obtains text information to be sorted first, and to be sorted
Text information carries out word segmentation processing, the participle word that text information to be sorted includes is obtained, according to predetermined word and word
The corresponding relationship of vector determines the corresponding target term vector of each participle word, is then based on predetermined weight setting rule
Then, the corresponding emotion degree weight of each participle word is determined, according to the corresponding target term vector of each participle word and emotion
Degree weight determines the corresponding emotion vector of text information to be sorted, then emotion vector is inputted first that training in advance is completed
Disaggregated model, determines whether text information to be sorted has Sentiment orientation, wherein the first disaggregated model includes emotion vector and feelings
Feel the corresponding relationship of tendency.If text information to be sorted has Sentiment orientation, emotion vector is inputted what training in advance was completed
Second disaggregated model determines the Sentiment orientation type of text information to be sorted, wherein the second disaggregated model include emotion vector with
The corresponding relationship of Sentiment orientation type.In this way, electronic equipment can accurately determine the Sentiment orientation class of text information to be sorted
Type is further studied and is handled convenient for subsequent.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
A kind of flow chart of the classification method for text information that Fig. 1 is provided by the embodiment of the present invention;
Fig. 2 is a kind of specific flow chart of step S103 in embodiment illustrated in fig. 1;
Fig. 3 is a kind of flow chart of the training method of the disaggregated model based on embodiment illustrated in fig. 1;
Fig. 4 is a kind of specific flow chart of step S302 in embodiment illustrated in fig. 3;
A kind of structural schematic diagram of the sorter for text information that Fig. 5 is provided by the embodiment of the present invention;
Fig. 6 is a kind of concrete structure schematic diagram of emotion degree weight determination module 530 in embodiment illustrated in fig. 5;
Fig. 7 is a kind of concrete structure schematic diagram of emotion vector determining module 540 in embodiment illustrated in fig. 5;
Fig. 8 is a kind of concrete structure schematic diagram of classification type determining module 550 in embodiment illustrated in fig. 5;
Fig. 9 is a kind of concrete structure schematic diagram of the model training module based on embodiment illustrated in fig. 5;
Figure 10 is a kind of concrete structure schematic diagram of emotion vector sample determination unit 902 in embodiment illustrated in fig. 9;
The structural schematic diagram for a kind of electronic equipment that Figure 11 is provided by the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In order to classify to text information, the Sentiment orientation type of text information is determined, the embodiment of the invention provides
Classification method, device, electronic equipment and the computer readable storage medium of text information.
The classification method for being provided for the embodiments of the invention a kind of text information below is introduced.
As shown in Figure 1, a kind of classification method of text information, which comprises
S101 obtains text information to be sorted, and carries out word segmentation processing to the text information to be sorted, obtains participle word
Language;
Electronic equipment text information to be sorted available first, wherein text information to be sorted be it needs to be determined that its
The text information of Sentiment orientation type.Text information to be sorted can be the text information that generates in various scenes, for example, can be with
For the project comment information in block chain, the topic comment information in forum website, video comments information in video website etc.,
It is not specifically limited again.There may be very big differences for view of the different user to same thing, to the view one of different things
As there is also very big differences, so text information to be sorted is likely to occur different types of Sentiment orientation, for example, liking, not liking
Vigorously, neutrality etc..
Since electronic equipment can not directly be handled text information, so after obtaining text information to be sorted, in order to
The accessible data of electronic equipment are converted by text information to be sorted, classified literals information can be treated and carry out word segmentation processing,
And then obtain the participle word that text information to be sorted includes.Wherein, word segmentation processing can be using Text extraction field
Any participle mode, as long as the participle word that text information to be sorted includes can be determined, be not specifically limited herein and
Explanation.
For example, the text information to be sorted that electronic equipment obtains is " this currency type is more steady, and has good prospects ".That
Treat classified literals information carry out word segmentation processing, the available participle word that it includes: " this ", " currency type ", " comparison ",
" steady ", " and ", " prospect " and " fine ".
S102 determines that each participle word is corresponding according to the corresponding relationship of predetermined word and term vector
Target term vector;
In this step, term vector is a vector, is the manageable data of electronic equipment.It is each in order to determine
The corresponding target term vector of word is segmented, electronic equipment can be with the corresponding relationship of predetermined word and term vector.
When determining the corresponding relationship of word and term vector, in one embodiment, in order to which specific aim is stronger, electronics is set
The standby corresponding scene of text information to be sorted handled as needed, obtains a large amount of text informations, for example, text to be sorted in advance
Information is the comment information in block chain, then the available a large amount of block chain news of electronic equipment and comment information.Certainly, exist
In another embodiment, in order to which the term vector of acquisition is more comprehensive, electronic equipment is also a large amount of in available each scene
Text information, this is all reasonable.
After obtaining a large amount of text informations, electronic equipment can obtain term vector using word2vec technology, wherein
Word2vec is the correlation model that a group is used to generate term vector.These models are the shallow and double-deck neural network, and training is completed
Later, word2vec model can be used to map each word to a vector.Therefore, electronic equipment can use word2vec mould
Type, word included by a large amount of text informations that will acquire are mapped to a vector, and then determine that word is corresponding with term vector
Relationship.
It, can be according to the corresponding relationship so after the participle word that electronic equipment determines that text information to be sorted includes
Determine the corresponding target term vector of each participle word.For example, the corresponding relationship of word and term vector is as shown in the table:
Serial number | Word | Term vector |
1 | A | a |
2 | B | b |
3 | C | c |
4 | D | d |
… | … | … |
If it is respectively word A, word C and word D that electronic equipment, which determines the participle word that text information to be sorted includes,
So corresponding relationship according to shown in upper table, electronic equipment can determine its corresponding target term vector be term vector a,
Term vector c and term vector d.
S103 determines the corresponding emotion degree power of each participle word based on predetermined weight setting rule
Weight;
In this step, the strong intensity of the emotion as represented by different terms is different, and the stronger word of the strong intensity of emotion
Language is bigger for the reference value of subsequent processing, so in order to preferably determine participle word that text information to be sorted includes
The strong intensity of emotion, electronic equipment can predefine weight setting rule, and then be advised according to the predetermined weight setting
Then, the corresponding emotion degree weight of each participle word is determined.
As an implementation, electronic equipment can determine the strong intensity of its emotion according to the semanteme of each participle word
To be high or low, and then determine the corresponding emotion degree weight of each participle word, wherein the strong intensity of emotion is high participle word
The corresponding emotion degree weight of language is higher, and the strong intensity of emotion is that the low corresponding emotion degree weight of participle word is lower.
S104, according to the corresponding target term vector of each participle word and emotion degree weight, determine it is described to point
The corresponding emotion vector of class text information;
In one embodiment, electronic equipment can be by the corresponding target term vector of each participle word and emotion degree
Multiplied by weight obtains product value, and the corresponding product value of all participle words is then added obtained result as to be sorted
The corresponding emotion vector of text information, that is, to the corresponding target term vector of each participle word, utilize emotion degree weight
It is weighted summation, obtains the corresponding emotion vector of text information to be sorted.
The emotion vector is inputted the first disaggregated model that training is completed in advance, determines the text to be sorted by S105
Whether information has Sentiment orientation, if the text information to be sorted has Sentiment orientation, executes step S106;Otherwise stop
Sort operation;
Wherein, the first disaggregated model includes the corresponding relationship of emotion vector and Sentiment orientation, so, electronic equipment is by emotion
After vector input the first disaggregated model that training is completed in advance, which can be according to the emotion vector that itself includes
With the corresponding relationship of Sentiment orientation, the corresponding Sentiment orientation of emotion vector is determined, and output it.Electronic equipment also can root
Result, which is exported, according to it determines whether text information to be sorted has Sentiment orientation.
For example, the Sentiment orientation in the corresponding relationship of emotion vector and Sentiment orientation that the first disaggregated model includes specifically may be used
Thinking has emotion, neutral emotion and ameleia.The output result of first disaggregated model can be label, for example, having emotion and nothing
Emotion can respectively correspond label 1 and 0.If that the output result of the first disaggregated model is 1, then illustrate text letter to be sorted
Breath has Sentiment orientation;If the output result of the first disaggregated model is 0, illustrate that text information to be sorted inclines without emotion
To.
Wherein, the first disaggregated model can be the machine learning such as SVM, convolutional neural networks, Recognition with Recurrent Neural Network
Model is not specifically limited herein.In order to scheme remove and be laid out it is clear, it is subsequent will be to the training method of the first disaggregated model
It is illustrated.
If text information to be sorted does not have Sentiment orientation, illustrate text information to be sorted to subsequent processing and research
It helps and little, then electronic equipment can stop the sort operation to the text information to be sorted, and continues to other texts
Word information carries out classification processing etc..If text information to be sorted has Sentiment orientation, illustrate text information to be sorted to subsequent
The help of processing and research is larger, then further determines that the Sentiment orientation type of text information to be sorted.
The emotion vector is inputted the second disaggregated model that training is completed in advance, determines the text to be sorted by S106
The Sentiment orientation type of information.
Wherein, the second disaggregated model may include the corresponding relationship of emotion vector Yu Sentiment orientation type.In this way, electronics is set
After standby the second disaggregated model that emotion vector is inputted to training completion in advance, which can include according to itself
The corresponding relationship of emotion vector and Sentiment orientation type determines the corresponding Sentiment orientation type of emotion vector, and outputs it.Electricity
Sub- equipment also can export the Sentiment orientation type that result determines text information to be sorted according to it.
Wherein, the second disaggregated model may be the engineerings such as SVM, convolutional neural networks, Recognition with Recurrent Neural Network
Model is practised, is not specifically limited herein.In order to scheme remove and be laid out it is clear, it is subsequent will be to the training side of the second disaggregated model
Formula is illustrated.
As it can be seen that electronic equipment obtains text information to be sorted first, and treats in scheme provided by the embodiment of the present invention
Classified literals information carries out word segmentation processing, the participle word that text information to be sorted includes is obtained, according to predetermined word
With the corresponding relationship of term vector, determines the corresponding target term vector of each participle word, be then based on predetermined weight and set
Set pattern then, determines the corresponding emotion degree weight of each participle word, according to the corresponding target term vector of each participle word and
Emotion degree weight determines the corresponding emotion vector of text information to be sorted, then emotion vector is inputted what training in advance was completed
First disaggregated model, determines whether text information to be sorted has Sentiment orientation, wherein the first disaggregated model includes emotion vector
With the corresponding relationship of Sentiment orientation.If text information to be sorted has Sentiment orientation, the input of emotion vector has been trained in advance
At the second disaggregated model, determine the Sentiment orientation type of text information to be sorted, wherein the second disaggregated model include emotion to
The corresponding relationship of amount and Sentiment orientation type.In this way, electronic equipment can accurately determine that the emotion of text information to be sorted is inclined
To type, is further studied and handled convenient for subsequent.
As a kind of embodiment of the embodiment of the present invention, as shown in Fig. 2, above-mentioned advised based on predetermined weight setting
Then, the step of determining each participle word corresponding emotion degree weight may include:
S201 determines sentiment dictionary belonging to each participle word;
In order to facilitate the corresponding emotion degree weight of participle word of determination text information to be sorted, electronic equipment can be pre-
Each vocabulary, is divided into different classes of by the classification of first sorting-out in statistics vocabulary according to the emotional intensity degree of vocabulary, and each classification institute is right
The vocabulary answered constitutes a sentiment dictionary, that is to say, that sentiment dictionary is the word being made of various emotion words pre-established
Allusion quotation.
For example, the emotional intensity degree of vocabulary can be divided into intense emotion and ameleia, then three feelings can be determined
Feel dictionary, respectively intense emotion dictionary and ameleia dictionary.
It is determined that electronic equipment can search each point respectively after the participle word that text information to be sorted includes
Sentiment dictionary belonging to word word.For example, text information to be sorted include participle word be " I ", " very good ", " this ",
" project ", then electronic equipment can search sentiment dictionary belonging to each participle word, if " I ", " this " and " item
Mesh " is in ameleia dictionary, and " very good " is in intense emotion dictionary, then emotion word belonging to " very good " can be determined
Allusion quotation is intense emotion dictionary, and sentiment dictionary belonging to " I ", " this " and " project " is ameleia dictionary.
Default weight corresponding to identified sentiment dictionary is determined as participle corresponding to the sentiment dictionary by S202
The corresponding emotion degree weight of word.
Since the reference value of the word of different emotions strength levels is different, the often stronger word of emotional intensity degree
Reference value is bigger, so electronic equipment can predefine the corresponding default weight of sentiment dictionary.For example, sentiment dictionary includes
Intense emotion dictionary, general sentiment dictionary and ameleia dictionary, then electronic equipment can predefine intense emotion dictionary, one
As sentiment dictionary and the corresponding default weight of ameleia dictionary be α, β, γ, and under normal circumstances, α > β > γ.
It, can be by identified emotion after so electronic equipment determines sentiment dictionary belonging to each participle word
Default weight corresponding to dictionary is determined as the corresponding emotion degree weight of participle word corresponding to the sentiment dictionary.
For example, if electronic equipment determine " very good " belonging to sentiment dictionary be intense emotion dictionary, " I ",
Sentiment dictionary belonging to " this " and " project " is ameleia dictionary, and the corresponding default weight of intense emotion dictionary is α, ameleia
The corresponding default weight of dictionary is γ, then can determine that " very good " corresponding emotion degree weight is α, " I ", " this
It is a " and " project " corresponding default weight be γ.
As it can be seen that in the present embodiment, electronic equipment can determine sentiment dictionary belonging to each participle word, and then by institute
Default weight corresponding to determining sentiment dictionary is determined as the corresponding emotion degree of participle word corresponding to the sentiment dictionary
Weight.Since sentiment dictionary is the dictionary being made of various emotion words pre-established, so electronic equipment can be quickly quasi-
Really determine the corresponding emotion degree weight of participle word.
As a kind of embodiment of the embodiment of the present invention, it is above-mentioned according to the corresponding target word of each participle word to
Amount and emotion degree weight, may include: at the step of determining the text information to be sorted corresponding emotion vector
According to formulaDetermine the corresponding emotion of the text information to be sorted to
Measure vector;
Wherein, n is the quantity for the participle word that text information to be sorted includes, wiFor the corresponding emotion journey of participle word i
Spend weight, veciFor the corresponding target term vector of participle word i.
That is, electronic equipment can utilize emotion degree weight to the corresponding target term vector of each participle word
It is weighted summation, to protrude the participle word of different emotions intensity in the Sentiment orientation class for determining text information to be sorted
Effect in type, then using the average value of weighted sum as the corresponding emotion vector of text information to be sorted.In this way, can be more
Add convenient for calculating, can also avoid influencing excessive problem caused by the higher participle word of some emotion intensity.
For example, the corresponding target term vector of participle word that text information to be sorted includes be respectively fc1, fc2, fc3,
Fc4 grades of fc5, electronic equipment determine that its corresponding emotion degree weight is w1、w2、w3、w4And w5, then according to above-mentioned public affairs
Formula, electronic equipment can determine the corresponding emotion vector v ector of classified literals information, i.e., are as follows:
As it can be seen that in the present embodiment, electronic equipment determines the corresponding emotion vector of classified literals information according to above-mentioned formula,
It can not only fully consider the participle word of different emotions intensity in the Sentiment orientation type for determining text information to be sorted
In effect, can also convenient for calculate, influence caused by the higher participle word of some emotion intensity can also be avoided
Excessive problem.
As a kind of embodiment of the embodiment of the present invention, above-mentioned that the emotion vector inputs to training in advance is completed
Two disaggregated models the step of determining the Sentiment orientation type of the text information to be sorted, may include:
The emotion vector is inputted into the second disaggregated model that training is completed in advance, obtains the second disaggregated model output
Tag along sort;Corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type determines the text to be sorted
The Sentiment orientation type of information is positive emotion or negative emotion.
Sentiment orientation type in the corresponding relationship of emotion vector and Sentiment orientation type that second disaggregated model includes has
Body can be positive emotion or negative emotion.The output result of first disaggregated model can be tag along sort, for example, positive emotion
Tag along sort 1 and -1 can be respectively corresponded with negative emotion.Above-mentioned emotion vector is so inputted the second of training completion in advance
Disaggregated model, the second disaggregated model can according to the corresponding relationship of the emotion vector that includes and Sentiment orientation type, output
Tag along sort.
In turn, electronic equipment can the corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type, really
The Sentiment orientation type of fixed text information to be sorted is positive emotion or negative emotion.For example, positive emotion and negative emotion can
To respectively correspond tag along sort 1 and -1, if that the second disaggregated model output tag along sort be 1, then can determine to
The Sentiment orientation type of classified literals information is positive emotion;If the tag along sort of the second disaggregated model output is -1,
The Sentiment orientation type that can determine text information to be sorted is negative emotion.
As it can be seen that in the present embodiment, emotion vector can be inputted the second classification mould that training in advance is completed by electronic equipment
Type obtains the tag along sort of the second disaggregated model output, is then based on predetermined tag along sort and Sentiment orientation type
Corresponding relationship determines that the Sentiment orientation type of text information to be sorted is positive emotion or negative emotion.In this way, can accurately really
The Sentiment orientation type of fixed text information to be sorted is positive emotion or negative emotion, provides strong ginseng for subsequent processing and research
It examines.
As a kind of embodiment of the embodiment of the present invention, as shown in figure 3, the training method of disaggregated model, may include:
Wherein, disaggregated model may include above-mentioned first disaggregated model and above-mentioned second disaggregated model, that is to say, that at this
In embodiment, the training method of above-mentioned first disaggregated model and above-mentioned second disaggregated model can be identical.
S301 obtains preliminary classification model;
Firstly, the available preliminary classification model of electronic equipment, wherein the parameter of preliminary classification model can be initial at random
Change, is not specifically limited herein.Preliminary classification model can be SVM, convolutional neural networks, Recognition with Recurrent Neural Network etc.
Machine learning model is not specifically limited herein.
S302 determines the corresponding emotion vector sample of multiple text information samples obtained in advance;
Electronic equipment can obtain a large amount of text information in advance, as text information sample.In order to train preliminary classification
Model, electronic equipment is it needs to be determined that the corresponding emotion vector sample of each text information sample, wherein emotion vector sample is
The vector of the strong program of emotion of word included by corresponding text information sample can be characterized.
S303 determines the corresponding affective tag of text message sample according to the semanteme of each text information sample;
Next, the semanteme as text information sample can indicate Sentiment orientation represented by text information sample, institute
The corresponding affective tag of text message sample can be determined according to the semanteme of each text information sample with electronic equipment.
In the case of the first, when above-mentioned disaggregated model is above-mentioned first disaggregated model, since the first disaggregated model is for true
Whether fixed text information to be sorted has Sentiment orientation, so electronic equipment determines the corresponding affective tag of text information sample i.e.
To indicate whether it has the affective tag of Sentiment orientation.For example, can be in love sense, neutral emotion and ameleia three classes emotion
Label, or have two class affective tag of emotion and ameleia, this is all reasonable.Affective tag can with preset number,
Letter etc. indicates, is not specifically limited herein.
For example, electronic equipment can determine if the semanteme of text information sample is " I does not like this currency type "
Its affective tag is in love sense affective tag;If the semanteme of text information sample is " not see to the prospect of this currency type
Method ", then electronic equipment can determine its affective tag for neutral emotion affective tag;If the semanteme of text information sample is
" what this is ", then electronic equipment can determine that its affective tag is ameleia affective tag.
Under second situation, when above-mentioned disaggregated model is above-mentioned second disaggregated model, since the second disaggregated model is for true
The Sentiment orientation type of fixed text information to be sorted, so electronic equipment determines the corresponding affective tag of text information sample is
Indicate the affective tag of its Sentiment orientation type.For example, can be two class affective tag of positive emotion and negative ameleia,
It can be strong positive emotion, four class affective tag of general positive emotion, strong negative emotion and general negative emotion etc., this is all
It is reasonable.Affective tag can be indicated with preset number, letter etc., be not specifically limited herein.
For example, electronic equipment can determine if the semanteme of text information sample is " I does not like this currency type "
Its affective tag is to have negative emotion label;If the semanteme of text information sample is " having good prospects for this currency type ",
Electronic equipment can determine that its affective tag is front affective tag.
S304 is based on the emotion vector sample and its corresponding affective tag, adjusts the ginseng of the preliminary classification model
Number, until the number of iterations of the preliminary classification model reaches preset times, or, the pre- mark of preliminary classification model output
The accuracy of label reaches preset value, and deconditioning obtains the disaggregated model.
After emotion vector sample and its corresponding affective tag has been determined, electronic equipment can be by each emotion vector sample
The above-mentioned preliminary classification model of this input, preliminary classification model can carry out at classification emotion vector sample based on parameter current
Reason, exports its corresponding prediction label.
The parameter of preliminary classification model is the prediction label and emotion vector of the output of preliminary classification model before being optimal
The corresponding affective tag of sample can have differences, and electronic equipment can be based on prediction label and its corresponding emotion vector sample
The difference of corresponding affective tag adjusts the parameter of preliminary classification model, so that the prediction label of preliminary classification model output
It is more and more accurate.The mode for adjusting the parameter of preliminary classification model can be gradient descent algorithm, stochastic gradient descent algorithm etc.
Mode is not specifically limited herein and illustrates.
While the constantly parameter of adjustment preliminary classification model, preliminary classification model can gradually Latent abilities vector sample
Relationship between emotion intensity represented by this and emotion vector sample, that is, emotion vector sample and affective tag
Corresponding relationship, so the output result of preliminary classification model can be more and more accurate.
When the number of iterations of initial disaggregated model reaches preset times, alternatively, the prediction label of preliminary classification model output
Accuracy when reaching preset value, illustrate that preliminary classification model has been able to handle emotion vector at this time, it is accurate to obtain
Affective tag, then can deconditioning, preliminary classification model at this time can use as above-mentioned disaggregated model.
As it can be seen that in the present embodiment, electronic equipment can obtain above-mentioned first classification mould by the training of above-mentioned training method
Type and above-mentioned second disaggregated model, by the available disaggregated model that can export accurate affective tag of the training method,
Further increase the accuracy of the classification results of text information to be sorted.
As a kind of embodiment of the embodiment of the present invention, as shown in figure 4, multiple texts letter that above-mentioned determination obtains in advance
The step of ceasing sample corresponding emotion vector sample may include:
S401 carries out word segmentation processing to the multiple text information samples obtained in advance, obtains each text information sample
Originally the participle sample for including;
Firstly, in order to which the multiple text information samples obtained in advance are converted to the manageable data class of electronic equipment
Type, electronic equipment can carry out word segmentation processing to the multiple text information samples obtained in advance, and then obtain each text information
The participle sample that sample includes.Wherein, the concrete mode of word segmentation processing can at the above-mentioned participle for treating classified literals information
The mode of reason is identical, and details are not described herein.
S402 determines that each participle sample is corresponding according to the corresponding relationship of predetermined word and term vector
Term vector sample;
The method of determination of the corresponding relationship of word and term vector can be identical as mode described in above-mentioned steps S102,
Details are not described herein.Electronic equipment can determine the corresponding term vector sample of each participle sample by inquiring the corresponding relationship
This.
S403 determines the corresponding emotion degree of each term vector sample based on predetermined weight setting rule
Weight;
Next, electronic equipment can determine the corresponding emotion degree weight of each term vector sample.In order to make to train
It obtains disaggregated model to be more applicable for carrying out above-mentioned text information to be sorted classification processing, predetermined weight setting rule
Can be identical as the weight setting rule in above-mentioned steps S103, electronic equipment is based on predetermined weight setting rule, just
It can determine the corresponding emotion degree weight of each term vector sample.
S404 is determined each described according to the corresponding term vector sample of each participle sample and emotion degree weight
The corresponding emotion vector sample of text information sample.
Similarly, it is more applicable for carrying out at classification above-mentioned text information to be sorted to make training obtain disaggregated model
Reason, electronic equipment determine that the mode of the corresponding emotion vector sample of each text information sample can be with above-mentioned determination text to be sorted
The mode of the corresponding emotion vector of word information is identical.Can certainly be different, have no effect on the classification knot of text information to be sorted
Fruit.
As it can be seen that in the present embodiment, electronic equipment can carry out at participle the multiple text information samples obtained in advance
Reason, obtains the participle sample that each text information sample includes, according to the corresponding relationship of predetermined word and term vector, really
Determine the corresponding term vector sample of each participle sample, is then based on predetermined weight setting rule, determines each term vector
The corresponding emotion degree weight of sample, and then according to the corresponding term vector sample of each participle sample and emotion degree weight, really
Determine the corresponding emotion vector sample of each text information sample.In this way, can accurately determine that each text information sample is corresponding
Emotion vector sample, so that training obtains disaggregated model and is more applicable for carrying out classification processing to above-mentioned text information to be sorted,
Classification results are more accurate.
Corresponding to the classification method of above-mentioned text information, the embodiment of the invention also provides a kind of classification of text information dresses
It sets.
The sorter for being provided for the embodiments of the invention a kind of text information below is introduced.
As shown in figure 5, a kind of sorter of text information, described device include:
Classified literals data obtaining module 510, for obtaining text information to be sorted, and to the text information to be sorted
Word segmentation processing is carried out, participle word is obtained;
Target term vector determining module 520 determines every for the corresponding relationship according to predetermined word and term vector
The corresponding target term vector of a participle word;
Emotion degree weight determination module 530, for determining each described point based on predetermined weight setting rule
The corresponding emotion degree weight of word word;
Emotion vector determining module 540, for according to the corresponding target term vector of each participle word and emotion journey
Weight is spent, determines the corresponding emotion vector of the text information to be sorted;
Sentiment orientation determining module 550, for complete emotion vector input model training module training in advance
First disaggregated model, determines whether the text information to be sorted has Sentiment orientation;
Wherein, first disaggregated model includes the corresponding relationship of emotion vector and Sentiment orientation.
Classification type determining module 560, if there is Sentiment orientation for the text information to be sorted, by the emotion
Vector input model training module the second disaggregated model that training is completed in advance, determines that the emotion of the text information to be sorted is inclined
To type.
Wherein, second disaggregated model includes the corresponding relationship of emotion vector Yu Sentiment orientation type.
As it can be seen that electronic equipment obtains text information to be sorted first, and treats in scheme provided by the embodiment of the present invention
Classified literals information carries out word segmentation processing, the participle word that text information to be sorted includes is obtained, according to predetermined word
With the corresponding relationship of term vector, determines the corresponding target term vector of each participle word, be then based on predetermined weight and set
Set pattern then, determines the corresponding emotion degree weight of each participle word, according to the corresponding target term vector of each participle word and
Emotion degree weight determines the corresponding emotion vector of text information to be sorted, then emotion vector is inputted what training in advance was completed
First disaggregated model, determines whether text information to be sorted has Sentiment orientation, wherein the first disaggregated model includes emotion vector
With the corresponding relationship of Sentiment orientation.If text information to be sorted has Sentiment orientation, the input of emotion vector has been trained in advance
At the second disaggregated model, determine the Sentiment orientation type of text information to be sorted, wherein the second disaggregated model include emotion to
The corresponding relationship of amount and Sentiment orientation type.In this way, electronic equipment can accurately determine that the emotion of text information to be sorted is inclined
To type, is further studied and handled convenient for subsequent.
As a kind of embodiment of the embodiment of the present invention, as shown in fig. 6, above-mentioned emotion degree weight determination module 530
May include:
Sentiment dictionary determination unit 5301, for determining sentiment dictionary belonging to each participle word, wherein described
Sentiment dictionary is the dictionary being made of various emotion words pre-established;
Emotion degree weight determining unit 5302, for determining default weight corresponding to identified sentiment dictionary
For the corresponding emotion degree weight of participle word corresponding to the sentiment dictionary.
As a kind of embodiment of the embodiment of the present invention, as shown in fig. 7, above-mentioned emotion vector determining module 540 can be with
Include:
Emotion vector determination unit 5401, for according to formulaDetermine it is described to
The corresponding emotion vector v ector of classified literals information;
Wherein, n is the quantity for the participle word that the text information to be sorted includes, wiFor the corresponding feelings of participle word i
Sense degree weight, veciFor the corresponding target term vector of participle word i.
As a kind of embodiment of the embodiment of the present invention, as shown in figure 8, above-mentioned classification type determining module 550 can be with
Include:
Tag along sort determination unit 5501, for the emotion vector to be inputted the second classification mould that training in advance is completed
Type obtains the tag along sort of the second disaggregated model output;
Classification type determination unit 5502, for based on predetermined tag along sort pass corresponding with Sentiment orientation type
System determines that the Sentiment orientation type of the text information to be sorted is positive emotion or negative emotion.
As a kind of embodiment of the embodiment of the present invention, as shown in figure 9, above-mentioned model training module may include:
Preliminary classification model acquiring unit 901, for obtaining preliminary classification model;
Emotion vector sample determination unit 902, for determining the corresponding emotion of multiple text information samples obtained in advance
Vector sample;
Affective tag determination unit 903 determines the text information for the semanteme according to each text information sample
The corresponding affective tag of sample;
Parameter adjustment unit 904 adjusts described first for being based on the emotion vector sample and its corresponding affective tag
The parameter of beginning disaggregated model, until the number of iterations of the preliminary classification model reaches preset times, or, the preliminary classification mould
The accuracy of the prediction label of type output reaches preset value, and deconditioning obtains the disaggregated model.
Wherein, the disaggregated model includes first disaggregated model and second disaggregated model.
As a kind of embodiment of the embodiment of the present invention, as shown in Figure 10, above-mentioned emotion vector sample determination unit 902
May include:
Sample acquisition subelement 9021 is segmented, for carrying out word segmentation processing to the multiple text information samples obtained in advance,
Obtain the participle sample that each text information sample includes;
Term vector sample acquisition subelement 9022, for the corresponding relationship according to predetermined word and term vector, really
Determine the corresponding term vector sample of each participle sample;
Emotion degree weight determines subelement 9023, for determining each institute based on predetermined weight setting rule
The corresponding emotion degree weight of predicate vector sample;
Emotion vector sample determines subelement 9024, for according to the corresponding term vector sample of each participle sample and
Emotion degree weight determines the corresponding emotion vector sample of each text information sample.
The embodiment of the invention also provides a kind of electronic equipment, and as shown in figure 11, electronic equipment may include processor
1101, communication interface 1102, memory 1103 and communication bus 1104, wherein processor 1101, communication interface 1102, storage
Device 1103 completes mutual communication by communication bus 1104,
Memory 1103, for storing computer program;
Processor 1101 when for executing the program stored on memory 1103, realizes following steps:
Text information to be sorted is obtained, and word segmentation processing is carried out to the text information to be sorted, obtains participle word;
According to the corresponding relationship of predetermined word and term vector, the corresponding target word of each participle word is determined
Vector;
Based on predetermined weight setting rule, the corresponding emotion degree weight of each participle word is determined;
According to the corresponding target term vector of each participle word and emotion degree weight, the text to be sorted is determined
The corresponding emotion vector of information;
The emotion vector is inputted into the first disaggregated model that training is completed in advance, determines that the text information to be sorted is
It is no that there is Sentiment orientation;
Wherein, first disaggregated model includes the corresponding relationship of emotion vector and Sentiment orientation.
If the text information to be sorted has Sentiment orientation, the emotion vector is inputted into that training in advance is completed
Two disaggregated models determine the Sentiment orientation type of the text information to be sorted.
Wherein, second disaggregated model includes the corresponding relationship of emotion vector Yu Sentiment orientation type.
As it can be seen that electronic equipment obtains text information to be sorted first, and treats in scheme provided by the embodiment of the present invention
Classified literals information carries out word segmentation processing, the participle word that text information to be sorted includes is obtained, according to predetermined word
With the corresponding relationship of term vector, determines the corresponding target term vector of each participle word, be then based on predetermined weight and set
Set pattern then, determines the corresponding emotion degree weight of each participle word, according to the corresponding target term vector of each participle word and
Emotion degree weight determines the corresponding emotion vector of text information to be sorted, then emotion vector is inputted what training in advance was completed
First disaggregated model, determines whether text information to be sorted has Sentiment orientation, wherein the first disaggregated model includes emotion vector
With the corresponding relationship of Sentiment orientation.If text information to be sorted has Sentiment orientation, the input of emotion vector has been trained in advance
At the second disaggregated model, determine the Sentiment orientation type of text information to be sorted, wherein the second disaggregated model include emotion to
The corresponding relationship of amount and Sentiment orientation type.In this way, electronic equipment can accurately determine that the emotion of text information to be sorted is inclined
To type, is further studied and handled convenient for subsequent.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component
Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard
Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just
It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy
The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also
To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit,
CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal
Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing
It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete
Door or transistor logic, discrete hardware components.
Wherein, above-mentioned based on predetermined weight setting rule, determine the corresponding emotion journey of each participle word
The step of degree weight, may include:
Determine sentiment dictionary belonging to each participle word;
Wherein, the sentiment dictionary is the dictionary being made of various emotion words pre-established.
By default weight corresponding to identified sentiment dictionary, it is determined as participle word pair corresponding to the sentiment dictionary
The emotion degree weight answered.
Wherein, above-mentioned according to the corresponding target term vector of each participle word and emotion degree weight, determine described in
The step of text information to be sorted corresponding emotion vector, may include:
According to formulaDetermine the corresponding emotion vector of the text information to be sorted
vector;
Wherein, n is the quantity for the participle word that the text information to be sorted includes, wiFor the corresponding feelings of participle word i
Sense degree weight, veciFor the corresponding target term vector of participle word i.
Wherein, above-mentioned that the emotion vector is inputted to the second disaggregated model that training is completed in advance, it determines described to be sorted
The step of Sentiment orientation type of text information, may include:
The emotion vector is inputted into the second disaggregated model that training is completed in advance, obtains the second disaggregated model output
Tag along sort;
Corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type determines the text information to be sorted
Sentiment orientation type be positive emotion or negative emotion.
Wherein, the training method of disaggregated model may include:
Obtain preliminary classification model;
Determine the corresponding emotion vector sample of multiple text information samples obtained in advance;
According to the semanteme of each text information sample, the corresponding affective tag of text message sample is determined;
Based on the emotion vector sample and its corresponding affective tag, the parameter of the preliminary classification model is adjusted, directly
The number of iterations to the preliminary classification model reaches preset times, or, the prediction label of preliminary classification model output
Accuracy reaches preset value, and deconditioning obtains the disaggregated model.
Wherein, the disaggregated model includes first disaggregated model and second disaggregated model.
It wherein, the step of multiple text information samples that above-mentioned determination obtains in advance corresponding emotion vector sample, can be with
Include:
Word segmentation processing is carried out to the multiple text information samples obtained in advance, obtaining each text information sample includes
Participle sample;
According to the corresponding relationship of predetermined word and term vector, the corresponding term vector of each participle sample is determined
Sample;
Based on predetermined weight setting rule, the corresponding emotion degree weight of each term vector sample is determined;
According to the corresponding term vector sample of each participle sample and emotion degree weight, each text letter is determined
Cease the corresponding emotion vector sample of sample.
The embodiment of the invention also provides a kind of computer readable storage medium, the computer readable storage medium memory
Computer program is contained, the computer program realizes text information described in any of the above-described embodiment when being executed by processor
Classification method.
As it can be seen that when computer program is executed by processor, being obtained first wait divide in scheme provided by the embodiment of the present invention
Class text information, and treat classified literals information and carry out word segmentation processing, obtain the participle word that text information to be sorted includes, root
According to the corresponding relationship of predetermined word and term vector, determines the corresponding target term vector of each participle word, be then based on
Predetermined weight setting rule, determines the corresponding emotion degree weight of each participle word, according to each participle word pair
The target term vector and emotion degree weight answered, determine the corresponding emotion vector of text information to be sorted, then emotion vector is defeated
Enter the first disaggregated model that training is completed in advance, determine whether text information to be sorted has Sentiment orientation, wherein the first classification
Model includes the corresponding relationship of emotion vector and Sentiment orientation.If text information to be sorted have Sentiment orientation, by emotion to
Amount input the second disaggregated model that training is completed in advance, determines the Sentiment orientation type of text information to be sorted, wherein second point
Class model includes the corresponding relationship of emotion vector Yu Sentiment orientation type.In this way, electronic equipment can accurately determine it is to be sorted
The Sentiment orientation type of text information is further studied and is handled convenient for subsequent.
It should be noted that for above-mentioned apparatus, electronic equipment and computer readable storage medium embodiment, due to
It is substantially similar to correlation method embodiment, so being described relatively simple, related place is said referring to the part of embodiment of the method
It is bright.
Need further exist for explanation, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention
It is interior.
Claims (10)
1. a kind of classification method of text information, which is characterized in that the described method includes:
Text information to be sorted is obtained, and word segmentation processing is carried out to the text information to be sorted, obtains participle word;
According to the corresponding relationship of predetermined word and term vector, determine the corresponding target word of each participle word to
Amount;
Based on predetermined weight setting rule, the corresponding emotion degree weight of each participle word is determined;
According to the corresponding target term vector of each participle word and emotion degree weight, the text information to be sorted is determined
Corresponding emotion vector;
The emotion vector is inputted into the first disaggregated model that training is completed in advance, determines whether the text information to be sorted has
There is Sentiment orientation, wherein first disaggregated model includes the corresponding relationship of emotion vector and Sentiment orientation;
If the text information to be sorted has Sentiment orientation, the emotion vector is inputted into second point that training in advance is completed
Class model determines the Sentiment orientation type of the text information to be sorted, wherein second disaggregated model includes emotion vector
With the corresponding relationship of Sentiment orientation type.
2. the method as described in claim 1, which is characterized in that it is described based on predetermined weight setting rule, it determines every
The step of a participle word corresponding emotion degree weight, comprising:
Determine sentiment dictionary belonging to each participle word, wherein the sentiment dictionary is to pre-establish by various feelings
Feel the dictionary of word composition;
By default weight corresponding to identified sentiment dictionary, it is corresponding to be determined as participle word corresponding to the sentiment dictionary
Emotion degree weight.
3. the method as described in claim 1, which is characterized in that it is described according to the corresponding target word of each participle word to
Amount and emotion degree weight, the step of determining the text information to be sorted corresponding emotion vector, comprising:
According to formulaDetermine the corresponding emotion vector of the text information to be sorted
vector;
Wherein, n is the quantity for the participle word that the text information to be sorted includes, wiFor the corresponding emotion degree of participle word i
Weight, veciFor the corresponding target term vector of participle word i.
4. the method as described in claim 1, which is characterized in that that the emotion vector inputs to training in advance is completed
Two disaggregated models, the step of determining the Sentiment orientation type of the text information to be sorted, comprising:
The emotion vector is inputted into the second disaggregated model that training is completed in advance, obtains point of the second disaggregated model output
Class label;
Corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type determines the feelings of the text information to be sorted
Sense tendency type is positive emotion or negative emotion.
5. method according to any of claims 1-4, which is characterized in that the training method of disaggregated model, comprising:
Obtain preliminary classification model;
Determine the corresponding emotion vector sample of multiple text information samples obtained in advance;
According to the semanteme of each text information sample, the corresponding affective tag of text message sample is determined;
Based on the emotion vector sample and its corresponding affective tag, the parameter of the preliminary classification model, Zhi Daosuo are adjusted
The number of iterations for stating preliminary classification model reaches preset times, or, the prediction label of preliminary classification model output is accurate
Degree reaches preset value, and deconditioning obtains the disaggregated model;Wherein, the disaggregated model includes first disaggregated model
And second disaggregated model.
6. method as claimed in claim 5, which is characterized in that multiple text information samples that the determination obtains in advance are corresponding
Emotion vector sample the step of, comprising:
Word segmentation processing is carried out to the multiple text information samples obtained in advance, obtains point that each text information sample includes
Word sample;
According to the corresponding relationship of predetermined word and term vector, the corresponding term vector sample of each participle sample is determined
This;
Based on predetermined weight setting rule, the corresponding emotion degree weight of each term vector sample is determined;
According to the corresponding term vector sample of each participle sample and emotion degree weight, each text information sample is determined
This corresponding emotion vector sample.
7. a kind of sorter of text information, which is characterized in that described device includes:
Classified literals data obtaining module divides for obtaining text information to be sorted, and to the text information to be sorted
Word processing obtains participle word;
Target term vector determining module determines each described for the corresponding relationship according to predetermined word and term vector
Segment the corresponding target term vector of word;
Emotion degree weight determination module, for determining each participle word based on predetermined weight setting rule
Corresponding emotion degree weight;
Emotion vector determining module, for according to the corresponding target term vector of each participle word and emotion degree weight,
Determine the corresponding emotion vector of the text information to be sorted;
Sentiment orientation determining module, the first classification for completing emotion vector input model training module training in advance
Model, determines whether the text information to be sorted has Sentiment orientation, wherein first disaggregated model includes emotion vector
With the corresponding relationship of Sentiment orientation;
Classification type determining module, it is if there is Sentiment orientation for the text information to be sorted, the emotion vector is defeated
Enter model training module the second disaggregated model that training is completed in advance, determines the Sentiment orientation class of the text information to be sorted
Type, wherein second disaggregated model includes the corresponding relationship of emotion vector Yu Sentiment orientation type.
8. device as claimed in claim 7, which is characterized in that the emotion degree weight determination module includes:
Sentiment dictionary determination unit, for determining sentiment dictionary belonging to each participle word, wherein the sentiment dictionary
For the dictionary being made of various emotion words pre-established;
Emotion degree weight determining unit, for being determined as the emotion for default weight corresponding to identified sentiment dictionary
The corresponding emotion degree weight of participle word corresponding to dictionary.
9. device as claimed in claim 7, which is characterized in that the emotion vector determining module includes:
Emotion vector determination unit, for according to formulaDetermine the text to be sorted
The corresponding emotion vector v ector of information;
Wherein, n is the quantity for the participle word that the text information to be sorted includes, wiFor the corresponding emotion degree of participle word i
Weight, veciFor the corresponding target term vector of participle word i.
10. device as claimed in claim 7, which is characterized in that the classification type determining module includes:
Tag along sort determination unit obtains institute for the emotion vector to be inputted the second disaggregated model that training is completed in advance
State the tag along sort of the second disaggregated model output;
Classification type determination unit is determined for the corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type
The Sentiment orientation type of the text information to be sorted is positive emotion or negative emotion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910568734.2A CN110309308A (en) | 2019-06-27 | 2019-06-27 | Text information classification method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910568734.2A CN110309308A (en) | 2019-06-27 | 2019-06-27 | Text information classification method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110309308A true CN110309308A (en) | 2019-10-08 |
Family
ID=68076813
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910568734.2A Pending CN110309308A (en) | 2019-06-27 | 2019-06-27 | Text information classification method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110309308A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111291564A (en) * | 2020-03-03 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Model training method and device for word vector acquisition and storage medium |
CN111401397A (en) * | 2019-11-05 | 2020-07-10 | 杭州海康威视系统技术有限公司 | Classification method, classification device, classification equipment and storage medium |
CN111460096A (en) * | 2020-03-26 | 2020-07-28 | 北京金山安全软件有限公司 | Fragment text processing method and device and electronic equipment |
CN111753082A (en) * | 2020-03-23 | 2020-10-09 | 北京沃东天骏信息技术有限公司 | Text classification method and device based on comment data, equipment and medium |
CN111783453A (en) * | 2020-07-01 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | Method and device for processing emotion information of text |
CN112784048A (en) * | 2021-01-26 | 2021-05-11 | 海尔数字科技(青岛)有限公司 | Method, device and equipment for emotion analysis of user questions and storage medium |
CN112860887A (en) * | 2021-01-18 | 2021-05-28 | 北京奇艺世纪科技有限公司 | Text labeling method and device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103761239A (en) * | 2013-12-09 | 2014-04-30 | 国家计算机网络与信息安全管理中心 | Method for performing emotional tendency classification to microblog by using emoticons |
US20140365208A1 (en) * | 2013-06-05 | 2014-12-11 | Microsoft Corporation | Classification of affective states in social media |
CN105117428A (en) * | 2015-08-04 | 2015-12-02 | 电子科技大学 | Web comment sentiment analysis method based on word alignment model |
CN106649603A (en) * | 2016-11-25 | 2017-05-10 | 北京资采信息技术有限公司 | Webpage text data sentiment classification designated information push method |
CN107357889A (en) * | 2017-07-11 | 2017-11-17 | 北京工业大学 | A kind of across social platform picture proposed algorithm based on interior perhaps emotion similitude |
CN108052505A (en) * | 2017-12-26 | 2018-05-18 | 上海智臻智能网络科技股份有限公司 | Text emotion analysis method and device, storage medium, terminal |
CN109918499A (en) * | 2019-01-14 | 2019-06-21 | 平安科技(深圳)有限公司 | A kind of file classification method, device, computer equipment and storage medium |
CN109933795A (en) * | 2019-03-19 | 2019-06-25 | 上海交通大学 | Based on context-emotion term vector text emotion analysis system |
-
2019
- 2019-06-27 CN CN201910568734.2A patent/CN110309308A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140365208A1 (en) * | 2013-06-05 | 2014-12-11 | Microsoft Corporation | Classification of affective states in social media |
CN103761239A (en) * | 2013-12-09 | 2014-04-30 | 国家计算机网络与信息安全管理中心 | Method for performing emotional tendency classification to microblog by using emoticons |
CN105117428A (en) * | 2015-08-04 | 2015-12-02 | 电子科技大学 | Web comment sentiment analysis method based on word alignment model |
CN106649603A (en) * | 2016-11-25 | 2017-05-10 | 北京资采信息技术有限公司 | Webpage text data sentiment classification designated information push method |
CN107357889A (en) * | 2017-07-11 | 2017-11-17 | 北京工业大学 | A kind of across social platform picture proposed algorithm based on interior perhaps emotion similitude |
CN108052505A (en) * | 2017-12-26 | 2018-05-18 | 上海智臻智能网络科技股份有限公司 | Text emotion analysis method and device, storage medium, terminal |
CN109918499A (en) * | 2019-01-14 | 2019-06-21 | 平安科技(深圳)有限公司 | A kind of file classification method, device, computer equipment and storage medium |
CN109933795A (en) * | 2019-03-19 | 2019-06-25 | 上海交通大学 | Based on context-emotion term vector text emotion analysis system |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401397A (en) * | 2019-11-05 | 2020-07-10 | 杭州海康威视系统技术有限公司 | Classification method, classification device, classification equipment and storage medium |
CN111291564A (en) * | 2020-03-03 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Model training method and device for word vector acquisition and storage medium |
CN111291564B (en) * | 2020-03-03 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Model training method, device and storage medium for word vector acquisition |
CN111753082A (en) * | 2020-03-23 | 2020-10-09 | 北京沃东天骏信息技术有限公司 | Text classification method and device based on comment data, equipment and medium |
CN111460096A (en) * | 2020-03-26 | 2020-07-28 | 北京金山安全软件有限公司 | Fragment text processing method and device and electronic equipment |
CN111460096B (en) * | 2020-03-26 | 2023-12-22 | 北京金山安全软件有限公司 | Method and device for processing fragmented text and electronic equipment |
CN111783453A (en) * | 2020-07-01 | 2020-10-16 | 支付宝(杭州)信息技术有限公司 | Method and device for processing emotion information of text |
CN111783453B (en) * | 2020-07-01 | 2024-05-21 | 支付宝(杭州)信息技术有限公司 | Text emotion information processing method and device |
CN112860887A (en) * | 2021-01-18 | 2021-05-28 | 北京奇艺世纪科技有限公司 | Text labeling method and device |
CN112860887B (en) * | 2021-01-18 | 2023-09-05 | 北京奇艺世纪科技有限公司 | Text labeling method and device |
CN112784048A (en) * | 2021-01-26 | 2021-05-11 | 海尔数字科技(青岛)有限公司 | Method, device and equipment for emotion analysis of user questions and storage medium |
CN112784048B (en) * | 2021-01-26 | 2023-03-28 | 海尔数字科技(青岛)有限公司 | Method, device and equipment for emotion analysis of user questions and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110309308A (en) | Text information classification method and device and electronic equipment | |
CN108182279B (en) | Object classification method, device and computer equipment based on text feature | |
CN107122346B (en) | The error correction method and device of a kind of read statement | |
US20230237328A1 (en) | Information processing method and terminal, and computer storage medium | |
EP4080889A1 (en) | Anchor information pushing method and apparatus, computer device, and storage medium | |
CN112632385A (en) | Course recommendation method and device, computer equipment and medium | |
CN108052505A (en) | Text emotion analysis method and device, storage medium, terminal | |
CN106202177A (en) | A kind of file classification method and device | |
CN109471942B (en) | Chinese comment emotion classification method and device based on evidence reasoning rule | |
CN106326984A (en) | User intention identification method and device and automatic answering system | |
CN106651057A (en) | Mobile terminal user age prediction method based on installation package sequence table | |
CN112860841A (en) | Text emotion analysis method, device and equipment and storage medium | |
CN108416032A (en) | A kind of file classification method, device and storage medium | |
CN108959329B (en) | Text classification method, device, medium and equipment | |
CN110610193A (en) | Method and device for processing labeled data | |
CN109960791A (en) | Judge the method and storage medium, terminal of text emotion | |
CN111666761A (en) | Fine-grained emotion analysis model training method and device | |
CN109582788A (en) | Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing | |
CN112699283A (en) | Test paper generation method and device | |
CN106776566A (en) | The recognition methods of emotion vocabulary and device | |
CN107291775A (en) | The reparation language material generation method and device of error sample | |
CN106897282A (en) | The sorting technique and equipment of a kind of customer group | |
CN110781673A (en) | Document acceptance method and device, computer equipment and storage medium | |
CN107908649B (en) | Text classification control method | |
CN109101487A (en) | Conversational character differentiating method, device, terminal device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191008 |
|
RJ01 | Rejection of invention patent application after publication |