CN110309308A - Text information classification method and device and electronic equipment - Google Patents

Text information classification method and device and electronic equipment Download PDF

Info

Publication number
CN110309308A
CN110309308A CN201910568734.2A CN201910568734A CN110309308A CN 110309308 A CN110309308 A CN 110309308A CN 201910568734 A CN201910568734 A CN 201910568734A CN 110309308 A CN110309308 A CN 110309308A
Authority
CN
China
Prior art keywords
emotion
text information
vector
word
sorted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910568734.2A
Other languages
Chinese (zh)
Inventor
汪庆辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Internet Security Software Co Ltd
Original Assignee
Beijing Kingsoft Internet Security Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Internet Security Software Co Ltd filed Critical Beijing Kingsoft Internet Security Software Co Ltd
Priority to CN201910568734.2A priority Critical patent/CN110309308A/en
Publication of CN110309308A publication Critical patent/CN110309308A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method and a device for classifying text information and electronic equipment, wherein the method comprises the following steps: acquiring character information to be classified, and performing word segmentation processing to obtain word segmentation words; determining a target word vector corresponding to each participle word according to a predetermined corresponding relation between the word and the word vector; determining the emotion degree weight corresponding to each participle word based on a weight setting rule; determining an emotion vector according to the target word vector corresponding to each word-segmentation word and the emotion degree weight; inputting the emotion vector into a first classification model which is trained in advance, and determining whether the character information to be classified has an emotional tendency; and if the character information to be classified has the emotional tendency, inputting the emotional vector into a second classification model which is trained in advance, and determining the emotional tendency type of the character information to be classified. Therefore, the electronic equipment can accurately determine the emotional tendency type of the character information to be classified, and further research and processing can be conveniently carried out subsequently.

Description

A kind of classification method of text information, device and electronic equipment
Technical field
The present invention relates to technical field of information processing, more particularly to a kind of classification method of text information, device and electricity Sub- equipment.
Background technique
Currently, all there is a large amount of text information in many fields, for example, currently having many items in block chain field Mesh or currency type, and new project is also constantly being emerged in large numbers, and each project has the characteristics that different and application scenarios, goes to solve Different some practical problems.For the things of any new rise at the initial stage just developed, many projects are all very different, users Evaluation to these projects is also to pass different judgements on.These evaluation informations are a kind of text informations.In another example in social platform In, user has a large amount of comment information to video or topic etc., these comment informations are also a kind of text information.
These text informations generally have certain Sentiment orientation, for example, liking, not liking.And these Sentiment orientations It is of great significance to follow-up work.For example, improvement subsequent for the project in block chain field etc. has directive function.? In network platform field, also there is important guiding effect for the improvement of interactive mode, visual effect etc., therefore, at present A kind of method for needing Sentiment orientation type that can determine text information.
Summary of the invention
The classification method for being designed to provide a kind of text information, device and the electronic equipment of the embodiment of the present invention, with right Text information is classified, and determines the Sentiment orientation type of text information.Specific technical solution is as follows:
In a first aspect, the embodiment of the invention provides a kind of classification methods of text information, which comprises
Text information to be sorted is obtained, and word segmentation processing is carried out to the text information to be sorted, obtains participle word;
According to the corresponding relationship of predetermined word and term vector, the corresponding target word of each participle word is determined Vector;
Based on predetermined weight setting rule, the corresponding emotion degree weight of each participle word is determined;
According to the corresponding target term vector of each participle word and emotion degree weight, the text to be sorted is determined The corresponding emotion vector of information;
The emotion vector is inputted into the first disaggregated model that training is completed in advance, determines that the text information to be sorted is It is no that there is Sentiment orientation, wherein first disaggregated model includes the corresponding relationship of emotion vector and Sentiment orientation;
If the text information to be sorted has Sentiment orientation, the emotion vector is inputted into that training in advance is completed Two disaggregated models determine the Sentiment orientation type of the text information to be sorted, wherein second disaggregated model includes emotion The corresponding relationship of vector and Sentiment orientation type.
Optionally, described based on predetermined weight setting rule, determine the corresponding emotion of each participle word The step of degree weight, comprising:
Determine sentiment dictionary belonging to each participle word, wherein the sentiment dictionary is to pre-establish by each The dictionary of kind emotion word composition;
By default weight corresponding to identified sentiment dictionary, it is determined as participle word pair corresponding to the sentiment dictionary The emotion degree weight answered.
Optionally, described according to the corresponding target term vector of each participle word and emotion degree weight, determine institute The step of stating text information to be sorted corresponding emotion vector, comprising:
According to formulaDetermine the corresponding emotion vector of the text information to be sorted vector;
Wherein, n is the quantity for the participle word that the text information to be sorted includes, wiFor the corresponding feelings of participle word i Sense degree weight, veciFor the corresponding target term vector of participle word i.
Optionally, described that the emotion vector is inputted to the second disaggregated model that preparatory training is completed, it determines described wait divide The step of Sentiment orientation type of class text information, comprising:
The emotion vector is inputted into the second disaggregated model that training is completed in advance, obtains the second disaggregated model output Tag along sort;
Corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type determines the text information to be sorted Sentiment orientation type be positive emotion or negative emotion.
Optionally, the training method of disaggregated model, comprising:
Obtain preliminary classification model;
Determine the corresponding emotion vector sample of multiple text information samples obtained in advance;
According to the semanteme of each text information sample, the corresponding affective tag of text message sample is determined;
Based on the emotion vector sample and its corresponding affective tag, the parameter of the preliminary classification model is adjusted, directly The number of iterations to the preliminary classification model reaches preset times, or, the prediction label of preliminary classification model output Accuracy reaches preset value, and deconditioning obtains the disaggregated model;Wherein, the disaggregated model includes first classification Model and second disaggregated model.
Optionally, the step of multiple text information samples that the determination obtains in advance corresponding emotion vector sample, packet It includes:
Word segmentation processing is carried out to the multiple text information samples obtained in advance, obtaining each text information sample includes Participle sample;
According to the corresponding relationship of predetermined word and term vector, the corresponding term vector of each participle sample is determined Sample;
Based on predetermined weight setting rule, the corresponding emotion degree weight of each term vector sample is determined;
According to the corresponding term vector sample of each participle sample and emotion degree weight, each text letter is determined Cease the corresponding emotion vector sample of sample.
Second aspect, the embodiment of the invention provides a kind of sorter of text information, described device includes:
Classified literals data obtaining module, for obtaining text information to be sorted, and to the text information to be sorted into Row word segmentation processing obtains participle word;
Target term vector determining module determines each for the corresponding relationship according to predetermined word and term vector The corresponding target term vector of the participle word;
Emotion degree weight determination module, for determining each participle based on predetermined weight setting rule The corresponding emotion degree weight of word;
Emotion vector determining module, for according to the corresponding target term vector of each participle word and emotion degree power Weight, determines the corresponding emotion vector of the text information to be sorted;
Sentiment orientation determining module, first for completing emotion vector input model training module training in advance Disaggregated model, determines whether the text information to be sorted has Sentiment orientation, wherein first disaggregated model includes emotion The corresponding relationship of vector and Sentiment orientation;
Classification type determining module, if for the text information to be sorted have Sentiment orientation, by the emotion to Input model training module the second disaggregated model that training is completed in advance is measured, determines the Sentiment orientation of the text information to be sorted Type, wherein second disaggregated model includes the corresponding relationship of emotion vector Yu Sentiment orientation type.
Optionally, the emotion degree weight determination module includes:
Sentiment dictionary determination unit, for determining sentiment dictionary belonging to each participle word, wherein the emotion Dictionary is the dictionary being made of various emotion words pre-established;
Emotion degree weight determining unit, for being determined as this for default weight corresponding to identified sentiment dictionary The corresponding emotion degree weight of participle word corresponding to sentiment dictionary.
Optionally, the emotion vector determining module includes:
Emotion vector determination unit, for according to formulaIt determines described to be sorted The corresponding emotion vector v ector of text information;
Wherein, n is the quantity for the participle word that the text information to be sorted includes, wiFor the corresponding feelings of participle word i Sense degree weight, veciFor the corresponding target term vector of participle word i.
Optionally, the classification type determining module includes:
Tag along sort determination unit is obtained for the emotion vector to be inputted the second disaggregated model that training is completed in advance The tag along sort exported to second disaggregated model;
Classification type determination unit, for the corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type, The Sentiment orientation type for determining the text information to be sorted is positive emotion or negative emotion.
Optionally, the model training module includes:
Preliminary classification model acquiring unit, for obtaining preliminary classification model;
Emotion vector sample determination unit, for determining the corresponding emotion vector of multiple text information samples obtained in advance Sample;
Affective tag determination unit determines the text information sample for the semanteme according to each text information sample This corresponding affective tag;
Parameter adjustment unit adjusts described initial for being based on the emotion vector sample and its corresponding affective tag The parameter of disaggregated model, until the number of iterations of the preliminary classification model reaches preset times, or, the preliminary classification model The accuracy of the prediction label of output reaches preset value, and deconditioning obtains the disaggregated model;Wherein, the disaggregated model Including first disaggregated model and second disaggregated model.
Optionally, the emotion vector sample determination unit includes:
Sample acquisition subelement is segmented, for carrying out word segmentation processing to the multiple text information samples obtained in advance, is obtained The participle sample that each text information sample includes;
Term vector sample acquisition subelement determines every for the corresponding relationship according to predetermined word and term vector The corresponding term vector sample of a participle sample;
Emotion degree weight determines subelement, for determining each institute's predicate based on predetermined weight setting rule The corresponding emotion degree weight of vector sample;
Emotion vector sample determines subelement, for according to the corresponding term vector sample of each participle sample and emotion Degree weight determines the corresponding emotion vector sample of each text information sample.
The third aspect, the embodiment of the invention provides a kind of electronic equipment, including processor, communication interface, memory and Communication bus, wherein processor, communication interface, memory complete mutual communication by communication bus;
Memory, for storing computer program;
Processor when for executing the program stored on memory, realizes point of any of the above-described text information Class method and step.
Fourth aspect, the embodiment of the invention provides a kind of computer readable storage medium, the computer-readable storage Dielectric memory contains computer program, and the computer program realizes any of the above-described text information when being executed by processor Classification method step.
In scheme provided by the embodiment of the present invention, electronic equipment obtains text information to be sorted first, and to be sorted Text information carries out word segmentation processing, the participle word that text information to be sorted includes is obtained, according to predetermined word and word The corresponding relationship of vector determines the corresponding target term vector of each participle word, is then based on predetermined weight setting rule Then, the corresponding emotion degree weight of each participle word is determined, according to the corresponding target term vector of each participle word and emotion Degree weight determines the corresponding emotion vector of text information to be sorted, then emotion vector is inputted first that training in advance is completed Disaggregated model, determines whether text information to be sorted has Sentiment orientation, wherein the first disaggregated model includes emotion vector and feelings Feel the corresponding relationship of tendency.If text information to be sorted has Sentiment orientation, emotion vector is inputted what training in advance was completed Second disaggregated model determines the Sentiment orientation type of text information to be sorted, wherein the second disaggregated model include emotion vector with The corresponding relationship of Sentiment orientation type.In this way, electronic equipment can accurately determine the Sentiment orientation class of text information to be sorted Type is further studied and is handled convenient for subsequent.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
A kind of flow chart of the classification method for text information that Fig. 1 is provided by the embodiment of the present invention;
Fig. 2 is a kind of specific flow chart of step S103 in embodiment illustrated in fig. 1;
Fig. 3 is a kind of flow chart of the training method of the disaggregated model based on embodiment illustrated in fig. 1;
Fig. 4 is a kind of specific flow chart of step S302 in embodiment illustrated in fig. 3;
A kind of structural schematic diagram of the sorter for text information that Fig. 5 is provided by the embodiment of the present invention;
Fig. 6 is a kind of concrete structure schematic diagram of emotion degree weight determination module 530 in embodiment illustrated in fig. 5;
Fig. 7 is a kind of concrete structure schematic diagram of emotion vector determining module 540 in embodiment illustrated in fig. 5;
Fig. 8 is a kind of concrete structure schematic diagram of classification type determining module 550 in embodiment illustrated in fig. 5;
Fig. 9 is a kind of concrete structure schematic diagram of the model training module based on embodiment illustrated in fig. 5;
Figure 10 is a kind of concrete structure schematic diagram of emotion vector sample determination unit 902 in embodiment illustrated in fig. 9;
The structural schematic diagram for a kind of electronic equipment that Figure 11 is provided by the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
In order to classify to text information, the Sentiment orientation type of text information is determined, the embodiment of the invention provides Classification method, device, electronic equipment and the computer readable storage medium of text information.
The classification method for being provided for the embodiments of the invention a kind of text information below is introduced.
As shown in Figure 1, a kind of classification method of text information, which comprises
S101 obtains text information to be sorted, and carries out word segmentation processing to the text information to be sorted, obtains participle word Language;
Electronic equipment text information to be sorted available first, wherein text information to be sorted be it needs to be determined that its The text information of Sentiment orientation type.Text information to be sorted can be the text information that generates in various scenes, for example, can be with For the project comment information in block chain, the topic comment information in forum website, video comments information in video website etc., It is not specifically limited again.There may be very big differences for view of the different user to same thing, to the view one of different things As there is also very big differences, so text information to be sorted is likely to occur different types of Sentiment orientation, for example, liking, not liking Vigorously, neutrality etc..
Since electronic equipment can not directly be handled text information, so after obtaining text information to be sorted, in order to The accessible data of electronic equipment are converted by text information to be sorted, classified literals information can be treated and carry out word segmentation processing, And then obtain the participle word that text information to be sorted includes.Wherein, word segmentation processing can be using Text extraction field Any participle mode, as long as the participle word that text information to be sorted includes can be determined, be not specifically limited herein and Explanation.
For example, the text information to be sorted that electronic equipment obtains is " this currency type is more steady, and has good prospects ".That Treat classified literals information carry out word segmentation processing, the available participle word that it includes: " this ", " currency type ", " comparison ", " steady ", " and ", " prospect " and " fine ".
S102 determines that each participle word is corresponding according to the corresponding relationship of predetermined word and term vector Target term vector;
In this step, term vector is a vector, is the manageable data of electronic equipment.It is each in order to determine The corresponding target term vector of word is segmented, electronic equipment can be with the corresponding relationship of predetermined word and term vector.
When determining the corresponding relationship of word and term vector, in one embodiment, in order to which specific aim is stronger, electronics is set The standby corresponding scene of text information to be sorted handled as needed, obtains a large amount of text informations, for example, text to be sorted in advance Information is the comment information in block chain, then the available a large amount of block chain news of electronic equipment and comment information.Certainly, exist In another embodiment, in order to which the term vector of acquisition is more comprehensive, electronic equipment is also a large amount of in available each scene Text information, this is all reasonable.
After obtaining a large amount of text informations, electronic equipment can obtain term vector using word2vec technology, wherein Word2vec is the correlation model that a group is used to generate term vector.These models are the shallow and double-deck neural network, and training is completed Later, word2vec model can be used to map each word to a vector.Therefore, electronic equipment can use word2vec mould Type, word included by a large amount of text informations that will acquire are mapped to a vector, and then determine that word is corresponding with term vector Relationship.
It, can be according to the corresponding relationship so after the participle word that electronic equipment determines that text information to be sorted includes Determine the corresponding target term vector of each participle word.For example, the corresponding relationship of word and term vector is as shown in the table:
Serial number Word Term vector
1 A a
2 B b
3 C c
4 D d
If it is respectively word A, word C and word D that electronic equipment, which determines the participle word that text information to be sorted includes, So corresponding relationship according to shown in upper table, electronic equipment can determine its corresponding target term vector be term vector a, Term vector c and term vector d.
S103 determines the corresponding emotion degree power of each participle word based on predetermined weight setting rule Weight;
In this step, the strong intensity of the emotion as represented by different terms is different, and the stronger word of the strong intensity of emotion Language is bigger for the reference value of subsequent processing, so in order to preferably determine participle word that text information to be sorted includes The strong intensity of emotion, electronic equipment can predefine weight setting rule, and then be advised according to the predetermined weight setting Then, the corresponding emotion degree weight of each participle word is determined.
As an implementation, electronic equipment can determine the strong intensity of its emotion according to the semanteme of each participle word To be high or low, and then determine the corresponding emotion degree weight of each participle word, wherein the strong intensity of emotion is high participle word The corresponding emotion degree weight of language is higher, and the strong intensity of emotion is that the low corresponding emotion degree weight of participle word is lower.
S104, according to the corresponding target term vector of each participle word and emotion degree weight, determine it is described to point The corresponding emotion vector of class text information;
In one embodiment, electronic equipment can be by the corresponding target term vector of each participle word and emotion degree Multiplied by weight obtains product value, and the corresponding product value of all participle words is then added obtained result as to be sorted The corresponding emotion vector of text information, that is, to the corresponding target term vector of each participle word, utilize emotion degree weight It is weighted summation, obtains the corresponding emotion vector of text information to be sorted.
The emotion vector is inputted the first disaggregated model that training is completed in advance, determines the text to be sorted by S105 Whether information has Sentiment orientation, if the text information to be sorted has Sentiment orientation, executes step S106;Otherwise stop Sort operation;
Wherein, the first disaggregated model includes the corresponding relationship of emotion vector and Sentiment orientation, so, electronic equipment is by emotion After vector input the first disaggregated model that training is completed in advance, which can be according to the emotion vector that itself includes With the corresponding relationship of Sentiment orientation, the corresponding Sentiment orientation of emotion vector is determined, and output it.Electronic equipment also can root Result, which is exported, according to it determines whether text information to be sorted has Sentiment orientation.
For example, the Sentiment orientation in the corresponding relationship of emotion vector and Sentiment orientation that the first disaggregated model includes specifically may be used Thinking has emotion, neutral emotion and ameleia.The output result of first disaggregated model can be label, for example, having emotion and nothing Emotion can respectively correspond label 1 and 0.If that the output result of the first disaggregated model is 1, then illustrate text letter to be sorted Breath has Sentiment orientation;If the output result of the first disaggregated model is 0, illustrate that text information to be sorted inclines without emotion To.
Wherein, the first disaggregated model can be the machine learning such as SVM, convolutional neural networks, Recognition with Recurrent Neural Network Model is not specifically limited herein.In order to scheme remove and be laid out it is clear, it is subsequent will be to the training method of the first disaggregated model It is illustrated.
If text information to be sorted does not have Sentiment orientation, illustrate text information to be sorted to subsequent processing and research It helps and little, then electronic equipment can stop the sort operation to the text information to be sorted, and continues to other texts Word information carries out classification processing etc..If text information to be sorted has Sentiment orientation, illustrate text information to be sorted to subsequent The help of processing and research is larger, then further determines that the Sentiment orientation type of text information to be sorted.
The emotion vector is inputted the second disaggregated model that training is completed in advance, determines the text to be sorted by S106 The Sentiment orientation type of information.
Wherein, the second disaggregated model may include the corresponding relationship of emotion vector Yu Sentiment orientation type.In this way, electronics is set After standby the second disaggregated model that emotion vector is inputted to training completion in advance, which can include according to itself The corresponding relationship of emotion vector and Sentiment orientation type determines the corresponding Sentiment orientation type of emotion vector, and outputs it.Electricity Sub- equipment also can export the Sentiment orientation type that result determines text information to be sorted according to it.
Wherein, the second disaggregated model may be the engineerings such as SVM, convolutional neural networks, Recognition with Recurrent Neural Network Model is practised, is not specifically limited herein.In order to scheme remove and be laid out it is clear, it is subsequent will be to the training side of the second disaggregated model Formula is illustrated.
As it can be seen that electronic equipment obtains text information to be sorted first, and treats in scheme provided by the embodiment of the present invention Classified literals information carries out word segmentation processing, the participle word that text information to be sorted includes is obtained, according to predetermined word With the corresponding relationship of term vector, determines the corresponding target term vector of each participle word, be then based on predetermined weight and set Set pattern then, determines the corresponding emotion degree weight of each participle word, according to the corresponding target term vector of each participle word and Emotion degree weight determines the corresponding emotion vector of text information to be sorted, then emotion vector is inputted what training in advance was completed First disaggregated model, determines whether text information to be sorted has Sentiment orientation, wherein the first disaggregated model includes emotion vector With the corresponding relationship of Sentiment orientation.If text information to be sorted has Sentiment orientation, the input of emotion vector has been trained in advance At the second disaggregated model, determine the Sentiment orientation type of text information to be sorted, wherein the second disaggregated model include emotion to The corresponding relationship of amount and Sentiment orientation type.In this way, electronic equipment can accurately determine that the emotion of text information to be sorted is inclined To type, is further studied and handled convenient for subsequent.
As a kind of embodiment of the embodiment of the present invention, as shown in Fig. 2, above-mentioned advised based on predetermined weight setting Then, the step of determining each participle word corresponding emotion degree weight may include:
S201 determines sentiment dictionary belonging to each participle word;
In order to facilitate the corresponding emotion degree weight of participle word of determination text information to be sorted, electronic equipment can be pre- Each vocabulary, is divided into different classes of by the classification of first sorting-out in statistics vocabulary according to the emotional intensity degree of vocabulary, and each classification institute is right The vocabulary answered constitutes a sentiment dictionary, that is to say, that sentiment dictionary is the word being made of various emotion words pre-established Allusion quotation.
For example, the emotional intensity degree of vocabulary can be divided into intense emotion and ameleia, then three feelings can be determined Feel dictionary, respectively intense emotion dictionary and ameleia dictionary.
It is determined that electronic equipment can search each point respectively after the participle word that text information to be sorted includes Sentiment dictionary belonging to word word.For example, text information to be sorted include participle word be " I ", " very good ", " this ", " project ", then electronic equipment can search sentiment dictionary belonging to each participle word, if " I ", " this " and " item Mesh " is in ameleia dictionary, and " very good " is in intense emotion dictionary, then emotion word belonging to " very good " can be determined Allusion quotation is intense emotion dictionary, and sentiment dictionary belonging to " I ", " this " and " project " is ameleia dictionary.
Default weight corresponding to identified sentiment dictionary is determined as participle corresponding to the sentiment dictionary by S202 The corresponding emotion degree weight of word.
Since the reference value of the word of different emotions strength levels is different, the often stronger word of emotional intensity degree Reference value is bigger, so electronic equipment can predefine the corresponding default weight of sentiment dictionary.For example, sentiment dictionary includes Intense emotion dictionary, general sentiment dictionary and ameleia dictionary, then electronic equipment can predefine intense emotion dictionary, one As sentiment dictionary and the corresponding default weight of ameleia dictionary be α, β, γ, and under normal circumstances, α > β > γ.
It, can be by identified emotion after so electronic equipment determines sentiment dictionary belonging to each participle word Default weight corresponding to dictionary is determined as the corresponding emotion degree weight of participle word corresponding to the sentiment dictionary.
For example, if electronic equipment determine " very good " belonging to sentiment dictionary be intense emotion dictionary, " I ", Sentiment dictionary belonging to " this " and " project " is ameleia dictionary, and the corresponding default weight of intense emotion dictionary is α, ameleia The corresponding default weight of dictionary is γ, then can determine that " very good " corresponding emotion degree weight is α, " I ", " this It is a " and " project " corresponding default weight be γ.
As it can be seen that in the present embodiment, electronic equipment can determine sentiment dictionary belonging to each participle word, and then by institute Default weight corresponding to determining sentiment dictionary is determined as the corresponding emotion degree of participle word corresponding to the sentiment dictionary Weight.Since sentiment dictionary is the dictionary being made of various emotion words pre-established, so electronic equipment can be quickly quasi- Really determine the corresponding emotion degree weight of participle word.
As a kind of embodiment of the embodiment of the present invention, it is above-mentioned according to the corresponding target word of each participle word to Amount and emotion degree weight, may include: at the step of determining the text information to be sorted corresponding emotion vector
According to formulaDetermine the corresponding emotion of the text information to be sorted to Measure vector;
Wherein, n is the quantity for the participle word that text information to be sorted includes, wiFor the corresponding emotion journey of participle word i Spend weight, veciFor the corresponding target term vector of participle word i.
That is, electronic equipment can utilize emotion degree weight to the corresponding target term vector of each participle word It is weighted summation, to protrude the participle word of different emotions intensity in the Sentiment orientation class for determining text information to be sorted Effect in type, then using the average value of weighted sum as the corresponding emotion vector of text information to be sorted.In this way, can be more Add convenient for calculating, can also avoid influencing excessive problem caused by the higher participle word of some emotion intensity.
For example, the corresponding target term vector of participle word that text information to be sorted includes be respectively fc1, fc2, fc3, Fc4 grades of fc5, electronic equipment determine that its corresponding emotion degree weight is w1、w2、w3、w4And w5, then according to above-mentioned public affairs Formula, electronic equipment can determine the corresponding emotion vector v ector of classified literals information, i.e., are as follows:
As it can be seen that in the present embodiment, electronic equipment determines the corresponding emotion vector of classified literals information according to above-mentioned formula, It can not only fully consider the participle word of different emotions intensity in the Sentiment orientation type for determining text information to be sorted In effect, can also convenient for calculate, influence caused by the higher participle word of some emotion intensity can also be avoided Excessive problem.
As a kind of embodiment of the embodiment of the present invention, above-mentioned that the emotion vector inputs to training in advance is completed Two disaggregated models the step of determining the Sentiment orientation type of the text information to be sorted, may include:
The emotion vector is inputted into the second disaggregated model that training is completed in advance, obtains the second disaggregated model output Tag along sort;Corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type determines the text to be sorted The Sentiment orientation type of information is positive emotion or negative emotion.
Sentiment orientation type in the corresponding relationship of emotion vector and Sentiment orientation type that second disaggregated model includes has Body can be positive emotion or negative emotion.The output result of first disaggregated model can be tag along sort, for example, positive emotion Tag along sort 1 and -1 can be respectively corresponded with negative emotion.Above-mentioned emotion vector is so inputted the second of training completion in advance Disaggregated model, the second disaggregated model can according to the corresponding relationship of the emotion vector that includes and Sentiment orientation type, output Tag along sort.
In turn, electronic equipment can the corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type, really The Sentiment orientation type of fixed text information to be sorted is positive emotion or negative emotion.For example, positive emotion and negative emotion can To respectively correspond tag along sort 1 and -1, if that the second disaggregated model output tag along sort be 1, then can determine to The Sentiment orientation type of classified literals information is positive emotion;If the tag along sort of the second disaggregated model output is -1, The Sentiment orientation type that can determine text information to be sorted is negative emotion.
As it can be seen that in the present embodiment, emotion vector can be inputted the second classification mould that training in advance is completed by electronic equipment Type obtains the tag along sort of the second disaggregated model output, is then based on predetermined tag along sort and Sentiment orientation type Corresponding relationship determines that the Sentiment orientation type of text information to be sorted is positive emotion or negative emotion.In this way, can accurately really The Sentiment orientation type of fixed text information to be sorted is positive emotion or negative emotion, provides strong ginseng for subsequent processing and research It examines.
As a kind of embodiment of the embodiment of the present invention, as shown in figure 3, the training method of disaggregated model, may include:
Wherein, disaggregated model may include above-mentioned first disaggregated model and above-mentioned second disaggregated model, that is to say, that at this In embodiment, the training method of above-mentioned first disaggregated model and above-mentioned second disaggregated model can be identical.
S301 obtains preliminary classification model;
Firstly, the available preliminary classification model of electronic equipment, wherein the parameter of preliminary classification model can be initial at random Change, is not specifically limited herein.Preliminary classification model can be SVM, convolutional neural networks, Recognition with Recurrent Neural Network etc. Machine learning model is not specifically limited herein.
S302 determines the corresponding emotion vector sample of multiple text information samples obtained in advance;
Electronic equipment can obtain a large amount of text information in advance, as text information sample.In order to train preliminary classification Model, electronic equipment is it needs to be determined that the corresponding emotion vector sample of each text information sample, wherein emotion vector sample is The vector of the strong program of emotion of word included by corresponding text information sample can be characterized.
S303 determines the corresponding affective tag of text message sample according to the semanteme of each text information sample;
Next, the semanteme as text information sample can indicate Sentiment orientation represented by text information sample, institute The corresponding affective tag of text message sample can be determined according to the semanteme of each text information sample with electronic equipment.
In the case of the first, when above-mentioned disaggregated model is above-mentioned first disaggregated model, since the first disaggregated model is for true Whether fixed text information to be sorted has Sentiment orientation, so electronic equipment determines the corresponding affective tag of text information sample i.e. To indicate whether it has the affective tag of Sentiment orientation.For example, can be in love sense, neutral emotion and ameleia three classes emotion Label, or have two class affective tag of emotion and ameleia, this is all reasonable.Affective tag can with preset number, Letter etc. indicates, is not specifically limited herein.
For example, electronic equipment can determine if the semanteme of text information sample is " I does not like this currency type " Its affective tag is in love sense affective tag;If the semanteme of text information sample is " not see to the prospect of this currency type Method ", then electronic equipment can determine its affective tag for neutral emotion affective tag;If the semanteme of text information sample is " what this is ", then electronic equipment can determine that its affective tag is ameleia affective tag.
Under second situation, when above-mentioned disaggregated model is above-mentioned second disaggregated model, since the second disaggregated model is for true The Sentiment orientation type of fixed text information to be sorted, so electronic equipment determines the corresponding affective tag of text information sample is Indicate the affective tag of its Sentiment orientation type.For example, can be two class affective tag of positive emotion and negative ameleia, It can be strong positive emotion, four class affective tag of general positive emotion, strong negative emotion and general negative emotion etc., this is all It is reasonable.Affective tag can be indicated with preset number, letter etc., be not specifically limited herein.
For example, electronic equipment can determine if the semanteme of text information sample is " I does not like this currency type " Its affective tag is to have negative emotion label;If the semanteme of text information sample is " having good prospects for this currency type ", Electronic equipment can determine that its affective tag is front affective tag.
S304 is based on the emotion vector sample and its corresponding affective tag, adjusts the ginseng of the preliminary classification model Number, until the number of iterations of the preliminary classification model reaches preset times, or, the pre- mark of preliminary classification model output The accuracy of label reaches preset value, and deconditioning obtains the disaggregated model.
After emotion vector sample and its corresponding affective tag has been determined, electronic equipment can be by each emotion vector sample The above-mentioned preliminary classification model of this input, preliminary classification model can carry out at classification emotion vector sample based on parameter current Reason, exports its corresponding prediction label.
The parameter of preliminary classification model is the prediction label and emotion vector of the output of preliminary classification model before being optimal The corresponding affective tag of sample can have differences, and electronic equipment can be based on prediction label and its corresponding emotion vector sample The difference of corresponding affective tag adjusts the parameter of preliminary classification model, so that the prediction label of preliminary classification model output It is more and more accurate.The mode for adjusting the parameter of preliminary classification model can be gradient descent algorithm, stochastic gradient descent algorithm etc. Mode is not specifically limited herein and illustrates.
While the constantly parameter of adjustment preliminary classification model, preliminary classification model can gradually Latent abilities vector sample Relationship between emotion intensity represented by this and emotion vector sample, that is, emotion vector sample and affective tag Corresponding relationship, so the output result of preliminary classification model can be more and more accurate.
When the number of iterations of initial disaggregated model reaches preset times, alternatively, the prediction label of preliminary classification model output Accuracy when reaching preset value, illustrate that preliminary classification model has been able to handle emotion vector at this time, it is accurate to obtain Affective tag, then can deconditioning, preliminary classification model at this time can use as above-mentioned disaggregated model.
As it can be seen that in the present embodiment, electronic equipment can obtain above-mentioned first classification mould by the training of above-mentioned training method Type and above-mentioned second disaggregated model, by the available disaggregated model that can export accurate affective tag of the training method, Further increase the accuracy of the classification results of text information to be sorted.
As a kind of embodiment of the embodiment of the present invention, as shown in figure 4, multiple texts letter that above-mentioned determination obtains in advance The step of ceasing sample corresponding emotion vector sample may include:
S401 carries out word segmentation processing to the multiple text information samples obtained in advance, obtains each text information sample Originally the participle sample for including;
Firstly, in order to which the multiple text information samples obtained in advance are converted to the manageable data class of electronic equipment Type, electronic equipment can carry out word segmentation processing to the multiple text information samples obtained in advance, and then obtain each text information The participle sample that sample includes.Wherein, the concrete mode of word segmentation processing can at the above-mentioned participle for treating classified literals information The mode of reason is identical, and details are not described herein.
S402 determines that each participle sample is corresponding according to the corresponding relationship of predetermined word and term vector Term vector sample;
The method of determination of the corresponding relationship of word and term vector can be identical as mode described in above-mentioned steps S102, Details are not described herein.Electronic equipment can determine the corresponding term vector sample of each participle sample by inquiring the corresponding relationship This.
S403 determines the corresponding emotion degree of each term vector sample based on predetermined weight setting rule Weight;
Next, electronic equipment can determine the corresponding emotion degree weight of each term vector sample.In order to make to train It obtains disaggregated model to be more applicable for carrying out above-mentioned text information to be sorted classification processing, predetermined weight setting rule Can be identical as the weight setting rule in above-mentioned steps S103, electronic equipment is based on predetermined weight setting rule, just It can determine the corresponding emotion degree weight of each term vector sample.
S404 is determined each described according to the corresponding term vector sample of each participle sample and emotion degree weight The corresponding emotion vector sample of text information sample.
Similarly, it is more applicable for carrying out at classification above-mentioned text information to be sorted to make training obtain disaggregated model Reason, electronic equipment determine that the mode of the corresponding emotion vector sample of each text information sample can be with above-mentioned determination text to be sorted The mode of the corresponding emotion vector of word information is identical.Can certainly be different, have no effect on the classification knot of text information to be sorted Fruit.
As it can be seen that in the present embodiment, electronic equipment can carry out at participle the multiple text information samples obtained in advance Reason, obtains the participle sample that each text information sample includes, according to the corresponding relationship of predetermined word and term vector, really Determine the corresponding term vector sample of each participle sample, is then based on predetermined weight setting rule, determines each term vector The corresponding emotion degree weight of sample, and then according to the corresponding term vector sample of each participle sample and emotion degree weight, really Determine the corresponding emotion vector sample of each text information sample.In this way, can accurately determine that each text information sample is corresponding Emotion vector sample, so that training obtains disaggregated model and is more applicable for carrying out classification processing to above-mentioned text information to be sorted, Classification results are more accurate.
Corresponding to the classification method of above-mentioned text information, the embodiment of the invention also provides a kind of classification of text information dresses It sets.
The sorter for being provided for the embodiments of the invention a kind of text information below is introduced.
As shown in figure 5, a kind of sorter of text information, described device include:
Classified literals data obtaining module 510, for obtaining text information to be sorted, and to the text information to be sorted Word segmentation processing is carried out, participle word is obtained;
Target term vector determining module 520 determines every for the corresponding relationship according to predetermined word and term vector The corresponding target term vector of a participle word;
Emotion degree weight determination module 530, for determining each described point based on predetermined weight setting rule The corresponding emotion degree weight of word word;
Emotion vector determining module 540, for according to the corresponding target term vector of each participle word and emotion journey Weight is spent, determines the corresponding emotion vector of the text information to be sorted;
Sentiment orientation determining module 550, for complete emotion vector input model training module training in advance First disaggregated model, determines whether the text information to be sorted has Sentiment orientation;
Wherein, first disaggregated model includes the corresponding relationship of emotion vector and Sentiment orientation.
Classification type determining module 560, if there is Sentiment orientation for the text information to be sorted, by the emotion Vector input model training module the second disaggregated model that training is completed in advance, determines that the emotion of the text information to be sorted is inclined To type.
Wherein, second disaggregated model includes the corresponding relationship of emotion vector Yu Sentiment orientation type.
As it can be seen that electronic equipment obtains text information to be sorted first, and treats in scheme provided by the embodiment of the present invention Classified literals information carries out word segmentation processing, the participle word that text information to be sorted includes is obtained, according to predetermined word With the corresponding relationship of term vector, determines the corresponding target term vector of each participle word, be then based on predetermined weight and set Set pattern then, determines the corresponding emotion degree weight of each participle word, according to the corresponding target term vector of each participle word and Emotion degree weight determines the corresponding emotion vector of text information to be sorted, then emotion vector is inputted what training in advance was completed First disaggregated model, determines whether text information to be sorted has Sentiment orientation, wherein the first disaggregated model includes emotion vector With the corresponding relationship of Sentiment orientation.If text information to be sorted has Sentiment orientation, the input of emotion vector has been trained in advance At the second disaggregated model, determine the Sentiment orientation type of text information to be sorted, wherein the second disaggregated model include emotion to The corresponding relationship of amount and Sentiment orientation type.In this way, electronic equipment can accurately determine that the emotion of text information to be sorted is inclined To type, is further studied and handled convenient for subsequent.
As a kind of embodiment of the embodiment of the present invention, as shown in fig. 6, above-mentioned emotion degree weight determination module 530 May include:
Sentiment dictionary determination unit 5301, for determining sentiment dictionary belonging to each participle word, wherein described Sentiment dictionary is the dictionary being made of various emotion words pre-established;
Emotion degree weight determining unit 5302, for determining default weight corresponding to identified sentiment dictionary For the corresponding emotion degree weight of participle word corresponding to the sentiment dictionary.
As a kind of embodiment of the embodiment of the present invention, as shown in fig. 7, above-mentioned emotion vector determining module 540 can be with Include:
Emotion vector determination unit 5401, for according to formulaDetermine it is described to The corresponding emotion vector v ector of classified literals information;
Wherein, n is the quantity for the participle word that the text information to be sorted includes, wiFor the corresponding feelings of participle word i Sense degree weight, veciFor the corresponding target term vector of participle word i.
As a kind of embodiment of the embodiment of the present invention, as shown in figure 8, above-mentioned classification type determining module 550 can be with Include:
Tag along sort determination unit 5501, for the emotion vector to be inputted the second classification mould that training in advance is completed Type obtains the tag along sort of the second disaggregated model output;
Classification type determination unit 5502, for based on predetermined tag along sort pass corresponding with Sentiment orientation type System determines that the Sentiment orientation type of the text information to be sorted is positive emotion or negative emotion.
As a kind of embodiment of the embodiment of the present invention, as shown in figure 9, above-mentioned model training module may include:
Preliminary classification model acquiring unit 901, for obtaining preliminary classification model;
Emotion vector sample determination unit 902, for determining the corresponding emotion of multiple text information samples obtained in advance Vector sample;
Affective tag determination unit 903 determines the text information for the semanteme according to each text information sample The corresponding affective tag of sample;
Parameter adjustment unit 904 adjusts described first for being based on the emotion vector sample and its corresponding affective tag The parameter of beginning disaggregated model, until the number of iterations of the preliminary classification model reaches preset times, or, the preliminary classification mould The accuracy of the prediction label of type output reaches preset value, and deconditioning obtains the disaggregated model.
Wherein, the disaggregated model includes first disaggregated model and second disaggregated model.
As a kind of embodiment of the embodiment of the present invention, as shown in Figure 10, above-mentioned emotion vector sample determination unit 902 May include:
Sample acquisition subelement 9021 is segmented, for carrying out word segmentation processing to the multiple text information samples obtained in advance, Obtain the participle sample that each text information sample includes;
Term vector sample acquisition subelement 9022, for the corresponding relationship according to predetermined word and term vector, really Determine the corresponding term vector sample of each participle sample;
Emotion degree weight determines subelement 9023, for determining each institute based on predetermined weight setting rule The corresponding emotion degree weight of predicate vector sample;
Emotion vector sample determines subelement 9024, for according to the corresponding term vector sample of each participle sample and Emotion degree weight determines the corresponding emotion vector sample of each text information sample.
The embodiment of the invention also provides a kind of electronic equipment, and as shown in figure 11, electronic equipment may include processor 1101, communication interface 1102, memory 1103 and communication bus 1104, wherein processor 1101, communication interface 1102, storage Device 1103 completes mutual communication by communication bus 1104,
Memory 1103, for storing computer program;
Processor 1101 when for executing the program stored on memory 1103, realizes following steps:
Text information to be sorted is obtained, and word segmentation processing is carried out to the text information to be sorted, obtains participle word;
According to the corresponding relationship of predetermined word and term vector, the corresponding target word of each participle word is determined Vector;
Based on predetermined weight setting rule, the corresponding emotion degree weight of each participle word is determined;
According to the corresponding target term vector of each participle word and emotion degree weight, the text to be sorted is determined The corresponding emotion vector of information;
The emotion vector is inputted into the first disaggregated model that training is completed in advance, determines that the text information to be sorted is It is no that there is Sentiment orientation;
Wherein, first disaggregated model includes the corresponding relationship of emotion vector and Sentiment orientation.
If the text information to be sorted has Sentiment orientation, the emotion vector is inputted into that training in advance is completed Two disaggregated models determine the Sentiment orientation type of the text information to be sorted.
Wherein, second disaggregated model includes the corresponding relationship of emotion vector Yu Sentiment orientation type.
As it can be seen that electronic equipment obtains text information to be sorted first, and treats in scheme provided by the embodiment of the present invention Classified literals information carries out word segmentation processing, the participle word that text information to be sorted includes is obtained, according to predetermined word With the corresponding relationship of term vector, determines the corresponding target term vector of each participle word, be then based on predetermined weight and set Set pattern then, determines the corresponding emotion degree weight of each participle word, according to the corresponding target term vector of each participle word and Emotion degree weight determines the corresponding emotion vector of text information to be sorted, then emotion vector is inputted what training in advance was completed First disaggregated model, determines whether text information to be sorted has Sentiment orientation, wherein the first disaggregated model includes emotion vector With the corresponding relationship of Sentiment orientation.If text information to be sorted has Sentiment orientation, the input of emotion vector has been trained in advance At the second disaggregated model, determine the Sentiment orientation type of text information to be sorted, wherein the second disaggregated model include emotion to The corresponding relationship of amount and Sentiment orientation type.In this way, electronic equipment can accurately determine that the emotion of text information to be sorted is inclined To type, is further studied and handled convenient for subsequent.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.
Wherein, above-mentioned based on predetermined weight setting rule, determine the corresponding emotion journey of each participle word The step of degree weight, may include:
Determine sentiment dictionary belonging to each participle word;
Wherein, the sentiment dictionary is the dictionary being made of various emotion words pre-established.
By default weight corresponding to identified sentiment dictionary, it is determined as participle word pair corresponding to the sentiment dictionary The emotion degree weight answered.
Wherein, above-mentioned according to the corresponding target term vector of each participle word and emotion degree weight, determine described in The step of text information to be sorted corresponding emotion vector, may include:
According to formulaDetermine the corresponding emotion vector of the text information to be sorted vector;
Wherein, n is the quantity for the participle word that the text information to be sorted includes, wiFor the corresponding feelings of participle word i Sense degree weight, veciFor the corresponding target term vector of participle word i.
Wherein, above-mentioned that the emotion vector is inputted to the second disaggregated model that training is completed in advance, it determines described to be sorted The step of Sentiment orientation type of text information, may include:
The emotion vector is inputted into the second disaggregated model that training is completed in advance, obtains the second disaggregated model output Tag along sort;
Corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type determines the text information to be sorted Sentiment orientation type be positive emotion or negative emotion.
Wherein, the training method of disaggregated model may include:
Obtain preliminary classification model;
Determine the corresponding emotion vector sample of multiple text information samples obtained in advance;
According to the semanteme of each text information sample, the corresponding affective tag of text message sample is determined;
Based on the emotion vector sample and its corresponding affective tag, the parameter of the preliminary classification model is adjusted, directly The number of iterations to the preliminary classification model reaches preset times, or, the prediction label of preliminary classification model output Accuracy reaches preset value, and deconditioning obtains the disaggregated model.
Wherein, the disaggregated model includes first disaggregated model and second disaggregated model.
It wherein, the step of multiple text information samples that above-mentioned determination obtains in advance corresponding emotion vector sample, can be with Include:
Word segmentation processing is carried out to the multiple text information samples obtained in advance, obtaining each text information sample includes Participle sample;
According to the corresponding relationship of predetermined word and term vector, the corresponding term vector of each participle sample is determined Sample;
Based on predetermined weight setting rule, the corresponding emotion degree weight of each term vector sample is determined;
According to the corresponding term vector sample of each participle sample and emotion degree weight, each text letter is determined Cease the corresponding emotion vector sample of sample.
The embodiment of the invention also provides a kind of computer readable storage medium, the computer readable storage medium memory Computer program is contained, the computer program realizes text information described in any of the above-described embodiment when being executed by processor Classification method.
As it can be seen that when computer program is executed by processor, being obtained first wait divide in scheme provided by the embodiment of the present invention Class text information, and treat classified literals information and carry out word segmentation processing, obtain the participle word that text information to be sorted includes, root According to the corresponding relationship of predetermined word and term vector, determines the corresponding target term vector of each participle word, be then based on Predetermined weight setting rule, determines the corresponding emotion degree weight of each participle word, according to each participle word pair The target term vector and emotion degree weight answered, determine the corresponding emotion vector of text information to be sorted, then emotion vector is defeated Enter the first disaggregated model that training is completed in advance, determine whether text information to be sorted has Sentiment orientation, wherein the first classification Model includes the corresponding relationship of emotion vector and Sentiment orientation.If text information to be sorted have Sentiment orientation, by emotion to Amount input the second disaggregated model that training is completed in advance, determines the Sentiment orientation type of text information to be sorted, wherein second point Class model includes the corresponding relationship of emotion vector Yu Sentiment orientation type.In this way, electronic equipment can accurately determine it is to be sorted The Sentiment orientation type of text information is further studied and is handled convenient for subsequent.
It should be noted that for above-mentioned apparatus, electronic equipment and computer readable storage medium embodiment, due to It is substantially similar to correlation method embodiment, so being described relatively simple, related place is said referring to the part of embodiment of the method It is bright.
Need further exist for explanation, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.
Each embodiment in this specification is all made of relevant mode and describes, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent replacement, improvement and so within the spirit and principles in the present invention, are all contained in protection scope of the present invention It is interior.

Claims (10)

1. a kind of classification method of text information, which is characterized in that the described method includes:
Text information to be sorted is obtained, and word segmentation processing is carried out to the text information to be sorted, obtains participle word;
According to the corresponding relationship of predetermined word and term vector, determine the corresponding target word of each participle word to Amount;
Based on predetermined weight setting rule, the corresponding emotion degree weight of each participle word is determined;
According to the corresponding target term vector of each participle word and emotion degree weight, the text information to be sorted is determined Corresponding emotion vector;
The emotion vector is inputted into the first disaggregated model that training is completed in advance, determines whether the text information to be sorted has There is Sentiment orientation, wherein first disaggregated model includes the corresponding relationship of emotion vector and Sentiment orientation;
If the text information to be sorted has Sentiment orientation, the emotion vector is inputted into second point that training in advance is completed Class model determines the Sentiment orientation type of the text information to be sorted, wherein second disaggregated model includes emotion vector With the corresponding relationship of Sentiment orientation type.
2. the method as described in claim 1, which is characterized in that it is described based on predetermined weight setting rule, it determines every The step of a participle word corresponding emotion degree weight, comprising:
Determine sentiment dictionary belonging to each participle word, wherein the sentiment dictionary is to pre-establish by various feelings Feel the dictionary of word composition;
By default weight corresponding to identified sentiment dictionary, it is corresponding to be determined as participle word corresponding to the sentiment dictionary Emotion degree weight.
3. the method as described in claim 1, which is characterized in that it is described according to the corresponding target word of each participle word to Amount and emotion degree weight, the step of determining the text information to be sorted corresponding emotion vector, comprising:
According to formulaDetermine the corresponding emotion vector of the text information to be sorted vector;
Wherein, n is the quantity for the participle word that the text information to be sorted includes, wiFor the corresponding emotion degree of participle word i Weight, veciFor the corresponding target term vector of participle word i.
4. the method as described in claim 1, which is characterized in that that the emotion vector inputs to training in advance is completed Two disaggregated models, the step of determining the Sentiment orientation type of the text information to be sorted, comprising:
The emotion vector is inputted into the second disaggregated model that training is completed in advance, obtains point of the second disaggregated model output Class label;
Corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type determines the feelings of the text information to be sorted Sense tendency type is positive emotion or negative emotion.
5. method according to any of claims 1-4, which is characterized in that the training method of disaggregated model, comprising:
Obtain preliminary classification model;
Determine the corresponding emotion vector sample of multiple text information samples obtained in advance;
According to the semanteme of each text information sample, the corresponding affective tag of text message sample is determined;
Based on the emotion vector sample and its corresponding affective tag, the parameter of the preliminary classification model, Zhi Daosuo are adjusted The number of iterations for stating preliminary classification model reaches preset times, or, the prediction label of preliminary classification model output is accurate Degree reaches preset value, and deconditioning obtains the disaggregated model;Wherein, the disaggregated model includes first disaggregated model And second disaggregated model.
6. method as claimed in claim 5, which is characterized in that multiple text information samples that the determination obtains in advance are corresponding Emotion vector sample the step of, comprising:
Word segmentation processing is carried out to the multiple text information samples obtained in advance, obtains point that each text information sample includes Word sample;
According to the corresponding relationship of predetermined word and term vector, the corresponding term vector sample of each participle sample is determined This;
Based on predetermined weight setting rule, the corresponding emotion degree weight of each term vector sample is determined;
According to the corresponding term vector sample of each participle sample and emotion degree weight, each text information sample is determined This corresponding emotion vector sample.
7. a kind of sorter of text information, which is characterized in that described device includes:
Classified literals data obtaining module divides for obtaining text information to be sorted, and to the text information to be sorted Word processing obtains participle word;
Target term vector determining module determines each described for the corresponding relationship according to predetermined word and term vector Segment the corresponding target term vector of word;
Emotion degree weight determination module, for determining each participle word based on predetermined weight setting rule Corresponding emotion degree weight;
Emotion vector determining module, for according to the corresponding target term vector of each participle word and emotion degree weight, Determine the corresponding emotion vector of the text information to be sorted;
Sentiment orientation determining module, the first classification for completing emotion vector input model training module training in advance Model, determines whether the text information to be sorted has Sentiment orientation, wherein first disaggregated model includes emotion vector With the corresponding relationship of Sentiment orientation;
Classification type determining module, it is if there is Sentiment orientation for the text information to be sorted, the emotion vector is defeated Enter model training module the second disaggregated model that training is completed in advance, determines the Sentiment orientation class of the text information to be sorted Type, wherein second disaggregated model includes the corresponding relationship of emotion vector Yu Sentiment orientation type.
8. device as claimed in claim 7, which is characterized in that the emotion degree weight determination module includes:
Sentiment dictionary determination unit, for determining sentiment dictionary belonging to each participle word, wherein the sentiment dictionary For the dictionary being made of various emotion words pre-established;
Emotion degree weight determining unit, for being determined as the emotion for default weight corresponding to identified sentiment dictionary The corresponding emotion degree weight of participle word corresponding to dictionary.
9. device as claimed in claim 7, which is characterized in that the emotion vector determining module includes:
Emotion vector determination unit, for according to formulaDetermine the text to be sorted The corresponding emotion vector v ector of information;
Wherein, n is the quantity for the participle word that the text information to be sorted includes, wiFor the corresponding emotion degree of participle word i Weight, veciFor the corresponding target term vector of participle word i.
10. device as claimed in claim 7, which is characterized in that the classification type determining module includes:
Tag along sort determination unit obtains institute for the emotion vector to be inputted the second disaggregated model that training is completed in advance State the tag along sort of the second disaggregated model output;
Classification type determination unit is determined for the corresponding relationship based on predetermined tag along sort Yu Sentiment orientation type The Sentiment orientation type of the text information to be sorted is positive emotion or negative emotion.
CN201910568734.2A 2019-06-27 2019-06-27 Text information classification method and device and electronic equipment Pending CN110309308A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910568734.2A CN110309308A (en) 2019-06-27 2019-06-27 Text information classification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910568734.2A CN110309308A (en) 2019-06-27 2019-06-27 Text information classification method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN110309308A true CN110309308A (en) 2019-10-08

Family

ID=68076813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910568734.2A Pending CN110309308A (en) 2019-06-27 2019-06-27 Text information classification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110309308A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291564A (en) * 2020-03-03 2020-06-16 腾讯科技(深圳)有限公司 Model training method and device for word vector acquisition and storage medium
CN111401397A (en) * 2019-11-05 2020-07-10 杭州海康威视系统技术有限公司 Classification method, classification device, classification equipment and storage medium
CN111460096A (en) * 2020-03-26 2020-07-28 北京金山安全软件有限公司 Fragment text processing method and device and electronic equipment
CN111753082A (en) * 2020-03-23 2020-10-09 北京沃东天骏信息技术有限公司 Text classification method and device based on comment data, equipment and medium
CN111783453A (en) * 2020-07-01 2020-10-16 支付宝(杭州)信息技术有限公司 Method and device for processing emotion information of text
CN112784048A (en) * 2021-01-26 2021-05-11 海尔数字科技(青岛)有限公司 Method, device and equipment for emotion analysis of user questions and storage medium
CN112860887A (en) * 2021-01-18 2021-05-28 北京奇艺世纪科技有限公司 Text labeling method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761239A (en) * 2013-12-09 2014-04-30 国家计算机网络与信息安全管理中心 Method for performing emotional tendency classification to microblog by using emoticons
US20140365208A1 (en) * 2013-06-05 2014-12-11 Microsoft Corporation Classification of affective states in social media
CN105117428A (en) * 2015-08-04 2015-12-02 电子科技大学 Web comment sentiment analysis method based on word alignment model
CN106649603A (en) * 2016-11-25 2017-05-10 北京资采信息技术有限公司 Webpage text data sentiment classification designated information push method
CN107357889A (en) * 2017-07-11 2017-11-17 北京工业大学 A kind of across social platform picture proposed algorithm based on interior perhaps emotion similitude
CN108052505A (en) * 2017-12-26 2018-05-18 上海智臻智能网络科技股份有限公司 Text emotion analysis method and device, storage medium, terminal
CN109918499A (en) * 2019-01-14 2019-06-21 平安科技(深圳)有限公司 A kind of file classification method, device, computer equipment and storage medium
CN109933795A (en) * 2019-03-19 2019-06-25 上海交通大学 Based on context-emotion term vector text emotion analysis system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140365208A1 (en) * 2013-06-05 2014-12-11 Microsoft Corporation Classification of affective states in social media
CN103761239A (en) * 2013-12-09 2014-04-30 国家计算机网络与信息安全管理中心 Method for performing emotional tendency classification to microblog by using emoticons
CN105117428A (en) * 2015-08-04 2015-12-02 电子科技大学 Web comment sentiment analysis method based on word alignment model
CN106649603A (en) * 2016-11-25 2017-05-10 北京资采信息技术有限公司 Webpage text data sentiment classification designated information push method
CN107357889A (en) * 2017-07-11 2017-11-17 北京工业大学 A kind of across social platform picture proposed algorithm based on interior perhaps emotion similitude
CN108052505A (en) * 2017-12-26 2018-05-18 上海智臻智能网络科技股份有限公司 Text emotion analysis method and device, storage medium, terminal
CN109918499A (en) * 2019-01-14 2019-06-21 平安科技(深圳)有限公司 A kind of file classification method, device, computer equipment and storage medium
CN109933795A (en) * 2019-03-19 2019-06-25 上海交通大学 Based on context-emotion term vector text emotion analysis system

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401397A (en) * 2019-11-05 2020-07-10 杭州海康威视系统技术有限公司 Classification method, classification device, classification equipment and storage medium
CN111291564A (en) * 2020-03-03 2020-06-16 腾讯科技(深圳)有限公司 Model training method and device for word vector acquisition and storage medium
CN111291564B (en) * 2020-03-03 2023-10-31 腾讯科技(深圳)有限公司 Model training method, device and storage medium for word vector acquisition
CN111753082A (en) * 2020-03-23 2020-10-09 北京沃东天骏信息技术有限公司 Text classification method and device based on comment data, equipment and medium
CN111460096A (en) * 2020-03-26 2020-07-28 北京金山安全软件有限公司 Fragment text processing method and device and electronic equipment
CN111460096B (en) * 2020-03-26 2023-12-22 北京金山安全软件有限公司 Method and device for processing fragmented text and electronic equipment
CN111783453A (en) * 2020-07-01 2020-10-16 支付宝(杭州)信息技术有限公司 Method and device for processing emotion information of text
CN111783453B (en) * 2020-07-01 2024-05-21 支付宝(杭州)信息技术有限公司 Text emotion information processing method and device
CN112860887A (en) * 2021-01-18 2021-05-28 北京奇艺世纪科技有限公司 Text labeling method and device
CN112860887B (en) * 2021-01-18 2023-09-05 北京奇艺世纪科技有限公司 Text labeling method and device
CN112784048A (en) * 2021-01-26 2021-05-11 海尔数字科技(青岛)有限公司 Method, device and equipment for emotion analysis of user questions and storage medium
CN112784048B (en) * 2021-01-26 2023-03-28 海尔数字科技(青岛)有限公司 Method, device and equipment for emotion analysis of user questions and storage medium

Similar Documents

Publication Publication Date Title
CN110309308A (en) Text information classification method and device and electronic equipment
CN108182279B (en) Object classification method, device and computer equipment based on text feature
CN107122346B (en) The error correction method and device of a kind of read statement
US20230237328A1 (en) Information processing method and terminal, and computer storage medium
EP4080889A1 (en) Anchor information pushing method and apparatus, computer device, and storage medium
CN112632385A (en) Course recommendation method and device, computer equipment and medium
CN108052505A (en) Text emotion analysis method and device, storage medium, terminal
CN106202177A (en) A kind of file classification method and device
CN109471942B (en) Chinese comment emotion classification method and device based on evidence reasoning rule
CN106326984A (en) User intention identification method and device and automatic answering system
CN106651057A (en) Mobile terminal user age prediction method based on installation package sequence table
CN112860841A (en) Text emotion analysis method, device and equipment and storage medium
CN108416032A (en) A kind of file classification method, device and storage medium
CN108959329B (en) Text classification method, device, medium and equipment
CN110610193A (en) Method and device for processing labeled data
CN109960791A (en) Judge the method and storage medium, terminal of text emotion
CN111666761A (en) Fine-grained emotion analysis model training method and device
CN109582788A (en) Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing
CN112699283A (en) Test paper generation method and device
CN106776566A (en) The recognition methods of emotion vocabulary and device
CN107291775A (en) The reparation language material generation method and device of error sample
CN106897282A (en) The sorting technique and equipment of a kind of customer group
CN110781673A (en) Document acceptance method and device, computer equipment and storage medium
CN107908649B (en) Text classification control method
CN109101487A (en) Conversational character differentiating method, device, terminal device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191008

RJ01 Rejection of invention patent application after publication