CN107807914A - Recognition methods, object classification method and the data handling system of Sentiment orientation - Google Patents

Recognition methods, object classification method and the data handling system of Sentiment orientation Download PDF

Info

Publication number
CN107807914A
CN107807914A CN201610812853.4A CN201610812853A CN107807914A CN 107807914 A CN107807914 A CN 107807914A CN 201610812853 A CN201610812853 A CN 201610812853A CN 107807914 A CN107807914 A CN 107807914A
Authority
CN
China
Prior art keywords
short text
pending
emotion degree
text
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610812853.4A
Other languages
Chinese (zh)
Inventor
潘林林
赵争超
林君
肖谦
张昌
张一昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610812853.4A priority Critical patent/CN107807914A/en
Priority to TW106123845A priority patent/TW201812615A/en
Priority to PCT/CN2017/100060 priority patent/WO2018045910A1/en
Publication of CN107807914A publication Critical patent/CN107807914A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

This application provides a kind of recognition methods of Sentiment orientation, object classification method and data handling system.The emotion degree estimation models that the application is built in the recognition methods of Sentiment orientation have taken into full account the classification belonging to short text, so, determine that Sentiment orientation is more accurate based on emotion degree estimation models.In addition, in the object classification method that the application provides, using the text feature information of object, image feature information and further feature information in the lump as the foundation of object classification, so the application provide object classification method, text feature information, image feature information and other characteristic informations can be taken into account, can so improve the accuracy rate of classification.

Description

Recognition methods, object classification method and the data handling system of Sentiment orientation
Technical field
The application is related to technical field of data processing, more particularly to the recognition methods of Sentiment orientation, object classification method and Data handling system.
Background technology
At present, many technical fields are directed to classify to object the problem of, typically, the text according to object Object is classified, object is divided into two classifications:First category or second category.In the text of object, accorded with by punctuate Number can be multiple short texts by text.
Because the meaning of a word of Chinese character enriches, identical short text may correspond to different classifications under different linguistic context.For example, It is that first user is evaluated as " clothes is faint in color, just ", and Article 2 user comments exemplified by clothes user evaluates text by object Valency is " clothes is faint in color, not vivid ".Above-mentioned two object has identical short text " clothes is faint in color ".If press text Classified, then two short texts can be classified as to one kind, but both ought to correspond to different classifications.
It can be seen that in different context, " clothes is faint in color " corresponding positive emotion in first user's evaluation, reason First category should be divided into;" clothes is faint in color " corresponding negative emotion in Article 2 user evaluation, ought to be divided into second category. Therefore, the classification of object is generally determined using Sentiment orientation corresponding to short text at present.
In order to determine the Sentiment orientation of short text, traditional approach is usually manually checked and determines that the emotion of short text is inclined To.Although artificial mark determines that the accuracy rate of the Sentiment orientation of short text is higher, less efficient, it is short batch can not to be applied to The processing of text.
The content of the invention
Present applicant is found in research process:The emotion of processor automatic identification short text can be utilized to incline To.Specific implementation process can be:
Before processing implement body performs, emotion dictionary is first built.Emotion dictionary includes many front vocabulary, for example, " clothing Clothes " " screen is big " " beautiful ", " quick ", " suitable ", " beauty " etc., emotion dictionary is also comprising many negative vocabulary, for example, " clothing Clothes " " ugly ", " at a slow speed ", " screen is small " etc..
Handled to treat process object, treat process object first and carry out cutting by punctuation mark, it is two neighboring It is a short text between punctuation mark, so as to be several pending short texts by pending object cutting.For example, with " clothing Clothes it is very suitable, old mother is delithted with " exemplified by, after punctuation mark cutting, can obtain two short texts " clothes is very suitable " and " old mother is delithted with ".Each short text of pending object, it is pending short text.
Referring to Fig. 1, determine the flow chart of the Sentiment orientation of pending short text for processor, implementation procedure specifically include with Lower step:
Step 1:Processor segments to pending short text, obtains word segmentation result.
According to default word segmentation regulation, pending short text is divided into some words, some words are word segmentation result.
For example, so that pending short text is " clothes is very suitable " as an example, the result obtained after participle is " clothes ", " very " " suitable ".Using pending short text as " mobile phone screen is very big ", then the word segmentation result obtained after segmenting is " mobile phone ", " shielded Curtain ", " very " and " big ".
Due to being segmented to pending short text, it is not the emphasis of the application concern, is not described in detail herein pre- If the specific implementation of word segmentation regulation.
Step 2:By word segmentation result and emotion dictionary, matched by emotion matched rule.
Step 3:It is determined that Sentiment orientation corresponding with pending short text.
Matched by word segmentation result, with emotion dictionary and emotion rule, positive feelings are corresponded to if being segmented in word segmentation result Feel and do not include negative word, it is determined that short text corresponds to positive emotion.If in word segmentation result emotion word correspond to negative emotion and Not comprising negative word, it is determined that short text corresponds to negative emotion.
Processor can perform the process shown in Fig. 1 automatically, incline so as to automatically determine the emotion of pending short text To.But the application applicant has found in research process:Although the above-mentioned process that automatically processes can identify to a certain extent The Sentiment orientation of pending short text, still, the Sentiment orientation for the pending short text that above-mentioned processing procedure obtains may be forbidden Really.
For example, so that object is user's evaluation in Taobao as an example, due to there is many classifications (such as dress ornament class in Taobao Mesh, electronic equipment classification, mother and baby's classification etc.), the article of each classification has corresponding user to evaluate.Applicant is in research process Middle discovery:In inhomogeneity, the short text comprising identical emotion word may correspond to different Sentiment orientations now.
Such as electronic equipment class now, a short text be " screen is very big ", the Sentiment orientation of the short text is front Emotion.Dress ornament class now, a short text be " clothes is very big ", the Sentiment orientation of the short text is negative emotion.From above-mentioned Citing as can be seen that two inhomogeneities now, two short texts have " very big ", so two short texts include identical feelings Feel word, but the two short texts but have different Sentiment orientations.
During processor automatically determines the Sentiment orientation of short text in above-mentioned Fig. 1, processor is for all right As using same processing mode, i.e., existing processing procedure not from object class purpose angle, handle short text respectively Sentiment orientation, so, determine that the Sentiment orientation of short text is inaccurate in the prior art.
Therefore, the application provides a kind of recognition methods of Sentiment orientation, so as to accurately determine pending short text Sentiment orientation.
To achieve these goals, this application provides following technical characteristic:
A kind of recognition methods of Sentiment orientation, including:
Determine that pending short text corresponds to classification mark;Wherein, word claims between a two neighboring punctuation mark of text For short text;
It is determined that the implementation of emotion degree estimation models corresponding with classification mark;
If the implementation of the emotion degree estimation models is the corresponding emotion degree estimation models of all classifications, it is determined that Characteristic set corresponding to pending short text;Wherein, each feature includes in the characteristic set:The pending short text Classification mark belonging to participle and the pending short text;According to the emotion degree estimation models of training in advance, with reference to pending The characteristic set of short text, emotion degree estimation is carried out to pending short text;Wherein, the emotion degree estimation models include:According to It is being obtained after several short text sample trainings according at least two classifications, with Sentiment orientation, output positive emotion degree and The model of negative emotion degree;Based on positive emotion degree and negative emotion degree corresponding to the pending short text, it is determined that described treat Handle Sentiment orientation corresponding to short text;
If the implementation of the emotion degree estimation models is the corresponding emotion degree estimation models of a classification, it is determined that treating Handle characteristic set corresponding to short text;Wherein, each feature includes in the characteristic set:Point of the pending short text Word;According to emotion degree estimation models corresponding with classification mark, with reference to the characteristic set of pending short text, to pending Short text carries out emotion degree estimation;Wherein, the emotion degree estimation models are:According to the classification mark corresponding to, with love The model of obtained after several short text sample trainings of sense tendency, output positive emotion degree and negative emotion degree;Based on institute Positive emotion degree corresponding to pending short text and negative emotion degree are stated, determines that emotion corresponding to the pending short text is inclined To.
Preferably, it is determined that after Sentiment orientation corresponding to the pending short text, in addition to:
Export Sentiment orientation corresponding to the pending short text.
A kind of recognition methods of Sentiment orientation, including:
Determine characteristic set corresponding to pending short text;Wherein, the text between a two neighboring punctuation mark of text Word is referred to as short text;Each feature includes in the characteristic set:The participle of the pending short text and described pending short Classification mark belonging to text;
According to the emotion degree estimation models of training in advance, with reference to the characteristic set of pending short text, to pending short essay This progress emotion degree estimation;Wherein, the emotion degree estimation models include:According at least two classifications, with Sentiment orientation Several short text sample trainings after obtain, output positive emotion degree and negative emotion degree model;
Based on positive emotion degree and negative emotion degree corresponding to the pending short text, the pending short text is determined Corresponding Sentiment orientation.
Preferably, characteristic set corresponding to the pending short text of determination, including:
Obtain classification corresponding to the pending short text to identify, and the pending short text is performed after segmenting operation The word segmentation result of acquisition;
Each participle in the word segmentation result and classification mark are combined, obtain each feature;
By the set of each feature, it is defined as the characteristic set of the pending short text.
Preferably, characteristic set corresponding to the pending short text of determination, including:
Obtain classification corresponding to the pending short text to identify, and the pending short text is performed after segmenting operation The word segmentation result of acquisition;
Each participle in the word segmentation result and classification mark are combined, obtain each feature;
Combinations of features is carried out to each feature using n gram language models, obtains several assemblage characteristics;
By the set of each feature and several assemblage characteristics, it is defined as the feature set of the pending short text Close.
Preferably, it is described that combinations of features is carried out to each feature using n gram language models, it is special to obtain several combinations Sign, including:
Combinations of features is carried out to each feature using two gram language models, obtains several assemblage characteristics.
Preferably, the emotion degree estimation models according to training in advance, it is right with reference to the characteristic set of pending short text Pending short text carries out emotion degree estimation, including:
The characteristic set is inputted to the emotion degree estimation models;
After being estimated as the emotion degree estimation models, export positive emotion degree and negative emotion corresponding to pending short text Degree.
Preferably, it is described to be based on positive emotion degree and negative emotion degree corresponding to the pending short text, it is determined that described Sentiment orientation corresponding to pending short text, including:
Determine the larger emotion degree in both the positive emotion degree and the negative emotion degree;
Judge whether the larger emotion degree is more than default confidence level;
If the larger emotion degree is more than default confidence level, it is determined that Sentiment orientation corresponding to the pending short text with The Sentiment orientation of the larger emotion degree is consistent.
Preferably, the emotion degree estimation models include:
Using maximum entropy model, after the characteristic set of several short texts is trained corresponding to foundation at least two classifications mark The model of obtain, output positive emotion degree and negative emotion degree.
Preferably, it is determined that after Sentiment orientation corresponding to the pending short text, in addition to:
Export Sentiment orientation corresponding to the pending short text.
A kind of recognition methods of Sentiment orientation, including:
Determine characteristic set corresponding to pending short text and classification mark;Wherein, the two neighboring punctuate symbol of a text Word between number is referred to as short text;Each feature includes in the characteristic set:The participle of the pending short text;
It is right with reference to the characteristic set of pending short text according to emotion degree estimation models corresponding with classification mark Pending short text carries out emotion degree estimation;Wherein, the emotion degree estimation models are:According to the classification mark corresponding to, The model of obtained after several short text sample trainings with Sentiment orientation, output positive emotion degree and negative emotion degree;
Based on positive emotion degree and negative emotion degree corresponding to the pending short text, the pending short text is determined Corresponding Sentiment orientation.
Preferably, characteristic set corresponding to the pending short text of determination, including:
Obtain the pending short text and perform the word segmentation result obtained after participle operation;
Participle combination is carried out to each participle using n gram language models, obtains several combination participles;
By the set of each participle and several combination participles, it is defined as the characteristic set of the pending short text, one The individual corresponding feature of participle.
Preferably, characteristic set corresponding to the pending short text of determination, including:
Obtain the pending short text and perform the word segmentation result obtained after participle operation;
By the word segmentation result, it is defined as the characteristic set of the pending short text, one segments a corresponding feature.
Preferably, it is determined that after Sentiment orientation corresponding to the pending short text, in addition to:
Export Sentiment orientation corresponding to the pending short text.
A kind of identifying system of Sentiment orientation, including:
Data providing device, for sending several objects;
Processor, several objects sent for receiving the data providing device, the short text according to several objects Emotion degree estimation models are built, and the Sentiment orientation of pending short text is determined using emotion degree estimation models.
Preferably, the processor, it is corresponding with the classification mark belonging to object that structure emotion degree estimation models are additionally operable to Relation.
Preferably, the system also includes receiving device;
The processor, it is additionally operable to export the Sentiment orientation of the pending text;
The receiving device, for receiving the Sentiment orientation of the pending text.
A kind of identifying system of Sentiment orientation, including:
Data providing device, for sending several objects;
Model construction equipment, several objects sent for receiving the data providing device, according to several objects Short text builds emotion degree estimation models, and sends the emotion degree estimation models;
Processor, pending short essay is determined for receiving the emotion degree estimation models, and using emotion degree estimation models This Sentiment orientation.
Preferably, the model construction equipment, it is additionally operable to build emotion degree estimation models and the classification mark belonging to object Corresponding relation, and corresponding relation is sent to the processor.
Preferably, the system also includes receiving device;
The processor, it is additionally operable to export the Sentiment orientation of the pending text;
The receiving device, for receiving the Sentiment orientation of the pending text.
A kind of object classification method, including:
Determine the characteristic information of pending object;Wherein, the characteristic information includes text feature information and characteristics of image Information, also, the text feature information includes the Sentiment orientation of short text;
According to the classification identification model of training in advance, classification identification is carried out to the characteristic information of the pending object;Its In, the classification identification model is:According to obtained after the characteristic information training of some object samples, first category and the second class Other grader.
Preferably, the characteristic information also includes:
Build the characteristic information of the first main body of the object;And/or
Belong to the characteristic information of the second main body appended by the object.
Preferably, the classification identification model according to training in advance, classification identification, bag are carried out to the characteristic information Include:
The characteristic information is inputted to the classification identification model;Determine first category corresponding to the pending object Matching degree and second category matching degree;
The first category matching degree and second category matching degree are compared;
If first category matching degree is more than second category matching degree, it is determined that the classification of the pending object is the first kind Not;
If second category matching degree is more than first category matching degree, it is determined that the classification of the pending object is the second class Not.
Preferably, in addition to:
After it is determined that the pending object is first category, will the pending object added in object set;
Send the object in the object set.
Preferably, in addition to:
Multiple object samples are received, the object samples derive from the object set, and, meet preset rules;
By the multiple object samples, added in the existing object samples of training classification identification model;
Based on the existing object samples after renewal, re -training classification identification model.
A kind of sorting technique of user's evaluation, including:
Determine the characteristic information of pending user's evaluation;Wherein, the characteristic information includes the text feature of user's evaluation Information, image feature information, the characteristic information of seller and the characteristic information of buyer of user's evaluation, also, the text feature Information includes the Sentiment orientation of short text;
According to the gradient lifting decision-tree model of training in advance, class is carried out to the characteristic information of pending user's evaluation Do not identify;Wherein, the classification identification model is:Evaluated according to some users it is obtaining after the characteristic information training of sample, the The grader of a kind of user's evaluation and the second class user evaluation.
Preferably, in addition to:
After it is determined that the pending user is evaluated as first kind user evaluation, the pending user is evaluated into addition Evaluated to first kind user in set;
Send the first kind user and evaluate set.
Preferably, in addition to:
Multiple first kind user evaluations are received, the first kind user evaluation derives from the first kind user evaluate collection Close;
The multiple first kind user is evaluated, added to classification identification model, existing user is evaluated in sample;
Sample, re -training classification identification model are evaluated based on the existing user after renewal.
A kind of object categorizing system, including:
Data providing device, for sending several objects;
Processor, several objects sent for receiving the data providing device, the characteristic information according to some objects The classification identification model of first category and second category is obtained, exported after training;For determining the characteristic information of pending object; Wherein, the characteristic information includes text feature information and image feature information, also, the text feature information includes short essay This Sentiment orientation;According to the classification identification model, classification identification is carried out to the characteristic information of the pending object;Also use In the object of output first category;
Data receiver, for receiving and using the object of the first category.
A kind of object categorizing system, including:
Data providing device, for sending several objects;
Model construction equipment, several objects sent for receiving the data providing device, according to several objects Obtained after characteristic information training, export the classification identification model of first category and second category, and send the classification identification mould Type;
Processor, for receiving the classification identification model, and determine the characteristic information of pending object;Wherein, it is described Characteristic information includes text feature information and image feature information, also, the text feature information includes the emotion of short text Tendency;According to the classification identification model, classification identification is carried out to the characteristic information of the pending object;It is additionally operable to output A kind of other object;
Data receiver, for receiving and using the object of the first category.
Pass through above technological means, it is possible to achieve following beneficial effect:
The application provides a kind of recognition methods of Sentiment orientation, and this method carries emotion using several corresponding with classification As training sample, the characteristic set for obtaining short text is trained the short text of tendency, and obtains emotion degree estimation models.By The participle and classification mark of short text are included in each feature, so, apply for that the emotion degree estimation models of structure take into full account Classification belonging to short text.Therefore, the Sentiment orientation for the pending short text determined based on emotion degree estimation models is also more Accurately.
Brief description of the drawings
, below will be to embodiment or existing in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of application, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the flow chart for the Sentiment orientation that prior art determines pending short text;
Fig. 2 a-2b are the structural representation of the identifying system for the Sentiment orientation that the embodiment of the present application provides;
Fig. 3 a-3c are the schematic diagram of the corresponding relation of the emotion degree estimation models that the embodiment of the present application provides and classification;
Fig. 4 a-4c are the flow chart for the structure emotion degree estimation models that the embodiment of the present application provides;
Fig. 5 is the flow chart for the another structure emotion degree estimation models that the embodiment of the present application provides;
Fig. 6 a-6b are the flow chart for the another structure emotion degree estimation models that the embodiment of the present application provides;
Fig. 7 is the flow chart of the recognition methods for the Sentiment orientation that the embodiment of the present application provides;
Fig. 8 a-8b are the flow chart of the recognition methods for the Sentiment orientation that the embodiment of the present application provides;
Fig. 9 is the flow chart of the recognition methods for the Sentiment orientation that the embodiment of the present application provides;
Figure 10 is the flow chart of the recognition methods for the Sentiment orientation that the embodiment of the present application provides;
Figure 11 a-11b are the flow chart of the recognition methods for the Sentiment orientation that the embodiment of the present application provides;
Figure 12 is the flow chart for the object classification method that the embodiment of the present application provides;
Figure 13 is the flow chart for the another object classification method that the embodiment of the present application provides;
Figure 14 is the flow chart for the another object classification method that the embodiment of the present application provides;
Figure 15 is the flow chart for the another object classification method that the embodiment of the present application provides;
Figure 16 is a kind of structural representation for object categorizing system that the embodiment of the present application provides;
Figure 17 is the structural representation for another object categorizing system that the embodiment of the present application provides;
Figure 18 is the flow chart of the scene embodiment for the object classification method that the embodiment of the present application provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete Site preparation describes, it is clear that described embodiment is only some embodiments of the present application, rather than whole embodiments.It is based on Embodiment in the application, those of ordinary skill in the art are obtained every other under the premise of creative work is not made Embodiment, belong to the scope of the application protection.
In order to accurately determine the Sentiment orientation of pending short text, the application proposes the technology of structure emotion degree estimation models Means, to estimate positive emotion degree corresponding to pending short text and negative emotion degree using emotion degree estimation models.Wherein, Positive emotion degree is used to represent that pending short text belongs to the degree of positive emotion, and similarly, negative emotion degree is used to represent to wait to locate The text that is in the wrong belongs to the degree of negative emotion.It is determined that after positive emotion degree and negative emotion degree, it may further determine that and treat Handle the Sentiment orientation of short text.
In order that those skilled in the art more have a clear understanding of the application scenarios of the application, referring to Fig. 2 a or Fig. 2 b, for this Application provides the identifying system of Sentiment orientation.
The identifying system for the Sentiment orientation that Fig. 2 a are provided specifically includes:Data providing device 100, with data providing device 100 connected processors 200.
Wherein, data providing device 100 is used to send several objects to processor 200.Processor 200, for foundation The short text of several objects builds emotion degree estimation models, and the feelings of pending short text are determined using emotion degree estimation models Sense tendency.
The application also provides the identifying system of another Sentiment orientation (referring to Fig. 2 b).
The identifying system for the Sentiment orientation that Fig. 2 b are provided specifically includes:Data providing device 100, with data providing device phase Model construction equipment 300 even, the processor 200 being connected with the model construction equipment.Model construction equipment 300 can be tool There is the processing equipment of disposal ability.
Wherein, data providing device 100 is used to send several objects to model construction equipment 300.Model construction equipment 300, emotion degree estimation models are built for the short text according to several objects, and emotion degree estimation models are sent to processing Device 200.Processor 200, for determining the Sentiment orientation of pending short text using emotion degree estimation models.
In the identifying system for the Sentiment orientation that Fig. 2 a and Fig. 2 b are proposed, processor 200 and model construction equipment 300 To perform the process of structure emotion degree estimation models, also, both processes of structure emotion degree estimation models are consistent.Cause This, processing equipment is referred to as by processor 200 or model construction equipment 300, to introduce structure emotion degree estimation mould following During type, using processing equipment come unified representation processor 200 or model construction equipment 300.
The receiving device that can also include being connected with processor in the system shown in Fig. 2 a and Fig. 2 b (does not show in diagram Go out).After the Sentiment orientation that processor determines pending short text, processor, it is additionally operable to export the feelings of the pending text Sense tendency;The receiving device, for receiving the Sentiment orientation of the pending text, wait to locate so that receiving device can utilize The Sentiment orientation for managing text performs other processing procedures.
The process of structure emotion degree estimation models is described below.Because prior art determines that the emotion of pending short text is inclined The classification of short text is not considered during, so the Sentiment orientation determined in the prior art is inaccurate.Therefore, the application The classification of short text is considered during processing equipment builds emotion degree estimation models, so as to the emotion degree estimation models of structure The positive emotion degree and negative emotion degree of pending short text can accurately be determined.
The application proposes three kinds of implementations of processing equipment structure emotion degree estimation models, is three kinds referring to Fig. 3 a-3c The schematic diagram of classification and emotion degree estimation models in implementation.
The first implementation:The corresponding emotion degree estimation models of all classifications (referring to Fig. 3 a).Second of realization side Formula:Each corresponding emotion degree estimation models of classification (referring to Fig. 3 b).The third implementation:Between the first implementation A kind of implementation between second of implementation (referring to Fig. 3 c);Assuming that there is N number of classification, then the third implementation can To build M emotion degree estimation models, wherein, M is non-zero natural number, and, 1 < M < N.
The specific implementation process of these three implementations is described in detail below:
The first implementation:The corresponding emotion degree estimation models of all classifications.
In order to accurately determine Sentiment orientation corresponding to the short text of each class now, this implementation builds for all classifications One corresponding emotion degree estimation models.
Referring to Fig. 4 a, for the process of emotion degree estimation models corresponding to all classifications, following steps are specifically included:
Step S401:It is determined that the short text sample for building emotion degree estimation models.
A) several objects of each class of data providing device transmission now are obtained, and cutting is carried out to each object, Obtain the short text set of each object.
Data providing device can send the object of each class now to processing equipment, and processing equipment can obtain each class Now multiple objects.In order to facilitate subsequent treatment, processing equipment can carry out cutting to each object by punctuation mark, so as to It is multiple short texts by each object cutting.
For example, by object be Taobao user evaluation exemplified by, dress ornament class now a user evaluation " clothes is very suitable, Old mother is delithted with ", then according to punctuation mark cutting after, two short texts " clothes is very suitable " can be obtained and " old mother likes very much Vigorously ".Target short text.For example, " mobile phone screen is very big, and outward appearance is very beautiful " is evaluated in the user of electronic equipment class now, Then according to punctuation mark cutting after, two short texts " mobile phone screen is very big " and " outward appearance is very beautiful " can be obtained.
B) the short text sample for building emotion degree estimation models is filtered out in all short texts.
Found by experiment, the implementation procedure shown in Fig. 1, determine a short text belong to the accuracy rate of positive emotion compared with Height, determine that the accuracy rate that a short text belongs to negative emotion is relatively low.
Therefore, processing equipment can perform each short text process as shown in Figure 1 in this step, if pressing Fig. 1 institutes The process shown, determine that a short text corresponds to positive emotion.So, determine that the short text can be used for building the estimation of emotion degree Model, and, the short text corresponds to positive emotion.
If process as shown in Figure 1, determine that a short text corresponds to negative emotion.So, then by artificial carry out further Confirmation.If a short text belongs to negative emotion after manual confirmation, it is determined that the short text can be used for building emotion degree Estimation models, and, the short text corresponds to negative emotion.
If a short text belongs to positive emotion after manual confirmation, the characteristics of illustrating the short text unobvious, it is uncomfortable Cooperate to build the short text of emotion degree estimation models.Therefore the short text is then abandoned.
Step S402:It is determined that characteristic set corresponding to each short text.
The word segmentation result that each short text can be obtained during step S401 is using shown in Fig. 1 (is referred in Fig. 1 and walked Rapid 1, will not be repeated here).Then, characteristic set corresponding to each short text is further determined that.
This step can have two kinds of executive modes, and the difference of two ways is:The feature set that first way is determined Assemblage characteristic is included in conjunction, and the characteristic set determined in the second way does not include assemblage characteristic.
Due to determine characteristic set corresponding to each short text be it is consistent, therefore, by taking a target short text as an example, Pair determine that the process of characteristic set of target short text describes in detail.
Referring to Fig. 4 b, to determine the detailed process of the first executive mode of the characteristic set of target short text:
Step 411:Obtain classification corresponding to the pending short text to identify, and the pending short text is performed and divided The word segmentation result obtained after word operation.
Processing equipment has obtained the word segmentation result of target short text in step S301.Because target short text is located with waiting The classification of reason object is consistent, and therefore, processing equipment can identify the classification of pending object, be defined as target short text Classification mark.
Dress ornament classification is belonged to target short text, and is word segmentation result corresponding to target short text exemplified by " clothes is very big " For " clothes " " very " and " big ", it is assumed that dress ornament classification is identified as " 16 ", then the corresponding classification of target short text is identified as “16”。
Electronic equipment classification is belonged to target short text, and exemplified by " screen is very big ", to be segmented corresponding to target short text As a result it is " screen " " very " and " big ", it is assumed that electronic equipment classification is identified as " 10 ", then the corresponding classification of target short text It is identified as " 10 ".
Step 412:Each participle and classification mark are combined, obtain each feature.
Because participle is probably consistent corresponding to the short text of inhomogeneity now, therefore, in order to take into full account classification pair Each participle is combined by the influence of short text, the application with classification, obtains each feature.
Because feature contains classification mark, also, inhomogeneity purpose mark is different, so can be accurate using feature Really distinguish inhomogeneity purpose participle.So, the emotion degree estimation models for training to obtain can accurately distinguish the phase of inhomogeneity now With participle.
Continue to continue the example above, by taking target short text " clothes is very big " as an example, then each spy corresponding to target short text Sign can be " clothes 16 " " very 16 " and " big 16 ".By taking target short text " screen is very big " as an example, then corresponding to target short text Each feature can be " screen 10 " " very 10 " and " big 10 ".Stand in characteristic angle, it is " big that processing equipment can tell participle 16 " with " big 10 " features that to be two different, and two features belong to different classifications.
In this citing, the combination of participle and classification mark can also be class to segment after preceding, class target identification Target is known in preceding, participle rear.Certainly, participle and classification mark can also have other combinations, not limit herein.
Step 413:N member combinations are carried out to each feature, obtain several assemblage characteristics.
Because being found by research process, some features have a regular collocation, for example, " there is no aberration ", " do not fall Color ", " there is no balling-up " etc..For this regular collocation, because two words are the vocabulary of negative emotion, but both are superimposed Expression of getting up is then positive emotion, if so such vocabulary can cause certain erroneous judgement if separating.Therefore, the present embodiment can To carry out combinations of features.
Specifically, to be combined using n gram language models to each feature of each short text.N is that non-zero is natural Count, the participle in the corresponding short text of a member in n gram language models.N gram language models carry out combinations of features: N adjacent feature is merged, n-1 feature is merged, is merged until by 2 features.
By taking n=2 as an example, if target short text it is each be characterized as " clothes 16 ", " very 16 " and " big 16 ", utilize binary After language model carries out combinations of features, acquisition assemblage characteristic is " clothes 16 very 16 " and " very 16 big 16 ".
By taking n=3 as an example, if target short text it is each be characterized as " clothes 16 ", " very 16 " and " big 16 ", carry out ternary After language model carries out combinations of features, assemblage characteristic is obtained as " clothes 16 very 16 is big by 16 ", " clothes 16 very 16 " and " very 16 is big 16”。
Step 414:By the set of each feature and several assemblage characteristics, it is defined as the feature set of the target short text Close.
Continue above-described embodiment, then so that two gram language models carry out combinations of features as an example, then the target short essay that finally obtains This characteristic set includes:" clothes 16 ", " very 16 ", " big by 16 ", " clothes 16 very 16 " and " very 16 big 16 ".
Referring to Fig. 4 c, to determine the detailed process of second of executive mode of the characteristic set of target short text:
Step 421:Obtain classification corresponding to the pending short text to identify, and the pending short text is performed and divided The word segmentation result obtained after word operation.
Step 422:Each participle and classification mark are combined, obtain each feature.
Step S421 and step S422 implementation procedure in Fig. 4 c is consistent with the step S411 in Fig. 4 b and step S412, Repeating no more.
Step 423:By the set of each feature, it is defined as the characteristic set of the target short text.
Lack in Fig. 4 c implementation procedure carry out combinations of features the step of, so, can directly will in step S422 really The set of fixed each feature, it is defined as the characteristic set of target short text.
So that target short text is " clothes is very big " as an example, then the feature of the target short text finally obtained after being performed by Fig. 4 c Set includes:" clothes 16 ", " very 16 ", " big 16 ".
Fig. 4 a are then returned to, into step S403:Determine the emotion of each feature in each short text character pair set Tendency, and the positive emotion degree and negative emotion degree of each feature, and by emotion corresponding to each feature and each feature Tendency, positive emotion degree and negative emotion degree, the input parameter as emotion degree estimation models.
During step S401 performs Fig. 1 embodiments, it has been determined that the Sentiment orientation of short text.Due to each feature Sentiment orientation and the Sentiment orientation of short text be consistent.Therefore, when short text corresponds to positive emotion, characteristic set is determined In each feature correspond to positive emotion;When short text corresponds to negative emotion, determine that each feature corresponds to negative in characteristic set Emotion.
By taking a feature as an example, pair positive emotion degree of determination feature and the process of negative emotion degree describe in detail. Processing equipment can obtain the same feature of many quantity, also, Sentiment orientation corresponding to this feature may be identical, may not Together.
Therefore, processing equipment can count the total quantity of this feature, and count the first quantity for belonging to positive emotion, and Belong to the second quantity of negative emotion.The proportionate relationship of the first quantity of foundation and total quantity, determine the positive emotion degree of this feature; The proportionate relationship of the first quantity of foundation and total quantity, determine the negative emotion degree of this feature.
Step S404:It is trained according to default sorter model, and the emotion degree estimation models obtained after being trained.
Default sorter model can include maximum entropy model, SVMs, neural network algorithm etc..Relevant training The existing related art method of process, will not be repeated here.
Second of implementation of processing equipment structure emotion degree estimation models is described below, in second of implementation An emotion degree estimation models are built for each classification, therefore, because an only classification, institute in each emotion degree estimation models To be segmented in second of implementation that is, feature, therefore without will participle and class target in second of implementation Knowledge is combined.
Because the building process of emotion degree estimation models corresponding to each classification is consistent.Therefore, with a target class Exemplified by mesh, the process of target emotion degree estimation models corresponding to structure target classification is described in detail.
Referring to Fig. 5, the process of structure target emotion degree estimation models specifically includes following steps:
Step S501:It is determined that the short text sample of structure target emotion degree estimation models.
A) several objects of the target class of data providing device transmission now are obtained, and cutting is carried out to each object, Obtain the short text set of each object.
B) short text for building emotion degree estimation models is filtered out in all short texts.
Step S501 specific implementation procedure is similar with step S401 implementation procedure, will not be repeated here.
Step S502:It is determined that characteristic set corresponding to each short text.
The word segmentation result that each short text can be obtained during step S501 is using shown in Fig. 1 (is referred in Fig. 1 and walked Rapid 1, will not be repeated here).Then, characteristic set corresponding to each short text is further determined that.This step can have two kinds to hold Line mode, the difference of two ways are:Include assemblage characteristic in the characteristic set that first way is determined, and second of side The characteristic set determined in formula does not include assemblage characteristic.
Due to determine characteristic set corresponding to each short text be it is consistent, therefore, by taking a target short text as an example, Pair determine that the process of characteristic set of target short text describes in detail.
Referring to Fig. 6 a, to determine the detailed process of the first executive mode of the characteristic set of target short text:
Step 601:Word segmentation result corresponding to the target short text is obtained, each corresponding feature of participle.
Step 602:N member combinations are carried out to each feature, obtain several assemblage characteristics.
Step 603:By the set of each feature and several assemblage characteristics, it is defined as the feature set of the target short text Close.
Using pending short text as " clothes is very big ", so that two gram language models carry out combinations of features as an example, then the present embodiment The characteristic set of the target short text finally obtained includes:" clothes ", " very ", " big ", " clothes is very " and " very big ".
Referring to Fig. 6 b, to determine the detailed process of second of executive mode of the characteristic set of target short text:
Step 611:Word segmentation result corresponding to the target short text is obtained, each corresponding feature of participle.
Step 612:By word segmentation result, it is defined as the characteristic set of the target short text.
Lack in Fig. 6 b implementation procedure carry out combinations of features the step of, so, can directly will in step S611 really The set of fixed each feature, it is defined as the characteristic set of target short text.
So that target short text is " clothes is very big " as an example, then the feature of the target short text finally obtained after being performed by Fig. 6 b Set includes:" clothes ", " very ", " big ".
Fig. 5 is then returned to, into step S503:Determine each special in target class each short text character pair set now The Sentiment orientation of sign, and the positive emotion degree and negative emotion degree of each feature, and by target class now each feature and Sentiment orientation, positive emotion degree and negative emotion degree corresponding to each feature, the input as target emotion degree estimation models are joined Number.
During step S501 performs Fig. 1 embodiments, it has been determined that the Sentiment orientation of each short text.Due to each The Sentiment orientation of feature and the Sentiment orientation of short text are consistent.Therefore, when short text corresponds to positive emotion, feature is determined Each feature corresponds to positive emotion in set;When short text corresponds to negative emotion, determine that each feature is corresponding in characteristic set Negative emotion.
Step S504:It is trained according to default sorter model, and the target emotion degree estimation obtained after being trained Model.
Default sorter model can include maximum entropy model, SVMs, neural network algorithm etc..Relevant training The existing related art method of process, will not be repeated here.
Fig. 5 is the emotion degree estimation models of one classification of structure, and Fig. 3 is the emotion degree estimation models for building all classifications Process, both processing steps are much like, therefore, the implementation procedure of Fig. 5 embodiment, may be referred to the specific of Fig. 4 and perform Journey, it will not be repeated here.
In the second implementation, the corresponding emotion degree estimation models of each classification.Therefore, in order to avoid obscuring, locate Reason equipment can be also built between emotion degree estimation models and classification mark after an emotion degree estimation models structure finishes Mapping, can accurately determine emotion degree estimation models corresponding with each classification so as to subsequent processor when in use.
The third implementation of processing equipment structure emotion degree estimation models is described below.
In the third implementation, it can include:Emotion degree estimation models corresponding to two or more classifications, And/or emotion degree estimation models corresponding to a classification.The structure of emotion estimation models corresponding to two or more classifications Process, it may be referred to the embodiment shown in Fig. 4.Emotion degree estimation models, refer to the implementation shown in Fig. 5 corresponding to one classification Example, will not be repeated here.
With reference to Fig. 2 a and Fig. 2 b, if the processing equipment of above-mentioned structure emotion degree estimation models is the feelings of itself of processor 200 , can be directly to use, to determine to treat using emotion degree estimation models after processor 200 completes emotion degree estimation models under condition Handle the Sentiment orientation of short text.
In the case where processing equipment is model construction equipment 300, model construction equipment 300 can be by emotion degree estimation models Send to processor 200, so that processor 200 determines using emotion degree estimation models the Sentiment orientation of pending short text.
The process that processor 200 determines the Sentiment orientation of pending short text according to emotion degree estimation models is described below. Because emotion degree estimation models have three kinds of different implementations, under different implementations, the implementation procedure of processor 200 It is not quite similar, so, introduce separately below under the different implementations of emotion degree estimation models, the implementation procedure of processor.
The first:
It is real using the first implementation (the corresponding emotion degree estimation models of all classifications) in emotion degree estimation models In the case of existing, processor 200 determines the Sentiment orientation of pending short text in the following ways.
Referring to Fig. 7, a kind of recognition methods of Sentiment orientation of the application, following steps are specifically included:
Step S701:Determine characteristic set corresponding to pending short text;Wherein, each feature bag in the characteristic set Include:Classification mark belonging to the participle of pending short text and the pending text.
Assuming that the first implementation is determined it is determined that during emotion degree estimation models using the first executive mode The characteristic set of short text;Pending short text characteristic set is also then determined using the first executive mode in this step.
Referring to Fig. 8 a, the first executive mode of characteristic set corresponding to pending short text is determined, is specifically included following Step:
Step S801:Obtain classification corresponding to the pending short text to identify, and the pending short text performs The word segmentation result obtained after participle operation.
Step S802:Each participle in word segmentation result and classification mark are combined, obtain each feature.
Step S803:N member combinations are carried out to each feature, obtain several assemblage characteristics.
Step S804:By the set of each feature and several assemblage characteristics, it is defined as the spy of the pending short text Collection is closed.
Fig. 8 a implementation procedure can be found in Fig. 4 a implementation procedure, will not be repeated here.
Assuming that the first implementation is determined it is determined that during emotion degree estimation models using second of executive mode The characteristic set of short text;The characteristic set of pending short text is then also determined using second of executive mode in this step.
Referring to Fig. 8 b, second of executive mode of characteristic set corresponding to pending short text is determined, is specifically included following Step:
Step S811:Obtain classification corresponding to the pending short text to identify, and the pending short text performs The word segmentation result obtained after participle operation.
Step S812:Each participle in word segmentation result and classification mark are combined, obtain each feature.
Step S813:By the set of each feature, it is defined as the characteristic set of the pending short text.
Fig. 8 b implementation procedure can be found in Fig. 4 b implementation procedure, will not be repeated here.
It is then returned to Fig. 7, step S702:According to the emotion degree estimation models of training in advance, with reference to pending short text Characteristic set, emotion degree estimation is carried out to pending short text;Wherein, the emotion degree estimation models include:According at least two Obtained after individual classification, several short text sample trainings with Sentiment orientation, output positive emotion degree and negative emotion degree Model.
Processor inputs the characteristic set to the emotion degree estimation models, is estimated by the emotion degree estimation models After export positive emotion degree corresponding to the characteristic set and negative emotion degree.
Step S703:Based on positive emotion degree and negative emotion degree corresponding to the pending short text, it is determined that described treat Handle Sentiment orientation corresponding to short text.
It is determined that Sentiment orientation corresponding to the pending short text, can also be exported corresponding to the pending short text Sentiment orientation, to carry out the use of other side.
Estimation obtains the positive emotion degree that pending short text belongs to positive emotion, and pending text in step S702 Originally belong to after the negative emotion degree of negative emotion, can will just in order to further determine that the Sentiment orientation of pending short text Face emotion degree is contrasted with negative emotion degree.If positive emotion degree is more than negative emotion degree, it is determined that pending short text category In corresponding positive emotion;If negative emotion degree is more than positive emotion degree, it is determined that pending short text corresponds to negative emotion.
In some cases, positive emotion degree and negative emotion degree are more or less the same.Example is expressed as using probability with emotion degree, The probable value of positive emotion degree is 0.51, and the probable value of negative emotion degree is 0.49.It is understood that due to positive emotion degree It is very close with negative emotion degree, so theoretically can not accurately determine the Sentiment orientation of pending short text.But In this case, determining the Sentiment orientation of pending short text still according to upper segmented mode, then error occurs.
Therefore, the Sentiment orientation that in the following manner carrys out pending short text is provided referring to Fig. 9, the application.
Step S901:Determine the larger emotion degree in both the positive emotion degree and the negative emotion degree.
Positive emotion degree and negative emotion degree are contrasted, it is determined that the larger emotion degree in both.If positive emotion degree More than negative emotion degree, it is determined that positive emotion degree is larger emotion degree;If negative emotion degree is more than positive emotion degree, it is determined that Negative emotion degree is larger emotion degree.
Step S902:Judge whether the larger emotion degree is more than default confidence level.
In order to judge whether larger emotion degree is credible, and the application has preset default confidence level.Default confidence level is true The fixed larger believable degree of emotion degree.Then, the size of larger emotion degree and default confidence level is judged.
Step S903:If the larger emotion degree is more than default confidence level, it is determined that corresponding to the pending short text Sentiment orientation is consistent with the Sentiment orientation of the larger emotion degree.
If larger emotion degree is more than default confidence level, it is determined that the confidence level of larger emotion degree is higher.Therefore, can be accurate Determine the Sentiment orientation of pending short text.Now, the Sentiment orientation of the Sentiment orientation of pending short text and larger emotion degree Unanimously.
That is, if larger emotion degree corresponds to positive emotion degree, it is determined that pending short text belongs to corresponding positive emotion;If compared with Big emotion degree corresponds to negative emotion degree, it is determined that pending short text corresponds to negative emotion.
Assuming that larger emotion degree is 0.8, it is 0.7 to preset confidence level, then in this case, it is possible to accurately determine pending short The Sentiment orientation of text.
Step S904:If the larger emotion degree is not more than default confidence level, performs other processing procedures and determine to wait to locate Manage the Sentiment orientation of text.
If larger emotion degree is not more than default confidence level, it is determined that the confidence level of larger emotion degree is relatively low.Therefore, Ke Yiwu Method accurately determines the Sentiment orientation of pending short text.Assuming that larger emotion degree is 0.55, it is 0.7 to preset confidence level, then herein In the case of, it can not accurately determine the Sentiment orientation of pending short text.
In this case, it is possible to some other processing procedures are performed, to further determine that the emotion of pending short text Tendency.This process is not the emphasis of the application, be will not be repeated here.
The receiving device that can also include being connected with processor in the system shown in Fig. 2 a and Fig. 2 b (does not show in diagram Go out).After the Sentiment orientation that processor determines pending short text, processor, it is additionally operable to export the feelings of the pending text Sense tendency;The receiving device, for receiving the Sentiment orientation of the pending text, wait to locate so that receiving device can utilize Manage the Sentiment orientation of text.
Second:
In the case where emotion degree estimation models are realized using second of implementation, processor 200 is in the following ways To determine the Sentiment orientation of pending short text.Referring to Figure 10, a kind of recognition methods of Sentiment orientation of the application, specifically include Following steps:
Step S1001:Determine characteristic set corresponding to pending short text and classification mark.
Assuming that second of implementation is determined it is determined that during emotion degree estimation models using the first executive mode The characteristic set of short text;Pending short text characteristic set is also then determined using the first executive mode in this step.
Referring to Figure 11 a, to determine the detailed process of the first executive mode of the characteristic set of pending short text:
Step 1101:Obtain the pending short text and perform the word segmentation result obtained after participle operation.
Step 1102:Participle combination is carried out to each participle using n gram language models, obtains several combination participles.
Step 1103:The set that each participle and several combinations are segmented, it is defined as the spy of the pending short text Collection is closed, the corresponding feature of a participle.
Similar with Fig. 6 a implementation procedure in Figure 11 a implementation procedure, specific implementation procedure can be found in Fig. 6 a execution Journey, it will not be repeated here.
Assuming that second of implementation is determined it is determined that during emotion degree estimation models using second of executive mode The characteristic set of short text;Pending short text characteristic set is also then determined using second of executive mode in this step.
Referring to Figure 11 b, to determine the detailed process of second of executive mode of the characteristic set of pending short text:
Step 1111:Obtain the pending short text and perform the word segmentation result obtained after participle operation.
Step 1112:By the word segmentation result, it is defined as the characteristic set of the pending short text, a participle is correspondingly One feature.
Similar with Fig. 6 b implementation procedure in Figure 11 b implementation procedure, specific implementation procedure can be found in Fig. 6 a execution Journey, it will not be repeated here.
Figure 10 is then returned to, into step S1002:According to emotion degree estimation models corresponding with classification mark, knot The characteristic set of pending short text is closed, emotion degree estimation is carried out to pending short text;Wherein, the emotion degree estimation models For:Obtained according to corresponding to classification mark, after the characteristic set training of several short text samples with Sentiment orientation , output positive emotion degree and negative emotion degree model.
In second of implementation, there are multiple emotion degree estimation models.It is applied to pending short text to obtain Emotion degree estimation models, can identify according to classification and be searched in multiple emotion degree estimation models, so that it is determined that and classification Emotion degree estimation models corresponding to mark.
Processor inputs the characteristic set to the emotion degree estimation models, is estimated by the emotion degree estimation models After export positive emotion degree corresponding to the characteristic set and negative emotion degree.
Step S1003:Based on positive emotion degree and negative emotion degree corresponding to the pending short text, it is determined that described treat Handle Sentiment orientation corresponding to short text.The implementation procedure of this step is consistent with the implementation procedure of Fig. 7 step 703, herein not Repeat again.
In the system shown in Fig. 2 a and Fig. 2 b, the receiving device that can also include being connected with processor (does not show in diagram Go out).After processor determines Sentiment orientation corresponding to the pending short text, processor, it is additionally operable to export described pending The Sentiment orientation of text;The receiving device, for receiving the Sentiment orientation of the pending text.
In the case where emotion degree estimation models are realized using the third implementation, processor 200 can prestore class Target knows the corresponding relation with emotion degree estimation models, and builds the structure of each classification mark and emotion degree estimation models in advance The corresponding relation of mode.
After if processor 200 receives a classification mark, first determine whether emotion degree estimation mould corresponding with classification mark The building mode of type;
If emotion degree estimation models are built using the first implementation, the process as shown in Figure 4 of adaptability determines The Sentiment orientation of pending short text;I.e.:Determine characteristic set corresponding to pending short text;Wherein, in the characteristic set Each feature includes:Classification mark belonging to the participle of the pending short text and the pending short text;According in advance The emotion degree estimation models of training, with reference to the characteristic set of pending short text, emotion degree estimation is carried out to pending short text; Wherein, the emotion degree estimation models include:According at least two classifications, several short text samples with Sentiment orientation The model of obtained after training, output positive emotion degree and negative emotion degree;Based on positive corresponding to the pending short text Emotion degree and negative emotion degree, determine Sentiment orientation corresponding to the pending short text.
If emotion degree estimation models are built using second of implementation, true by the process as shown in Figure 5 of adaptability The Sentiment orientation of fixed pending short text.I.e.:Determine characteristic set corresponding to pending short text;Wherein, the characteristic set In each feature include:The participle of the pending short text;Foundation emotion degree estimation models corresponding with classification mark, With reference to the characteristic set of pending short text, emotion degree estimation is carried out to pending short text;Wherein, the emotion degree estimation mould Type is:According to obtained corresponding to classification mark, after several short text sample trainings with Sentiment orientation, output just The model of face emotion degree and negative emotion degree;Based on positive emotion degree and negative emotion degree corresponding to the pending short text, Determine Sentiment orientation corresponding to the pending short text.Pass through the embodiment shown in Fig. 7 and Figure 10, it can be seen that the application has There is following beneficial effect:
The application provides a kind of recognition methods of Sentiment orientation, and this method is entered using several short texts with Sentiment orientation Row training, and obtain emotion degree estimation models.Because each characteristic set includes the participle and classification mark of short text, so, The emotion degree estimation models of application structure have taken into full account the classification belonging to short text.Therefore, it is true based on emotion degree estimation models The positive emotion degree and negative emotion degree for the pending short text made are more accurate in terms of existing technologies.And then profit The Sentiment orientation determined with positive emotion degree and negative emotion degree is also more accurate.
Below by taking maximum entropy model as an example, the training process that emotion degree estimation models are built to the application is situated between in detail Continue:
Two matrixes are built first:Matrix A and matrix B, front corresponding to each feature and each feature is included in matrix A Emotion degree and negative emotion degree.Matrix B includes two classification results:Positive emotion and negative emotion.For any in matrix A Individual feature a, its Sentiment orientation is represented using b.fi(a, b) represents that situation occurs jointly in (a, b).
F is calculated firstiThe expectation of (a, b) in training sample, due to not having variable in training pattern, so calculating The desired value is a constant after finishing.Specific formula for calculation is as follows:
Wherein,Represent fiThe expectation of (a, b) in training sample i,Represent fi(a, b) is in training sample Experienced probability distribution.
fiThe formula of the probability distribution of (a, b) in a model is as follows:
Wherein,It is b probability to represent Sentiment orientation corresponding to short text in training sample, and p (a | b) represents short text Sentiment orientation be b on the premise of, feature a conditional probability.
Then fiThe calculation formula of (a, b) in maximum entropy model is:
In maximum entropy model, fiThe expectation of (a, b) in training sample, with fiThe expectation of (a, b) in a model should be Consistent.I.e.:
Using method of Lagrange multipliers, the optimal solution of solution target equation (2) in the case where meeting the constraints of formula (4), Optimal solution is as follows:
Wherein,For normalization factor so thatwiIt is characterized fi's Weight.
Formula (5) is updated in formula (1) so as to obtain the result of the training of maximum entropy model, namely the estimation of emotion degree Model.
As shown in figure 12, this application provides a kind of object classification method.Applied in processor, in the present embodiment, Directly object can be classified using the Sentiment orientation of the short text of pending object.Specifically include following steps:
Step S1201:The short text information of pending object is determined, wherein, the short text information includes short text Sentiment orientation.
Processor can utilize punctuation mark that pending object is divided into several short texts, and each short text can be according to The process that the application Fig. 7 or Figure 10 are provided determines its Sentiment orientation, may thereby determine that out each short text in pending object Sentiment orientation.In addition, short text information can also include:Belong to short text quantity, the category of positive emotion in pending object The proportion of short text quantity, front short text in negative emotion, proportion of negative short text etc..
Step S1202:According to the classification identification model of training in advance, classification identification is carried out to the short text information;Its In, the classification identification feature model is:Short text information according to some objects trains obtain, first category and the second class Other grader.
Classification identification model is obtained output first category after advancing with the short text information training of several objects With the grader of second category.Specifically, maximum entropy model, neural network algorithm or SVMs decile can be utilized Class model, the short text information of several objects is trained, so as to obtain classification identification model.Related art method, can To use training method of the prior art, will not be repeated here.
After the short text information of pending object is obtained, the short text of pending object is inputted to classification and identifies mould Type, after the processing of classification identification model, it may be determined that the classification of pending object.
Found in real process, for an object, object can also include image in addition to including text. So that object is user's evaluation of electric business system as an example, in user's evaluation in addition to text (character user evaluation), may be used also With the image with commodity.
It is understood that the object type determined separately through the short text information of object is inaccurate, because not There is the image feature information in view of object;Similarly, the object type individually determined using the image feature information of object Inaccuracy, because not in view of the short text information of object.Therefore, the present embodiment believes short text information and characteristics of image Breath is merged, and object type is determined in the lump using short text information and image feature information, so as to improve the standard of object type True rate.
The application provides a kind of object classification method again, in the present embodiment using multiple features of pending object come Object is classified.As shown in figure 13, following steps are specifically included:
Step S1301:It is determined that characteristic information corresponding with pending object;Wherein, the characteristic information includes short text Information and image feature information, also, the short text information includes the Sentiment orientation of short text.
Processor can utilize punctuation mark that pending object is divided into several short texts, and each short text can be according to The process that the application Fig. 7 or Figure 10 are provided determines its Sentiment orientation, may thereby determine that out each short text in pending object Sentiment orientation.In addition, short text information can also include:Belong to short text quantity, the category of positive emotion in pending object The proportion of short text quantity, front short text in negative emotion, proportion of negative short text etc..
Processor can be handled image, so as to obtain image feature information.Under image feature information can include State one or more of characteristics of image:For the subgraph that face number, image include in picture traverse, picture altitude, image Number, image background whether be pure color, image include character area accounting be how many, salient region of image domain color number, scheme As domain color number, image psoriasis fraction, image subject mass fraction, image are in the probability score of manikin, image It is that the probability score of true-man model, image shows are probability scores of commodity details etc..
Step S1302:According to the classification identification model of training in advance, classification identification is carried out to the characteristic information;Wherein, The classification identification feature model is:Characteristic information according to some objects trains obtain, first category and second category Grader.
Classification identification model be advance with several objects short text information and image feature information training after, obtain Output first category and second category grader.Specifically, can utilize maximum entropy model, neural network algorithm or The disaggregated models such as SVMs, the short text information of several objects is trained, so as to obtain classification identification model.Phase Technological means is closed, training method of the prior art can be used, will not be repeated here.
After the short text information of pending object is obtained, the short text of pending object is sent to classification and identifies mould Type, so that it is determined that the classification of pending object.
It is understood that the feature species in the characteristic information of pending object is more, then the result finally obtained is got over Accurately.So in order to further improve the accuracy rate of the classification of pending object, characteristic information can also include:It is described to wait to locate Belong to the characteristic information of the first main body appended by reason object;And/or belong to the feature letter of the second main body appended by the pending object Breath.Certainly further feature information can also be included, will not enumerate herein.
For example, so that user evaluates as an example, the characteristic information that the first main body is belonged to appended by the pending object is specially:Business The appended of product belongs to seller (the first main body) characteristic information, for example, the credit grade of seller, sales volume of seller etc..It is described to treat The characteristic information for belonging to the second main body appended by process object is specially:The appended of commodity belongs to buyer (the second main body) feature letter Breath, for example, the credit grade of buyer, the non-default user's evaluating data amount of issue, user of the issue with figure evaluate quantity, issue User with figure evaluates accounting.
After increasing short text information, image feature information and further feature information in characteristic information, the feature of object Information will have multiple characteristic informations.In order to consider multiple characteristic informations, the present embodiment proposes to determine using gradient lifting Plan tree-model is trained to several training samples, so as to obtain classification identification model.
Gradient lifting decision-tree model is the method for improving using decision tree as basic function.Gradient lifting decision-tree model includes More decision trees, why it is to consider that plan can be caused because of Hypersegmentation for single decision tree using more decision trees Close, lose generalization ability;If division is very little, study can be caused not abundant enough again.
The training process of gradient lifting decision-tree model is described below:
First, estimation initial value F0
Initial value F0Can be a random numerical value, can also be equal to 0, concrete numerical value can depending on actual conditions, Do not limit herein.
Second, iteration M times in the following manner, obtain M decision tree
A the estimate of multiple characteristic informations) is corresponded to using the whole training samples of upper gradient lifting decision tree renewal.
B part sample) is randomly choosed from all training samples, the training sample as this structure decision tree.
C) the feature included according to sample, the information gain of every kind of feature is calculated, selects the maximum feature of information gain First time division is carried out, left side represents first category, and right side represents second category.This gradient is calculated, with reference to gradient again The characteristic value of the characteristic information of sample estimates.
Epimere step is repeated J times, obtains the decision tree of J layer leaf nodes.
D) according to M decision tree is obtained, accuracy rate of the training sample on this decision tree is calculated, regard accuracy rate as this The weight of decision tree.
3rd, M decision tree is subjected to linear combination, obtains final gradient lifting decision-tree model.
Gradient lifting decision-tree model includes more decision trees, can be expressed as the addition model of more decision trees:F(X) =F01T1(X)+β2T2(X)+…βiTi(X)…+βMTM(X) ... formula (6)
Wherein, F0It is an initial value, Ti(X) characteristic information of pending object and the matching degree of a decision tree, β are representedi The weight of a decision tree is represented, M represents the total quantity of decision tree.
It is exactly to want in training precision and generalization ability two that gradient, which lifts decision-tree model using more decision trees, Aspect all reaches preferable result.Gradient lifts decision-tree model as a kind of boosting algorithms, gradient lifting decision tree mould Type includes boosting thought naturally:A series of Weak Classifiers are combined, form a strong classifier.It does not require every Decision tree acquires too many thing, and every tree all learns some knowledge, then the knowledge that each decision tree is acquired adds up Form a powerful model.
The application provides a kind of object classification method again, as shown in figure 14, specifically includes following steps:
Step S1401:It is determined that characteristic information corresponding with pending object.
Wherein, the characteristic information appended by short text information, image feature information, pending object including belonging to the first master Belong to the characteristic information of the second main body appended by the characteristic information of body, the pending object.Also, the short text information includes The Sentiment orientation of short text.
So that object is evaluated user as an example, then this step can be:Determine the characteristic information of pending user's evaluation;Its In, the characteristic information includes the text feature information of user's evaluation, the image feature information of user's evaluation, the feature letter of seller Breath and the characteristic information of buyer, also, the text feature information includes the Sentiment orientation of short text.
Step S1402:The gradient of the characteristic information and training in advance lifting decision-tree model is identified.
Continuing so that object is user's evaluation as an example, then this step is that the gradient of foundation training in advance lifts decision-tree model, Classification identification is carried out to the characteristic information of pending user's evaluation;Wherein, the classification identification model is:According to some use Obtained after the characteristic information training of family evaluation sample, first kind user evaluation and the grader of the second class user evaluation.
As shown in figure 15, specifically this step comprises the following steps:
Step S1501:The characteristic information is inputted to the classification identification model, namely gradient lifting decision tree mould Type.
Gradient deduction decision-tree model has M tree, characteristic information is matched with M tree respectively, so as to obtain and often The classification that determines after tree matching.
Step S1502:Determine first category matching degree and second category matching degree corresponding to the pending object.
First category matching degree and second category matching degree are determined by above-mentioned formula 6.
First category matching degree F1(X)=F01T1(X)+β2T2(X)+…βiTi(X)…+βMTM(X).Wherein, Ti(X) table Show the matching degree of characteristic information and one tree, βiRepresent weight corresponding to the tree.If one tree determines characteristic information corresponding first Classification, then weight is βi;If one tree determines that characteristic information corresponds to second category, weight 0.
Second category matching degree F2(X)=F01T1(X)+β2T2(X)+…βiTiT(X)…+βMTM(X).Wherein, Ti(X) table Show the matching degree of characteristic information and one tree, βiRepresent weight corresponding to the tree.If one tree determines characteristic information corresponding second Classification, then weight is βi;If one tree determines that characteristic information corresponds to first category, weight 0.
Step S1503:The first category matching degree and second category matching degree are compared.If first category matches Degree is more than second category matching degree, into step S1504;If second category matching degree is more than first category matching degree, enter Step S1505.
Step S1504:The classification for determining the pending object is first category.
Continue so that object is user's evaluation as an example, then this step is to determine that the classification that pending user evaluates is the first kind Not.First category is evaluated for high-quality user, then this step is that the classification for determining pending user's evaluation is commented for high-quality user Valency.Step S1505:The classification for determining the pending object is second category.
Continue so that object is user's evaluation as an example, then this step is to determine that the classification that pending user evaluates is the second class Not.Second category is evaluated for user inferior, then this step is that the classification for determining pending user's evaluation is commented for user inferior Valency.
After it is determined that the pending object is first category, will the pending object added in object set; Send the object in the object set.Object set can be used by miscellaneous equipment, in use, can again pass by Multiple more excellent object samples are determined in screening, are then retransmited object samples to processor, so that processor utilization is more excellent Object samples, re -training classification identification model, so that classification identification model is more accurate.That is, processor can receive more Individual object samples, the object samples derive from the object set;By the multiple object samples, know added to training classification In the existing object samples of other model;Based on the existing object samples after renewal, re -training classification identification model.
Continue so that object is user's evaluation as an example, then this process is:It is determined that the pending user is evaluated as the first kind After user's evaluation, pending user's evaluation is evaluated in set added to first kind user;The first kind is sent to use Family evaluation set.First user evaluates set and user can be used, and can be commented in use in first kind user More excellent user's evaluation is determined in valency set.Sent it is then possible to which more excellent user is evaluated to processing equipment, to handle Equipment re -training classification identification model.I.e. the system can form closed-loop system.
That is, processor receives multiple first kind user evaluations, and the first kind user evaluation is used from the first kind Family evaluation set;The multiple first kind user is evaluated, added to classification identification model, existing user is evaluated in sample;Base Existing user after renewal evaluates sample, re -training classification identification model.
Referring to Figure 16, this application provides a kind of object categorizing system, including:
Data providing device 100, for sending several objects.
Processor 200, several objects sent for receiving the data providing device, the feature according to some objects are believed The classification identification model of first category and second category is obtained, exported after breath training;For determining that the feature of pending object is believed Breath;Wherein, the characteristic information includes text feature information and image feature information, also, the text feature information includes The Sentiment orientation of short text;According to the classification identification model, classification identification is carried out to the characteristic information of the pending object; It is additionally operable to export the object of first category.
Data receiver 400, for receiving and using the object of the first category.
Data receiver 400 using during object set, can again pass by screening determine it is multiple more excellent Object samples, then object samples are retransmited to processor 200, so as to processor utilize more excellent object samples, again Classification identification model is trained, so that classification identification model is more accurate.
Referring to Figure 17, present invention also provides a kind of object categorizing system, including:
Data providing device 100, for sending several objects.
Model construction equipment 300, several objects sent for receiving the data providing device, according to several objects Characteristic information training after obtain, export the classification identification model of first category and second category, and send the classification and identify Model.
Processor 200, for receiving the classification identification model, and determine the characteristic information of pending object;Wherein, institute Stating characteristic information includes text feature information and image feature information, also, the text feature information includes the feelings of short text Sense tendency;According to the classification identification model, classification identification is carried out to the characteristic information of the pending object;It is additionally operable to export The object of first category.
Data receiver 400, for receiving and using the object of the first category.
Data receiver 400 using during object set, can again pass by screening determine it is multiple more excellent Object samples, then object samples are retransmited to processor 200, so as to processor utilize more excellent object samples, again Classification identification model is trained, so that classification identification model is more accurate.
Below with a concrete scene embodiment, object classification method is described in detail.
Had many consumers evaluation in electric business system, and high-quality user's evaluation is filtered out in the multi-user that how to comform evaluation, is The present embodiment problem to be solved.Because user's evaluation value volume and range of product is various in electric business system, businessman needs to spend a lot Time finds out the high-quality user evaluation in shop, and this virtually needs costly human cost.Commented at present in high-quality user Valency identifies field, and the conventional technology of industrial quarters mainly has two kinds:The first, the identification technology based on short text;Second, it is based on The identification technology of characteristics of image.
Identification technology based on short text is relatively easily realized, but there is some limitations:It is not concerned with user The image information that buyer issues in evaluation.In actual scene, for example dress ornament class, user are not merely concerned about the text in user's evaluation Word description part, it is also concerned about the real appearance of commodity, i.e. image feature information.
Identification technology significant effect based on characteristics of image, but also have certain limitation.Based on the high-quality of characteristics of image The image information that user is evaluated during identification technology is evaluated just with user is identified, and is not relevant for buyer and specifically buys Gains in depth of comprehension afterwards, i.e. short text information.Therefore, it can be seen that the short text information and image feature information in user's evaluation are same Sample is important.
In addition, it is found by the applicant that the also high-quality user's evaluation of some further features pair determination, can play booster action.Example Such as, seller's feature and buyer's feature.Therefore, features above is evaluated as high-quality user's evaluation by the present embodiment as determination user Or the foundation of user's evaluation inferior.Therefore, the present embodiment proposes the machine learning method based on various features fusion, i.e. gradient carries Decision-tree model is risen, to train several training samples, so as to obtain classification identification model.
As shown in figure 18, the flow chart for determining high-quality user's evaluation is provided for the application.Can be clearly whole from figure Determine the process of high-quality user's evaluation.Mainly it is made up of three parts:
(1) build user and evaluate storehouse
Substantial amounts of user's evaluation is obtained in user evaluates server, is filtered out first with preprocessing rule a part of bad Matter user evaluates.Preprocessing rule can be:Image and required some requirements met of text in high-quality user's evaluation, even if A large number of users evaluation is filtered with the feature of a small amount of dimension in short text and characteristics of image.
Specifically, the short text in high-quality user's evaluation can not be negative emotion, based on this, if in user's evaluation Short text correspond to negative emotion, then be determined as non-prime user evaluation.Image in being evaluated for high-quality user also has base This requirement, the resolution ratio of image reaches default resolution ratio, image be non-conversational screenshotss, the obvious advertising language in image and Watermark accounting is less than preset value, etc..
User is evaluated and meets that the user of above-mentioned short text requirement and characteristics of image requirement evaluates in server, is put it into User is evaluated in storehouse.Evaluate, then sentence these users evaluation for the user for being unsatisfactory for short text requirement and characteristics of image requirement It is set to high-quality user's evaluation, is not put into user and evaluates in storehouse.
Some non-prime users can be filtered out by the filtering of preprocessing rule to evaluate, and can not only so be reduced high-quality User evaluates the access times of identification model, moreover, it is also possible to effectively filter out non-prime user's evaluation, lifts high-quality user Evaluate the accuracy rate of identification model prediction.
(2) determine that high-quality user evaluates set
Identification model is evaluated using high-quality user user's evaluation in user's evaluation storehouse is identified, if recognition result is excellent Matter user evaluates, then is put into high-quality user and evaluates in set.
(3) set is evaluated using high-quality user.
Data receiver can be evaluated in set from high-quality user obtains high-quality user's evaluation, and in actual application It is middle to use high-quality evaluation.Data receiver, can root during high-quality user's evaluation during set is evaluated using high-quality user The high-quality user evaluation in high-quality evaluation set is screened again according to criterion is preset, set in advance so as to filter out to meet Fix high-quality user evaluation then.Then, the high-quality user that presets criterion will be met evaluate and send to processor or model Equipment is built, so that processor or model construction equipment evaluate identification model to high-quality user and be iterated renewal.
(4) high-quality user evaluates the iteration renewal of identification model.
The high-quality user evaluation for presetting criterion using meeting, identification model is evaluated to high-quality user again and instructed Practice, meet that the high-quality user of user's request evaluates so that high-quality user evaluates identification model and can exported as far as possible.
Due to evaluating the high-quality user picked out in set evaluation in high-quality user, the pre- of seller or operations staff is satisfied by If regular, evaluated so these high-quality user's evaluations are rejoined into user in storehouse, identification model is evaluated to high-quality user again Renewal optimization, meet the desired high-quality user evaluation of user so that high-quality user evaluates identification model and preferably identified.
Based on said process it can be found that:User can no longer need to evaluate in storehouse one from original user in the present embodiment One is gone to screen, it is only necessary in high-quality user evaluates set select the quick desired high-quality user's evaluation of just energy, effectively Ground reduces human cost.At the same time, high-quality user's evaluation model can effectively utilize the high-quality user evaluation of businessman's offer Renewal is iterated, meets the desired high-quality user's evaluation of businessman so as to further identify.
If the function described in the present embodiment method is realized in the form of SFU software functional unit and is used as independent product pin Sell or in use, can be stored in a computing device read/write memory medium.Based on such understanding, the embodiment of the present application The part to be contributed to prior art or the part of the technical scheme can be embodied in the form of software product, and this is soft Part product is stored in a storage medium, including some instructions to cause a computing device (can be personal computer, Server, mobile computing device or network equipment etc.) perform all or part of step of each embodiment methods described of the application Suddenly.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), deposit at random Access to memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
Each embodiment is described by the way of progressive in this specification, what each embodiment stressed be with it is other The difference of embodiment, between each embodiment same or similar part mutually referring to.
The foregoing description of the disclosed embodiments, professional and technical personnel in the field are enable to realize or using the application. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments in the case where not departing from spirit herein or scope.Therefore, the application The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims (30)

  1. A kind of 1. recognition methods of Sentiment orientation, it is characterised in that including:
    Determine that pending short text corresponds to classification mark;Wherein, word is referred to as short between a two neighboring punctuation mark of text Text;
    It is determined that the implementation of emotion degree estimation models corresponding with classification mark;
    If the implementation of the emotion degree estimation models is the corresponding emotion degree estimation models of all classifications, it is determined that waits to locate Be in the wrong characteristic set corresponding to text;Wherein, each feature includes in the characteristic set:The participle of the pending short text With the classification mark belonging to the pending short text;According to the emotion degree estimation models of training in advance, with reference to pending short essay This characteristic set, emotion degree estimation is carried out to pending short text;Wherein, the emotion degree estimation models include:According to extremely Obtained after few two species purposes, several short text sample trainings with Sentiment orientation, output positive emotion degree and negative The model of emotion degree;Based on positive emotion degree and negative emotion degree corresponding to the pending short text, determine described pending Sentiment orientation corresponding to short text;
    If the implementation of the emotion degree estimation models is the corresponding emotion degree estimation models of a classification, determine pending Characteristic set corresponding to short text;Wherein, each feature includes in the characteristic set:The participle of the pending short text; According to emotion degree estimation models corresponding with classification mark, with reference to the characteristic set of pending short text, to pending short Text carries out emotion degree estimation;Wherein, the emotion degree estimation models are:According to the classification mark corresponding to, with emotion The model of obtained after several short text sample trainings of tendency, output positive emotion degree and negative emotion degree;Based on described Positive emotion degree corresponding to pending short text and negative emotion degree, determine Sentiment orientation corresponding to the pending short text.
  2. 2. the method as described in claim 1, it is characterised in that it is determined that Sentiment orientation corresponding to the pending short text Afterwards, in addition to:
    Export Sentiment orientation corresponding to the pending short text.
  3. A kind of 3. recognition methods of Sentiment orientation, it is characterised in that including:
    Determine characteristic set corresponding to pending short text;Wherein, the word between a two neighboring punctuation mark of text claims For short text;Each feature includes in the characteristic set:The participle of the pending short text and the pending short text Affiliated classification mark;
    According to the emotion degree estimation models of training in advance, with reference to the characteristic set of pending short text, pending short text is entered Market sensitivity is estimated;Wherein, the emotion degree estimation models include:If according at least two classifications, with Sentiment orientation The model of obtained after dry short text sample training, output positive emotion degree and negative emotion degree;
    Based on positive emotion degree and negative emotion degree corresponding to the pending short text, determine that the pending short text is corresponding Sentiment orientation.
  4. 4. method as claimed in claim 3, it is characterised in that described to determine characteristic set corresponding to pending short text, bag Include:
    Obtain classification corresponding to the pending short text to identify, and the pending short text is performed after participle operates and obtained Word segmentation result;
    Each participle in the word segmentation result and classification mark are combined, obtain each feature;
    By the set of each feature, it is defined as the characteristic set of the pending short text.
  5. 5. method as claimed in claim 3, it is characterised in that described to determine characteristic set corresponding to pending short text, bag Include:
    Obtain classification corresponding to the pending short text to identify, and the pending short text is performed after participle operates and obtained Word segmentation result;
    Each participle in the word segmentation result and classification mark are combined, obtain each feature;
    Combinations of features is carried out to each feature using n gram language models, obtains several assemblage characteristics;
    By the set of each feature and several assemblage characteristics, it is defined as the characteristic set of the pending short text.
  6. 6. method as claimed in claim 5, it is characterised in that described to be carried out using n gram language models to each feature Combinations of features, several assemblage characteristics are obtained, including:
    Combinations of features is carried out to each feature using two gram language models, obtains several assemblage characteristics.
  7. 7. method as claimed in claim 3, it is characterised in that the emotion degree estimation models according to training in advance, with reference to The characteristic set of pending short text, emotion degree estimation is carried out to pending short text, including:
    The characteristic set is inputted to the emotion degree estimation models;
    After being estimated as the emotion degree estimation models, export positive emotion degree and negative emotion degree corresponding to pending short text.
  8. 8. method as claimed in claim 3, it is characterised in that described to be based on positive emotion corresponding to the pending short text Degree and negative emotion degree, determine Sentiment orientation corresponding to the pending short text, including:
    Determine the larger emotion degree in both the positive emotion degree and the negative emotion degree;
    Judge whether the larger emotion degree is more than default confidence level;
    If the larger emotion degree is more than default confidence level, it is determined that Sentiment orientation corresponding to the pending short text with it is described The Sentiment orientation of larger emotion degree is consistent.
  9. 9. method as claimed in claim 3, it is characterised in that the emotion degree estimation models include:
    Using maximum entropy model, obtained after the characteristic set training according to several short texts corresponding at least two classifications mark , output positive emotion degree and negative emotion degree model.
  10. 10. method as claimed in claim 3, it is characterised in that it is determined that Sentiment orientation corresponding to the pending short text Afterwards, in addition to:
    Export Sentiment orientation corresponding to the pending short text.
  11. A kind of 11. recognition methods of Sentiment orientation, it is characterised in that including:
    Determine characteristic set corresponding to pending short text and classification mark;Wherein, two neighboring punctuation mark of text it Between word be referred to as short text;Each feature includes in the characteristic set:The participle of the pending short text;
    According to emotion degree estimation models corresponding with classification mark, with reference to the characteristic set of pending short text, place is treated The text that is in the wrong carries out emotion degree estimation;Wherein, the emotion degree estimation models are:According to the classification mark corresponding to, carry The model of obtained after several short text sample trainings of Sentiment orientation, output positive emotion degree and negative emotion degree;
    Based on positive emotion degree and negative emotion degree corresponding to the pending short text, determine that the pending short text is corresponding Sentiment orientation.
  12. 12. method as claimed in claim 11, it is characterised in that it is described to determine characteristic set corresponding to pending short text, Including:
    Obtain the pending short text and perform the word segmentation result obtained after participle operation;
    Participle combination is carried out to each participle using n gram language models, obtains several combination participles;
    By the set of each participle and several combination participles, it is defined as the characteristic set of the pending short text, one point The corresponding feature of word.
  13. 13. method as claimed in claim 11, it is characterised in that it is described to determine characteristic set corresponding to pending short text, Including:
    Obtain the pending short text and perform the word segmentation result obtained after participle operation;
    By the word segmentation result, it is defined as the characteristic set of the pending short text, one segments a corresponding feature.
  14. 14. method as claimed in claim 11, it is characterised in that it is determined that Sentiment orientation corresponding to the pending short text Afterwards, in addition to:
    Export Sentiment orientation corresponding to the pending short text.
  15. A kind of 15. identifying system of Sentiment orientation, it is characterised in that including:
    Data providing device, for sending several objects;
    Processor, several objects sent for receiving the data providing device, the short text according to several objects are built Emotion degree estimation models, and determine using emotion degree estimation models the Sentiment orientation of pending short text.
  16. 16. system as claimed in claim 15, it is characterised in that
    The processor, it is additionally operable to build emotion degree estimation models and the corresponding relation of the classification mark belonging to object.
  17. 17. system as claimed in claim 15, it is characterised in that the system also includes receiving device;
    The processor, it is additionally operable to export the Sentiment orientation of the pending text;
    The receiving device, for receiving the Sentiment orientation of the pending text.
  18. A kind of 18. identifying system of Sentiment orientation, it is characterised in that including:
    Data providing device, for sending several objects;
    Model construction equipment, several objects sent for receiving the data providing device, the short essay according to several objects This structure emotion degree estimation models, and send the emotion degree estimation models;
    Processor, pending short text is determined for receiving the emotion degree estimation models, and using emotion degree estimation models Sentiment orientation.
  19. 19. system as claimed in claim 18, it is characterised in that
    The model construction equipment, it is additionally operable to build emotion degree estimation models and the corresponding relation of the classification mark belonging to object, And corresponding relation is sent to the processor.
  20. 20. system as claimed in claim 18, it is characterised in that the system also includes receiving device;
    The processor, it is additionally operable to export the Sentiment orientation of the pending text;
    The receiving device, for receiving the Sentiment orientation of the pending text.
  21. A kind of 21. object classification method, it is characterised in that including:
    Determine the characteristic information of pending object;Wherein, the characteristic information includes text feature information and image feature information, Also, the text feature information includes the Sentiment orientation of short text;
    According to the classification identification model of training in advance, classification identification is carried out to the characteristic information of the pending object;Wherein, institute Stating classification identification model is:According to being obtained after the characteristic information training of some object samples, first category and second category Grader.
  22. 22. method as claimed in claim 21, it is characterised in that the characteristic information also includes:
    Build the characteristic information of the first main body of the object;And/or
    Belong to the characteristic information of the second main body appended by the object.
  23. 23. method as claimed in claim 21, it is characterised in that the classification identification model according to training in advance, to institute State characteristic information and carry out classification identification, including:
    The characteristic information is inputted to the classification identification model;Determine that first category corresponding to the pending object matches Degree and second category matching degree;
    The first category matching degree and second category matching degree are compared;
    If first category matching degree is more than second category matching degree, it is determined that the classification of the pending object is first category;
    If second category matching degree is more than first category matching degree, it is determined that the classification of the pending object is second category.
  24. 24. method as claimed in claim 23, it is characterised in that also include:
    After it is determined that the pending object is first category, will the pending object added in object set;
    Send the object in the object set.
  25. 25. method as claimed in claim 24, it is characterised in that also include:
    Multiple object samples are received, the object samples derive from the object set, and, meet preset rules;
    By the multiple object samples, added in the existing object samples of training classification identification model;
    Based on the existing object samples after renewal, re -training classification identification model.
  26. A kind of 26. sorting technique of user's evaluation, it is characterised in that including:
    Determine the characteristic information of pending user's evaluation;Wherein, the characteristic information include user evaluation text feature information, Image feature information, the characteristic information of seller and the characteristic information of buyer of user's evaluation, also, the text feature packet Include the Sentiment orientation of short text;
    According to the gradient lifting decision-tree model of training in advance, classification knowledge is carried out to the characteristic information of pending user's evaluation Not;Wherein, the classification identification model is:Obtained after characteristic information training according to some users evaluation sample, first kind User evaluates and the grader of the second class user evaluation.
  27. 27. method as claimed in claim 26, it is characterised in that also include:
    After it is determined that the pending user is evaluated as first kind user evaluation, the pending user evaluation is added to the A kind of user is evaluated in set;
    Send the first kind user and evaluate set.
  28. 28. method as claimed in claim 26, it is characterised in that also include:
    Multiple first kind user evaluations are received, the first kind user evaluation evaluates set from the first kind user;
    The multiple first kind user is evaluated, added to classification identification model, existing user is evaluated in sample;
    Sample, re -training classification identification model are evaluated based on the existing user after renewal.
  29. A kind of 29. object categorizing system, it is characterised in that including:
    Data providing device, for sending several objects;
    Processor, several objects sent for receiving the data providing device, the characteristic information according to some objects are trained The classification identification model of first category and second category is obtained, exported afterwards;For determining the characteristic information of pending object;Its In, the characteristic information includes text feature information and image feature information, also, the text feature information includes short text Sentiment orientation;According to the classification identification model, classification identification is carried out to the characteristic information of the pending object;It is additionally operable to Export the object of first category;
    Data receiver, for receiving and using the object of the first category.
  30. A kind of 30. object categorizing system, it is characterised in that including:
    Data providing device, for sending several objects;
    Model construction equipment, several objects sent for receiving the data providing device, the feature according to several objects Obtained after information training, export the classification identification model of first category and second category, and send the classification identification model;
    Processor, for receiving the classification identification model, and determine the characteristic information of pending object;Wherein, the feature Information includes text feature information and image feature information, also, the text feature information includes the Sentiment orientation of short text; According to the classification identification model, classification identification is carried out to the characteristic information of the pending object;It is additionally operable to export the first kind Other object;
    Data receiver, for receiving and using the object of the first category.
CN201610812853.4A 2016-09-09 2016-09-09 Recognition methods, object classification method and the data handling system of Sentiment orientation Pending CN107807914A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201610812853.4A CN107807914A (en) 2016-09-09 2016-09-09 Recognition methods, object classification method and the data handling system of Sentiment orientation
TW106123845A TW201812615A (en) 2016-09-09 2017-07-17 Sentiment orientation recognition method, object classification method and data processing system
PCT/CN2017/100060 WO2018045910A1 (en) 2016-09-09 2017-08-31 Sentiment orientation recognition method, object classification method and data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610812853.4A CN107807914A (en) 2016-09-09 2016-09-09 Recognition methods, object classification method and the data handling system of Sentiment orientation

Publications (1)

Publication Number Publication Date
CN107807914A true CN107807914A (en) 2018-03-16

Family

ID=61562512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610812853.4A Pending CN107807914A (en) 2016-09-09 2016-09-09 Recognition methods, object classification method and the data handling system of Sentiment orientation

Country Status (3)

Country Link
CN (1) CN107807914A (en)
TW (1) TW201812615A (en)
WO (1) WO2018045910A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036570A (en) * 2018-05-31 2018-12-18 北京云知声信息技术有限公司 The filter method and system of the non-case history content of Ultrasonography
CN109271627A (en) * 2018-09-03 2019-01-25 深圳市腾讯网络信息技术有限公司 Text analyzing method, apparatus, computer equipment and storage medium
CN109299782A (en) * 2018-08-02 2019-02-01 北京奇安信科技有限公司 A kind of data processing method and device based on deep learning model
CN109492226A (en) * 2018-11-10 2019-03-19 上海文军信息技术有限公司 A method of it improving the low text of Sentiment orientation accounting and prejudges accuracy rate
CN109871807A (en) * 2019-02-21 2019-06-11 百度在线网络技术(北京)有限公司 Face image processing process and device
CN110516416A (en) * 2019-08-06 2019-11-29 咪咕文化科技有限公司 Auth method, verifying end and client
CN110929026A (en) * 2018-09-19 2020-03-27 阿里巴巴集团控股有限公司 Abnormal text recognition method and device, computing equipment and medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344257A (en) * 2018-10-24 2019-02-15 平安科技(深圳)有限公司 Text emotion recognition methods and device, electronic equipment, storage medium
CN109684627A (en) * 2018-11-16 2019-04-26 北京奇虎科技有限公司 A kind of file classification method and device
CN110032645B (en) * 2019-04-17 2021-02-09 携程旅游信息技术(上海)有限公司 Text emotion recognition method, system, device and medium
CN111506733B (en) * 2020-05-29 2022-06-28 广东太平洋互联网信息服务有限公司 Object portrait generation method and device, computer equipment and storage medium
CN114443849B (en) 2022-02-09 2023-10-27 北京百度网讯科技有限公司 Labeling sample selection method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510254A (en) * 2009-03-25 2009-08-19 北京中星微电子有限公司 Method for updating gender classifier in image analysis and the gender classifier
US20110251973A1 (en) * 2010-04-08 2011-10-13 Microsoft Corporation Deriving statement from product or service reviews
CN102682124A (en) * 2012-05-16 2012-09-19 苏州大学 Emotion classifying method and device for text
CN103365867A (en) * 2012-03-29 2013-10-23 腾讯科技(深圳)有限公司 Method and device for emotion analysis of user evaluation
CN105005560A (en) * 2015-08-26 2015-10-28 苏州大学张家港工业技术研究院 Maximum entropy model-based evaluation type emotion sorting method and system
CN105069072A (en) * 2015-07-30 2015-11-18 天津大学 Emotional analysis based mixed user scoring information recommendation method and apparatus
CN105095181A (en) * 2014-05-19 2015-11-25 株式会社理光 Spam comment detection method and device
CN105550269A (en) * 2015-12-10 2016-05-04 复旦大学 Product comment analyzing method and system with learning supervising function

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968408A (en) * 2012-11-23 2013-03-13 西安电子科技大学 Method for identifying substance features of customer reviews
CN103455562A (en) * 2013-08-13 2013-12-18 西安建筑科技大学 Text orientation analysis method and product review orientation discriminator on basis of same

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510254A (en) * 2009-03-25 2009-08-19 北京中星微电子有限公司 Method for updating gender classifier in image analysis and the gender classifier
US20110251973A1 (en) * 2010-04-08 2011-10-13 Microsoft Corporation Deriving statement from product or service reviews
CN103365867A (en) * 2012-03-29 2013-10-23 腾讯科技(深圳)有限公司 Method and device for emotion analysis of user evaluation
CN102682124A (en) * 2012-05-16 2012-09-19 苏州大学 Emotion classifying method and device for text
CN105095181A (en) * 2014-05-19 2015-11-25 株式会社理光 Spam comment detection method and device
CN105069072A (en) * 2015-07-30 2015-11-18 天津大学 Emotional analysis based mixed user scoring information recommendation method and apparatus
CN105005560A (en) * 2015-08-26 2015-10-28 苏州大学张家港工业技术研究院 Maximum entropy model-based evaluation type emotion sorting method and system
CN105550269A (en) * 2015-12-10 2016-05-04 复旦大学 Product comment analyzing method and system with learning supervising function

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHOU-SHAN LI ET AL.: "Multi-Domain Sentiment Classification with Classifier Combination", 《JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY》 *
SHOUSHAN LI ET AL.: "Multi-domain Sentiment Classification", 《PROCEEDINGS OF ACL-08: HLT, SHORT PAPERS (COMPANION VOLUME)》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036570A (en) * 2018-05-31 2018-12-18 北京云知声信息技术有限公司 The filter method and system of the non-case history content of Ultrasonography
CN109036570B (en) * 2018-05-31 2021-08-31 云知声智能科技股份有限公司 Method and system for filtering non-medical record content of ultrasound department
CN109299782A (en) * 2018-08-02 2019-02-01 北京奇安信科技有限公司 A kind of data processing method and device based on deep learning model
CN109271627A (en) * 2018-09-03 2019-01-25 深圳市腾讯网络信息技术有限公司 Text analyzing method, apparatus, computer equipment and storage medium
CN109271627B (en) * 2018-09-03 2023-09-05 深圳市腾讯网络信息技术有限公司 Text analysis method, apparatus, computer device and storage medium
CN110929026A (en) * 2018-09-19 2020-03-27 阿里巴巴集团控股有限公司 Abnormal text recognition method and device, computing equipment and medium
CN110929026B (en) * 2018-09-19 2023-04-25 阿里巴巴集团控股有限公司 Abnormal text recognition method, device, computing equipment and medium
CN109492226A (en) * 2018-11-10 2019-03-19 上海文军信息技术有限公司 A method of it improving the low text of Sentiment orientation accounting and prejudges accuracy rate
CN109492226B (en) * 2018-11-10 2023-03-24 上海五节数据科技有限公司 Method for improving low text pre-segmentation accuracy rate of emotional tendency proportion
CN109871807A (en) * 2019-02-21 2019-06-11 百度在线网络技术(北京)有限公司 Face image processing process and device
CN110516416A (en) * 2019-08-06 2019-11-29 咪咕文化科技有限公司 Auth method, verifying end and client
CN110516416B (en) * 2019-08-06 2021-08-06 咪咕文化科技有限公司 Identity authentication method, authentication end and client

Also Published As

Publication number Publication date
WO2018045910A1 (en) 2018-03-15
TW201812615A (en) 2018-04-01

Similar Documents

Publication Publication Date Title
CN107807914A (en) Recognition methods, object classification method and the data handling system of Sentiment orientation
Cetinic et al. A deep learning perspective on beauty, sentiment, and remembrance of art
Bearman et al. What’s the point: Semantic segmentation with point supervision
Karayev et al. Recognizing image style
Zheng et al. Dense semantic image segmentation with objects and attributes
US10810494B2 (en) Systems, methods, and computer program products for extending, augmenting and enhancing searching and sorting capabilities by learning and adding concepts on the fly
CN107861951A (en) Session subject identifying method in intelligent customer service
CN109919252B (en) Method for generating classifier by using few labeled images
CN106294344A (en) Video retrieval method and device
CN105144239A (en) Image processing device, program, and image processing method
CN103942571B (en) Graphic image sorting method based on genetic programming algorithm
CN107506786A (en) A kind of attributive classification recognition methods based on deep learning
CN105045913B (en) File classification method based on WordNet and latent semantic analysis
CN110825850B (en) Natural language theme classification method and device
CN108846047A (en) A kind of picture retrieval method and system based on convolution feature
CN107463906A (en) The method and device of Face datection
Qi et al. Personalized sketch-based image retrieval by convolutional neural network and deep transfer learning
Sun et al. Adaptive activation thresholding: Dynamic routing type behavior for interpretability in convolutional neural networks
Grechikhin et al. User modeling on mobile device based on facial clustering and object detection in photos and videos
Jolly et al. How do convolutional neural networks learn design?
Huynh et al. Joint age estimation and gender classification of Asian faces using wide ResNet
Fröhlich et al. Large-scale gaussian process multi-class classification for semantic segmentation and facade recognition
CN104077408B (en) Extensive across media data distributed semi content of supervision method for identifying and classifying and device
CN110458600A (en) Portrait model training method, device, computer equipment and storage medium
CN112801784A (en) Bit currency address mining method and device for digital currency exchange

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1252505

Country of ref document: HK

RJ01 Rejection of invention patent application after publication

Application publication date: 20180316

RJ01 Rejection of invention patent application after publication