CN105824898A - Label extracting method and device for network comments - Google Patents
Label extracting method and device for network comments Download PDFInfo
- Publication number
- CN105824898A CN105824898A CN201610143169.1A CN201610143169A CN105824898A CN 105824898 A CN105824898 A CN 105824898A CN 201610143169 A CN201610143169 A CN 201610143169A CN 105824898 A CN105824898 A CN 105824898A
- Authority
- CN
- China
- Prior art keywords
- comment
- word
- short sentence
- emotion
- mark
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a label extracting method and device for network comments. The label extracting method comprises the following steps: marking comment targets and feeling categories for comment short sentences; according to the comment targets, performing counting, counting the number of comment short sentences of which the feeling categories are positive feeling and the number of comment short sentences of which the feeling categories are negative feeling in the same comment target, and using the statistical result as a label for extraction. Compared with a method for semantic de-duplication labels for comment short sentences, the label extracting method has the advantages that the label contains targets commented by the comment short sentences, and a plurality of pieces of the information of positive comment and negative comment of the comment targets, so that the information of products in a certain respect can be shown in a succinct labeling manner, and the shopping experience of the user is improved.
Description
Technical field
The application relates to data processing field, more particularly, it relates to a kind of tag extraction method of network comment and dress
Put.
Background technology
Along with the Internet and the fast development of ecommerce, the operation flow of the traditional forms of enterprises and the behavioral pattern of consumer all
Create huge change.The experience of online shopping constantly improves so that online shopping becomes more and more popular, almost all of electricity business
Commodity or the service of all encouraging or invite consumer to be bought by them are evaluated, and increasing consumer is also willing to be intended to
The shopping experience of oneself and the quality of purchased commodity is shared on each electricity business's platform.Therefore, the commenting of each product on network
Opinion number increases at a gallop, and for concrete commodity, its comment number is the most thousands of.Iphone 5s with store, Jingdone district
As a example by mobile phone, ending in December, 2015, its user comment is close to 140,000.When on the one hand these substantial amounts of comments are big data
The resource treasured for each electricity business's platform, on the other hand also brings a lot of inconvenience to business and consumer.The comment of magnanimity is made
Becoming consumer's reading difficulty, seldom have consumer to browse the comment of thousands of bar to determine to buy commodity, magnanimity is commented
The value of opinion cannot intuitively embody.
From the comment of magnanimity, how to extract brief effective description, and allow user understand commodity within the shortest time
Important information, traditional mode is to use to be refined into tediously long comment commenting on phrase, then carries out what semantic duplicate removal was extracted
Method.It shows result such as: " everybody writes " of Taobao, " everybody thinks " of popular comment, " buyer's print in store, Jingdone district
As " etc..The defect that the method that this semantic duplicate removal is extracted exists is that the label information that similar commodity extract duplicates, and then impact
The consumption experience of user.
Summary of the invention
The application problem to be addressed is how the tag extraction method that existing network is commented on, and uses the extraction side of semantic duplicate removal
Formula, the description causing similar commodity to extract duplicates, and then affects the problem that customer consumption is experienced.
In order to solve the problems referred to above, it is proposed that scheme as follows:
A kind of tag extraction method of network comment, based on entity knowledge base, described entity knowledge base comprises multiple field
Attribute word, described attribute word for comment short sentence comment object be labeled, described method includes:
Obtain network comment information;
For separator, described network comment information is split with punctuation mark, obtain some comment short sentences;
Mark the emotional category of each described comment short sentence, when the emotional category of described comment short sentence is labeled as neutral emotion
Time, terminate the mark of described comment short sentence, when the emotional category of described comment short sentence is labeled as positive emotion or negative emotion,
Mark the comment object of described comment short sentence;
Add up according to comment object, add up the comment short sentence number that emotional category in same comment object is positive emotion
Amount, and the comment short sentence quantity that emotional category is negative emotion;
Statistical result being extracted as label, described statistical result includes commenting on object, and described comment object
The comment short sentence quantity of corresponding positive emotion and the comment short sentence quantity of negative emotion.
Preferably, the comment object of described mark described comment short sentence includes:
Described comment short sentence is carried out participle and part-of-speech tagging;
Judge in the attribute the word whether word that part of speech annotation results is noun exists described entity knowledge base;
If exist, then by comprise described word comment short sentence comment object marking be described word, if not existing, then
Calculate described word and the similarity of each attribute word in described entity knowledge base respectively;
It is the attribute word maximum with described Words similarity by the comment object marking of the comment short sentence comprising described word.
Preferably, described calculate in described word and described entity knowledge base before the similarity of each attribute word respectively, also
Including:
Judge whether the number of repetition of word described in described network comment information exceedes default word frequency threshold value, if not surpassing
Crossing, then terminate the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence.
Preferably, described calculate in described word and described entity knowledge base after the similarity of each attribute word respectively, also
Including:
Judge whether the maximum similarity value of described word and each attribute word exceedes default similarity threshold, if not surpassing
Crossing, then terminate the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence.
Preferably, the emotional category of each described comment short sentence of described mark includes:
Described comment short sentence is carried out participle, obtains some words;
Judging whether described some words exist in sentiment dictionary, if not existing, then terminating the mark of described comment short sentence
Note, if there being part not exist, then inquires about negative word in the neighbouring several words exist word;
If inquiring negative word into, then the emotion implication of described word is changed the antisense of original emotion implication;
Compare the quantity in described comment short sentence with positive emotion implication word and there is negative emotion implication word
Quantity size between the two, if the quantity with positive emotion implication word is big, then marks the emotion class of described comment short sentence
Not Wei positive emotion, if the quantity with negative emotion implication word is big, then the emotional category marking described comment short sentence is negative
Face emotion, if both are equal, the emotional category marking described comment short sentence is neutral emotion.
The tag extraction device of a kind of network comment, based on entity knowledge base, described entity knowledge base comprises multiple field
Attribute word, described attribute word for comment short sentence comment object be labeled, described device includes:
Comment acquiring unit, is used for obtaining network comment information;
Comment cutting unit, for splitting described network comment information with punctuation mark for separator, if obtaining
Dry comment short sentence;
Emotion tagging unit, for marking the emotional category of each described comment short sentence, when the emotion of described comment short sentence
When classification is labeled as neutral emotion, terminate the mark of described comment short sentence;
Comment object marking unit, for being labeled as positive emotion or negative emotion when the emotional category of described comment short sentence
Time, mark the comment object of described comment short sentence;
Statistic unit, for adding up according to comment object, adding up emotional category in same comment object is front feelings
The comment short sentence quantity of sense, and the comment short sentence quantity that emotional category is negative emotion;
Tag extraction unit, extracts statistical result as label, and described statistical result includes commenting on object, and
The comment short sentence quantity of the positive emotion that described comment object is corresponding and the comment short sentence quantity of negative emotion.
Preferably, described comment object marking unit, including:
Part-of-speech tagging subelement, for carrying out participle and part-of-speech tagging to described comment short sentence;
First judgment sub-unit, for judging whether the word that part of speech annotation results is noun exists described entity knowledge base
Attribute word in, if exist, then by comprise described word comment short sentence comment object marking be described word, if not depositing
, calculate described word and the similarity of each attribute word in described entity knowledge base the most respectively;
Comment object marking subelement, for the comment object marking commenting on short sentence by comprising described word for described
The attribute word that Words similarity is maximum.
Preferably, described comment object marking unit, also include:
Second judgment sub-unit, for judging described in described network comment information, whether the number of repetition of word exceedes pre-
If word frequency threshold value, if not less than, then terminating the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence
Note.
Preferably, described comment object marking unit, also include:
3rd judgment sub-unit, for judging whether the maximum similarity value of described word and each attribute word exceedes default
Similarity threshold, if not less than, then terminating the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence
Note.
Preferably, described Emotion tagging unit, including:
Participle unit, for described comment short sentence is carried out participle, obtains some words;
4th judging unit, being used for judging whether described some words exist in sentiment dictionary, if not existing, then tying
The mark of Shu Suoshu comment short sentence, if there being part not exist, then inquires about negative word, if looking in the neighbouring several words exist word
Ask negative word, then the emotion implication of described word is changed into the antisense of original emotion implication;
Emotion tagging subelement, compares the quantity in described comment short sentence with positive emotion implication word and has negatively
The quantity of emotion implication word size between the two, if having the quantity of positive emotion implication word greatly, marks described comment
The emotional category of short sentence is positive emotion, if having the quantity of negative emotion implication word greatly, marks the feelings of described comment short sentence
Sense classification is negative emotion, if both are equal, the emotional category marking described comment short sentence is neutral emotion.
From above-mentioned technical scheme it can be seen that the tag extraction method of network comment disclosed in the present application, based on entity
Knowledge base, described entity knowledge base comprises the attribute word in multiple field, and described attribute word is for the comment object to comment short sentence
Being labeled, method includes carrying out commenting on object and the mark of emotional category to comment short sentence.Then according to comment object is carried out
Statistics, adds up the comment short sentence quantity that emotional category in same comment object is positive emotion, and emotional category is negative feelings
The comment short sentence quantity of sense, and statistical result is extracted as label.Extract with only comment short sentence being carried out semantic duplicate removal
The method of label is compared, the object commented containing comment short sentence in label, and to commenting front and the negative reviews bar of object
Number information, can be shown the information in a certain respect of commodity with more succinct label form, improve the shopping body of user
Test.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to embodiment or existing
In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this
Some embodiments of application, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to
Other accompanying drawing is obtained according to these accompanying drawings.
Fig. 1 is the tag extraction method schematic diagram of a kind of network comment disclosed in the present embodiment;
Fig. 2 is the method schematic diagram of a kind of comment object marking comment short sentence disclosed in the present embodiment;
Fig. 3 is a kind of method that comment short sentence is carried out emotional category mark disclosed in the present embodiment;
Fig. 4 is the tag extraction device schematic diagram of a kind of network comment disclosed in the present embodiment;
Fig. 5 is a kind of comment object marking cell schematics disclosed in the present embodiment;
Fig. 6 is the schematic diagram of a kind of Emotion tagging unit disclosed in the present embodiment;
Fig. 7 is the displaying schematic diagram after a kind of network comment tag extraction disclosed in the present embodiment.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete
Describe, it is clear that described embodiment is only some embodiments of the present application rather than whole embodiments wholely.Based on
Embodiment in the application, it is every other that those of ordinary skill in the art are obtained under not making creative work premise
Embodiment, broadly falls into the scope of the application protection.
The tag extraction method of the open network comment of the present embodiment, based on an entity knowledge base, entity knowledge base includes
The attribute word in multiple fields, attribute word outward appearance, continuation of the journey, processor and system that such as field of mobile phones is corresponding etc..Attribute word is used
In the comment object commenting on short sentence is labeled, such as, utilize attribute word outward appearance that comment short sentence " outward appearance is the most beautiful " is commented
Opinion object marking is outward appearance.Due to the particularity of network comment language material, the attribute word of different field in the application entity knowledge base
Hierarchical structure is different, and some attribute word levels are more, as the big generic attribute word in field, hotel have food, environment, price, service,
Drinks, distance, the hyponym attribute word of food has raw material, dish etc., and the hyponym attribute word of raw material has meat, vegetable etc., meat
The hyponym attribute word of class has a poultry, domestic animal, wild etc.;But some attribute words is likely not to have hyponym attribute word, such as valency
Lattice.
Due to the particularity of network comment language material, the foundation of entity knowledge base is mainly main to entity with network comment language material
Knowledge base is set up.First the network comment information in each big electricity business's platform, the such as electricity such as Taobao, Jingdone district business's platform are obtained;
Then review information is processed, obtain commenting on short sentence, then carry out the operation of participle and part-of-speech tagging, and then extract all nouns.
By the low-frequency word in noun with after in commodity, few word occurred filters, as the attribute word at all levels of different field,
For setting up entity knowledge base.
Fig. 1 is the tag extraction method schematic diagram of a kind of network comment disclosed in the present embodiment
Seeing Fig. 1, the tag extraction method of network comment, based on entity knowledge base, described entity knowledge base comprises multiple
The attribute word in field, described attribute word is for being labeled the comment object of comment short sentence, and method includes:
Step S11: obtain network comment information.
Such as: obtain Taobao, Jingdone district, where go, take in electricity business's platform such as journey, popular comment certain commodity in certain on-line shop
Network comment information.
Step S12: split described network comment information for separator with punctuation mark, obtains some comments short
Sentence.
Performing, in step S12, network comment information is carried out pretreatment, a network comment information may be from multiple comments
Commodity are commented on by angle, therefore press punctuate ", " ".”?”“!": " etc. symbol be separator, by tediously long network comment information
It is divided into comment short sentence.
Step S13: mark the emotional category of each described comment short sentence, when the emotional category of described comment short sentence is labeled as
During neutral emotion, terminate the mark of described comment short sentence, when the emotional category of described comment short sentence is labeled as positive emotion or negative
During the emotion of face, perform step 14.
Perform the emotional category identification in step S13, user can be obtained to business from the description information of a comment short sentence
The liking of product, do not like or the emotion of neutrality.The emotional category of mark comment short sentence, it is simple to user when shopping to certain of commodity
Individual attribute becomes more apparent upon.And for expressing the comment short sentence of neutral emotion, little for user's reference value, label will not be carried out
Extract.
Step S14: mark the comment object of described comment short sentence.
Performing to comment in step S14 the commentary object marking of short sentence, mark comment object seeks to identify in comment short sentence
Evaluation object, i.e. which angle or the aspect of evaluation object are made comments by commentator.Such as " environment in this family shop is fine "
The environment in Shi Cong restaurant is evaluated, and identifies the evaluation angle environment in comment short sentence, and mark evaluates the short sentence " ring in this family shop
Border is fine " comment object environment.
Step S15: adding up according to comment object, adding up emotional category in same comment object is commenting of positive emotion
Opinion short sentence quantity, and the comment short sentence quantity that emotional category is negative emotion.
Perform the statistical operation in step S15, comment short sentence is carried out emotional category mark and evaluation object has marked
After, add up according to comment object, add up quantity and the table of the comment short sentence expressing positive emotion in each comment object
Reach the quantity of the comment short sentence of negative emotion.
Step S16: statistical result extracted as label, described statistical result includes commenting on object, and described
The comment short sentence quantity of the positive emotion that comment object is corresponding and the comment short sentence quantity of negative emotion.
Perform the tag extraction in step S16, such as: the form " environment (154,145) " of label, i.e. represent network comment
In have 154 express positive emotions comment short sentences, 145 comment short sentences with negative emotion.
The tag extraction method of network comment disclosed in the present embodiment, carries out commenting on object and emotional category to comment short sentence
Mark, and comment emotional category expressed by short sentence.Then according to comment object is added up, add up same comment object
Middle emotional category is the comment short sentence quantity of positive emotion, and the comment short sentence quantity that emotional category is negative emotion, and will
Statistical result is extracted as label.Compared with the method that only comment short sentence is carried out semantic duplicate removal extraction label, in label
The object commented containing comment short sentence, and to commenting front and the negative reviews bar number information of object, can be by certain of commodity
On the one hand information is shown with more succinct label form, improves the purchase experiences of user.
Fig. 2 is the method schematic diagram of a kind of comment object marking comment short sentence disclosed in the present embodiment
When above-mentioned execution step S14 mark comments on the comment object of short sentence, in order to improve the coverage rate of comment short sentence mark,
For comment short sentence does not exist the attribute word of entity knowledge base, the method for semantic similarity is used to carry out commenting on the comment of short sentence
Object marking.Seeing Fig. 2, the method includes:
Step S21: described comment short sentence is carried out participle and part-of-speech tagging.Chinese lexical analysis system can be used
ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System) is right
Comment short sentence carries out participle and part-of-speech tagging, will comment on short sentence and be divided into some words, and be labeled the part of speech of word,
It is labeled as noun, verb etc..
Step S22: judge in the attribute the word whether word that part of speech annotation results is noun exists described entity knowledge base,
If existing, performing step S23, if not existing, performing step S24.
Step S23: by comprise described word comment short sentence comment object marking be described word.
Step S24: calculate described word and the similarity of each attribute word in described entity knowledge base respectively.word2vec
It is an instrument that word is converted into vector form, calculates the similarity in vector space, represent on phrase semantic
Similarity.If word does not exist in the attribute word of knowledge base, then word2vec instrument is utilized to calculate in word and entity knowledge base
The similarity of each attribute word.Then the attribute word maximum with Words similarity is found out.
Step S25: the comment object marking of the comment short sentence of described word will be comprised for maximum with described Words similarity
Attribute word.
In order to improve the accuracy rate of the comment object marking of comment short sentence, can be by arranging word frequency threshold value and/or similar
The mode of degree threshold value operates.If word does not exist in the attribute word of knowledge base, before step 24, judge that word is in network comment
In information, whether number of repetition exceedes default word frequency threshold value, if not less than, then terminate the mark of word correspondence comment short sentence, if
Exceed, then perform step 24.After step 24, it is judged that whether maximum similarity value exceedes default similarity threshold, if not surpassing
Crossing, then terminate the mark of word correspondence comment short sentence, if exceeding, then performing step 25.
In order to verify the impact on the comment object marking of comment short sentence of similarity threshold and frequency threshold, never comprise reality
The language material of the attribute word of body knowledge base takes 500 comment short sentences at random, uses different similarity thresholds and frequency threshold to enter
Row comment object marking, experimental result is as shown in the table:
Similarity threshold | 0 | 0.1 | 0.2 | 0.25 | 0.3 | 0.35 | 0.4 |
Accuracy rate | 0.464 | 0.4914 | 0.5683 | 0.6412 | 0.6585 | 0.7079 | 0.6792 |
Word frequency threshold value | 0 | 10 | 30 | 50 | 80 | 100 | 1000 |
Similarity threshold | 0.35 | 0.35 | 0.35 | 0.35 | 0.35 | 0.35 | 0.35 |
Accuracy rate | 0.7079 | 0.7362 | 0.7624 | 0.7824 | 0.8011 | 0.8032 | 0.8053 |
Fig. 3 is a kind of method that comment short sentence is carried out emotional category mark disclosed in the present embodiment
Seeing Fig. 3, the emotional category method marking each described comment short sentence includes:
Step S31: described comment short sentence is carried out participle, obtains some words.Use ICTCLAS that comment short sentence is carried out
Participle.
Step S32: judge whether described some words exist in sentiment dictionary, if not existing, then terminates institute's commentary
The mark of opinion short sentence, if part does not exists, performs step 33.
The most the more commonly used sentiment dictionary is the Chinese feeling polarities dictionary of Taiwan Univ., the Chinese of Dalian University of Technology
Emotion vocabulary ontology library, and know net sentiment analysis word collection etc..Article one, some words that comment short sentence includes, it is possible to all
Do not exist in sentiment dictionary, it is also possible to all exist in sentiment dictionary, will be unable to carry out emotional category mark if the most not existing, entering
And the mark of commenting on object is the most meaningless, so correspondence is commented on the mark of short sentence.As long as and commenting in short sentence and have a word to go out
Now in sentiment dictionary, the emotional category can expressed comment short sentence judges.
Step S33: inquire about negative word in the neighbouring several words exist word.
Step S34: if inquiring negative word, then change by the emotion implication of described word as the antisense of original emotion implication into.
As " this mobile phone is plain " this sentence comment short sentence finds negative word " no ", then by the pole of this emotion word emotion word " good-looking " is front
Property negates and becomes negative emotion word.
Step S35: the quantity relatively in described comment short sentence with positive emotion implication word contains with having negative emotion
The quantity of justice word size between the two, if the former greatly, marks the emotional category of described comment short sentence is positive emotion, if
It is negative emotion that the latter the most then marks the emotional category of described comment short sentence, if both are equal, marks the feelings of described comment short sentence
Sense classification is neutral emotion.
Fig. 4 is the tag extraction device schematic diagram of a kind of network comment disclosed in the present embodiment
Shown in Figure 4, the tag extraction device of the open network comment of the present embodiment includes:
Comment acquiring unit 11, is used for obtaining network comment information.
Comment cutting unit 12, for splitting described network comment information with punctuation mark for separator, obtains
Some comment short sentences.
Emotion tagging unit 13, for marking the emotional category of each described comment short sentence, when the feelings of described comment short sentence
When sense classification is labeled as neutral emotion, terminate the mark of described comment short sentence.
Comment object marking unit 14, for being labeled as positive emotion or negative feelings when the emotional category of described comment short sentence
During sense, mark the comment object of described comment short sentence.
Statistic unit 15, for adding up according to comment object, adding up emotional category in same comment object is front
The comment short sentence quantity of emotion, and the comment short sentence quantity that emotional category is negative emotion.
Tag extraction unit 16, extracts statistical result as label, and described statistical result includes commenting on object, with
And the comment short sentence quantity of positive emotion corresponding to described comment object and the comment short sentence quantity of negative emotion.
Fig. 5 is a kind of comment object marking cell schematics disclosed in the present embodiment
Shown in Figure 5, comment object marking unit 14 includes:
Part-of-speech tagging subelement 141, for carrying out participle and part-of-speech tagging to described comment short sentence.
First judgment sub-unit 142, for judging whether the word that part of speech annotation results is noun exists described entity and know
Know storehouse attribute word in, if exist, then by comprise described word comment short sentence comment object marking be described word, if not
Exist, calculate described word and the similarity of each attribute word in described entity knowledge base the most respectively.
Comment object marking subelement 143, for by comprise described word comment short sentence comment object marking for
The attribute word that described Words similarity is maximum.
In order to improve the accuracy rate of the comment object marking of comment short sentence, comment object marking unit can also include second
Judgment sub-unit and/or the 3rd judgment sub-unit.Second judgment sub-unit, is used for judging word described in described network comment information
Whether the number of repetition of language exceedes default word frequency threshold value, if not less than, then terminate the mark of described comment short sentence, if exceeding,
Then continue the mark of described comment short sentence.3rd judgment sub-unit, for judging the maximum phase of described word and each attribute word
Default similarity threshold whether is exceeded like angle value, if not less than, then terminating the mark of described comment short sentence, if exceeding, then continuing
The mark of continuous described comment short sentence.
Fig. 6 is the schematic diagram of a kind of Emotion tagging unit disclosed in the present embodiment
Shown in Figure 6, Emotion tagging unit includes:
Participle unit 131, for described comment short sentence is carried out participle, obtains some words.
4th judging unit 132, is used for judging whether described some words exist in sentiment dictionary, if not existing,
Then terminating the mark of described comment short sentence, if there being part not exist, then in the neighbouring several words exist word, inquiring about negative word,
If inquiring negative word into, then the emotion implication of described word is changed the antisense of original emotion implication.
Emotion tagging subelement 133, compares the quantity in described comment short sentence with positive emotion implication word and has
The quantity of negative emotion implication word size between the two, if the former greatly, marks the emotional category of described comment short sentence for just
Face emotion, if the emotional category that the latter greatly, marks described comment short sentence is negative emotion, if both are equal, the commentary of mark institute
The emotional category of opinion short sentence is neutral emotion.
After network comment being carried out tag extraction by network comment tag extraction method disclosed in the present application, use layering
The mode shown, shown in Figure 7, it is label area 1 on the left of the page, right side is comment region 2.Label area is at page open
When show ground floor attribute word label " food (121.111), environment (245.152) ... ", user click on ground floor belong to
Property word label " food (121.111) ", display food next layer of attribute word, click on wherein " wheaten food (5.4) ", then continue display
Next layer of attribute word, last layer of attribute word of through last entity knowledge base design.Attribute word label contain user for
The front of commodity and unfavorable ratings quantity, comment region shows the comment corresponding with clicking on attribute, and wherein front is evaluated and negative
Evaluate and mark by different colors, it is simple to user browses.
For device embodiment, owing to it essentially corresponds to embodiment of the method, so relevant part sees method in fact
The part executing example illustrates.Device embodiment described above is only schematically, wherein said as separating component
The unit illustrated can be or may not be physically separate, and the parts shown as unit can be or can also
It not physical location, i.e. may be located at a place, or can also be distributed on multiple NE.Can be according to reality
Need to select some or all of module therein to realize the purpose of the present embodiment scheme.Those of ordinary skill in the art are not
In the case of paying creative work, i.e. it is appreciated that and implements.
In this article, the relational terms of such as first and second or the like is used merely to an entity or operation with another
One entity or operating space separate, and there is any this reality between not necessarily requiring or imply these entities or operating
Relation or order.And, term " includes ", " comprising " or its any other variant are intended to the bag of nonexcludability
Contain, so that include that the process of a series of key element, method, article or equipment not only include those key elements, but also include
Other key elements being not expressly set out, or also include the key element intrinsic for this process, method, article or equipment.
In the case of there is no more restriction, statement " including ... " key element limited, it is not excluded that including described key element
Process, method, article or equipment in there is also other identical element.
In this specification, each embodiment uses the mode gone forward one by one to describe, and what each embodiment stressed is and other
The difference of embodiment, between each embodiment, identical similar portion sees mutually.
Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses the application.
Multiple amendment to these embodiments will be apparent from for those skilled in the art, as defined herein
General Principle can realize in the case of without departing from spirit herein or scope in other embodiments.Therefore, the application
It is not intended to be limited to the embodiments shown herein, and is to fit to and principles disclosed herein and features of novelty phase one
The widest scope caused.
Claims (10)
1. the tag extraction method of a network comment, it is characterised in that based on entity knowledge base, described entity knowledge base comprises
The attribute word in multiple fields, described attribute word is for being labeled the comment object of comment short sentence, and described method includes:
Obtain network comment information;
For separator, described network comment information is split with punctuation mark, obtain some comment short sentences;
Mark the emotional category of each described comment short sentence, when the emotional category of described comment short sentence is labeled as neutral emotion,
Terminate the mark of described comment short sentence, when the emotional category of described comment short sentence is labeled as positive emotion or negative emotion, mark
Note the comment object of described comment short sentence;
Add up according to comment object, add up the comment short sentence quantity that emotional category in same comment object is positive emotion,
And the comment short sentence quantity that emotional category is negative emotion;
Statistical result being extracted as label, described statistical result includes commenting on object, and described comment object is corresponding
The comment short sentence quantity and the comment short sentence quantity of negative emotion of positive emotion.
Method the most according to claim 1, it is characterised in that the comment object of described mark described comment short sentence includes:
Described comment short sentence is carried out participle and part-of-speech tagging;
Judge in the attribute the word whether word that part of speech annotation results is noun exists described entity knowledge base;
If exist, then by comprise described word comment short sentence comment object marking be described word, if not existing, then distinguish
Calculate described word and the similarity of each attribute word in described entity knowledge base;
It is the attribute word maximum with described Words similarity by the comment object marking of the comment short sentence comprising described word.
Method the most according to claim 2, it is characterised in that described described word and the described entity knowledge base of calculating respectively
In each attribute word similarity before, also include:
Judge whether the number of repetition of word described in described network comment information exceedes default word frequency threshold value, if not less than,
Then terminating the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence.
Method the most according to claim 2, it is characterised in that described described word and the described entity knowledge base of calculating respectively
In each attribute word similarity after, also include:
Judge whether the maximum similarity value of described word and each attribute word exceedes default similarity threshold, if not less than,
Then terminating the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence.
Method the most according to claim 1, it is characterised in that the emotional category bag of each described comment short sentence of described mark
Include:
Described comment short sentence is carried out participle, obtains some words;
Judging whether described some words exist in sentiment dictionary, if not existing, then terminating the mark of described comment short sentence, if
There is part not exist, then in the neighbouring several words exist word, inquire about negative word;
If inquiring negative word into, then the emotion implication of described word is changed the antisense of original emotion implication;
Relatively described comment short sentence has the quantity of positive emotion implication word and the quantity with negative emotion implication word
Size between the two, if the quantity with positive emotion implication word is big, then the emotional category marking described comment short sentence is
Positive emotion, if the quantity with negative emotion implication word is big, then the emotional category marking described comment short sentence is negative feelings
Sense, if both are equal, the emotional category marking described comment short sentence is neutral emotion.
6. the tag extraction device of a network comment, it is characterised in that based on entity knowledge base, described entity knowledge base comprises
The attribute word in multiple fields, described attribute word is for being labeled the comment object of comment short sentence, and described device includes:
Comment acquiring unit, is used for obtaining network comment information;
Comment cutting unit, for splitting described network comment information with punctuation mark for separator, obtains some commenting
Opinion short sentence;
Emotion tagging unit, for marking the emotional category of each described comment short sentence, when the emotional category of described comment short sentence
When being labeled as neutral emotion, terminate the mark of described comment short sentence;
Comment object marking unit, is used for when the emotional category of described comment short sentence is labeled as positive emotion or negative emotion,
Mark the comment object of described comment short sentence;
Statistic unit, for adding up according to comment object, adding up emotional category in same comment object is positive emotion
Comment short sentence quantity, and the comment short sentence quantity that emotional category is negative emotion;
Tag extraction unit, extracts statistical result as label, and described statistical result includes commenting on object, and described
The comment short sentence quantity of the positive emotion that comment object is corresponding and the comment short sentence quantity of negative emotion.
Device the most according to claim 6, it is characterised in that described comment object marking unit, including:
Part-of-speech tagging subelement, for carrying out participle and part-of-speech tagging to described comment short sentence;
First judgment sub-unit, for judging whether the word that part of speech annotation results is noun exists the genus of described entity knowledge base
Property word in, if exist, then by comprise described word comment short sentence comment object marking be described word, if not existing, then
Calculate described word and the similarity of each attribute word in described entity knowledge base respectively;
Comment object marking subelement, for by the comment object marking of the comment comprising described word short sentence being and described word
The attribute word that similarity is maximum.
Device the most according to claim 7, it is characterised in that described comment object marking unit, also includes:
Second judgment sub-unit, for judging described in described network comment information, whether the number of repetition of word exceedes default
Word frequency threshold value, if not less than, then terminating the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence.
Method the most according to claim 7, it is characterised in that described comment object marking unit, also includes:
3rd judgment sub-unit, for judging whether the maximum similarity value of described word and each attribute word exceedes default phase
Seemingly spend threshold value, if not less than, then terminating the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence.
Method the most according to claim 6, it is characterised in that described Emotion tagging unit, including:
Participle unit, for described comment short sentence is carried out participle, obtains some words;
4th judging unit, is used for judging whether described some words exist in sentiment dictionary, if not existing, then terminates institute
The mark of commentary opinion short sentence, if there being part not exist, then inquires about negative word, if inquiring in the neighbouring several words exist word
Negative word, then change by the emotion implication of described word as the antisense of original emotion implication into;
Emotion tagging subelement, compares the quantity in described comment short sentence with positive emotion implication word and has negative emotion
The quantity of implication word size between the two, if having the quantity of positive emotion implication word greatly, marks described comment short sentence
Emotional category be positive emotion, if there is the quantity of negative emotion implication word greatly, mark the emotion class of described comment short sentence
Not Wei negative emotion, if both are equal, mark the emotional category of described comment short sentence for neutral emotion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610143169.1A CN105824898A (en) | 2016-03-14 | 2016-03-14 | Label extracting method and device for network comments |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610143169.1A CN105824898A (en) | 2016-03-14 | 2016-03-14 | Label extracting method and device for network comments |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105824898A true CN105824898A (en) | 2016-08-03 |
Family
ID=56988091
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610143169.1A Pending CN105824898A (en) | 2016-03-14 | 2016-03-14 | Label extracting method and device for network comments |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105824898A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107247709A (en) * | 2017-07-28 | 2017-10-13 | 广州多益网络股份有限公司 | The optimization method and system of a kind of encyclopaedia entry label |
CN107436922A (en) * | 2017-07-05 | 2017-12-05 | 北京百度网讯科技有限公司 | Text label generation method and device |
CN107491531A (en) * | 2017-08-18 | 2017-12-19 | 华南师范大学 | Chinese network comment sensibility classification method based on integrated study framework |
CN108038725A (en) * | 2017-12-04 | 2018-05-15 | 中国计量大学 | A kind of electric business Customer Satisfaction for Product analysis method based on machine learning |
CN109522412A (en) * | 2018-11-14 | 2019-03-26 | 北京神州泰岳软件股份有限公司 | Text emotion analysis method, device and medium |
CN109684641A (en) * | 2018-12-26 | 2019-04-26 | 广东工业大学 | A kind of data extraction device, method, electronic equipment and storage medium |
CN109829033A (en) * | 2017-11-23 | 2019-05-31 | 阿里巴巴集团控股有限公司 | Method for exhibiting data and terminal device |
CN109885687A (en) * | 2018-12-29 | 2019-06-14 | 深兰科技(上海)有限公司 | A kind of sentiment analysis method, apparatus, electronic equipment and the storage medium of text |
KR20190104154A (en) * | 2017-01-18 | 2019-09-06 | 알리바바 그룹 홀딩 리미티드 | How to display service objects, how to handle map data, clients and servers |
CN110378725A (en) * | 2019-06-28 | 2019-10-25 | 联想(北京)有限公司 | A kind of information processing method, terminal and storage medium |
CN110490663A (en) * | 2019-08-23 | 2019-11-22 | 联想(北京)有限公司 | A kind of data processing method, device and electronic equipment |
CN112215014A (en) * | 2020-10-13 | 2021-01-12 | 平安国际智慧城市科技股份有限公司 | Portrait generation method, apparatus, medium and device based on user comment |
CN112800180A (en) * | 2021-02-04 | 2021-05-14 | 北京易车互联信息技术有限公司 | Automatic extraction scheme of comment text labels |
CN114398473A (en) * | 2022-01-19 | 2022-04-26 | 平安国际智慧城市科技股份有限公司 | Enterprise portrait generation method, device, server and storage medium |
CN114692644A (en) * | 2022-03-11 | 2022-07-01 | 粤港澳大湾区数字经济研究院(福田) | Text entity labeling method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678335A (en) * | 2012-09-05 | 2014-03-26 | 阿里巴巴集团控股有限公司 | Method and device for identifying commodity with labels and method for commodity navigation |
CN104933130A (en) * | 2015-06-12 | 2015-09-23 | 百度在线网络技术(北京)有限公司 | Comment information marking method and comment information marking device |
CN105095288A (en) * | 2014-05-14 | 2015-11-25 | 腾讯科技(深圳)有限公司 | Data analysis method and data analysis device |
CN105354183A (en) * | 2015-10-19 | 2016-02-24 | Tcl集团股份有限公司 | Analytic method, apparatus and system for internet comments of household electrical appliance products |
-
2016
- 2016-03-14 CN CN201610143169.1A patent/CN105824898A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678335A (en) * | 2012-09-05 | 2014-03-26 | 阿里巴巴集团控股有限公司 | Method and device for identifying commodity with labels and method for commodity navigation |
CN105095288A (en) * | 2014-05-14 | 2015-11-25 | 腾讯科技(深圳)有限公司 | Data analysis method and data analysis device |
CN104933130A (en) * | 2015-06-12 | 2015-09-23 | 百度在线网络技术(北京)有限公司 | Comment information marking method and comment information marking device |
CN105354183A (en) * | 2015-10-19 | 2016-02-24 | Tcl集团股份有限公司 | Analytic method, apparatus and system for internet comments of household electrical appliance products |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20190104154A (en) * | 2017-01-18 | 2019-09-06 | 알리바바 그룹 홀딩 리미티드 | How to display service objects, how to handle map data, clients and servers |
JP7175276B2 (en) | 2017-01-18 | 2022-11-18 | アリババ・グループ・ホールディング・リミテッド | Method, Client and Server for Displaying Service Objects and Processing Map Data |
KR102446246B1 (en) * | 2017-01-18 | 2022-09-22 | 알리바바 그룹 홀딩 리미티드 | Service object display method, map data processing method, client and server |
JP2020509453A (en) * | 2017-01-18 | 2020-03-26 | アリババ・グループ・ホールディング・リミテッドAlibaba Group Holding Limited | Method for displaying service objects and processing map data, client and server |
CN107436922A (en) * | 2017-07-05 | 2017-12-05 | 北京百度网讯科技有限公司 | Text label generation method and device |
CN107436922B (en) * | 2017-07-05 | 2021-06-08 | 北京百度网讯科技有限公司 | Text label generation method and device |
CN107247709B (en) * | 2017-07-28 | 2021-03-16 | 广州多益网络股份有限公司 | Encyclopedic entry label optimization method and system |
CN107247709A (en) * | 2017-07-28 | 2017-10-13 | 广州多益网络股份有限公司 | The optimization method and system of a kind of encyclopaedia entry label |
CN107491531B (en) * | 2017-08-18 | 2019-05-17 | 华南师范大学 | Chinese network comment sensibility classification method based on integrated study frame |
CN107491531A (en) * | 2017-08-18 | 2017-12-19 | 华南师范大学 | Chinese network comment sensibility classification method based on integrated study framework |
CN109829033B (en) * | 2017-11-23 | 2023-04-18 | 阿里巴巴集团控股有限公司 | Data display method and terminal equipment |
CN109829033A (en) * | 2017-11-23 | 2019-05-31 | 阿里巴巴集团控股有限公司 | Method for exhibiting data and terminal device |
CN108038725A (en) * | 2017-12-04 | 2018-05-15 | 中国计量大学 | A kind of electric business Customer Satisfaction for Product analysis method based on machine learning |
CN109522412A (en) * | 2018-11-14 | 2019-03-26 | 北京神州泰岳软件股份有限公司 | Text emotion analysis method, device and medium |
CN109522412B (en) * | 2018-11-14 | 2021-02-26 | 鼎富智能科技有限公司 | Text emotion analysis method, device and medium |
CN109684641A (en) * | 2018-12-26 | 2019-04-26 | 广东工业大学 | A kind of data extraction device, method, electronic equipment and storage medium |
CN109684641B (en) * | 2018-12-26 | 2023-04-07 | 广东工业大学 | Data extraction device and method, electronic equipment and storage medium |
CN109885687A (en) * | 2018-12-29 | 2019-06-14 | 深兰科技(上海)有限公司 | A kind of sentiment analysis method, apparatus, electronic equipment and the storage medium of text |
CN110378725A (en) * | 2019-06-28 | 2019-10-25 | 联想(北京)有限公司 | A kind of information processing method, terminal and storage medium |
CN110490663A (en) * | 2019-08-23 | 2019-11-22 | 联想(北京)有限公司 | A kind of data processing method, device and electronic equipment |
CN112215014A (en) * | 2020-10-13 | 2021-01-12 | 平安国际智慧城市科技股份有限公司 | Portrait generation method, apparatus, medium and device based on user comment |
CN112800180A (en) * | 2021-02-04 | 2021-05-14 | 北京易车互联信息技术有限公司 | Automatic extraction scheme of comment text labels |
CN114398473A (en) * | 2022-01-19 | 2022-04-26 | 平安国际智慧城市科技股份有限公司 | Enterprise portrait generation method, device, server and storage medium |
CN114692644A (en) * | 2022-03-11 | 2022-07-01 | 粤港澳大湾区数字经济研究院(福田) | Text entity labeling method, device, equipment and storage medium |
CN114692644B (en) * | 2022-03-11 | 2024-06-11 | 粤港澳大湾区数字经济研究院(福田) | Text entity labeling method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105824898A (en) | Label extracting method and device for network comments | |
Barbosa et al. | Robust sentiment detection on twitter from biased and noisy data | |
CN103729359B (en) | A kind of method and system recommending search word | |
CN104102626B (en) | A kind of method for short text Semantic Similarity Measurement | |
CN108628833B (en) | Method and device for determining summary of original content and method and device for recommending original content | |
CN104008186B (en) | The method and apparatus that keyword is determined from target text | |
CN104111941B (en) | The method and apparatus that information is shown | |
CN105868185A (en) | Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis | |
CN104715049B (en) | Comment on commodity attribute word abstracting method based on body dictionary | |
CN103377249A (en) | Keyword putting method and system | |
CN103365904B (en) | A kind of advertising message searching method and system | |
CN106294425A (en) | The automatic image-text method of abstracting of commodity network of relation article and system | |
CN105468649B (en) | Method and device for judging matching of objects to be displayed | |
CN104751354B (en) | A kind of advertisement crowd screening technique | |
CN108763321A (en) | A kind of related entities recommendation method based on extensive related entities network | |
CN106776860A (en) | One kind search abstraction generating method and device | |
CN109960756A (en) | Media event information inductive method | |
CN105630768A (en) | Cascaded conditional random field-based product name recognition method and device | |
CN105740382A (en) | Aspect classification method for short comment texts | |
CN110321549B (en) | New concept mining method based on sequential learning, relation mining and time sequence analysis | |
CN105955957A (en) | Determining method and device for aspect score in general comment of merchant | |
CN110399614A (en) | System and method for the identification of true product word | |
Meng et al. | Mining user reviews: from specification to summarization | |
CN106339898A (en) | Product innovation method based on internet big data | |
Wu et al. | Keyword extraction for contextual advertisement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Chen Wenliang Inventor after: Ma Chunping Inventor after: Zhang Min Inventor before: Chen Wenliang Inventor before: Ma Chunping |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160803 |