CN105824898A - Label extracting method and device for network comments - Google Patents

Label extracting method and device for network comments Download PDF

Info

Publication number
CN105824898A
CN105824898A CN201610143169.1A CN201610143169A CN105824898A CN 105824898 A CN105824898 A CN 105824898A CN 201610143169 A CN201610143169 A CN 201610143169A CN 105824898 A CN105824898 A CN 105824898A
Authority
CN
China
Prior art keywords
comment
word
short sentence
emotion
mark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610143169.1A
Other languages
Chinese (zh)
Inventor
陈文亮
马春平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201610143169.1A priority Critical patent/CN105824898A/en
Publication of CN105824898A publication Critical patent/CN105824898A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a label extracting method and device for network comments. The label extracting method comprises the following steps: marking comment targets and feeling categories for comment short sentences; according to the comment targets, performing counting, counting the number of comment short sentences of which the feeling categories are positive feeling and the number of comment short sentences of which the feeling categories are negative feeling in the same comment target, and using the statistical result as a label for extraction. Compared with a method for semantic de-duplication labels for comment short sentences, the label extracting method has the advantages that the label contains targets commented by the comment short sentences, and a plurality of pieces of the information of positive comment and negative comment of the comment targets, so that the information of products in a certain respect can be shown in a succinct labeling manner, and the shopping experience of the user is improved.

Description

A kind of tag extraction method and apparatus of network comment
Technical field
The application relates to data processing field, more particularly, it relates to a kind of tag extraction method of network comment and dress Put.
Background technology
Along with the Internet and the fast development of ecommerce, the operation flow of the traditional forms of enterprises and the behavioral pattern of consumer all Create huge change.The experience of online shopping constantly improves so that online shopping becomes more and more popular, almost all of electricity business Commodity or the service of all encouraging or invite consumer to be bought by them are evaluated, and increasing consumer is also willing to be intended to The shopping experience of oneself and the quality of purchased commodity is shared on each electricity business's platform.Therefore, the commenting of each product on network Opinion number increases at a gallop, and for concrete commodity, its comment number is the most thousands of.Iphone 5s with store, Jingdone district As a example by mobile phone, ending in December, 2015, its user comment is close to 140,000.When on the one hand these substantial amounts of comments are big data The resource treasured for each electricity business's platform, on the other hand also brings a lot of inconvenience to business and consumer.The comment of magnanimity is made Becoming consumer's reading difficulty, seldom have consumer to browse the comment of thousands of bar to determine to buy commodity, magnanimity is commented The value of opinion cannot intuitively embody.
From the comment of magnanimity, how to extract brief effective description, and allow user understand commodity within the shortest time Important information, traditional mode is to use to be refined into tediously long comment commenting on phrase, then carries out what semantic duplicate removal was extracted Method.It shows result such as: " everybody writes " of Taobao, " everybody thinks " of popular comment, " buyer's print in store, Jingdone district As " etc..The defect that the method that this semantic duplicate removal is extracted exists is that the label information that similar commodity extract duplicates, and then impact The consumption experience of user.
Summary of the invention
The application problem to be addressed is how the tag extraction method that existing network is commented on, and uses the extraction side of semantic duplicate removal Formula, the description causing similar commodity to extract duplicates, and then affects the problem that customer consumption is experienced.
In order to solve the problems referred to above, it is proposed that scheme as follows:
A kind of tag extraction method of network comment, based on entity knowledge base, described entity knowledge base comprises multiple field Attribute word, described attribute word for comment short sentence comment object be labeled, described method includes:
Obtain network comment information;
For separator, described network comment information is split with punctuation mark, obtain some comment short sentences;
Mark the emotional category of each described comment short sentence, when the emotional category of described comment short sentence is labeled as neutral emotion Time, terminate the mark of described comment short sentence, when the emotional category of described comment short sentence is labeled as positive emotion or negative emotion, Mark the comment object of described comment short sentence;
Add up according to comment object, add up the comment short sentence number that emotional category in same comment object is positive emotion Amount, and the comment short sentence quantity that emotional category is negative emotion;
Statistical result being extracted as label, described statistical result includes commenting on object, and described comment object The comment short sentence quantity of corresponding positive emotion and the comment short sentence quantity of negative emotion.
Preferably, the comment object of described mark described comment short sentence includes:
Described comment short sentence is carried out participle and part-of-speech tagging;
Judge in the attribute the word whether word that part of speech annotation results is noun exists described entity knowledge base;
If exist, then by comprise described word comment short sentence comment object marking be described word, if not existing, then Calculate described word and the similarity of each attribute word in described entity knowledge base respectively;
It is the attribute word maximum with described Words similarity by the comment object marking of the comment short sentence comprising described word.
Preferably, described calculate in described word and described entity knowledge base before the similarity of each attribute word respectively, also Including:
Judge whether the number of repetition of word described in described network comment information exceedes default word frequency threshold value, if not surpassing Crossing, then terminate the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence.
Preferably, described calculate in described word and described entity knowledge base after the similarity of each attribute word respectively, also Including:
Judge whether the maximum similarity value of described word and each attribute word exceedes default similarity threshold, if not surpassing Crossing, then terminate the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence.
Preferably, the emotional category of each described comment short sentence of described mark includes:
Described comment short sentence is carried out participle, obtains some words;
Judging whether described some words exist in sentiment dictionary, if not existing, then terminating the mark of described comment short sentence Note, if there being part not exist, then inquires about negative word in the neighbouring several words exist word;
If inquiring negative word into, then the emotion implication of described word is changed the antisense of original emotion implication;
Compare the quantity in described comment short sentence with positive emotion implication word and there is negative emotion implication word Quantity size between the two, if the quantity with positive emotion implication word is big, then marks the emotion class of described comment short sentence Not Wei positive emotion, if the quantity with negative emotion implication word is big, then the emotional category marking described comment short sentence is negative Face emotion, if both are equal, the emotional category marking described comment short sentence is neutral emotion.
The tag extraction device of a kind of network comment, based on entity knowledge base, described entity knowledge base comprises multiple field Attribute word, described attribute word for comment short sentence comment object be labeled, described device includes:
Comment acquiring unit, is used for obtaining network comment information;
Comment cutting unit, for splitting described network comment information with punctuation mark for separator, if obtaining Dry comment short sentence;
Emotion tagging unit, for marking the emotional category of each described comment short sentence, when the emotion of described comment short sentence When classification is labeled as neutral emotion, terminate the mark of described comment short sentence;
Comment object marking unit, for being labeled as positive emotion or negative emotion when the emotional category of described comment short sentence Time, mark the comment object of described comment short sentence;
Statistic unit, for adding up according to comment object, adding up emotional category in same comment object is front feelings The comment short sentence quantity of sense, and the comment short sentence quantity that emotional category is negative emotion;
Tag extraction unit, extracts statistical result as label, and described statistical result includes commenting on object, and The comment short sentence quantity of the positive emotion that described comment object is corresponding and the comment short sentence quantity of negative emotion.
Preferably, described comment object marking unit, including:
Part-of-speech tagging subelement, for carrying out participle and part-of-speech tagging to described comment short sentence;
First judgment sub-unit, for judging whether the word that part of speech annotation results is noun exists described entity knowledge base Attribute word in, if exist, then by comprise described word comment short sentence comment object marking be described word, if not depositing , calculate described word and the similarity of each attribute word in described entity knowledge base the most respectively;
Comment object marking subelement, for the comment object marking commenting on short sentence by comprising described word for described The attribute word that Words similarity is maximum.
Preferably, described comment object marking unit, also include:
Second judgment sub-unit, for judging described in described network comment information, whether the number of repetition of word exceedes pre- If word frequency threshold value, if not less than, then terminating the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence Note.
Preferably, described comment object marking unit, also include:
3rd judgment sub-unit, for judging whether the maximum similarity value of described word and each attribute word exceedes default Similarity threshold, if not less than, then terminating the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence Note.
Preferably, described Emotion tagging unit, including:
Participle unit, for described comment short sentence is carried out participle, obtains some words;
4th judging unit, being used for judging whether described some words exist in sentiment dictionary, if not existing, then tying The mark of Shu Suoshu comment short sentence, if there being part not exist, then inquires about negative word, if looking in the neighbouring several words exist word Ask negative word, then the emotion implication of described word is changed into the antisense of original emotion implication;
Emotion tagging subelement, compares the quantity in described comment short sentence with positive emotion implication word and has negatively The quantity of emotion implication word size between the two, if having the quantity of positive emotion implication word greatly, marks described comment The emotional category of short sentence is positive emotion, if having the quantity of negative emotion implication word greatly, marks the feelings of described comment short sentence Sense classification is negative emotion, if both are equal, the emotional category marking described comment short sentence is neutral emotion.
From above-mentioned technical scheme it can be seen that the tag extraction method of network comment disclosed in the present application, based on entity Knowledge base, described entity knowledge base comprises the attribute word in multiple field, and described attribute word is for the comment object to comment short sentence Being labeled, method includes carrying out commenting on object and the mark of emotional category to comment short sentence.Then according to comment object is carried out Statistics, adds up the comment short sentence quantity that emotional category in same comment object is positive emotion, and emotional category is negative feelings The comment short sentence quantity of sense, and statistical result is extracted as label.Extract with only comment short sentence being carried out semantic duplicate removal The method of label is compared, the object commented containing comment short sentence in label, and to commenting front and the negative reviews bar of object Number information, can be shown the information in a certain respect of commodity with more succinct label form, improve the shopping body of user Test.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this Some embodiments of application, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to Other accompanying drawing is obtained according to these accompanying drawings.
Fig. 1 is the tag extraction method schematic diagram of a kind of network comment disclosed in the present embodiment;
Fig. 2 is the method schematic diagram of a kind of comment object marking comment short sentence disclosed in the present embodiment;
Fig. 3 is a kind of method that comment short sentence is carried out emotional category mark disclosed in the present embodiment;
Fig. 4 is the tag extraction device schematic diagram of a kind of network comment disclosed in the present embodiment;
Fig. 5 is a kind of comment object marking cell schematics disclosed in the present embodiment;
Fig. 6 is the schematic diagram of a kind of Emotion tagging unit disclosed in the present embodiment;
Fig. 7 is the displaying schematic diagram after a kind of network comment tag extraction disclosed in the present embodiment.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete Describe, it is clear that described embodiment is only some embodiments of the present application rather than whole embodiments wholely.Based on Embodiment in the application, it is every other that those of ordinary skill in the art are obtained under not making creative work premise Embodiment, broadly falls into the scope of the application protection.
The tag extraction method of the open network comment of the present embodiment, based on an entity knowledge base, entity knowledge base includes The attribute word in multiple fields, attribute word outward appearance, continuation of the journey, processor and system that such as field of mobile phones is corresponding etc..Attribute word is used In the comment object commenting on short sentence is labeled, such as, utilize attribute word outward appearance that comment short sentence " outward appearance is the most beautiful " is commented Opinion object marking is outward appearance.Due to the particularity of network comment language material, the attribute word of different field in the application entity knowledge base Hierarchical structure is different, and some attribute word levels are more, as the big generic attribute word in field, hotel have food, environment, price, service, Drinks, distance, the hyponym attribute word of food has raw material, dish etc., and the hyponym attribute word of raw material has meat, vegetable etc., meat The hyponym attribute word of class has a poultry, domestic animal, wild etc.;But some attribute words is likely not to have hyponym attribute word, such as valency Lattice.
Due to the particularity of network comment language material, the foundation of entity knowledge base is mainly main to entity with network comment language material Knowledge base is set up.First the network comment information in each big electricity business's platform, the such as electricity such as Taobao, Jingdone district business's platform are obtained; Then review information is processed, obtain commenting on short sentence, then carry out the operation of participle and part-of-speech tagging, and then extract all nouns. By the low-frequency word in noun with after in commodity, few word occurred filters, as the attribute word at all levels of different field, For setting up entity knowledge base.
Fig. 1 is the tag extraction method schematic diagram of a kind of network comment disclosed in the present embodiment
Seeing Fig. 1, the tag extraction method of network comment, based on entity knowledge base, described entity knowledge base comprises multiple The attribute word in field, described attribute word is for being labeled the comment object of comment short sentence, and method includes:
Step S11: obtain network comment information.
Such as: obtain Taobao, Jingdone district, where go, take in electricity business's platform such as journey, popular comment certain commodity in certain on-line shop Network comment information.
Step S12: split described network comment information for separator with punctuation mark, obtains some comments short Sentence.
Performing, in step S12, network comment information is carried out pretreatment, a network comment information may be from multiple comments Commodity are commented on by angle, therefore press punctuate ", " ".”?”“!": " etc. symbol be separator, by tediously long network comment information It is divided into comment short sentence.
Step S13: mark the emotional category of each described comment short sentence, when the emotional category of described comment short sentence is labeled as During neutral emotion, terminate the mark of described comment short sentence, when the emotional category of described comment short sentence is labeled as positive emotion or negative During the emotion of face, perform step 14.
Perform the emotional category identification in step S13, user can be obtained to business from the description information of a comment short sentence The liking of product, do not like or the emotion of neutrality.The emotional category of mark comment short sentence, it is simple to user when shopping to certain of commodity Individual attribute becomes more apparent upon.And for expressing the comment short sentence of neutral emotion, little for user's reference value, label will not be carried out Extract.
Step S14: mark the comment object of described comment short sentence.
Performing to comment in step S14 the commentary object marking of short sentence, mark comment object seeks to identify in comment short sentence Evaluation object, i.e. which angle or the aspect of evaluation object are made comments by commentator.Such as " environment in this family shop is fine " The environment in Shi Cong restaurant is evaluated, and identifies the evaluation angle environment in comment short sentence, and mark evaluates the short sentence " ring in this family shop Border is fine " comment object environment.
Step S15: adding up according to comment object, adding up emotional category in same comment object is commenting of positive emotion Opinion short sentence quantity, and the comment short sentence quantity that emotional category is negative emotion.
Perform the statistical operation in step S15, comment short sentence is carried out emotional category mark and evaluation object has marked After, add up according to comment object, add up quantity and the table of the comment short sentence expressing positive emotion in each comment object Reach the quantity of the comment short sentence of negative emotion.
Step S16: statistical result extracted as label, described statistical result includes commenting on object, and described The comment short sentence quantity of the positive emotion that comment object is corresponding and the comment short sentence quantity of negative emotion.
Perform the tag extraction in step S16, such as: the form " environment (154,145) " of label, i.e. represent network comment In have 154 express positive emotions comment short sentences, 145 comment short sentences with negative emotion.
The tag extraction method of network comment disclosed in the present embodiment, carries out commenting on object and emotional category to comment short sentence Mark, and comment emotional category expressed by short sentence.Then according to comment object is added up, add up same comment object Middle emotional category is the comment short sentence quantity of positive emotion, and the comment short sentence quantity that emotional category is negative emotion, and will Statistical result is extracted as label.Compared with the method that only comment short sentence is carried out semantic duplicate removal extraction label, in label The object commented containing comment short sentence, and to commenting front and the negative reviews bar number information of object, can be by certain of commodity On the one hand information is shown with more succinct label form, improves the purchase experiences of user.
Fig. 2 is the method schematic diagram of a kind of comment object marking comment short sentence disclosed in the present embodiment
When above-mentioned execution step S14 mark comments on the comment object of short sentence, in order to improve the coverage rate of comment short sentence mark, For comment short sentence does not exist the attribute word of entity knowledge base, the method for semantic similarity is used to carry out commenting on the comment of short sentence Object marking.Seeing Fig. 2, the method includes:
Step S21: described comment short sentence is carried out participle and part-of-speech tagging.Chinese lexical analysis system can be used ICTCLAS (Institute of Computing Technology, Chinese Lexical Analysis System) is right Comment short sentence carries out participle and part-of-speech tagging, will comment on short sentence and be divided into some words, and be labeled the part of speech of word, It is labeled as noun, verb etc..
Step S22: judge in the attribute the word whether word that part of speech annotation results is noun exists described entity knowledge base, If existing, performing step S23, if not existing, performing step S24.
Step S23: by comprise described word comment short sentence comment object marking be described word.
Step S24: calculate described word and the similarity of each attribute word in described entity knowledge base respectively.word2vec It is an instrument that word is converted into vector form, calculates the similarity in vector space, represent on phrase semantic Similarity.If word does not exist in the attribute word of knowledge base, then word2vec instrument is utilized to calculate in word and entity knowledge base The similarity of each attribute word.Then the attribute word maximum with Words similarity is found out.
Step S25: the comment object marking of the comment short sentence of described word will be comprised for maximum with described Words similarity Attribute word.
In order to improve the accuracy rate of the comment object marking of comment short sentence, can be by arranging word frequency threshold value and/or similar The mode of degree threshold value operates.If word does not exist in the attribute word of knowledge base, before step 24, judge that word is in network comment In information, whether number of repetition exceedes default word frequency threshold value, if not less than, then terminate the mark of word correspondence comment short sentence, if Exceed, then perform step 24.After step 24, it is judged that whether maximum similarity value exceedes default similarity threshold, if not surpassing Crossing, then terminate the mark of word correspondence comment short sentence, if exceeding, then performing step 25.
In order to verify the impact on the comment object marking of comment short sentence of similarity threshold and frequency threshold, never comprise reality The language material of the attribute word of body knowledge base takes 500 comment short sentences at random, uses different similarity thresholds and frequency threshold to enter Row comment object marking, experimental result is as shown in the table:
Similarity threshold 0 0.1 0.2 0.25 0.3 0.35 0.4
Accuracy rate 0.464 0.4914 0.5683 0.6412 0.6585 0.7079 0.6792
Word frequency threshold value 0 10 30 50 80 100 1000
Similarity threshold 0.35 0.35 0.35 0.35 0.35 0.35 0.35
Accuracy rate 0.7079 0.7362 0.7624 0.7824 0.8011 0.8032 0.8053
Fig. 3 is a kind of method that comment short sentence is carried out emotional category mark disclosed in the present embodiment
Seeing Fig. 3, the emotional category method marking each described comment short sentence includes:
Step S31: described comment short sentence is carried out participle, obtains some words.Use ICTCLAS that comment short sentence is carried out Participle.
Step S32: judge whether described some words exist in sentiment dictionary, if not existing, then terminates institute's commentary The mark of opinion short sentence, if part does not exists, performs step 33.
The most the more commonly used sentiment dictionary is the Chinese feeling polarities dictionary of Taiwan Univ., the Chinese of Dalian University of Technology Emotion vocabulary ontology library, and know net sentiment analysis word collection etc..Article one, some words that comment short sentence includes, it is possible to all Do not exist in sentiment dictionary, it is also possible to all exist in sentiment dictionary, will be unable to carry out emotional category mark if the most not existing, entering And the mark of commenting on object is the most meaningless, so correspondence is commented on the mark of short sentence.As long as and commenting in short sentence and have a word to go out Now in sentiment dictionary, the emotional category can expressed comment short sentence judges.
Step S33: inquire about negative word in the neighbouring several words exist word.
Step S34: if inquiring negative word, then change by the emotion implication of described word as the antisense of original emotion implication into. As " this mobile phone is plain " this sentence comment short sentence finds negative word " no ", then by the pole of this emotion word emotion word " good-looking " is front Property negates and becomes negative emotion word.
Step S35: the quantity relatively in described comment short sentence with positive emotion implication word contains with having negative emotion The quantity of justice word size between the two, if the former greatly, marks the emotional category of described comment short sentence is positive emotion, if It is negative emotion that the latter the most then marks the emotional category of described comment short sentence, if both are equal, marks the feelings of described comment short sentence Sense classification is neutral emotion.
Fig. 4 is the tag extraction device schematic diagram of a kind of network comment disclosed in the present embodiment
Shown in Figure 4, the tag extraction device of the open network comment of the present embodiment includes:
Comment acquiring unit 11, is used for obtaining network comment information.
Comment cutting unit 12, for splitting described network comment information with punctuation mark for separator, obtains Some comment short sentences.
Emotion tagging unit 13, for marking the emotional category of each described comment short sentence, when the feelings of described comment short sentence When sense classification is labeled as neutral emotion, terminate the mark of described comment short sentence.
Comment object marking unit 14, for being labeled as positive emotion or negative feelings when the emotional category of described comment short sentence During sense, mark the comment object of described comment short sentence.
Statistic unit 15, for adding up according to comment object, adding up emotional category in same comment object is front The comment short sentence quantity of emotion, and the comment short sentence quantity that emotional category is negative emotion.
Tag extraction unit 16, extracts statistical result as label, and described statistical result includes commenting on object, with And the comment short sentence quantity of positive emotion corresponding to described comment object and the comment short sentence quantity of negative emotion.
Fig. 5 is a kind of comment object marking cell schematics disclosed in the present embodiment
Shown in Figure 5, comment object marking unit 14 includes:
Part-of-speech tagging subelement 141, for carrying out participle and part-of-speech tagging to described comment short sentence.
First judgment sub-unit 142, for judging whether the word that part of speech annotation results is noun exists described entity and know Know storehouse attribute word in, if exist, then by comprise described word comment short sentence comment object marking be described word, if not Exist, calculate described word and the similarity of each attribute word in described entity knowledge base the most respectively.
Comment object marking subelement 143, for by comprise described word comment short sentence comment object marking for The attribute word that described Words similarity is maximum.
In order to improve the accuracy rate of the comment object marking of comment short sentence, comment object marking unit can also include second Judgment sub-unit and/or the 3rd judgment sub-unit.Second judgment sub-unit, is used for judging word described in described network comment information Whether the number of repetition of language exceedes default word frequency threshold value, if not less than, then terminate the mark of described comment short sentence, if exceeding, Then continue the mark of described comment short sentence.3rd judgment sub-unit, for judging the maximum phase of described word and each attribute word Default similarity threshold whether is exceeded like angle value, if not less than, then terminating the mark of described comment short sentence, if exceeding, then continuing The mark of continuous described comment short sentence.
Fig. 6 is the schematic diagram of a kind of Emotion tagging unit disclosed in the present embodiment
Shown in Figure 6, Emotion tagging unit includes:
Participle unit 131, for described comment short sentence is carried out participle, obtains some words.
4th judging unit 132, is used for judging whether described some words exist in sentiment dictionary, if not existing, Then terminating the mark of described comment short sentence, if there being part not exist, then in the neighbouring several words exist word, inquiring about negative word, If inquiring negative word into, then the emotion implication of described word is changed the antisense of original emotion implication.
Emotion tagging subelement 133, compares the quantity in described comment short sentence with positive emotion implication word and has The quantity of negative emotion implication word size between the two, if the former greatly, marks the emotional category of described comment short sentence for just Face emotion, if the emotional category that the latter greatly, marks described comment short sentence is negative emotion, if both are equal, the commentary of mark institute The emotional category of opinion short sentence is neutral emotion.
After network comment being carried out tag extraction by network comment tag extraction method disclosed in the present application, use layering The mode shown, shown in Figure 7, it is label area 1 on the left of the page, right side is comment region 2.Label area is at page open When show ground floor attribute word label " food (121.111), environment (245.152) ... ", user click on ground floor belong to Property word label " food (121.111) ", display food next layer of attribute word, click on wherein " wheaten food (5.4) ", then continue display Next layer of attribute word, last layer of attribute word of through last entity knowledge base design.Attribute word label contain user for The front of commodity and unfavorable ratings quantity, comment region shows the comment corresponding with clicking on attribute, and wherein front is evaluated and negative Evaluate and mark by different colors, it is simple to user browses.
For device embodiment, owing to it essentially corresponds to embodiment of the method, so relevant part sees method in fact The part executing example illustrates.Device embodiment described above is only schematically, wherein said as separating component The unit illustrated can be or may not be physically separate, and the parts shown as unit can be or can also It not physical location, i.e. may be located at a place, or can also be distributed on multiple NE.Can be according to reality Need to select some or all of module therein to realize the purpose of the present embodiment scheme.Those of ordinary skill in the art are not In the case of paying creative work, i.e. it is appreciated that and implements.
In this article, the relational terms of such as first and second or the like is used merely to an entity or operation with another One entity or operating space separate, and there is any this reality between not necessarily requiring or imply these entities or operating Relation or order.And, term " includes ", " comprising " or its any other variant are intended to the bag of nonexcludability Contain, so that include that the process of a series of key element, method, article or equipment not only include those key elements, but also include Other key elements being not expressly set out, or also include the key element intrinsic for this process, method, article or equipment. In the case of there is no more restriction, statement " including ... " key element limited, it is not excluded that including described key element Process, method, article or equipment in there is also other identical element.
In this specification, each embodiment uses the mode gone forward one by one to describe, and what each embodiment stressed is and other The difference of embodiment, between each embodiment, identical similar portion sees mutually.
Described above to the disclosed embodiments, makes professional and technical personnel in the field be capable of or uses the application. Multiple amendment to these embodiments will be apparent from for those skilled in the art, as defined herein General Principle can realize in the case of without departing from spirit herein or scope in other embodiments.Therefore, the application It is not intended to be limited to the embodiments shown herein, and is to fit to and principles disclosed herein and features of novelty phase one The widest scope caused.

Claims (10)

1. the tag extraction method of a network comment, it is characterised in that based on entity knowledge base, described entity knowledge base comprises The attribute word in multiple fields, described attribute word is for being labeled the comment object of comment short sentence, and described method includes:
Obtain network comment information;
For separator, described network comment information is split with punctuation mark, obtain some comment short sentences;
Mark the emotional category of each described comment short sentence, when the emotional category of described comment short sentence is labeled as neutral emotion, Terminate the mark of described comment short sentence, when the emotional category of described comment short sentence is labeled as positive emotion or negative emotion, mark Note the comment object of described comment short sentence;
Add up according to comment object, add up the comment short sentence quantity that emotional category in same comment object is positive emotion, And the comment short sentence quantity that emotional category is negative emotion;
Statistical result being extracted as label, described statistical result includes commenting on object, and described comment object is corresponding The comment short sentence quantity and the comment short sentence quantity of negative emotion of positive emotion.
Method the most according to claim 1, it is characterised in that the comment object of described mark described comment short sentence includes:
Described comment short sentence is carried out participle and part-of-speech tagging;
Judge in the attribute the word whether word that part of speech annotation results is noun exists described entity knowledge base;
If exist, then by comprise described word comment short sentence comment object marking be described word, if not existing, then distinguish Calculate described word and the similarity of each attribute word in described entity knowledge base;
It is the attribute word maximum with described Words similarity by the comment object marking of the comment short sentence comprising described word.
Method the most according to claim 2, it is characterised in that described described word and the described entity knowledge base of calculating respectively In each attribute word similarity before, also include:
Judge whether the number of repetition of word described in described network comment information exceedes default word frequency threshold value, if not less than, Then terminating the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence.
Method the most according to claim 2, it is characterised in that described described word and the described entity knowledge base of calculating respectively In each attribute word similarity after, also include:
Judge whether the maximum similarity value of described word and each attribute word exceedes default similarity threshold, if not less than, Then terminating the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence.
Method the most according to claim 1, it is characterised in that the emotional category bag of each described comment short sentence of described mark Include:
Described comment short sentence is carried out participle, obtains some words;
Judging whether described some words exist in sentiment dictionary, if not existing, then terminating the mark of described comment short sentence, if There is part not exist, then in the neighbouring several words exist word, inquire about negative word;
If inquiring negative word into, then the emotion implication of described word is changed the antisense of original emotion implication;
Relatively described comment short sentence has the quantity of positive emotion implication word and the quantity with negative emotion implication word Size between the two, if the quantity with positive emotion implication word is big, then the emotional category marking described comment short sentence is Positive emotion, if the quantity with negative emotion implication word is big, then the emotional category marking described comment short sentence is negative feelings Sense, if both are equal, the emotional category marking described comment short sentence is neutral emotion.
6. the tag extraction device of a network comment, it is characterised in that based on entity knowledge base, described entity knowledge base comprises The attribute word in multiple fields, described attribute word is for being labeled the comment object of comment short sentence, and described device includes:
Comment acquiring unit, is used for obtaining network comment information;
Comment cutting unit, for splitting described network comment information with punctuation mark for separator, obtains some commenting Opinion short sentence;
Emotion tagging unit, for marking the emotional category of each described comment short sentence, when the emotional category of described comment short sentence When being labeled as neutral emotion, terminate the mark of described comment short sentence;
Comment object marking unit, is used for when the emotional category of described comment short sentence is labeled as positive emotion or negative emotion, Mark the comment object of described comment short sentence;
Statistic unit, for adding up according to comment object, adding up emotional category in same comment object is positive emotion Comment short sentence quantity, and the comment short sentence quantity that emotional category is negative emotion;
Tag extraction unit, extracts statistical result as label, and described statistical result includes commenting on object, and described The comment short sentence quantity of the positive emotion that comment object is corresponding and the comment short sentence quantity of negative emotion.
Device the most according to claim 6, it is characterised in that described comment object marking unit, including:
Part-of-speech tagging subelement, for carrying out participle and part-of-speech tagging to described comment short sentence;
First judgment sub-unit, for judging whether the word that part of speech annotation results is noun exists the genus of described entity knowledge base Property word in, if exist, then by comprise described word comment short sentence comment object marking be described word, if not existing, then Calculate described word and the similarity of each attribute word in described entity knowledge base respectively;
Comment object marking subelement, for by the comment object marking of the comment comprising described word short sentence being and described word The attribute word that similarity is maximum.
Device the most according to claim 7, it is characterised in that described comment object marking unit, also includes:
Second judgment sub-unit, for judging described in described network comment information, whether the number of repetition of word exceedes default Word frequency threshold value, if not less than, then terminating the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence.
Method the most according to claim 7, it is characterised in that described comment object marking unit, also includes:
3rd judgment sub-unit, for judging whether the maximum similarity value of described word and each attribute word exceedes default phase Seemingly spend threshold value, if not less than, then terminating the mark of described comment short sentence, if exceeding, then continuing the mark of described comment short sentence.
Method the most according to claim 6, it is characterised in that described Emotion tagging unit, including:
Participle unit, for described comment short sentence is carried out participle, obtains some words;
4th judging unit, is used for judging whether described some words exist in sentiment dictionary, if not existing, then terminates institute The mark of commentary opinion short sentence, if there being part not exist, then inquires about negative word, if inquiring in the neighbouring several words exist word Negative word, then change by the emotion implication of described word as the antisense of original emotion implication into;
Emotion tagging subelement, compares the quantity in described comment short sentence with positive emotion implication word and has negative emotion The quantity of implication word size between the two, if having the quantity of positive emotion implication word greatly, marks described comment short sentence Emotional category be positive emotion, if there is the quantity of negative emotion implication word greatly, mark the emotion class of described comment short sentence Not Wei negative emotion, if both are equal, mark the emotional category of described comment short sentence for neutral emotion.
CN201610143169.1A 2016-03-14 2016-03-14 Label extracting method and device for network comments Pending CN105824898A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610143169.1A CN105824898A (en) 2016-03-14 2016-03-14 Label extracting method and device for network comments

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610143169.1A CN105824898A (en) 2016-03-14 2016-03-14 Label extracting method and device for network comments

Publications (1)

Publication Number Publication Date
CN105824898A true CN105824898A (en) 2016-08-03

Family

ID=56988091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610143169.1A Pending CN105824898A (en) 2016-03-14 2016-03-14 Label extracting method and device for network comments

Country Status (1)

Country Link
CN (1) CN105824898A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247709A (en) * 2017-07-28 2017-10-13 广州多益网络股份有限公司 The optimization method and system of a kind of encyclopaedia entry label
CN107436922A (en) * 2017-07-05 2017-12-05 北京百度网讯科技有限公司 Text label generation method and device
CN107491531A (en) * 2017-08-18 2017-12-19 华南师范大学 Chinese network comment sensibility classification method based on integrated study framework
CN108038725A (en) * 2017-12-04 2018-05-15 中国计量大学 A kind of electric business Customer Satisfaction for Product analysis method based on machine learning
CN109522412A (en) * 2018-11-14 2019-03-26 北京神州泰岳软件股份有限公司 Text emotion analysis method, device and medium
CN109684641A (en) * 2018-12-26 2019-04-26 广东工业大学 A kind of data extraction device, method, electronic equipment and storage medium
CN109829033A (en) * 2017-11-23 2019-05-31 阿里巴巴集团控股有限公司 Method for exhibiting data and terminal device
CN109885687A (en) * 2018-12-29 2019-06-14 深兰科技(上海)有限公司 A kind of sentiment analysis method, apparatus, electronic equipment and the storage medium of text
KR20190104154A (en) * 2017-01-18 2019-09-06 알리바바 그룹 홀딩 리미티드 How to display service objects, how to handle map data, clients and servers
CN110378725A (en) * 2019-06-28 2019-10-25 联想(北京)有限公司 A kind of information processing method, terminal and storage medium
CN110490663A (en) * 2019-08-23 2019-11-22 联想(北京)有限公司 A kind of data processing method, device and electronic equipment
CN112215014A (en) * 2020-10-13 2021-01-12 平安国际智慧城市科技股份有限公司 Portrait generation method, apparatus, medium and device based on user comment
CN112800180A (en) * 2021-02-04 2021-05-14 北京易车互联信息技术有限公司 Automatic extraction scheme of comment text labels
CN114398473A (en) * 2022-01-19 2022-04-26 平安国际智慧城市科技股份有限公司 Enterprise portrait generation method, device, server and storage medium
CN114692644A (en) * 2022-03-11 2022-07-01 粤港澳大湾区数字经济研究院(福田) Text entity labeling method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678335A (en) * 2012-09-05 2014-03-26 阿里巴巴集团控股有限公司 Method and device for identifying commodity with labels and method for commodity navigation
CN104933130A (en) * 2015-06-12 2015-09-23 百度在线网络技术(北京)有限公司 Comment information marking method and comment information marking device
CN105095288A (en) * 2014-05-14 2015-11-25 腾讯科技(深圳)有限公司 Data analysis method and data analysis device
CN105354183A (en) * 2015-10-19 2016-02-24 Tcl集团股份有限公司 Analytic method, apparatus and system for internet comments of household electrical appliance products

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678335A (en) * 2012-09-05 2014-03-26 阿里巴巴集团控股有限公司 Method and device for identifying commodity with labels and method for commodity navigation
CN105095288A (en) * 2014-05-14 2015-11-25 腾讯科技(深圳)有限公司 Data analysis method and data analysis device
CN104933130A (en) * 2015-06-12 2015-09-23 百度在线网络技术(北京)有限公司 Comment information marking method and comment information marking device
CN105354183A (en) * 2015-10-19 2016-02-24 Tcl集团股份有限公司 Analytic method, apparatus and system for internet comments of household electrical appliance products

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190104154A (en) * 2017-01-18 2019-09-06 알리바바 그룹 홀딩 리미티드 How to display service objects, how to handle map data, clients and servers
JP7175276B2 (en) 2017-01-18 2022-11-18 アリババ・グループ・ホールディング・リミテッド Method, Client and Server for Displaying Service Objects and Processing Map Data
KR102446246B1 (en) * 2017-01-18 2022-09-22 알리바바 그룹 홀딩 리미티드 Service object display method, map data processing method, client and server
JP2020509453A (en) * 2017-01-18 2020-03-26 アリババ・グループ・ホールディング・リミテッドAlibaba Group Holding Limited Method for displaying service objects and processing map data, client and server
CN107436922A (en) * 2017-07-05 2017-12-05 北京百度网讯科技有限公司 Text label generation method and device
CN107436922B (en) * 2017-07-05 2021-06-08 北京百度网讯科技有限公司 Text label generation method and device
CN107247709B (en) * 2017-07-28 2021-03-16 广州多益网络股份有限公司 Encyclopedic entry label optimization method and system
CN107247709A (en) * 2017-07-28 2017-10-13 广州多益网络股份有限公司 The optimization method and system of a kind of encyclopaedia entry label
CN107491531B (en) * 2017-08-18 2019-05-17 华南师范大学 Chinese network comment sensibility classification method based on integrated study frame
CN107491531A (en) * 2017-08-18 2017-12-19 华南师范大学 Chinese network comment sensibility classification method based on integrated study framework
CN109829033B (en) * 2017-11-23 2023-04-18 阿里巴巴集团控股有限公司 Data display method and terminal equipment
CN109829033A (en) * 2017-11-23 2019-05-31 阿里巴巴集团控股有限公司 Method for exhibiting data and terminal device
CN108038725A (en) * 2017-12-04 2018-05-15 中国计量大学 A kind of electric business Customer Satisfaction for Product analysis method based on machine learning
CN109522412A (en) * 2018-11-14 2019-03-26 北京神州泰岳软件股份有限公司 Text emotion analysis method, device and medium
CN109522412B (en) * 2018-11-14 2021-02-26 鼎富智能科技有限公司 Text emotion analysis method, device and medium
CN109684641A (en) * 2018-12-26 2019-04-26 广东工业大学 A kind of data extraction device, method, electronic equipment and storage medium
CN109684641B (en) * 2018-12-26 2023-04-07 广东工业大学 Data extraction device and method, electronic equipment and storage medium
CN109885687A (en) * 2018-12-29 2019-06-14 深兰科技(上海)有限公司 A kind of sentiment analysis method, apparatus, electronic equipment and the storage medium of text
CN110378725A (en) * 2019-06-28 2019-10-25 联想(北京)有限公司 A kind of information processing method, terminal and storage medium
CN110490663A (en) * 2019-08-23 2019-11-22 联想(北京)有限公司 A kind of data processing method, device and electronic equipment
CN112215014A (en) * 2020-10-13 2021-01-12 平安国际智慧城市科技股份有限公司 Portrait generation method, apparatus, medium and device based on user comment
CN112800180A (en) * 2021-02-04 2021-05-14 北京易车互联信息技术有限公司 Automatic extraction scheme of comment text labels
CN114398473A (en) * 2022-01-19 2022-04-26 平安国际智慧城市科技股份有限公司 Enterprise portrait generation method, device, server and storage medium
CN114692644A (en) * 2022-03-11 2022-07-01 粤港澳大湾区数字经济研究院(福田) Text entity labeling method, device, equipment and storage medium
CN114692644B (en) * 2022-03-11 2024-06-11 粤港澳大湾区数字经济研究院(福田) Text entity labeling method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN105824898A (en) Label extracting method and device for network comments
Barbosa et al. Robust sentiment detection on twitter from biased and noisy data
CN103729359B (en) A kind of method and system recommending search word
CN104102626B (en) A kind of method for short text Semantic Similarity Measurement
CN108628833B (en) Method and device for determining summary of original content and method and device for recommending original content
CN104008186B (en) The method and apparatus that keyword is determined from target text
CN104111941B (en) The method and apparatus that information is shown
CN105868185A (en) Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis
CN104715049B (en) Comment on commodity attribute word abstracting method based on body dictionary
CN103377249A (en) Keyword putting method and system
CN103365904B (en) A kind of advertising message searching method and system
CN106294425A (en) The automatic image-text method of abstracting of commodity network of relation article and system
CN105468649B (en) Method and device for judging matching of objects to be displayed
CN104751354B (en) A kind of advertisement crowd screening technique
CN108763321A (en) A kind of related entities recommendation method based on extensive related entities network
CN106776860A (en) One kind search abstraction generating method and device
CN109960756A (en) Media event information inductive method
CN105630768A (en) Cascaded conditional random field-based product name recognition method and device
CN105740382A (en) Aspect classification method for short comment texts
CN110321549B (en) New concept mining method based on sequential learning, relation mining and time sequence analysis
CN105955957A (en) Determining method and device for aspect score in general comment of merchant
CN110399614A (en) System and method for the identification of true product word
Meng et al. Mining user reviews: from specification to summarization
CN106339898A (en) Product innovation method based on internet big data
Wu et al. Keyword extraction for contextual advertisement

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Chen Wenliang

Inventor after: Ma Chunping

Inventor after: Zhang Min

Inventor before: Chen Wenliang

Inventor before: Ma Chunping

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160803