CN107480257A - Product feature extracting method based on pattern match - Google Patents

Product feature extracting method based on pattern match Download PDF

Info

Publication number
CN107480257A
CN107480257A CN201710694361.4A CN201710694361A CN107480257A CN 107480257 A CN107480257 A CN 107480257A CN 201710694361 A CN201710694361 A CN 201710694361A CN 107480257 A CN107480257 A CN 107480257A
Authority
CN
China
Prior art keywords
word
product feature
governing
centre
product
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710694361.4A
Other languages
Chinese (zh)
Inventor
徐新胜
林静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Jiliang University
Original Assignee
China Jiliang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Jiliang University filed Critical China Jiliang University
Priority to CN201710694361.4A priority Critical patent/CN107480257A/en
Publication of CN107480257A publication Critical patent/CN107480257A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The present invention proposes a kind of product feature extracting method based on pattern match, comprises the following steps that:1, comment corpus obtains;2, Chinese natural language processing;3, product feature extraction.Five criterions that the product feature that the innovative point of whole method proposes in step 3 must is fulfilled for, step 1 and 2 be product feature extraction element task.The present invention is intended to provide a kind of convenient, efficient method extraction product feature, it is the expansion to existing product feature extracting method.Using the present invention, researchers can fast and effectively carry out product feature extraction, while improve accuracy rate, recall rate and the F values of product feature extraction.

Description

Product feature extracting method based on pattern match
Technical field:
The invention belongs to text mining field, is related to a kind of product feature extracting method based on pattern match, is a kind of Unsupervised product feature extracting method.
Background technology:
With the development of network technology and the variation of network english teaching, people can pass through electronic product whenever and wherever possible Obtain or sharing information, the Web2.0 epoch of customer-centric have come quietly.Modern life rhythm is fast, live load compared with Weight, shopping at network with its it is convenient, fast the characteristics of attract increasing people pass through internet buy product, therefore, electronics business Business has obtained vigorous growth in China.End in December, 2016, Chinese netizen's scale is up to 7.31 hundred million people, Internet penetration 53.2%, wherein customers scale reaches 4.67 hundred million, accounts for netizen's ratio as 63.8%.Manufacturing enterprise and electric business in order to The market situation of product is preferably grasped, e-commerce website typically all allows consumer to deliver the related comment of product.These Contain abundant, valuable information in product review text, effectively can help manufacturing enterprise using these comment texts Improve the design of product, lift the quality of product, improve the market competitiveness, electric business can also be helped to take suitable operation sale Strategy, extend volume growth.
In order to provide more automation, intelligentized text-mining tool to manufacturing enterprise and electric business, domestic and international expert learns Person has carried out substantial amounts of research.For the excavation and utilization of English network comment text, external brainstrust proposes a variety of effective Method for digging, achieve huge achievement in research.And Chinese network comment text is excavated and started late, at present, text mining Research work be concentrated mainly on product feature extraction, comment feeling polarities and intensity judge, comment Result analysis On.Wherein, product feature extraction is the element task of product review text mining, and the quality of the product feature of extraction is direct Have influence on the effect of follow-up study work.
The present invention proposes a kind of product feature extracting method based on pattern match, is a kind of extraction side of unsupervised type Method, it can improve accuracy rate, recall rate and the F values of product feature extraction.
The content of the invention:
In order to quickly and efficiently extract real product feature from magnanimity, multi-source heterogeneous product review text, this Invention provides a kind of product feature extracting method based on pattern match, is a kind of efficient, easily product feature extraction Method, and the expansion to existing product feature extracting method.
The technical solution adopted for the present invention to solve the technical problems such as the description below:
Product feature extracting method based on pattern match, it is characterised in that:This method comprises the steps:
Step 1, corpus is commented on to obtain:Using web crawlers instrument, some is gathered from large-scale electric business platform and specifies production The product of product uses comment information, and is saved in local data base, and then the comment information of preservation is pre-processed, and reduces number Noise in, obtain true, reliable, non-structured comment corpus;
Step 2, Chinese natural language is handled:Comment language material is carried out for the first time respectively using Chinese natural language handling implement Participle and part-of-speech tagging, new word identification, optimize the operation such as participle and part-of-speech tagging, syntactic analysis and sentiment analysis, obtain structure The sentiment analysis result of change is simultaneously saved in database;
Step 3, product feature is extracted:Five criterions of product feature are defined, according to this five criterions to sentiment analysis knot Fruit carries out product feature mark, and extraction is labeled as the word of product feature, generates product feature set.
In the above-mentioned product feature extracting method based on pattern match, in described step 1, due to opening for network The diversification of putting property and network comment, discreteness so that contain in the network comment text captured from electric business platform and largely make an uproar Sound, if directly carrying out product feature extraction to it, acquired results may produce relatively large deviation with actual.So in order to obtain Meet actual result, original comment set need to be filtered and cleaned, reduce noise.Wherein, data prediction includes deleting Except blank, useless comment, punctuation mark unnecessary in comment is deleted, deletes the word of redundancy in comment, number of words is deleted and is less than 4 The comment of word, changes wrong word, and simplified Chinese character replaces the complex form of Chinese characters, deletes comment of redundancy etc..
In the above-mentioned product feature extracting method based on pattern match, in described step 3, product feature five Criterion is specific as follows:
First, product feature can not be off word;
2nd, product feature is the noun or noun phrase in the comment numerous appearance of language material intermediate frequency;
3rd, product feature and the dependence of governing word are " SBV ", and governing word is emotion word;
4th, product feature is to meet the word of seven decimation rules;
5th, product feature is the domain term of non-single word.
In the above-mentioned product feature extracting method based on pattern match, seven decimation rules that product feature meets can It is different by centre word part of speech, it is divided into two major classes, is specifically described as:
First, when centre word part of speech is adjective,
1. when the relation of word and centre word is " SBV ", i.e. when the governing word of word is exactly centre word, then the word is product spy Sign;2. the direct dependence of " COO " when the governing word of word is not centre word, but between governing word and centre word be present, then The word is product feature;3. when the governing word of word is not centre word, but in the presence of the indirect of " COO " between governing word and centre word Dependence, then the word is product feature;
2nd, when centre word part of speech is not centre word for the governing word of verb and word,
4. when the direct dependence that " COO " between the governing word and centre word of word be present, then the word is product feature;⑤ When the direct dependence that " VOB " between the governing word and centre word of word be present, then the word is product feature;6. when the domination of word The indirect dependence of " COO " between word and centre word be present, then the word is product feature;7. when the governing word and centre word of word Between exist " VOB " indirect dependence, then the word is product feature.
The present invention can obtain magnanimity using web crawlers instrument from electric business platform website, multi-source heterogeneous product uses Comment text, by shallow-layer, the Chinese text information processing technology of deep layer so that non-structured data become the number of structuring According to, and carry out product feature mark and extraction using five criterions of definition.Using the method for the present invention, researchers can be fast Speed, effective accuracy rate, recall rate and the F values for carrying out the extraction of product feature, while improving product feature extraction.
Brief description of the drawings:
Fig. 1 is the overall flow figure of the present invention.
Fig. 2 is the product feature extractive technique route map of the present invention.
Fig. 3 is caused result field variation diagram in product feature extraction process of the invention.
The comment corpus that Fig. 4 is the present invention obtains flow chart.
Fig. 5 is the syntactic analysis result case diagram of the comment sentence of the present invention.
Fig. 6 is the dependency relationship type expression figure between the word and word of the present invention.
Fig. 7 is seven decimation rule figures of the product feature of the present invention.
Fig. 8 is the portioned product feature annotation results figure of the present invention.
Embodiment:
With reference to specific accompanying drawing, the present invention is further illustrated.
The present invention is to carry out information scratching to large-scale electric business platform by web crawlers instrument, obtains magnanimity, multi-source heterogeneous Chinese network user comment text, and Chinese natural language processing is carried out to it, products is extracted according to the five of definition criterions Feature, improve accuracy rate, recall rate and the F values of product feature extraction.
Product feature extracting method based on pattern match, including comment corpus obtain, Chinese natural language processing and Product feature extracts these three steps, as shown in Figure 1.
Technology and its technology path involved by product feature extracting method based on pattern match is as shown in Fig. 2 Fig. 2 is gone back Denote caused result after every kind of technology use.Wherein, data acquisition and data prediction are used in step 1 of the present invention Technology;First participle and its part-of-speech tagging, optimization participle and its part-of-speech tagging, syntactic analysis, sentiment analysis are then nature languages Speech processing basic technology, is the technology in step 2, and product feature mark and extraction are the technologies of step 3.
Caused result and its field change in the whole extraction process of product feature extracting method based on pattern match, such as Shown in Fig. 3.Comment on and there was only two fields, respectively sequence number and comment text in corpus;First participle and part-of-speech tagging result, Optimization participle and part-of-speech tagging result have 3 fields, respectively sequence number, morphology and part of speech;Syntactic analysis result has 6 words Section, respectively sequence number, morphology, part of speech, dependence, governing word and governing word part of speech;Sentiment analysis result has 7 fields, point Wei not sequence number, morphology, part of speech, dependence, governing word, governing word part of speech and emotion mark;Product feature annotation results have 8 Field, respectively sequence number, morphology, part of speech, dependence, governing word, governing word part of speech, emotion mark and product signature; Product feature set has two fields, respectively sequence number and product feature.
This each step is described in detail separately below.
Step 1, corpus is commented on to obtain:Using web crawlers instrument, some is gathered from large-scale electric business platform and specifies production The product of product uses comment information, and is saved in local data base, and then the comment information of preservation is pre-processed, and reduces number Noise in, obtain true, reliable, non-structured comment corpus.
It is as shown in Figure 4 to comment on the process that corpus obtains.That formulates web crawlers instrument crawls rule, treats the big of crawl Type electric business platform carries out data grabber, and the result of crawl is stored into local data base, turns into original comment text;To original Comment text carries out data prediction, generation comment corpus, is also stored into database.
Wherein, due to the opening of network and diversification, the discreteness of network comment so that captured from electric business platform Contain a large amount of noises in network comment text, if directly carrying out text mining to it, acquired results may with it is actual produce compared with Large deviation.So meeting actual result to obtain, original comment set need to be filtered and cleaned, reduce noise.In advance Processing includes deleting blank, useless comment, deletes punctuation mark unnecessary in comment, deletes the word of redundancy in comment, deletes Except comment of the number of words less than 4 words, modification wrong word, simplified Chinese character replaces the complex form of Chinese characters, the comment for deleting redundancy etc..
Step 2, Chinese natural language is handled:Comment language material is carried out for the first time respectively using Chinese natural language handling implement Participle and part-of-speech tagging, new word identification, optimize the operation such as participle and part-of-speech tagging, syntactic analysis and sentiment analysis, obtain structure The sentiment analysis result of change is simultaneously saved in database.
2.1) participle and part-of-speech tagging
Comment of the client feedback on electric business platform is for the purpose of exchanging and share, and is the unstructured of textual form Natural language, to therefrom excavate valuable information, then need that it is converted into structural data by participle technique.To commenting The instrument that The Analects of Confucius material carries out segmenting use is ICTCLAS, and the instrument of part-of-speech tagging use is carried out to the comment language material after participle It is ICTCLAS, in order to improve the precision ratio of product feature extraction, the part-of-speech tagging method of selection is to mark out more specific situation Two level mark.
With the fast development of society, there are many new words.The Chinese that these neologisms can not updated point Word device correctly identifies, during participle, it by mistakenly separate, such as, " cost performance " can by ICTCLAS be divided into " property ", Three words of " valency ", " ratio ".In order to solve this problem, the accuracy rate of participle is improved, we will carry out new to first word segmentation result Word is found, the field neologisms of identification are added in user-oriented dictionary, recycles ICTCLAS to optimize participle to comment corpus And two level part-of-speech tagging.
New word discovery process includes construction repeated strings, frequency filter, cohesion filtering and left and right entropy filtering FOUR EASY STEPS.Its In, construction repeated strings are to utilize N-Gram algorithms, and combined filtering vocabulary, filtering part of speech vocabulary, stop words etc. exclude vocabulary and carried out The construction of repeated strings;Frequency filter is to filter out repeated strings of the frequency less than a certain threshold value;Cohesion filtering is by cohesion Value filters out less than the repeated strings of a certain threshold value, the mutual information (Mutual of the cohesion repeated strings of repeated strings Information, MI) represent, the mutual information calculation formula of repeated strings is:
Wherein, x, y represent composition repeated strings R 2 substrings, PxyRepresent what repeated strings R occurred in first word segmentation result Probability, Px, PyRepresent the probability that substring x, y individually occur in first word segmentation result;Left and right entropy filtering is that left entropy or right entropy is low Filtered out in the repeated strings of threshold value, the left entropy of repeated strings, right entropy calculation formula are respectively:
Wherein, p (a | R) represents that word string a is the probability of repeated strings R left adjacent word, and p (b | R) represent that word string b is repeated strings The probability of R right adjacent word.
2.2) syntactic analysis
Interdependent syntactic analysis is one of key technology in natural language processing, is to identify " SVO " in sentence, " fixed The grammatical items such as shape benefit ", and analyze the technology of relation between each composition.Herein using the language technology platform of Harbin Institute of Technology's research and development (Language Technology Platform, LTP) determines the dependence between each composition in sentence.Due to ICTCLAS It is different with the part-of-speech tagging collection that LTP is used, before interdependent syntactic analysis is carried out, first carry out part-of-speech tagging collection conversion.
Fig. 5 is the interdependent syntactic analysis result of a comment sentence, can from the interdependent syntactic analysis result in Fig. 5 Go out, dependence directly occurs between the word and word in sentence, and a dependence connects two words, one of them cries domination Word, another is dependent.Dependence is represented with a directed arc, is interdependent arc, and the direction of interdependent arc is by governing word Point to dependent.There is individual mark on each interdependent arc, be relationship type, represent which type of be present between governing word and dependent Dependence.Dependent, relationship type and governing word composition one are interdependent right, i.e., dependent depends on domination with dependence Word.As shown in figure 5, (mobile phone, SBV, good) is one interdependent right, " mobile phone " is dependent, and " good " is governing word, " SBV " It is the dependence for representing to exist between " mobile phone " and " good " " SBV ", this is interdependent to representing that " mobile phone " is depended on " SBV " " good ".
Wherein, the centre word of sentence is not dominated by other any compositions, i.e., is " HED " with the dependence of " Root " Word centered on word.In Fig. 5, " good " and the dependence of " Root " are " HED ", then " good " is the center of this comment Word.
2.3) sentiment analysis
By analyzing the Chinese network comment text of the homologous isomery of magnanimity, the comment of user feedback is user to purchase Commodity in-service evaluation, generally express the viewpoint of oneself with adjective, noun or verb.Arrange herein and generate a feelings Word dictionary is felt, for judging whether the governing word of each word in syntactic analysis result is emotion word, if the governing word of certain word is feelings Feel word, then the emotion mark isOp of the word is designated as " Y ", conversely, being designated as " N ".
Step 3, product feature is extracted:Five criterions of product feature are defined, according to this five criterions to sentiment analysis knot Fruit carries out product feature mark, and extraction is labeled as the word of product feature, generates product feature set.
In Chinese product review, the dependence between two words is extremely complex, and we define two kinds of dependences The grammer that type is come between descriptor and word contacts, respectively direct dependence and indirect dependence.Wherein, it is directly interdependent Relation:Represent that a word directly depends on another word, as shown in (a) in Fig. 6, A directly depends on B with dependence;Between Connect dependence:Represent that a word depends on another word by one or more medium terms, such as (b) and (c) institute in Fig. 6 Show, A directly depends on medium term with dependence, and medium term directly depends on B with one or more " COO " again, i.e. A indirectly according to It is stored in B.
3.1) product feature marks
By analyzing substantial amounts of Chinese comment text, summing up product feature needs to meet following five criterions:
First, product feature can not be off word
Stop words is usually that frequency of use is very high, but has no its meaning in itself, only puts it into a complete sentence Just there is the word of certain effect in son, such as " ", " ", " and " etc..And product feature is notional word, there is lexical meaning and language Adopted meaning, syntactic constituent can be served as in sentence.So product feature is unlikely to be stop words.
2nd, product feature is the noun or noun phrase in the comment numerous appearance of language material intermediate frequency
3rd, product feature and the dependence of governing word are " SBV ", and governing word is emotion word
4th, product feature is to meet the word of seven decimation rules
5th, product feature is the domain term of non-single word
Wherein, seven decimation rules in criterion four, it is that we combine definition and the sentiment analysis knot of dependency relationship type Fruit, according to direct dependence or indirect dependence between the governing word and centre word of word be present, sum up what is come, such as Fig. 7 It is shown.
This seven rules can be different by centre word part of speech, are divided into two major classes, are specifically described as:
(1) when centre word part of speech is adjective
1. when the relation of certain word and centre word is " SBV ", i.e. when the governing word of certain word is exactly centre word, then the word is product Feature, as shown in (a) in Fig. 7.
2. when the governing word of certain word is not centre word, but in the presence of the direct interdependent of " COO " between governing word and centre word Relation, then the word is product feature, as shown in (b) in Fig. 7.
3. when the governing word of certain word is not centre word, but in the presence of the indirect interdependent of " COO " between governing word and centre word Relation, then the word is product feature, as shown in (c) in Fig. 7.
(2) when centre word part of speech is verb
4. when the direct dependence that " COO " between the governing word and centre word of word be present, then the word is product feature, such as Shown in (d) in Fig. 7.
5. when the direct dependence that " VOB " between the governing word and centre word of word be present, then the word is product feature, such as Shown in (f) in Fig. 3.
6. when the indirect dependence that " COO " between the governing word and centre word of word be present, then the word is product feature, such as Shown in (e) in Fig. 3.
7. when the indirect dependence that " VOB " between the governing word and centre word of word be present, then the word is product feature, such as Shown in (g) in Fig. 3.
Fig. 8 is portioned product feature annotation results, shares 8 fields.Wherein, no represents sequence number, and tk represents morphology, pos Part of speech is represented, pRel represents dependence, and pWd represents governing word, and pPos represents governing word part of speech, and isOp represents emotion mark, IsPF represents product signature.Morphology and part of speech are that participle and part-of-speech tagging generate, dependence, governing word and governing word Part of speech is syntactic analysis generation, and emotion mark is that sentiment analysis generates, and product signature is product feature mark generation 's.
3.2) product feature is extracted
The word that product feature is labeled as in product feature mark set is extracted, generates product feature set.
The present invention can utilize web crawlers instrument to capture user comment related to appointed product on large-scale electric business platform Text, and a series of processing are carried out to it, product feature mark and extraction are carried out according to the five of definition criterions, generation product is special Collection is closed.Using the method for the present invention, we can efficiently, efficiently carry out product feature extraction, and improve product feature Accuracy rate, recall rate and the F values of extraction.

Claims (3)

1. the product feature extracting method based on pattern match, it is characterised in that:This method comprises the steps:
Step 1:Corpus is commented on to obtain
Using web crawlers instrument, the product that some appointed product is gathered from large-scale electric business platform uses comment information, and protects Be stored to local data base, then the comment information of preservation pre-processed, reduce data in noise, obtain it is true, reliable, Non-structured comment corpus;
Step 2:Chinese natural language processing
Comment language material is segmented for the first time respectively using Chinese natural language handling implement and part-of-speech tagging, new word identification, excellent Change the operation such as participle and part-of-speech tagging, syntactic analysis and sentiment analysis, obtain the sentiment analysis result of structuring and be saved in number According in storehouse;
Step 3:Product feature is extracted
Five criterions of product feature are defined, product feature mark, extraction are carried out to sentiment analysis result according to this five criterions The word of product feature is labeled as, generates product feature set.
2. the product feature extracting method based on pattern match as claimed in claim 1, it is characterised in that:In step 3, product Five criterions of feature are specific as follows:
First, product feature can not be off word;
2nd, product feature is the noun or noun phrase in the comment numerous appearance of language material intermediate frequency;
3rd, product feature and the dependence of governing word are " SBV ", and governing word is emotion word;
4th, product feature is to meet the word of seven decimation rules;
5th, product feature is the domain term of non-single word.
3. the product feature extracting method based on pattern match as claimed in claim 2, it is characterised in that:Product feature meets Seven decimation rules can be different by centre word part of speech, be divided into two major classes, be specifically described as:
First, when centre word part of speech is adjective,
1. when the relation of word and centre word is " SBV ", i.e. when the governing word of word is exactly centre word, then the word is product feature;② The direct dependence of " COO " when the governing word of word is not centre word, but between governing word and centre word be present, then the word is Product feature;3. the indirect interdependent pass of " COO " when the governing word of word is not centre word, but between governing word and centre word be present System, then the word is product feature;
2nd, when centre word part of speech is not centre word for the governing word of verb and word,
4. when the direct dependence that " COO " between the governing word and centre word of word be present, then the word is product feature;5. work as word Governing word and centre word between exist " VOB " direct dependence, then the word is product feature;6. when word governing word with The indirect dependence of " COO " between centre word be present, then the word is product feature;7. when between the governing word and centre word of word In the presence of the indirect dependence of " VOB ", then the word is product feature.
CN201710694361.4A 2017-08-14 2017-08-14 Product feature extracting method based on pattern match Pending CN107480257A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710694361.4A CN107480257A (en) 2017-08-14 2017-08-14 Product feature extracting method based on pattern match

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710694361.4A CN107480257A (en) 2017-08-14 2017-08-14 Product feature extracting method based on pattern match

Publications (1)

Publication Number Publication Date
CN107480257A true CN107480257A (en) 2017-12-15

Family

ID=60600424

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710694361.4A Pending CN107480257A (en) 2017-08-14 2017-08-14 Product feature extracting method based on pattern match

Country Status (1)

Country Link
CN (1) CN107480257A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101189640A (en) * 2005-05-31 2008-05-28 日本电气株式会社 Pattern collation method, pattern collation system, and pattern collation program
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
US20150186912A1 (en) * 2010-06-07 2015-07-02 Affectiva, Inc. Analysis in response to mental state expression requests
CN106649260A (en) * 2016-10-19 2017-05-10 中国计量大学 Product feature structure tree construction method based on comment text mining
CN106708966A (en) * 2016-11-29 2017-05-24 中国计量大学 Similarity calculation-based junk comment detection method
CN106776574A (en) * 2016-12-28 2017-05-31 Tcl集团股份有限公司 User comment text method for digging and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101189640A (en) * 2005-05-31 2008-05-28 日本电气株式会社 Pattern collation method, pattern collation system, and pattern collation program
CN101751458A (en) * 2009-12-31 2010-06-23 暨南大学 Network public sentiment monitoring system and method
US20150186912A1 (en) * 2010-06-07 2015-07-02 Affectiva, Inc. Analysis in response to mental state expression requests
CN106649260A (en) * 2016-10-19 2017-05-10 中国计量大学 Product feature structure tree construction method based on comment text mining
CN106708966A (en) * 2016-11-29 2017-05-24 中国计量大学 Similarity calculation-based junk comment detection method
CN106776574A (en) * 2016-12-28 2017-05-31 Tcl集团股份有限公司 User comment text method for digging and device

Similar Documents

Publication Publication Date Title
CN107797991B (en) Dependency syntax tree-based knowledge graph expansion method and system
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN104834735B (en) A kind of documentation summary extraction method based on term vector
Gokulakrishnan et al. Opinion mining and sentiment analysis on a twitter data stream
Cetto et al. Graphene: Semantically-linked propositions in open information extraction
CN102254014B (en) Adaptive information extraction method for webpage characteristics
JP6403382B2 (en) Phrase pair collection device and computer program therefor
CN107193801A (en) A kind of short text characteristic optimization and sentiment analysis method based on depth belief network
CN105528437B (en) A kind of question answering system construction method extracted based on structured text knowledge
CN104268160A (en) Evaluation object extraction method based on domain dictionary and semantic roles
CN105608232B (en) A kind of bug knowledge modeling method based on graphic data base
CN103942340A (en) Microblog user interest recognizing method based on text mining
JP5907393B2 (en) Complex predicate template collection device and computer program therefor
Mori et al. A machine learning approach to recipe text processing
EP3483747A1 (en) Preserving and processing ambiguity in natural language
CN105740227A (en) Genetic simulated annealing method for solving new words in Chinese segmentation
Boström Miljörörelsens mångfald
CN104346382B (en) Use the text analysis system and method for language inquiry
CN106055633A (en) Chinese microblog subjective and objective sentence classification method
Ghanem et al. Stemming effectiveness in clustering of Arabic documents
Jia et al. A Chinese unknown word recognition method for micro-blog short text based on improved FP-growth
CN103440343A (en) Knowledge base construction method facing domain service target
CN110032738A (en) Microblogging text normalization method based on context graph random walk and phonetic-stroke code
Dahir et al. Utilizing machine learning for sentiment analysis of IMDB movie review data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination