CN107480257A - Product feature extracting method based on pattern match - Google Patents
Product feature extracting method based on pattern match Download PDFInfo
- Publication number
- CN107480257A CN107480257A CN201710694361.4A CN201710694361A CN107480257A CN 107480257 A CN107480257 A CN 107480257A CN 201710694361 A CN201710694361 A CN 201710694361A CN 107480257 A CN107480257 A CN 107480257A
- Authority
- CN
- China
- Prior art keywords
- word
- product feature
- governing
- centre
- product
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Abstract
The present invention proposes a kind of product feature extracting method based on pattern match, comprises the following steps that:1, comment corpus obtains;2, Chinese natural language processing;3, product feature extraction.Five criterions that the product feature that the innovative point of whole method proposes in step 3 must is fulfilled for, step 1 and 2 be product feature extraction element task.The present invention is intended to provide a kind of convenient, efficient method extraction product feature, it is the expansion to existing product feature extracting method.Using the present invention, researchers can fast and effectively carry out product feature extraction, while improve accuracy rate, recall rate and the F values of product feature extraction.
Description
Technical field:
The invention belongs to text mining field, is related to a kind of product feature extracting method based on pattern match, is a kind of
Unsupervised product feature extracting method.
Background technology:
With the development of network technology and the variation of network english teaching, people can pass through electronic product whenever and wherever possible
Obtain or sharing information, the Web2.0 epoch of customer-centric have come quietly.Modern life rhythm is fast, live load compared with
Weight, shopping at network with its it is convenient, fast the characteristics of attract increasing people pass through internet buy product, therefore, electronics business
Business has obtained vigorous growth in China.End in December, 2016, Chinese netizen's scale is up to 7.31 hundred million people, Internet penetration
53.2%, wherein customers scale reaches 4.67 hundred million, accounts for netizen's ratio as 63.8%.Manufacturing enterprise and electric business in order to
The market situation of product is preferably grasped, e-commerce website typically all allows consumer to deliver the related comment of product.These
Contain abundant, valuable information in product review text, effectively can help manufacturing enterprise using these comment texts
Improve the design of product, lift the quality of product, improve the market competitiveness, electric business can also be helped to take suitable operation sale
Strategy, extend volume growth.
In order to provide more automation, intelligentized text-mining tool to manufacturing enterprise and electric business, domestic and international expert learns
Person has carried out substantial amounts of research.For the excavation and utilization of English network comment text, external brainstrust proposes a variety of effective
Method for digging, achieve huge achievement in research.And Chinese network comment text is excavated and started late, at present, text mining
Research work be concentrated mainly on product feature extraction, comment feeling polarities and intensity judge, comment Result analysis
On.Wherein, product feature extraction is the element task of product review text mining, and the quality of the product feature of extraction is direct
Have influence on the effect of follow-up study work.
The present invention proposes a kind of product feature extracting method based on pattern match, is a kind of extraction side of unsupervised type
Method, it can improve accuracy rate, recall rate and the F values of product feature extraction.
The content of the invention:
In order to quickly and efficiently extract real product feature from magnanimity, multi-source heterogeneous product review text, this
Invention provides a kind of product feature extracting method based on pattern match, is a kind of efficient, easily product feature extraction
Method, and the expansion to existing product feature extracting method.
The technical solution adopted for the present invention to solve the technical problems such as the description below:
Product feature extracting method based on pattern match, it is characterised in that:This method comprises the steps:
Step 1, corpus is commented on to obtain:Using web crawlers instrument, some is gathered from large-scale electric business platform and specifies production
The product of product uses comment information, and is saved in local data base, and then the comment information of preservation is pre-processed, and reduces number
Noise in, obtain true, reliable, non-structured comment corpus;
Step 2, Chinese natural language is handled:Comment language material is carried out for the first time respectively using Chinese natural language handling implement
Participle and part-of-speech tagging, new word identification, optimize the operation such as participle and part-of-speech tagging, syntactic analysis and sentiment analysis, obtain structure
The sentiment analysis result of change is simultaneously saved in database;
Step 3, product feature is extracted:Five criterions of product feature are defined, according to this five criterions to sentiment analysis knot
Fruit carries out product feature mark, and extraction is labeled as the word of product feature, generates product feature set.
In the above-mentioned product feature extracting method based on pattern match, in described step 1, due to opening for network
The diversification of putting property and network comment, discreteness so that contain in the network comment text captured from electric business platform and largely make an uproar
Sound, if directly carrying out product feature extraction to it, acquired results may produce relatively large deviation with actual.So in order to obtain
Meet actual result, original comment set need to be filtered and cleaned, reduce noise.Wherein, data prediction includes deleting
Except blank, useless comment, punctuation mark unnecessary in comment is deleted, deletes the word of redundancy in comment, number of words is deleted and is less than 4
The comment of word, changes wrong word, and simplified Chinese character replaces the complex form of Chinese characters, deletes comment of redundancy etc..
In the above-mentioned product feature extracting method based on pattern match, in described step 3, product feature five
Criterion is specific as follows:
First, product feature can not be off word;
2nd, product feature is the noun or noun phrase in the comment numerous appearance of language material intermediate frequency;
3rd, product feature and the dependence of governing word are " SBV ", and governing word is emotion word;
4th, product feature is to meet the word of seven decimation rules;
5th, product feature is the domain term of non-single word.
In the above-mentioned product feature extracting method based on pattern match, seven decimation rules that product feature meets can
It is different by centre word part of speech, it is divided into two major classes, is specifically described as:
First, when centre word part of speech is adjective,
1. when the relation of word and centre word is " SBV ", i.e. when the governing word of word is exactly centre word, then the word is product spy
Sign;2. the direct dependence of " COO " when the governing word of word is not centre word, but between governing word and centre word be present, then
The word is product feature;3. when the governing word of word is not centre word, but in the presence of the indirect of " COO " between governing word and centre word
Dependence, then the word is product feature;
2nd, when centre word part of speech is not centre word for the governing word of verb and word,
4. when the direct dependence that " COO " between the governing word and centre word of word be present, then the word is product feature;⑤
When the direct dependence that " VOB " between the governing word and centre word of word be present, then the word is product feature;6. when the domination of word
The indirect dependence of " COO " between word and centre word be present, then the word is product feature;7. when the governing word and centre word of word
Between exist " VOB " indirect dependence, then the word is product feature.
The present invention can obtain magnanimity using web crawlers instrument from electric business platform website, multi-source heterogeneous product uses
Comment text, by shallow-layer, the Chinese text information processing technology of deep layer so that non-structured data become the number of structuring
According to, and carry out product feature mark and extraction using five criterions of definition.Using the method for the present invention, researchers can be fast
Speed, effective accuracy rate, recall rate and the F values for carrying out the extraction of product feature, while improving product feature extraction.
Brief description of the drawings:
Fig. 1 is the overall flow figure of the present invention.
Fig. 2 is the product feature extractive technique route map of the present invention.
Fig. 3 is caused result field variation diagram in product feature extraction process of the invention.
The comment corpus that Fig. 4 is the present invention obtains flow chart.
Fig. 5 is the syntactic analysis result case diagram of the comment sentence of the present invention.
Fig. 6 is the dependency relationship type expression figure between the word and word of the present invention.
Fig. 7 is seven decimation rule figures of the product feature of the present invention.
Fig. 8 is the portioned product feature annotation results figure of the present invention.
Embodiment:
With reference to specific accompanying drawing, the present invention is further illustrated.
The present invention is to carry out information scratching to large-scale electric business platform by web crawlers instrument, obtains magnanimity, multi-source heterogeneous
Chinese network user comment text, and Chinese natural language processing is carried out to it, products is extracted according to the five of definition criterions
Feature, improve accuracy rate, recall rate and the F values of product feature extraction.
Product feature extracting method based on pattern match, including comment corpus obtain, Chinese natural language processing and
Product feature extracts these three steps, as shown in Figure 1.
Technology and its technology path involved by product feature extracting method based on pattern match is as shown in Fig. 2 Fig. 2 is gone back
Denote caused result after every kind of technology use.Wherein, data acquisition and data prediction are used in step 1 of the present invention
Technology;First participle and its part-of-speech tagging, optimization participle and its part-of-speech tagging, syntactic analysis, sentiment analysis are then nature languages
Speech processing basic technology, is the technology in step 2, and product feature mark and extraction are the technologies of step 3.
Caused result and its field change in the whole extraction process of product feature extracting method based on pattern match, such as
Shown in Fig. 3.Comment on and there was only two fields, respectively sequence number and comment text in corpus;First participle and part-of-speech tagging result,
Optimization participle and part-of-speech tagging result have 3 fields, respectively sequence number, morphology and part of speech;Syntactic analysis result has 6 words
Section, respectively sequence number, morphology, part of speech, dependence, governing word and governing word part of speech;Sentiment analysis result has 7 fields, point
Wei not sequence number, morphology, part of speech, dependence, governing word, governing word part of speech and emotion mark;Product feature annotation results have 8
Field, respectively sequence number, morphology, part of speech, dependence, governing word, governing word part of speech, emotion mark and product signature;
Product feature set has two fields, respectively sequence number and product feature.
This each step is described in detail separately below.
Step 1, corpus is commented on to obtain:Using web crawlers instrument, some is gathered from large-scale electric business platform and specifies production
The product of product uses comment information, and is saved in local data base, and then the comment information of preservation is pre-processed, and reduces number
Noise in, obtain true, reliable, non-structured comment corpus.
It is as shown in Figure 4 to comment on the process that corpus obtains.That formulates web crawlers instrument crawls rule, treats the big of crawl
Type electric business platform carries out data grabber, and the result of crawl is stored into local data base, turns into original comment text;To original
Comment text carries out data prediction, generation comment corpus, is also stored into database.
Wherein, due to the opening of network and diversification, the discreteness of network comment so that captured from electric business platform
Contain a large amount of noises in network comment text, if directly carrying out text mining to it, acquired results may with it is actual produce compared with
Large deviation.So meeting actual result to obtain, original comment set need to be filtered and cleaned, reduce noise.In advance
Processing includes deleting blank, useless comment, deletes punctuation mark unnecessary in comment, deletes the word of redundancy in comment, deletes
Except comment of the number of words less than 4 words, modification wrong word, simplified Chinese character replaces the complex form of Chinese characters, the comment for deleting redundancy etc..
Step 2, Chinese natural language is handled:Comment language material is carried out for the first time respectively using Chinese natural language handling implement
Participle and part-of-speech tagging, new word identification, optimize the operation such as participle and part-of-speech tagging, syntactic analysis and sentiment analysis, obtain structure
The sentiment analysis result of change is simultaneously saved in database.
2.1) participle and part-of-speech tagging
Comment of the client feedback on electric business platform is for the purpose of exchanging and share, and is the unstructured of textual form
Natural language, to therefrom excavate valuable information, then need that it is converted into structural data by participle technique.To commenting
The instrument that The Analects of Confucius material carries out segmenting use is ICTCLAS, and the instrument of part-of-speech tagging use is carried out to the comment language material after participle
It is ICTCLAS, in order to improve the precision ratio of product feature extraction, the part-of-speech tagging method of selection is to mark out more specific situation
Two level mark.
With the fast development of society, there are many new words.The Chinese that these neologisms can not updated point
Word device correctly identifies, during participle, it by mistakenly separate, such as, " cost performance " can by ICTCLAS be divided into " property ",
Three words of " valency ", " ratio ".In order to solve this problem, the accuracy rate of participle is improved, we will carry out new to first word segmentation result
Word is found, the field neologisms of identification are added in user-oriented dictionary, recycles ICTCLAS to optimize participle to comment corpus
And two level part-of-speech tagging.
New word discovery process includes construction repeated strings, frequency filter, cohesion filtering and left and right entropy filtering FOUR EASY STEPS.Its
In, construction repeated strings are to utilize N-Gram algorithms, and combined filtering vocabulary, filtering part of speech vocabulary, stop words etc. exclude vocabulary and carried out
The construction of repeated strings;Frequency filter is to filter out repeated strings of the frequency less than a certain threshold value;Cohesion filtering is by cohesion
Value filters out less than the repeated strings of a certain threshold value, the mutual information (Mutual of the cohesion repeated strings of repeated strings
Information, MI) represent, the mutual information calculation formula of repeated strings is:
Wherein, x, y represent composition repeated strings R 2 substrings, PxyRepresent what repeated strings R occurred in first word segmentation result
Probability, Px, PyRepresent the probability that substring x, y individually occur in first word segmentation result;Left and right entropy filtering is that left entropy or right entropy is low
Filtered out in the repeated strings of threshold value, the left entropy of repeated strings, right entropy calculation formula are respectively:
Wherein, p (a | R) represents that word string a is the probability of repeated strings R left adjacent word, and p (b | R) represent that word string b is repeated strings
The probability of R right adjacent word.
2.2) syntactic analysis
Interdependent syntactic analysis is one of key technology in natural language processing, is to identify " SVO " in sentence, " fixed
The grammatical items such as shape benefit ", and analyze the technology of relation between each composition.Herein using the language technology platform of Harbin Institute of Technology's research and development
(Language Technology Platform, LTP) determines the dependence between each composition in sentence.Due to ICTCLAS
It is different with the part-of-speech tagging collection that LTP is used, before interdependent syntactic analysis is carried out, first carry out part-of-speech tagging collection conversion.
Fig. 5 is the interdependent syntactic analysis result of a comment sentence, can from the interdependent syntactic analysis result in Fig. 5
Go out, dependence directly occurs between the word and word in sentence, and a dependence connects two words, one of them cries domination
Word, another is dependent.Dependence is represented with a directed arc, is interdependent arc, and the direction of interdependent arc is by governing word
Point to dependent.There is individual mark on each interdependent arc, be relationship type, represent which type of be present between governing word and dependent
Dependence.Dependent, relationship type and governing word composition one are interdependent right, i.e., dependent depends on domination with dependence
Word.As shown in figure 5, (mobile phone, SBV, good) is one interdependent right, " mobile phone " is dependent, and " good " is governing word, " SBV "
It is the dependence for representing to exist between " mobile phone " and " good " " SBV ", this is interdependent to representing that " mobile phone " is depended on " SBV "
" good ".
Wherein, the centre word of sentence is not dominated by other any compositions, i.e., is " HED " with the dependence of " Root "
Word centered on word.In Fig. 5, " good " and the dependence of " Root " are " HED ", then " good " is the center of this comment
Word.
2.3) sentiment analysis
By analyzing the Chinese network comment text of the homologous isomery of magnanimity, the comment of user feedback is user to purchase
Commodity in-service evaluation, generally express the viewpoint of oneself with adjective, noun or verb.Arrange herein and generate a feelings
Word dictionary is felt, for judging whether the governing word of each word in syntactic analysis result is emotion word, if the governing word of certain word is feelings
Feel word, then the emotion mark isOp of the word is designated as " Y ", conversely, being designated as " N ".
Step 3, product feature is extracted:Five criterions of product feature are defined, according to this five criterions to sentiment analysis knot
Fruit carries out product feature mark, and extraction is labeled as the word of product feature, generates product feature set.
In Chinese product review, the dependence between two words is extremely complex, and we define two kinds of dependences
The grammer that type is come between descriptor and word contacts, respectively direct dependence and indirect dependence.Wherein, it is directly interdependent
Relation:Represent that a word directly depends on another word, as shown in (a) in Fig. 6, A directly depends on B with dependence;Between
Connect dependence:Represent that a word depends on another word by one or more medium terms, such as (b) and (c) institute in Fig. 6
Show, A directly depends on medium term with dependence, and medium term directly depends on B with one or more " COO " again, i.e. A indirectly according to
It is stored in B.
3.1) product feature marks
By analyzing substantial amounts of Chinese comment text, summing up product feature needs to meet following five criterions:
First, product feature can not be off word
Stop words is usually that frequency of use is very high, but has no its meaning in itself, only puts it into a complete sentence
Just there is the word of certain effect in son, such as " ", " ", " and " etc..And product feature is notional word, there is lexical meaning and language
Adopted meaning, syntactic constituent can be served as in sentence.So product feature is unlikely to be stop words.
2nd, product feature is the noun or noun phrase in the comment numerous appearance of language material intermediate frequency
3rd, product feature and the dependence of governing word are " SBV ", and governing word is emotion word
4th, product feature is to meet the word of seven decimation rules
5th, product feature is the domain term of non-single word
Wherein, seven decimation rules in criterion four, it is that we combine definition and the sentiment analysis knot of dependency relationship type
Fruit, according to direct dependence or indirect dependence between the governing word and centre word of word be present, sum up what is come, such as Fig. 7
It is shown.
This seven rules can be different by centre word part of speech, are divided into two major classes, are specifically described as:
(1) when centre word part of speech is adjective
1. when the relation of certain word and centre word is " SBV ", i.e. when the governing word of certain word is exactly centre word, then the word is product
Feature, as shown in (a) in Fig. 7.
2. when the governing word of certain word is not centre word, but in the presence of the direct interdependent of " COO " between governing word and centre word
Relation, then the word is product feature, as shown in (b) in Fig. 7.
3. when the governing word of certain word is not centre word, but in the presence of the indirect interdependent of " COO " between governing word and centre word
Relation, then the word is product feature, as shown in (c) in Fig. 7.
(2) when centre word part of speech is verb
4. when the direct dependence that " COO " between the governing word and centre word of word be present, then the word is product feature, such as
Shown in (d) in Fig. 7.
5. when the direct dependence that " VOB " between the governing word and centre word of word be present, then the word is product feature, such as
Shown in (f) in Fig. 3.
6. when the indirect dependence that " COO " between the governing word and centre word of word be present, then the word is product feature, such as
Shown in (e) in Fig. 3.
7. when the indirect dependence that " VOB " between the governing word and centre word of word be present, then the word is product feature, such as
Shown in (g) in Fig. 3.
Fig. 8 is portioned product feature annotation results, shares 8 fields.Wherein, no represents sequence number, and tk represents morphology, pos
Part of speech is represented, pRel represents dependence, and pWd represents governing word, and pPos represents governing word part of speech, and isOp represents emotion mark,
IsPF represents product signature.Morphology and part of speech are that participle and part-of-speech tagging generate, dependence, governing word and governing word
Part of speech is syntactic analysis generation, and emotion mark is that sentiment analysis generates, and product signature is product feature mark generation
's.
3.2) product feature is extracted
The word that product feature is labeled as in product feature mark set is extracted, generates product feature set.
The present invention can utilize web crawlers instrument to capture user comment related to appointed product on large-scale electric business platform
Text, and a series of processing are carried out to it, product feature mark and extraction are carried out according to the five of definition criterions, generation product is special
Collection is closed.Using the method for the present invention, we can efficiently, efficiently carry out product feature extraction, and improve product feature
Accuracy rate, recall rate and the F values of extraction.
Claims (3)
1. the product feature extracting method based on pattern match, it is characterised in that:This method comprises the steps:
Step 1:Corpus is commented on to obtain
Using web crawlers instrument, the product that some appointed product is gathered from large-scale electric business platform uses comment information, and protects
Be stored to local data base, then the comment information of preservation pre-processed, reduce data in noise, obtain it is true, reliable,
Non-structured comment corpus;
Step 2:Chinese natural language processing
Comment language material is segmented for the first time respectively using Chinese natural language handling implement and part-of-speech tagging, new word identification, excellent
Change the operation such as participle and part-of-speech tagging, syntactic analysis and sentiment analysis, obtain the sentiment analysis result of structuring and be saved in number
According in storehouse;
Step 3:Product feature is extracted
Five criterions of product feature are defined, product feature mark, extraction are carried out to sentiment analysis result according to this five criterions
The word of product feature is labeled as, generates product feature set.
2. the product feature extracting method based on pattern match as claimed in claim 1, it is characterised in that:In step 3, product
Five criterions of feature are specific as follows:
First, product feature can not be off word;
2nd, product feature is the noun or noun phrase in the comment numerous appearance of language material intermediate frequency;
3rd, product feature and the dependence of governing word are " SBV ", and governing word is emotion word;
4th, product feature is to meet the word of seven decimation rules;
5th, product feature is the domain term of non-single word.
3. the product feature extracting method based on pattern match as claimed in claim 2, it is characterised in that:Product feature meets
Seven decimation rules can be different by centre word part of speech, be divided into two major classes, be specifically described as:
First, when centre word part of speech is adjective,
1. when the relation of word and centre word is " SBV ", i.e. when the governing word of word is exactly centre word, then the word is product feature;②
The direct dependence of " COO " when the governing word of word is not centre word, but between governing word and centre word be present, then the word is
Product feature;3. the indirect interdependent pass of " COO " when the governing word of word is not centre word, but between governing word and centre word be present
System, then the word is product feature;
2nd, when centre word part of speech is not centre word for the governing word of verb and word,
4. when the direct dependence that " COO " between the governing word and centre word of word be present, then the word is product feature;5. work as word
Governing word and centre word between exist " VOB " direct dependence, then the word is product feature;6. when word governing word with
The indirect dependence of " COO " between centre word be present, then the word is product feature;7. when between the governing word and centre word of word
In the presence of the indirect dependence of " VOB ", then the word is product feature.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710694361.4A CN107480257A (en) | 2017-08-14 | 2017-08-14 | Product feature extracting method based on pattern match |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710694361.4A CN107480257A (en) | 2017-08-14 | 2017-08-14 | Product feature extracting method based on pattern match |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107480257A true CN107480257A (en) | 2017-12-15 |
Family
ID=60600424
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710694361.4A Pending CN107480257A (en) | 2017-08-14 | 2017-08-14 | Product feature extracting method based on pattern match |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107480257A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101189640A (en) * | 2005-05-31 | 2008-05-28 | 日本电气株式会社 | Pattern collation method, pattern collation system, and pattern collation program |
CN101751458A (en) * | 2009-12-31 | 2010-06-23 | 暨南大学 | Network public sentiment monitoring system and method |
US20150186912A1 (en) * | 2010-06-07 | 2015-07-02 | Affectiva, Inc. | Analysis in response to mental state expression requests |
CN106649260A (en) * | 2016-10-19 | 2017-05-10 | 中国计量大学 | Product feature structure tree construction method based on comment text mining |
CN106708966A (en) * | 2016-11-29 | 2017-05-24 | 中国计量大学 | Similarity calculation-based junk comment detection method |
CN106776574A (en) * | 2016-12-28 | 2017-05-31 | Tcl集团股份有限公司 | User comment text method for digging and device |
-
2017
- 2017-08-14 CN CN201710694361.4A patent/CN107480257A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101189640A (en) * | 2005-05-31 | 2008-05-28 | 日本电气株式会社 | Pattern collation method, pattern collation system, and pattern collation program |
CN101751458A (en) * | 2009-12-31 | 2010-06-23 | 暨南大学 | Network public sentiment monitoring system and method |
US20150186912A1 (en) * | 2010-06-07 | 2015-07-02 | Affectiva, Inc. | Analysis in response to mental state expression requests |
CN106649260A (en) * | 2016-10-19 | 2017-05-10 | 中国计量大学 | Product feature structure tree construction method based on comment text mining |
CN106708966A (en) * | 2016-11-29 | 2017-05-24 | 中国计量大学 | Similarity calculation-based junk comment detection method |
CN106776574A (en) * | 2016-12-28 | 2017-05-31 | Tcl集团股份有限公司 | User comment text method for digging and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107797991B (en) | Dependency syntax tree-based knowledge graph expansion method and system | |
CN103544255B (en) | Text semantic relativity based network public opinion information analysis method | |
CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
CN104834735B (en) | A kind of documentation summary extraction method based on term vector | |
Gokulakrishnan et al. | Opinion mining and sentiment analysis on a twitter data stream | |
Cetto et al. | Graphene: Semantically-linked propositions in open information extraction | |
CN102254014B (en) | Adaptive information extraction method for webpage characteristics | |
JP6403382B2 (en) | Phrase pair collection device and computer program therefor | |
CN107193801A (en) | A kind of short text characteristic optimization and sentiment analysis method based on depth belief network | |
CN105528437B (en) | A kind of question answering system construction method extracted based on structured text knowledge | |
CN104268160A (en) | Evaluation object extraction method based on domain dictionary and semantic roles | |
CN105608232B (en) | A kind of bug knowledge modeling method based on graphic data base | |
CN103942340A (en) | Microblog user interest recognizing method based on text mining | |
JP5907393B2 (en) | Complex predicate template collection device and computer program therefor | |
Mori et al. | A machine learning approach to recipe text processing | |
EP3483747A1 (en) | Preserving and processing ambiguity in natural language | |
CN105740227A (en) | Genetic simulated annealing method for solving new words in Chinese segmentation | |
Boström | Miljörörelsens mångfald | |
CN104346382B (en) | Use the text analysis system and method for language inquiry | |
CN106055633A (en) | Chinese microblog subjective and objective sentence classification method | |
Ghanem et al. | Stemming effectiveness in clustering of Arabic documents | |
Jia et al. | A Chinese unknown word recognition method for micro-blog short text based on improved FP-growth | |
CN103440343A (en) | Knowledge base construction method facing domain service target | |
CN110032738A (en) | Microblogging text normalization method based on context graph random walk and phonetic-stroke code | |
Dahir et al. | Utilizing machine learning for sentiment analysis of IMDB movie review data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |