CN103886051A - Comment analysis method based on entities and features - Google Patents
Comment analysis method based on entities and features Download PDFInfo
- Publication number
- CN103886051A CN103886051A CN201410093275.4A CN201410093275A CN103886051A CN 103886051 A CN103886051 A CN 103886051A CN 201410093275 A CN201410093275 A CN 201410093275A CN 103886051 A CN103886051 A CN 103886051A
- Authority
- CN
- China
- Prior art keywords
- comment
- entity
- module
- feature
- mainly used
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Animal Behavior & Ethology (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a comment analysis method based on entities and features, belongs to the field of natural language processing, and aims at conducing comment text analysis. Features of an entity tree and the related entities of a comment text are obtained through the related natural language processing means by processing the comment text. Information extraction is further conducted on the text through the entities and the features of the comment text. The method has the effect of promoting the comment analysis work such as public opinion analysis, relation extraction and orientation analysis.
Description
Technical field
The invention belongs to natural language processing field, more specifically say, relate to a kind of comment and analysis method based on entity and feature.
Background technology
Along with the arrival in web2.0 epoch, the review information quantity of network presents explosive growth.If your company has issued a new product.After the release of new products, bring the relevant report from different media, also had the related commentary of Ge great portal website.Facing to these comments, you perhaps urgent hope understand user and which aspect of product more paid close attention to actually, to the evaluation of this product how on earth user.Certainly obtaining the excessively artificial mode of above-mentioned information exchange is may complete hardly.This just processes to computing machine the result that above-mentioned data obtain wanting and has proposed requirement.The analytical approach of the comment text based on entity and feature of the present invention is by building entity and related entities feature, to above-mentioned data analysis the result that obtains.
Summary of the invention
Final purpose of the present invention is that comment text is analyzed.The present invention, by the extraction to the entity of commenting on a large amount of comment texts and feature, builds entity and the feature frame analysis structure of oneself, further helps comment text analysis, carries out information extraction.
To achieve these goals, the present invention is based on the analytical approach of entity and feature comment text, its method forms and is mainly made up of following characteristics:
-comment data acquisition module.Be mainly used in gathering the comment data of association area.Obtain a large amount of comment text data by web crawlers or additive method.
-data preprocessing module.Be mainly used in the separately middle sentence of comment text.In text separately, after sentence, use participle part-of-speech tagging instrument to carry out participle part-of-speech tagging to it.
-entity extraction module.Be mainly used in extracting the entity in comment.Entity is mainly made up of noun composition.The present invention uses word frequency and the artificial mode participating in to carry out substantive noun extraction.
-entity body tree builds module.Being mainly used in that the noun in entity is carried out to body tree builds.The upper different classes of word of body tree is built in different branches, and the hierarchical relationship of word and word is also embodied on body tree simultaneously.
-substance feature extraction module.Be mainly used in extracting the feature of related entities.Substance feature is mainly by adjective, verb, and noun forms.The present invention adopts the method for syntax dependence, and occurs that method carries out substance feature extraction when word.
-comment and analysis module.Be mainly used in utilizing entity and feature to carry out untreated comment text analysis.And obtain relevant information and extract result.
Goal of the invention of the present invention is achieved in that the present invention obtains the data after rough handling by calling data acquisition module and data preprocessing module, next by calling entity extraction module, entity body tree builds module, substance feature extraction module obtains relevant training result, finally by comment and analysis module, above-mentioned module is encapsulated, after having encapsulated, in the time entering new comment text, by comment and analysis module, text analysis is obtained to end product.
Brief description of the drawings
Fig. 1 is that the comment and analysis method that the present invention is based on entity and feature is specifically implemented principle and block diagram.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described, so that those skilled in the art understands the present invention better.Requiring particular attention is that, in the following description, in the time that perhaps the detailed description of known function and design can desalinate main contents of the present invention, these are described in here and will be left in the basket.
Fig. 1 is that the comment and analysis method that the present invention is based on entity and feature is specifically implemented principle and block diagram.
In the present embodiment, as shown in Figure 1, the present invention is based on the comment and analysis method data acquisition module 101 of entity and feature, data preprocessing module 102, entity extraction module 103, entity body tree builds module 104, substance feature extraction module 105, entity and feature construction module 201, untreated comment 106, analysis result 107.
Obtain after relevant data by calling data acquisition module 101 in this example, its data are passed to data preprocessing module 102, complete separately paragraph by data preprocessing module, separately long sentence in comment, separately short sentence in comment, after participle and part-of-speech tagging, pass to entity extraction module 103 through data preprocessing module 102 data, substance feature extraction module 105, is extracted and data is passed to entity body tree after entity and build module 104 by entity labeling module 103.Utilize substance feature extraction module 104 to extract corresponding feature simultaneously.Entity extraction module 103, entity body tree builds module 104, and substance feature extraction module 105 all belongs to entity and feature construction module 201.To utilize the untreated comment 106 of 201 processing completing after entity and feature construction module 201.After processing, obtain analysis result 107.
Although above the illustrative embodiment of the present invention is described; so that the technician of this technology neck understands the present invention; but should be clear; the invention is not restricted to the scope of embodiment; to those skilled in the art; as long as various variations appended claim limit and definite the spirit and scope of the present invention in, these variations are apparent, all utilize innovation and creation that the present invention conceives all at the row of protection.
Claims (1)
1. the comment and analysis method based on entity and feature is made up of following characteristics:
-comment data acquisition module.Be mainly used in gathering the comment data of association area.Obtain a large amount of comment text data by web crawlers or additive method.
-data preprocessing module.Be mainly used in the separately middle sentence of comment text.In text separately, after sentence, use participle part-of-speech tagging instrument to carry out participle part-of-speech tagging to it.
-entity extraction module.Be mainly used in extracting the entity in comment.Entity is mainly made up of noun composition.The present invention uses word frequency and the artificial mode participating in to carry out substantive noun extraction.
-entity body tree builds module.Being mainly used in that the noun in entity is carried out to body tree builds.The upper different classes of word of body tree is built in different branches, and the hierarchical relationship of word and word is also embodied on body tree simultaneously.
-substance feature extraction module.Be mainly used in extracting the feature of related entities.Substance feature is mainly by adjective, verb, and noun forms.The present invention adopts the method for syntax dependence, and occurs that method carries out substance feature extraction when word.
-comment and analysis module.Be mainly used in utilizing entity and feature to carry out untreated comment text analysis.And obtain relevant information and extract result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410093275.4A CN103886051A (en) | 2014-03-13 | 2014-03-13 | Comment analysis method based on entities and features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410093275.4A CN103886051A (en) | 2014-03-13 | 2014-03-13 | Comment analysis method based on entities and features |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103886051A true CN103886051A (en) | 2014-06-25 |
Family
ID=50954943
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410093275.4A Pending CN103886051A (en) | 2014-03-13 | 2014-03-13 | Comment analysis method based on entities and features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103886051A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112528625A (en) * | 2020-12-11 | 2021-03-19 | 北京百度网讯科技有限公司 | Event extraction method and device, computer equipment and readable storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101196904A (en) * | 2007-11-09 | 2008-06-11 | 清华大学 | News keyword abstraction method based on word frequency and multi-component grammar |
US20090119156A1 (en) * | 2007-11-02 | 2009-05-07 | Wise Window Inc. | Systems and methods of providing market analytics for a brand |
US20090265332A1 (en) * | 2008-04-18 | 2009-10-22 | Biz360 Inc. | System and Methods for Evaluating Feature Opinions for Products, Services, and Entities |
WO2010042888A1 (en) * | 2008-10-10 | 2010-04-15 | The Regents Of The University Of California | A computational method for comparing, classifying, indexing, and cataloging of electronically stored linear information |
CN102968408A (en) * | 2012-11-23 | 2013-03-13 | 西安电子科技大学 | Method for identifying substance features of customer reviews |
CN103077164A (en) * | 2012-12-27 | 2013-05-01 | 新浪网技术(中国)有限公司 | Text analysis method and text analyzer |
CN103370707A (en) * | 2011-02-24 | 2013-10-23 | 瑞典爱立信有限公司 | Method and server for media classification |
CN103544255A (en) * | 2013-10-15 | 2014-01-29 | 常州大学 | Text semantic relativity based network public opinion information analysis method |
-
2014
- 2014-03-13 CN CN201410093275.4A patent/CN103886051A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090119156A1 (en) * | 2007-11-02 | 2009-05-07 | Wise Window Inc. | Systems and methods of providing market analytics for a brand |
CN101196904A (en) * | 2007-11-09 | 2008-06-11 | 清华大学 | News keyword abstraction method based on word frequency and multi-component grammar |
US20090265332A1 (en) * | 2008-04-18 | 2009-10-22 | Biz360 Inc. | System and Methods for Evaluating Feature Opinions for Products, Services, and Entities |
WO2010042888A1 (en) * | 2008-10-10 | 2010-04-15 | The Regents Of The University Of California | A computational method for comparing, classifying, indexing, and cataloging of electronically stored linear information |
CN103370707A (en) * | 2011-02-24 | 2013-10-23 | 瑞典爱立信有限公司 | Method and server for media classification |
CN102968408A (en) * | 2012-11-23 | 2013-03-13 | 西安电子科技大学 | Method for identifying substance features of customer reviews |
CN103077164A (en) * | 2012-12-27 | 2013-05-01 | 新浪网技术(中国)有限公司 | Text analysis method and text analyzer |
CN103544255A (en) * | 2013-10-15 | 2014-01-29 | 常州大学 | Text semantic relativity based network public opinion information analysis method |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112528625A (en) * | 2020-12-11 | 2021-03-19 | 北京百度网讯科技有限公司 | Event extraction method and device, computer equipment and readable storage medium |
CN112528625B (en) * | 2020-12-11 | 2024-02-23 | 北京百度网讯科技有限公司 | Event extraction method, device, computer equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103853834B (en) | Text structure analysis-based Web document abstract generation method | |
WO2014085832A3 (en) | Event investigation within an online research system | |
CN103077164A (en) | Text analysis method and text analyzer | |
CN103246641A (en) | Text semantic information analyzing system and method | |
WO2014210387A3 (en) | Concept extraction | |
CN103970898A (en) | Method and device for extracting information based on multistage rule base | |
MY194297A (en) | A method and device for providing search engine label | |
CN105426379A (en) | Keyword weight calculation method based on position of word | |
CN102999523A (en) | Intelligence digitizing method | |
CN103886051A (en) | Comment analysis method based on entities and features | |
MY167959A (en) | System and method for semantic-level sentiment analysis of text | |
CN108205542A (en) | A kind of analysis method and system of song comment | |
Luporini | Metaphor in times of crisis: Metaphorical representations of the global crisis in The Financial Times and II Sole 24 Ore 2008 | |
김양희 et al. | An extensive analysis of studies on ESP in Korea: From 2007 to 2016 | |
Heo et al. | Feature extraction to detect hoax articles | |
Nagar et al. | News sentiment analysis using R to predict stock market trends | |
Prayudha | The Cohesion and Coherence of the Editorials in The Jakarta Post | |
de la Fuente | Sampling for machine translation evaluation | |
Pavliuk et al. | The Use of Computer Technologies in the Lexicography | |
임혜원 et al. | A Study on the Perception and Value Evaluation of Regional Tourism Brands Using Big Data | |
Kwon et al. | Automated procedure for extracting safety regulatory information using natural language processing techniques and ontology | |
Wang et al. | Text Mining to Facilitate Geoscience Knowledge Discovery | |
Ting | China-Australia executive leadership program: Cross-border leadership development in the Asian century | |
Dewey et al. | L2 development during study abroad in China | |
Wahyu et al. | FIGURES OF SPEECH; METAPHOR USED IN HOUSING ADVERTISEMENTS IN “THE POINT” NEWSPAPER IN NOVEMBER 2009 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140625 |