CN103886051A - Comment analysis method based on entities and features - Google Patents

Comment analysis method based on entities and features Download PDF

Info

Publication number
CN103886051A
CN103886051A CN201410093275.4A CN201410093275A CN103886051A CN 103886051 A CN103886051 A CN 103886051A CN 201410093275 A CN201410093275 A CN 201410093275A CN 103886051 A CN103886051 A CN 103886051A
Authority
CN
China
Prior art keywords
comment
entity
module
feature
mainly used
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410093275.4A
Other languages
Chinese (zh)
Inventor
秦志光
周尔强
罗熹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201410093275.4A priority Critical patent/CN103886051A/en
Publication of CN103886051A publication Critical patent/CN103886051A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a comment analysis method based on entities and features, belongs to the field of natural language processing, and aims at conducing comment text analysis. Features of an entity tree and the related entities of a comment text are obtained through the related natural language processing means by processing the comment text. Information extraction is further conducted on the text through the entities and the features of the comment text. The method has the effect of promoting the comment analysis work such as public opinion analysis, relation extraction and orientation analysis.

Description

A kind of comment and analysis method based on entity and feature
Technical field
The invention belongs to natural language processing field, more specifically say, relate to a kind of comment and analysis method based on entity and feature.
Background technology
Along with the arrival in web2.0 epoch, the review information quantity of network presents explosive growth.If your company has issued a new product.After the release of new products, bring the relevant report from different media, also had the related commentary of Ge great portal website.Facing to these comments, you perhaps urgent hope understand user and which aspect of product more paid close attention to actually, to the evaluation of this product how on earth user.Certainly obtaining the excessively artificial mode of above-mentioned information exchange is may complete hardly.This just processes to computing machine the result that above-mentioned data obtain wanting and has proposed requirement.The analytical approach of the comment text based on entity and feature of the present invention is by building entity and related entities feature, to above-mentioned data analysis the result that obtains.
Summary of the invention
Final purpose of the present invention is that comment text is analyzed.The present invention, by the extraction to the entity of commenting on a large amount of comment texts and feature, builds entity and the feature frame analysis structure of oneself, further helps comment text analysis, carries out information extraction.
To achieve these goals, the present invention is based on the analytical approach of entity and feature comment text, its method forms and is mainly made up of following characteristics:
-comment data acquisition module.Be mainly used in gathering the comment data of association area.Obtain a large amount of comment text data by web crawlers or additive method.
-data preprocessing module.Be mainly used in the separately middle sentence of comment text.In text separately, after sentence, use participle part-of-speech tagging instrument to carry out participle part-of-speech tagging to it.
-entity extraction module.Be mainly used in extracting the entity in comment.Entity is mainly made up of noun composition.The present invention uses word frequency and the artificial mode participating in to carry out substantive noun extraction.
-entity body tree builds module.Being mainly used in that the noun in entity is carried out to body tree builds.The upper different classes of word of body tree is built in different branches, and the hierarchical relationship of word and word is also embodied on body tree simultaneously.
-substance feature extraction module.Be mainly used in extracting the feature of related entities.Substance feature is mainly by adjective, verb, and noun forms.The present invention adopts the method for syntax dependence, and occurs that method carries out substance feature extraction when word.
-comment and analysis module.Be mainly used in utilizing entity and feature to carry out untreated comment text analysis.And obtain relevant information and extract result.
Goal of the invention of the present invention is achieved in that the present invention obtains the data after rough handling by calling data acquisition module and data preprocessing module, next by calling entity extraction module, entity body tree builds module, substance feature extraction module obtains relevant training result, finally by comment and analysis module, above-mentioned module is encapsulated, after having encapsulated, in the time entering new comment text, by comment and analysis module, text analysis is obtained to end product.
Brief description of the drawings
Fig. 1 is that the comment and analysis method that the present invention is based on entity and feature is specifically implemented principle and block diagram.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described, so that those skilled in the art understands the present invention better.Requiring particular attention is that, in the following description, in the time that perhaps the detailed description of known function and design can desalinate main contents of the present invention, these are described in here and will be left in the basket.
Fig. 1 is that the comment and analysis method that the present invention is based on entity and feature is specifically implemented principle and block diagram.
In the present embodiment, as shown in Figure 1, the present invention is based on the comment and analysis method data acquisition module 101 of entity and feature, data preprocessing module 102, entity extraction module 103, entity body tree builds module 104, substance feature extraction module 105, entity and feature construction module 201, untreated comment 106, analysis result 107.
Obtain after relevant data by calling data acquisition module 101 in this example, its data are passed to data preprocessing module 102, complete separately paragraph by data preprocessing module, separately long sentence in comment, separately short sentence in comment, after participle and part-of-speech tagging, pass to entity extraction module 103 through data preprocessing module 102 data, substance feature extraction module 105, is extracted and data is passed to entity body tree after entity and build module 104 by entity labeling module 103.Utilize substance feature extraction module 104 to extract corresponding feature simultaneously.Entity extraction module 103, entity body tree builds module 104, and substance feature extraction module 105 all belongs to entity and feature construction module 201.To utilize the untreated comment 106 of 201 processing completing after entity and feature construction module 201.After processing, obtain analysis result 107.
Although above the illustrative embodiment of the present invention is described; so that the technician of this technology neck understands the present invention; but should be clear; the invention is not restricted to the scope of embodiment; to those skilled in the art; as long as various variations appended claim limit and definite the spirit and scope of the present invention in, these variations are apparent, all utilize innovation and creation that the present invention conceives all at the row of protection.

Claims (1)

1. the comment and analysis method based on entity and feature is made up of following characteristics:
-comment data acquisition module.Be mainly used in gathering the comment data of association area.Obtain a large amount of comment text data by web crawlers or additive method.
-data preprocessing module.Be mainly used in the separately middle sentence of comment text.In text separately, after sentence, use participle part-of-speech tagging instrument to carry out participle part-of-speech tagging to it.
-entity extraction module.Be mainly used in extracting the entity in comment.Entity is mainly made up of noun composition.The present invention uses word frequency and the artificial mode participating in to carry out substantive noun extraction.
-entity body tree builds module.Being mainly used in that the noun in entity is carried out to body tree builds.The upper different classes of word of body tree is built in different branches, and the hierarchical relationship of word and word is also embodied on body tree simultaneously.
-substance feature extraction module.Be mainly used in extracting the feature of related entities.Substance feature is mainly by adjective, verb, and noun forms.The present invention adopts the method for syntax dependence, and occurs that method carries out substance feature extraction when word.
-comment and analysis module.Be mainly used in utilizing entity and feature to carry out untreated comment text analysis.And obtain relevant information and extract result.
CN201410093275.4A 2014-03-13 2014-03-13 Comment analysis method based on entities and features Pending CN103886051A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410093275.4A CN103886051A (en) 2014-03-13 2014-03-13 Comment analysis method based on entities and features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410093275.4A CN103886051A (en) 2014-03-13 2014-03-13 Comment analysis method based on entities and features

Publications (1)

Publication Number Publication Date
CN103886051A true CN103886051A (en) 2014-06-25

Family

ID=50954943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410093275.4A Pending CN103886051A (en) 2014-03-13 2014-03-13 Comment analysis method based on entities and features

Country Status (1)

Country Link
CN (1) CN103886051A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528625A (en) * 2020-12-11 2021-03-19 北京百度网讯科技有限公司 Event extraction method and device, computer equipment and readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196904A (en) * 2007-11-09 2008-06-11 清华大学 News keyword abstraction method based on word frequency and multi-component grammar
US20090119156A1 (en) * 2007-11-02 2009-05-07 Wise Window Inc. Systems and methods of providing market analytics for a brand
US20090265332A1 (en) * 2008-04-18 2009-10-22 Biz360 Inc. System and Methods for Evaluating Feature Opinions for Products, Services, and Entities
WO2010042888A1 (en) * 2008-10-10 2010-04-15 The Regents Of The University Of California A computational method for comparing, classifying, indexing, and cataloging of electronically stored linear information
CN102968408A (en) * 2012-11-23 2013-03-13 西安电子科技大学 Method for identifying substance features of customer reviews
CN103077164A (en) * 2012-12-27 2013-05-01 新浪网技术(中国)有限公司 Text analysis method and text analyzer
CN103370707A (en) * 2011-02-24 2013-10-23 瑞典爱立信有限公司 Method and server for media classification
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090119156A1 (en) * 2007-11-02 2009-05-07 Wise Window Inc. Systems and methods of providing market analytics for a brand
CN101196904A (en) * 2007-11-09 2008-06-11 清华大学 News keyword abstraction method based on word frequency and multi-component grammar
US20090265332A1 (en) * 2008-04-18 2009-10-22 Biz360 Inc. System and Methods for Evaluating Feature Opinions for Products, Services, and Entities
WO2010042888A1 (en) * 2008-10-10 2010-04-15 The Regents Of The University Of California A computational method for comparing, classifying, indexing, and cataloging of electronically stored linear information
CN103370707A (en) * 2011-02-24 2013-10-23 瑞典爱立信有限公司 Method and server for media classification
CN102968408A (en) * 2012-11-23 2013-03-13 西安电子科技大学 Method for identifying substance features of customer reviews
CN103077164A (en) * 2012-12-27 2013-05-01 新浪网技术(中国)有限公司 Text analysis method and text analyzer
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528625A (en) * 2020-12-11 2021-03-19 北京百度网讯科技有限公司 Event extraction method and device, computer equipment and readable storage medium
CN112528625B (en) * 2020-12-11 2024-02-23 北京百度网讯科技有限公司 Event extraction method, device, computer equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN103853834B (en) Text structure analysis-based Web document abstract generation method
WO2014085832A3 (en) Event investigation within an online research system
CN103077164A (en) Text analysis method and text analyzer
CN103246641A (en) Text semantic information analyzing system and method
WO2014210387A3 (en) Concept extraction
CN103970898A (en) Method and device for extracting information based on multistage rule base
MY194297A (en) A method and device for providing search engine label
CN105426379A (en) Keyword weight calculation method based on position of word
CN102999523A (en) Intelligence digitizing method
CN103886051A (en) Comment analysis method based on entities and features
MY167959A (en) System and method for semantic-level sentiment analysis of text
CN108205542A (en) A kind of analysis method and system of song comment
Luporini Metaphor in times of crisis: Metaphorical representations of the global crisis in The Financial Times and II Sole 24 Ore 2008
김양희 et al. An extensive analysis of studies on ESP in Korea: From 2007 to 2016
Heo et al. Feature extraction to detect hoax articles
Nagar et al. News sentiment analysis using R to predict stock market trends
Prayudha The Cohesion and Coherence of the Editorials in The Jakarta Post
de la Fuente Sampling for machine translation evaluation
Pavliuk et al. The Use of Computer Technologies in the Lexicography
임혜원 et al. A Study on the Perception and Value Evaluation of Regional Tourism Brands Using Big Data
Kwon et al. Automated procedure for extracting safety regulatory information using natural language processing techniques and ontology
Wang et al. Text Mining to Facilitate Geoscience Knowledge Discovery
Ting China-Australia executive leadership program: Cross-border leadership development in the Asian century
Dewey et al. L2 development during study abroad in China
Wahyu et al. FIGURES OF SPEECH; METAPHOR USED IN HOUSING ADVERTISEMENTS IN “THE POINT” NEWSPAPER IN NOVEMBER 2009

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140625