CN110837729A - Dependency syntax analysis-based online comment viewpoint extraction method - Google Patents
Dependency syntax analysis-based online comment viewpoint extraction method Download PDFInfo
- Publication number
- CN110837729A CN110837729A CN201911062773.1A CN201911062773A CN110837729A CN 110837729 A CN110837729 A CN 110837729A CN 201911062773 A CN201911062773 A CN 201911062773A CN 110837729 A CN110837729 A CN 110837729A
- Authority
- CN
- China
- Prior art keywords
- sentences
- keywords
- online comment
- analyzed
- dependency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of information extraction, and discloses an online comment viewpoint extraction method based on dependency syntax analysis, which comprises the following steps: s1, designing a network data extractor, extracting all online comment viewpoints in a webpage through the network data extractor, and obtaining a data library of the online comment viewpoints after induction and arrangement; s2, extracting sentences based on the required keywords in the data library as sentences to be analyzed; s3, acquiring a keyword target of the sentence to be analyzed in S2, a, taking the first two words in the sentence to be analyzed as pre-keywords, judging whether the pre-keywords are central words or not, if so, determining specific dependency information between the two pre-keywords by adopting a pre-established prediction model, and determining the specific dependency relationship between the two pre-keywords. The online comment viewpoint extraction method based on dependency syntax analysis has the advantage that extraction efficiency is greatly improved.
Description
Technical Field
The invention relates to the technical field of information extraction, in particular to an online comment viewpoint extraction method based on dependency syntax analysis.
Background
With the development and popularity of the Internet, the Web has changed the way consumers feed back opinions to a great extent. Today, product users can publish views of product performance through merchants' websites, web forums, BBSs, and blogs. These online reviews provide a sufficient and valuable source of information for market research.
The merchant or the manufacturer can obtain the feedback opinions of the consumers in time by tracking the information. The potential consumers can also use the method as an important reference. However, if the user relies on manual browsing of the web page, collection of comments, and analysis of views, it is time-consuming, labor-consuming, and inefficient. The method has the main reasons that firstly, the network information is massive, and hundreds of topic stickers can be generated every day in a vigorous forum; in many cases, only a few words in the long-document reviews may be subjective reviews in which we are interested, and other sentences are either objective descriptions or express opinions unrelated to the products, which greatly affects the extraction efficiency of effective reviews.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides an online comment viewpoint extraction method based on dependency syntactic analysis, which has the advantage of greatly improving the extraction efficiency and solves the problems that network information is massive, hundreds of subject stickers can be generated every day in a vigorous forum, and in many cases, only a few sentences in long-space document comments are subjective comments which are interesting to people, and other sentences are either objective descriptions or express opinions irrelevant to products, so that the extraction efficiency of effective comments is influenced very much.
(II) technical scheme
In order to achieve the aim of greatly improving the extraction efficiency, the invention provides the following technical scheme: an online comment viewpoint extraction method based on dependency syntax analysis comprises the following steps:
s1, designing a network data extractor, extracting all online comment viewpoints in a webpage through the network data extractor, and obtaining a data library of the online comment viewpoints after induction and arrangement;
s2, extracting sentences based on the required keywords in the data library as sentences to be analyzed;
s3, obtaining the keyword target of the sentence to be analyzed in S2
a. Taking the first two words in the sentence to be analyzed as the pre-keywords, judging whether the pre-keywords are both the central words, if so, determining specific dependency information between the two pre-keywords by adopting a pre-established prediction model, and determining the specific dependency relationship between the two pre-keywords;
b. repeating the operation means of the step a until all the sentences to be analyzed in the data library are analyzed;
s4, comparing the analyzed sentences obtained in the S3 with preset sentence relations in a sample database, calculating to obtain sentences which accord with the keyword targets and storing the sentences independently;
and S5, feeding the sentences which accord with the keyword targets and are obtained in the S4 back to the data library to extract corresponding original online comment sentences, and finally summarizing the sentences together to obtain an online comment viewpoint extraction database.
Preferably, the induction collating method in S1 is to extract all online comment viewpoints from the first page to the last page, separate multiple groups of comments by dividing with spacers, and remove the tags of the web pages and collate the web pages in one data text.
Preferably, the method for building the prediction model in the step a is as follows:
firstly, extracting keywords in a prepared sample sentence;
establishing a dependency syntax tree pointing to the keywords to obtain a backbone dependency tree;
decomposing the constructed backbone dependency tree into a whole set of backbone implementation targets;
extracting backbone feature points with corresponding features according to specific each implementation operation behavior in the whole set of backbone implementation targets;
determining the specific behavior content of the current backbone implementation target as a target class corresponding to all current characteristics;
and aiming at large batches of sample sentences, a final prediction model is obtained by using a network code learning sequence algorithm.
Preferably, in the step S2, a keyword database is first created by building a plurality of keywords for which the staff wants to obtain information, and the related range is comprehensive, and then the online review viewpoints obtained in the step S1 are compared with the content of the keyword database one by one and analyzed, so as to obtain the required key sentences.
Preferably, the database in S1 is a long text.
Preferably, in step a, the core product word of each central target word of the selected sentence to be analyzed is further extracted according to the dependency relationship between the central target words.
(III) advantageous effects
Compared with the prior art, the invention provides an online comment viewpoint extraction method based on dependency syntax analysis, which has the following beneficial effects:
1. according to the online comment viewpoint extraction method based on dependency syntax analysis, the connection relation among the keywords in the online comment viewpoint can be quickly obtained through the dependency syntax-based analysis method, the targeted topics can be quickly pointed, the useful comment viewpoint can be quickly extracted and integrated in a database, and workers do not need to select the comment viewpoint from massive network information, so that the extraction efficiency is greatly improved, and the online comment viewpoint extraction method is convenient to use.
2. According to the online comment viewpoint extraction method based on dependency syntax analysis, the extracted comment information can be preliminarily sorted through the step of induction and sorting, and various impurity information influencing subsequent extraction is directly removed, so that subsequent analysis is faster.
Drawings
Fig. 1 is a schematic flow diagram of an online comment opinion extraction method based on dependency parsing according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an online comment opinion extraction method based on dependency syntax analysis includes the following steps:
s1, designing a network data extractor, extracting all online comment viewpoints in a webpage through the network data extractor, and obtaining a data library of the online comment viewpoints after induction and arrangement;
s2, extracting sentences based on the required keywords in the data library as sentences to be analyzed;
s3, obtaining the keyword target of the sentence to be analyzed in S2
a. Taking the first two words in the sentence to be analyzed as the pre-keywords, judging whether the pre-keywords are both the central words, if so, determining specific dependency information between the two pre-keywords by adopting a pre-established prediction model, and determining the specific dependency relationship between the two pre-keywords;
b. repeating the operation means of the step a until all the sentences to be analyzed in the data library are analyzed;
s4, comparing the analyzed sentences obtained in the S3 with preset sentence relations in a sample database, calculating to obtain sentences which accord with the keyword targets and storing the sentences independently;
and S5, feeding the sentences which accord with the keyword targets and are obtained in the S4 back to the data library to extract corresponding original online comment sentences, and finally summarizing the sentences together to obtain an online comment viewpoint extraction database.
The induction and sorting method in the S1 includes the steps of firstly extracting all online comment viewpoints from the first page to the last page, dividing by using spacers to place a plurality of groups of comments separately, removing labels of the webpages, and sorting the webpages in a data text.
The construction method of the prediction model in the step a is as follows:
firstly, extracting keywords in a prepared sample sentence;
establishing a dependency syntax tree pointing to the keywords to obtain a backbone dependency tree;
decomposing the constructed backbone dependency tree into a whole set of backbone implementation targets;
extracting backbone feature points with corresponding features according to specific each implementation operation behavior in the whole set of backbone implementation targets;
determining the specific behavior content of the current backbone implementation target as a target class corresponding to all current characteristics;
and aiming at large batches of sample sentences, a final prediction model is obtained by using a network code learning sequence algorithm.
In the step S2, a keyword database is first established by building a plurality of keywords for which the staff wants to obtain information, and the related range is comprehensive, and then the online review viewpoints obtained in the step S1 are compared with the content of the keyword database one by one and analyzed, so as to obtain the required key sentences.
The database in S1 is a long text.
And b, extracting core product words of the central target words in the step a according to the dependency relationship among the central target words of the selected sentence to be analyzed.
In summary, in the online comment viewpoint extraction method based on dependency syntax analysis, all online comment viewpoints in a webpage are extracted through a network data extractor, and a data library of the online comment viewpoints is obtained after induction and arrangement; extracting sentences based on the required keywords in the data library as sentences to be analyzed; acquiring a keyword target of a sentence to be analyzed in S2, taking the first two words in the sentence to be analyzed as pre-keywords, judging whether the pre-keywords are central words or not, if so, determining specific dependency information between the two pre-keywords by adopting a pre-established prediction model, and determining the specific dependency relationship between the two pre-keywords; comparing the analyzed sentences in the step S3 with preset sentence relations in a sample database, calculating sentences meeting the keyword targets and storing the sentences alone; the sentences which accord with the keyword targets and are obtained in the S4 are fed back to a data library to extract corresponding original online comment sentences, and finally the sentences are collected together to obtain an online comment viewpoint extraction database, so that the connection relation among the keywords in the online comment viewpoint can be quickly obtained, the target topics can be quickly pointed, useful comment viewpoints can be quickly extracted and integrated into the database, workers do not need to select the sentences in massive network information, the extraction efficiency is greatly improved, the use is convenient, the extracted massive comment information can be preliminarily sorted through the step of summarizing and sorting, various impurity information which influences the follow-up extraction is directly removed, and the follow-up analysis is quicker.
It is to be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (6)
1. An online comment viewpoint extraction method based on dependency syntax analysis is characterized by comprising the following steps: the method comprises the following steps:
s1, designing a network data extractor, extracting all online comment viewpoints in a webpage through the network data extractor, and obtaining a data library of the online comment viewpoints after induction and arrangement;
s2, extracting sentences based on the required keywords in the data library as sentences to be analyzed;
s3, obtaining the keyword target of the sentence to be analyzed in S2
a. Taking the first two words in the sentence to be analyzed as the pre-keywords, judging whether the pre-keywords are both the central words, if so, determining specific dependency information between the two pre-keywords by adopting a pre-established prediction model, and determining the specific dependency relationship between the two pre-keywords;
b. repeating the operation means of the step a until all the sentences to be analyzed in the data library are analyzed;
s4, comparing the analyzed sentences obtained in the S3 with preset sentence relations in a sample database, calculating to obtain sentences which accord with the keyword targets and storing the sentences independently;
and S5, feeding the sentences which accord with the keyword targets and are obtained in the S4 back to the data library to extract corresponding original online comment sentences, and finally summarizing the sentences together to obtain an online comment viewpoint extraction database.
2. The dependency parsing-based online comment view extraction method as claimed in claim 1, wherein: the induction and sorting method in the S1 specifically comprises the steps of extracting all online comment viewpoints from the first page to the last page, dividing by using spacers to place a plurality of groups of comments separately, removing the self-carrying tags of the web pages and sorting the web pages in a data text.
3. The dependency parsing-based online comment view extraction method as claimed in claim 1, wherein: the method for building the prediction model in the step a comprises the following steps:
firstly, extracting keywords in a prepared sample sentence;
establishing a dependency syntax tree pointing to the keywords to obtain a backbone dependency tree;
decomposing the constructed backbone dependency tree into a whole set of backbone implementation targets;
extracting backbone feature points with corresponding features according to specific each implementation operation behavior in the whole set of backbone implementation targets;
determining the specific behavior content of the current backbone implementation target as a target class corresponding to all current characteristics;
and aiming at large batches of sample sentences, a final prediction model is obtained by using a network code learning sequence algorithm.
4. The dependency parsing-based online comment view extraction method as claimed in claim 1, wherein: in the step S2, a keyword database is first established by building a plurality of keywords for which the staff wants to obtain information, and the related range is comprehensive, and then the online review viewpoints obtained in the step S1 are compared with the content of the keyword database one by one and analyzed, so as to obtain the required key sentences.
5. The dependency parsing-based online comment view extraction method as claimed in claim 1, wherein: the database in S1 is a long text.
6. The dependency parsing-based online comment view extraction method as claimed in claim 1, wherein: and b, extracting core product words of the central target words in the step a according to the dependency relationship among the central target words of the selected sentence to be analyzed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911062773.1A CN110837729A (en) | 2019-11-03 | 2019-11-03 | Dependency syntax analysis-based online comment viewpoint extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911062773.1A CN110837729A (en) | 2019-11-03 | 2019-11-03 | Dependency syntax analysis-based online comment viewpoint extraction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110837729A true CN110837729A (en) | 2020-02-25 |
Family
ID=69576000
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911062773.1A Pending CN110837729A (en) | 2019-11-03 | 2019-11-03 | Dependency syntax analysis-based online comment viewpoint extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110837729A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112651211A (en) * | 2020-12-11 | 2021-04-13 | 北京大米科技有限公司 | Label information determination method, device, server and storage medium |
-
2019
- 2019-11-03 CN CN201911062773.1A patent/CN110837729A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112651211A (en) * | 2020-12-11 | 2021-04-13 | 北京大米科技有限公司 | Label information determination method, device, server and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Asghar et al. | Sentiment analysis on youtube: A brief survey | |
US10002371B1 (en) | System, method, and computer program product for searching summaries of online reviews of products | |
CN107544988B (en) | Method and device for acquiring public opinion data | |
CN102509233A (en) | User online action information-based recommendation method | |
CN104036038A (en) | News recommendation method and system | |
CN103310003A (en) | Method and system for predicting click rate of new advertisement based on click log | |
William et al. | CLICK-ID: A novel dataset for Indonesian clickbait headlines | |
CN105468649B (en) | Method and device for judging matching of objects to be displayed | |
CN103177036A (en) | Method and system for label automatic extraction | |
CN111160019A (en) | Public opinion monitoring method, device and system | |
Henrys | Importance of web scraping in e-commerce and e-marketing | |
Koumpouri et al. | Evaluation of four approaches for" sentiment analysis on movie reviews" the kaggle competition | |
Rani et al. | Study and comparision of vectorization techniques used in text classification | |
CN113392329A (en) | Content recommendation method and device, electronic equipment and storage medium | |
US11295078B2 (en) | Portfolio-based text analytics tool | |
CN105760502A (en) | Commercial quality emotional dictionary construction system based on big data text mining | |
Kim et al. | A user opinion and metadata mining scheme for predicting box office performance of movies in the social network environment | |
Guo et al. | An opinion feature extraction approach based on a multidimensional sentence analysis model | |
CN107665442B (en) | Method and device for acquiring target user | |
Song et al. | Extracting product features from online reviews for sentimental analysis | |
CN112989053A (en) | Periodical recommendation method and device | |
Kim et al. | Product recommendation system based user purchase criteria and product reviews | |
CN110837729A (en) | Dependency syntax analysis-based online comment viewpoint extraction method | |
CN113705217B (en) | Literature recommendation method and device for knowledge learning in electric power field | |
Jeong et al. | Determining the titles of Web pages using anchor text and link analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200225 |
|
WD01 | Invention patent application deemed withdrawn after publication |