CN110837729A - Dependency syntax analysis-based online comment viewpoint extraction method - Google Patents

Dependency syntax analysis-based online comment viewpoint extraction method Download PDF

Info

Publication number
CN110837729A
CN110837729A CN201911062773.1A CN201911062773A CN110837729A CN 110837729 A CN110837729 A CN 110837729A CN 201911062773 A CN201911062773 A CN 201911062773A CN 110837729 A CN110837729 A CN 110837729A
Authority
CN
China
Prior art keywords
sentences
keywords
online comment
analyzed
dependency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911062773.1A
Other languages
Chinese (zh)
Inventor
李孟婷
姜同强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Technology and Business University
Original Assignee
Beijing Technology and Business University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Technology and Business University filed Critical Beijing Technology and Business University
Priority to CN201911062773.1A priority Critical patent/CN110837729A/en
Publication of CN110837729A publication Critical patent/CN110837729A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of information extraction, and discloses an online comment viewpoint extraction method based on dependency syntax analysis, which comprises the following steps: s1, designing a network data extractor, extracting all online comment viewpoints in a webpage through the network data extractor, and obtaining a data library of the online comment viewpoints after induction and arrangement; s2, extracting sentences based on the required keywords in the data library as sentences to be analyzed; s3, acquiring a keyword target of the sentence to be analyzed in S2, a, taking the first two words in the sentence to be analyzed as pre-keywords, judging whether the pre-keywords are central words or not, if so, determining specific dependency information between the two pre-keywords by adopting a pre-established prediction model, and determining the specific dependency relationship between the two pre-keywords. The online comment viewpoint extraction method based on dependency syntax analysis has the advantage that extraction efficiency is greatly improved.

Description

Dependency syntax analysis-based online comment viewpoint extraction method
Technical Field
The invention relates to the technical field of information extraction, in particular to an online comment viewpoint extraction method based on dependency syntax analysis.
Background
With the development and popularity of the Internet, the Web has changed the way consumers feed back opinions to a great extent. Today, product users can publish views of product performance through merchants' websites, web forums, BBSs, and blogs. These online reviews provide a sufficient and valuable source of information for market research.
The merchant or the manufacturer can obtain the feedback opinions of the consumers in time by tracking the information. The potential consumers can also use the method as an important reference. However, if the user relies on manual browsing of the web page, collection of comments, and analysis of views, it is time-consuming, labor-consuming, and inefficient. The method has the main reasons that firstly, the network information is massive, and hundreds of topic stickers can be generated every day in a vigorous forum; in many cases, only a few words in the long-document reviews may be subjective reviews in which we are interested, and other sentences are either objective descriptions or express opinions unrelated to the products, which greatly affects the extraction efficiency of effective reviews.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides an online comment viewpoint extraction method based on dependency syntactic analysis, which has the advantage of greatly improving the extraction efficiency and solves the problems that network information is massive, hundreds of subject stickers can be generated every day in a vigorous forum, and in many cases, only a few sentences in long-space document comments are subjective comments which are interesting to people, and other sentences are either objective descriptions or express opinions irrelevant to products, so that the extraction efficiency of effective comments is influenced very much.
(II) technical scheme
In order to achieve the aim of greatly improving the extraction efficiency, the invention provides the following technical scheme: an online comment viewpoint extraction method based on dependency syntax analysis comprises the following steps:
s1, designing a network data extractor, extracting all online comment viewpoints in a webpage through the network data extractor, and obtaining a data library of the online comment viewpoints after induction and arrangement;
s2, extracting sentences based on the required keywords in the data library as sentences to be analyzed;
s3, obtaining the keyword target of the sentence to be analyzed in S2
a. Taking the first two words in the sentence to be analyzed as the pre-keywords, judging whether the pre-keywords are both the central words, if so, determining specific dependency information between the two pre-keywords by adopting a pre-established prediction model, and determining the specific dependency relationship between the two pre-keywords;
b. repeating the operation means of the step a until all the sentences to be analyzed in the data library are analyzed;
s4, comparing the analyzed sentences obtained in the S3 with preset sentence relations in a sample database, calculating to obtain sentences which accord with the keyword targets and storing the sentences independently;
and S5, feeding the sentences which accord with the keyword targets and are obtained in the S4 back to the data library to extract corresponding original online comment sentences, and finally summarizing the sentences together to obtain an online comment viewpoint extraction database.
Preferably, the induction collating method in S1 is to extract all online comment viewpoints from the first page to the last page, separate multiple groups of comments by dividing with spacers, and remove the tags of the web pages and collate the web pages in one data text.
Preferably, the method for building the prediction model in the step a is as follows:
firstly, extracting keywords in a prepared sample sentence;
establishing a dependency syntax tree pointing to the keywords to obtain a backbone dependency tree;
decomposing the constructed backbone dependency tree into a whole set of backbone implementation targets;
extracting backbone feature points with corresponding features according to specific each implementation operation behavior in the whole set of backbone implementation targets;
determining the specific behavior content of the current backbone implementation target as a target class corresponding to all current characteristics;
and aiming at large batches of sample sentences, a final prediction model is obtained by using a network code learning sequence algorithm.
Preferably, in the step S2, a keyword database is first created by building a plurality of keywords for which the staff wants to obtain information, and the related range is comprehensive, and then the online review viewpoints obtained in the step S1 are compared with the content of the keyword database one by one and analyzed, so as to obtain the required key sentences.
Preferably, the database in S1 is a long text.
Preferably, in step a, the core product word of each central target word of the selected sentence to be analyzed is further extracted according to the dependency relationship between the central target words.
(III) advantageous effects
Compared with the prior art, the invention provides an online comment viewpoint extraction method based on dependency syntax analysis, which has the following beneficial effects:
1. according to the online comment viewpoint extraction method based on dependency syntax analysis, the connection relation among the keywords in the online comment viewpoint can be quickly obtained through the dependency syntax-based analysis method, the targeted topics can be quickly pointed, the useful comment viewpoint can be quickly extracted and integrated in a database, and workers do not need to select the comment viewpoint from massive network information, so that the extraction efficiency is greatly improved, and the online comment viewpoint extraction method is convenient to use.
2. According to the online comment viewpoint extraction method based on dependency syntax analysis, the extracted comment information can be preliminarily sorted through the step of induction and sorting, and various impurity information influencing subsequent extraction is directly removed, so that subsequent analysis is faster.
Drawings
Fig. 1 is a schematic flow diagram of an online comment opinion extraction method based on dependency parsing according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an online comment opinion extraction method based on dependency syntax analysis includes the following steps:
s1, designing a network data extractor, extracting all online comment viewpoints in a webpage through the network data extractor, and obtaining a data library of the online comment viewpoints after induction and arrangement;
s2, extracting sentences based on the required keywords in the data library as sentences to be analyzed;
s3, obtaining the keyword target of the sentence to be analyzed in S2
a. Taking the first two words in the sentence to be analyzed as the pre-keywords, judging whether the pre-keywords are both the central words, if so, determining specific dependency information between the two pre-keywords by adopting a pre-established prediction model, and determining the specific dependency relationship between the two pre-keywords;
b. repeating the operation means of the step a until all the sentences to be analyzed in the data library are analyzed;
s4, comparing the analyzed sentences obtained in the S3 with preset sentence relations in a sample database, calculating to obtain sentences which accord with the keyword targets and storing the sentences independently;
and S5, feeding the sentences which accord with the keyword targets and are obtained in the S4 back to the data library to extract corresponding original online comment sentences, and finally summarizing the sentences together to obtain an online comment viewpoint extraction database.
The induction and sorting method in the S1 includes the steps of firstly extracting all online comment viewpoints from the first page to the last page, dividing by using spacers to place a plurality of groups of comments separately, removing labels of the webpages, and sorting the webpages in a data text.
The construction method of the prediction model in the step a is as follows:
firstly, extracting keywords in a prepared sample sentence;
establishing a dependency syntax tree pointing to the keywords to obtain a backbone dependency tree;
decomposing the constructed backbone dependency tree into a whole set of backbone implementation targets;
extracting backbone feature points with corresponding features according to specific each implementation operation behavior in the whole set of backbone implementation targets;
determining the specific behavior content of the current backbone implementation target as a target class corresponding to all current characteristics;
and aiming at large batches of sample sentences, a final prediction model is obtained by using a network code learning sequence algorithm.
In the step S2, a keyword database is first established by building a plurality of keywords for which the staff wants to obtain information, and the related range is comprehensive, and then the online review viewpoints obtained in the step S1 are compared with the content of the keyword database one by one and analyzed, so as to obtain the required key sentences.
The database in S1 is a long text.
And b, extracting core product words of the central target words in the step a according to the dependency relationship among the central target words of the selected sentence to be analyzed.
In summary, in the online comment viewpoint extraction method based on dependency syntax analysis, all online comment viewpoints in a webpage are extracted through a network data extractor, and a data library of the online comment viewpoints is obtained after induction and arrangement; extracting sentences based on the required keywords in the data library as sentences to be analyzed; acquiring a keyword target of a sentence to be analyzed in S2, taking the first two words in the sentence to be analyzed as pre-keywords, judging whether the pre-keywords are central words or not, if so, determining specific dependency information between the two pre-keywords by adopting a pre-established prediction model, and determining the specific dependency relationship between the two pre-keywords; comparing the analyzed sentences in the step S3 with preset sentence relations in a sample database, calculating sentences meeting the keyword targets and storing the sentences alone; the sentences which accord with the keyword targets and are obtained in the S4 are fed back to a data library to extract corresponding original online comment sentences, and finally the sentences are collected together to obtain an online comment viewpoint extraction database, so that the connection relation among the keywords in the online comment viewpoint can be quickly obtained, the target topics can be quickly pointed, useful comment viewpoints can be quickly extracted and integrated into the database, workers do not need to select the sentences in massive network information, the extraction efficiency is greatly improved, the use is convenient, the extracted massive comment information can be preliminarily sorted through the step of summarizing and sorting, various impurity information which influences the follow-up extraction is directly removed, and the follow-up analysis is quicker.
It is to be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. An online comment viewpoint extraction method based on dependency syntax analysis is characterized by comprising the following steps: the method comprises the following steps:
s1, designing a network data extractor, extracting all online comment viewpoints in a webpage through the network data extractor, and obtaining a data library of the online comment viewpoints after induction and arrangement;
s2, extracting sentences based on the required keywords in the data library as sentences to be analyzed;
s3, obtaining the keyword target of the sentence to be analyzed in S2
a. Taking the first two words in the sentence to be analyzed as the pre-keywords, judging whether the pre-keywords are both the central words, if so, determining specific dependency information between the two pre-keywords by adopting a pre-established prediction model, and determining the specific dependency relationship between the two pre-keywords;
b. repeating the operation means of the step a until all the sentences to be analyzed in the data library are analyzed;
s4, comparing the analyzed sentences obtained in the S3 with preset sentence relations in a sample database, calculating to obtain sentences which accord with the keyword targets and storing the sentences independently;
and S5, feeding the sentences which accord with the keyword targets and are obtained in the S4 back to the data library to extract corresponding original online comment sentences, and finally summarizing the sentences together to obtain an online comment viewpoint extraction database.
2. The dependency parsing-based online comment view extraction method as claimed in claim 1, wherein: the induction and sorting method in the S1 specifically comprises the steps of extracting all online comment viewpoints from the first page to the last page, dividing by using spacers to place a plurality of groups of comments separately, removing the self-carrying tags of the web pages and sorting the web pages in a data text.
3. The dependency parsing-based online comment view extraction method as claimed in claim 1, wherein: the method for building the prediction model in the step a comprises the following steps:
firstly, extracting keywords in a prepared sample sentence;
establishing a dependency syntax tree pointing to the keywords to obtain a backbone dependency tree;
decomposing the constructed backbone dependency tree into a whole set of backbone implementation targets;
extracting backbone feature points with corresponding features according to specific each implementation operation behavior in the whole set of backbone implementation targets;
determining the specific behavior content of the current backbone implementation target as a target class corresponding to all current characteristics;
and aiming at large batches of sample sentences, a final prediction model is obtained by using a network code learning sequence algorithm.
4. The dependency parsing-based online comment view extraction method as claimed in claim 1, wherein: in the step S2, a keyword database is first established by building a plurality of keywords for which the staff wants to obtain information, and the related range is comprehensive, and then the online review viewpoints obtained in the step S1 are compared with the content of the keyword database one by one and analyzed, so as to obtain the required key sentences.
5. The dependency parsing-based online comment view extraction method as claimed in claim 1, wherein: the database in S1 is a long text.
6. The dependency parsing-based online comment view extraction method as claimed in claim 1, wherein: and b, extracting core product words of the central target words in the step a according to the dependency relationship among the central target words of the selected sentence to be analyzed.
CN201911062773.1A 2019-11-03 2019-11-03 Dependency syntax analysis-based online comment viewpoint extraction method Pending CN110837729A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911062773.1A CN110837729A (en) 2019-11-03 2019-11-03 Dependency syntax analysis-based online comment viewpoint extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911062773.1A CN110837729A (en) 2019-11-03 2019-11-03 Dependency syntax analysis-based online comment viewpoint extraction method

Publications (1)

Publication Number Publication Date
CN110837729A true CN110837729A (en) 2020-02-25

Family

ID=69576000

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911062773.1A Pending CN110837729A (en) 2019-11-03 2019-11-03 Dependency syntax analysis-based online comment viewpoint extraction method

Country Status (1)

Country Link
CN (1) CN110837729A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651211A (en) * 2020-12-11 2021-04-13 北京大米科技有限公司 Label information determination method, device, server and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651211A (en) * 2020-12-11 2021-04-13 北京大米科技有限公司 Label information determination method, device, server and storage medium

Similar Documents

Publication Publication Date Title
Asghar et al. Sentiment analysis on youtube: A brief survey
US10002371B1 (en) System, method, and computer program product for searching summaries of online reviews of products
CN107544988B (en) Method and device for acquiring public opinion data
CN102509233A (en) User online action information-based recommendation method
CN104036038A (en) News recommendation method and system
CN103310003A (en) Method and system for predicting click rate of new advertisement based on click log
William et al. CLICK-ID: A novel dataset for Indonesian clickbait headlines
CN105468649B (en) Method and device for judging matching of objects to be displayed
CN103177036A (en) Method and system for label automatic extraction
CN111160019A (en) Public opinion monitoring method, device and system
Henrys Importance of web scraping in e-commerce and e-marketing
Koumpouri et al. Evaluation of four approaches for" sentiment analysis on movie reviews" the kaggle competition
Rani et al. Study and comparision of vectorization techniques used in text classification
CN113392329A (en) Content recommendation method and device, electronic equipment and storage medium
US11295078B2 (en) Portfolio-based text analytics tool
CN105760502A (en) Commercial quality emotional dictionary construction system based on big data text mining
Kim et al. A user opinion and metadata mining scheme for predicting box office performance of movies in the social network environment
Guo et al. An opinion feature extraction approach based on a multidimensional sentence analysis model
CN107665442B (en) Method and device for acquiring target user
Song et al. Extracting product features from online reviews for sentimental analysis
CN112989053A (en) Periodical recommendation method and device
Kim et al. Product recommendation system based user purchase criteria and product reviews
CN110837729A (en) Dependency syntax analysis-based online comment viewpoint extraction method
CN113705217B (en) Literature recommendation method and device for knowledge learning in electric power field
Jeong et al. Determining the titles of Web pages using anchor text and link analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200225

WD01 Invention patent application deemed withdrawn after publication