CN112905766A - Method for extracting core viewpoints from subjective answer text - Google Patents

Method for extracting core viewpoints from subjective answer text Download PDF

Info

Publication number
CN112905766A
CN112905766A CN202110178549.XA CN202110178549A CN112905766A CN 112905766 A CN112905766 A CN 112905766A CN 202110178549 A CN202110178549 A CN 202110178549A CN 112905766 A CN112905766 A CN 112905766A
Authority
CN
China
Prior art keywords
viewpoints
subjective
model
text
answer text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110178549.XA
Other languages
Chinese (zh)
Inventor
封黎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha Ranxing Information Technology Co ltd
Original Assignee
Changsha Ranxing Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha Ranxing Information Technology Co ltd filed Critical Changsha Ranxing Information Technology Co ltd
Priority to CN202110178549.XA priority Critical patent/CN112905766A/en
Publication of CN112905766A publication Critical patent/CN112905766A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of online questionnaire text processing, and particularly relates to a method for extracting a core viewpoint from a subjective answer text, which comprises the following steps: s1, inputting data to obtain questionnaire titles, subjective questions and answer text data; s2, industry classification; s3: a text extraction viewpoint; s4: merging the statistical viewpoints; the invention can simplify the text information, and the user can quickly know the viewpoint of the answerer in the subjective question.

Description

Method for extracting core viewpoints from subjective answer text
Technical Field
The invention belongs to the technical field of online questionnaire text processing, and particularly relates to a method for extracting a core viewpoint from a subjective answer text.
Background
In the subjective answer text of the existing online questionnaire, the text is characterized by different lengths, complex viewpoints and no obvious rules, and a user cannot quickly and comprehensively acquire text information. The purpose of viewpoint extraction is to simplify text information and enable a user to quickly know the viewpoints of answerers in subjective questions.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method for extracting a core viewpoint from a subjective answer text.
The technical scheme of the invention is realized as follows:
a method for extracting core opinions from a subjective answer text comprises the following steps:
s1, inputting data to obtain questionnaire titles, subjective questions and answer text data;
s2, industry classification;
s3: a text extraction viewpoint;
s4: and combining the statistical viewpoints.
Further, in the step S2, industry classification is performed according to the questionnaire titles and the subjective question questions acquired in the step S1, since the questionnaire relates to each industry, and the division of the industry enables the extraction of viewpoints to be more accurate, the industry classification adopts a method of classifying according to the questionnaire titles and the subjective question questions by rules, and the current industry is classified into catering hotels, medical health, university education, other education (including primary schools, kindergartens, student teaching, and courses), enterprise management, and other industries.
Further, the specific steps of the step S3 are as follows:
a. preprocessing the text data, and deleting some invalid texts including empty texts and texts with meaningless contents;
b. selecting a corresponding model according to the industry, carrying out sequence labeling on the texts in batches by the model, wherein the sequence labeling result is as follows: the 'dormitory air input is poor, not ventilated', and the results obtained [ 'O', 'O', 'B-ASP', 'I-ASP', 'B-OPI', 'I-OPI', 'O', 'B-OPI', 'I-OPI', 'I-OPI' ].
Model training used the Bert training sequence labeling model, and the sequence labeling type is [ 'B-OPI', 'I-OPI', 'B-ASP', 'I-ASP', 'O' ]. The model training process is to use a part of data training sequence labeled model as a pre-training model, label the pre-training model with the training sequence in different industries, and the two groups of data are not overlapped;
c. and processing the result of the sequence annotation and integrating the viewpoints. Such as: the sequence labeling results [ 'O', 'O', 'B-ASP', 'I-ASP', 'B-OPI', 'I-OPI', 'O', 'B-OPI', 'I-OPI', 'I-OPI' ], and the views integrated to obtain [ 'poor air', 'no ventilation' ].
Further, in step S4, the statistical viewpoints are merged. And performing similarity calculation on the extracted viewpoints, combining viewpoints with higher similarity, and counting the number of the viewpoints.
The scheme has the following effects:
the scheme can simplify the text information, and enables a user to quickly know the viewpoint of the answerer in the subjective question.
Drawings
Fig. 1 is a flowchart illustrating a method for extracting a core point of view from a subjective answer text according to an embodiment 1 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that all the directional indicators (such as upper, lower, left, right, front and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.
In addition, the descriptions related to "first", "second", etc. in the present invention are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Example 1
As shown in fig. 1, a method for extracting a core opinion from a subjective answer text includes the following steps:
s1, inputting data to obtain questionnaire titles, subjective questions and answer text data;
s2, industry classification;
s3: a text extraction viewpoint;
s4: and combining the statistical viewpoints.
In the step S2, industry classification is performed according to the questionnaire titles and the subjective question questions acquired in the step S1, since questionnaires relate to various industries, and division of industries can make the extraction of viewpoints more accurate, a method for industry classification is to classify according to the questionnaire titles and the subjective question questions by using rules, and the industries at present are classified into catering hotels, medical health, university education, other education (including primary schools, kindergartens, student teaching and course training), enterprise management and other industries. a. Preprocessing the text data, and deleting some invalid texts including empty texts and texts with meaningless contents;
b. selecting a corresponding model according to the industry, carrying out sequence labeling on the texts in batches by the model, wherein the sequence labeling result is as follows: the 'dormitory air input is poor, not ventilated', and the results obtained [ 'O', 'O', 'B-ASP', 'I-ASP', 'B-OPI', 'I-OPI', 'O', 'B-OPI', 'I-OPI', 'I-OPI' ].
Model training used the Bert training sequence labeling model, and the sequence labeling type is [ 'B-OPI', 'I-OPI', 'B-ASP', 'I-ASP', 'O' ]. The model training process is to use a part of data training sequence labeled model as a pre-training model, label the pre-training model with the training sequence in different industries, and the two groups of data are not overlapped;
c. and processing the result of the sequence annotation and integrating the viewpoints. Such as: the sequence labeling results [ 'O', 'O', 'B-ASP', 'I-ASP', 'B-OPI', 'I-OPI', 'O', 'B-OPI', 'I-OPI', 'I-OPI' ], and the views integrated to obtain [ 'poor air', 'no ventilation' ].
Meanwhile, in step S4, the statistical viewpoints are merged. And performing similarity calculation on the extracted viewpoints, combining viewpoints with higher similarity, and counting the number of the viewpoints.
The scheme can simplify the text information, and enables a user to quickly know the viewpoint of the answerer in the subjective question.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims (4)

1. A method for extracting a core viewpoint from a subjective answer text is characterized by comprising the following steps:
s1, inputting data to obtain questionnaire titles, subjective questions and answer text data;
s2, industry classification;
s3: a text extraction viewpoint;
s4: and combining the statistical viewpoints.
2. The method for extracting core opinions from the subjective answer text as claimed in claim 1, wherein in the step S2, the categories are classified into restaurant, medical health, university education, other education, enterprise management and other industries according to the questionnaire titles and subjective question titles obtained in the step S1.
3. The method for extracting core opinions from the subjective answer text as claimed in claim 1, wherein said step S3 comprises the following steps:
a. preprocessing the text data, and deleting some invalid texts including empty texts and texts with meaningless contents;
b. selecting a corresponding model according to the industry, and carrying out sequence marking on the texts in batches by the model; the model training process is to use a part of data training sequence labeled model as a pre-training model, label the pre-training model with the training sequence in different industries, and the two groups of data are not overlapped;
c. and processing the result of the sequence annotation and integrating the viewpoints.
4. The method as claimed in claim 1, wherein in step S4, similarity calculation is performed on the extracted viewpoints, the viewpoints with higher similarity are merged, and the number of the viewpoints is counted.
CN202110178549.XA 2021-02-09 2021-02-09 Method for extracting core viewpoints from subjective answer text Pending CN112905766A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110178549.XA CN112905766A (en) 2021-02-09 2021-02-09 Method for extracting core viewpoints from subjective answer text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110178549.XA CN112905766A (en) 2021-02-09 2021-02-09 Method for extracting core viewpoints from subjective answer text

Publications (1)

Publication Number Publication Date
CN112905766A true CN112905766A (en) 2021-06-04

Family

ID=76123199

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110178549.XA Pending CN112905766A (en) 2021-02-09 2021-02-09 Method for extracting core viewpoints from subjective answer text

Country Status (1)

Country Link
CN (1) CN112905766A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063497A (en) * 2014-07-04 2014-09-24 百度在线网络技术(北京)有限公司 Viewpoint processing method and device and searching method and device
CN104331394A (en) * 2014-08-29 2015-02-04 南通大学 Text classification method based on viewpoint
CN106777236A (en) * 2016-12-27 2017-05-31 北京百度网讯科技有限公司 The exhibiting method and device of the Query Result based on depth question and answer
CN108984521A (en) * 2018-06-20 2018-12-11 国家计算机网络与信息安全管理中心 Personage's viewpoint abstracting method in a kind of media event
CN109582948A (en) * 2017-09-29 2019-04-05 北京国双科技有限公司 The method and device that evaluated views extract
CN110163257A (en) * 2019-04-23 2019-08-23 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and the computer storage medium of drawing-out structure information
CN111401061A (en) * 2020-03-19 2020-07-10 昆明理工大学 Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention
JP2020177367A (en) * 2019-04-16 2020-10-29 ナレルシステム株式会社 Computer system for edge-driven collaborative ai, and program and method therefor

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063497A (en) * 2014-07-04 2014-09-24 百度在线网络技术(北京)有限公司 Viewpoint processing method and device and searching method and device
CN104331394A (en) * 2014-08-29 2015-02-04 南通大学 Text classification method based on viewpoint
CN106777236A (en) * 2016-12-27 2017-05-31 北京百度网讯科技有限公司 The exhibiting method and device of the Query Result based on depth question and answer
CN109582948A (en) * 2017-09-29 2019-04-05 北京国双科技有限公司 The method and device that evaluated views extract
CN108984521A (en) * 2018-06-20 2018-12-11 国家计算机网络与信息安全管理中心 Personage's viewpoint abstracting method in a kind of media event
JP2020177367A (en) * 2019-04-16 2020-10-29 ナレルシステム株式会社 Computer system for edge-driven collaborative ai, and program and method therefor
CN110163257A (en) * 2019-04-23 2019-08-23 百度在线网络技术(北京)有限公司 Method, apparatus, equipment and the computer storage medium of drawing-out structure information
CN111401061A (en) * 2020-03-19 2020-07-10 昆明理工大学 Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention

Similar Documents

Publication Publication Date Title
John et al. What makes a high-quality user-generated answer?
Isik et al. Effects of brand on consumer preferences: A study in Turkmenistan
KR101734728B1 (en) Method and server for providing online collaborative learning using social network service
Jalali et al. Analytical assessment process of e-learning domain research between 1980 and 2014
CN106446287A (en) Answer aggregation method and system facing crowdsourcing scene question-answering system
Wong et al. Analyzing MOOC discussion forum messages to identify cognitive learning information exchanges
US9640085B2 (en) System and method for automated content generation for enhancing learning, creativity, insights, and assessments
CN110807086A (en) Text data labeling method and device, storage medium and electronic equipment
Kwon et al. Exploring customers’ luxury consumption in restaurants: A combined method of topic modeling and three-factor theory
CN112148859A (en) Question-answer knowledge base management method, device, terminal equipment and storage medium
Liu et al. International comparisons of themes in higher education research
CN110738050A (en) Text recombination method, device and medium based on word segmentation and named entity recognition
CN109087224A (en) A method of the individual demand based on examinee carries out college entrance will recommendation and prediction
CN112434173A (en) Search content output method and device, computer equipment and readable storage medium
CN110532374A (en) The processing method and processing device of insurance information
US11134045B2 (en) Message sorting system, message sorting method, and program
CN112905766A (en) Method for extracting core viewpoints from subjective answer text
CN111144103A (en) Film review identification method and device
CN112637684B (en) Method for detecting user portrait label at smart television terminal
CN115640403A (en) Knowledge management and control method and device based on knowledge graph
CN112989217B (en) System for managing human veins
Lawson The Life of a Number: Measurement, Meaning and the Media
Jianwu et al. Artificial intelligence-enabled evaluating for computer-aided drawings (AMCAD)
US20150304269A1 (en) System and method
CN114037256A (en) Method for collecting and analyzing multi-person answer data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210604