CN112905766A - Method for extracting core viewpoints from subjective answer text - Google Patents
Method for extracting core viewpoints from subjective answer text Download PDFInfo
- Publication number
- CN112905766A CN112905766A CN202110178549.XA CN202110178549A CN112905766A CN 112905766 A CN112905766 A CN 112905766A CN 202110178549 A CN202110178549 A CN 202110178549A CN 112905766 A CN112905766 A CN 112905766A
- Authority
- CN
- China
- Prior art keywords
- viewpoints
- subjective
- model
- text
- answer text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 description 10
- 238000009423 ventilation Methods 0.000 description 2
- 230000008094 contradictory effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the technical field of online questionnaire text processing, and particularly relates to a method for extracting a core viewpoint from a subjective answer text, which comprises the following steps: s1, inputting data to obtain questionnaire titles, subjective questions and answer text data; s2, industry classification; s3: a text extraction viewpoint; s4: merging the statistical viewpoints; the invention can simplify the text information, and the user can quickly know the viewpoint of the answerer in the subjective question.
Description
Technical Field
The invention belongs to the technical field of online questionnaire text processing, and particularly relates to a method for extracting a core viewpoint from a subjective answer text.
Background
In the subjective answer text of the existing online questionnaire, the text is characterized by different lengths, complex viewpoints and no obvious rules, and a user cannot quickly and comprehensively acquire text information. The purpose of viewpoint extraction is to simplify text information and enable a user to quickly know the viewpoints of answerers in subjective questions.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a method for extracting a core viewpoint from a subjective answer text.
The technical scheme of the invention is realized as follows:
a method for extracting core opinions from a subjective answer text comprises the following steps:
s1, inputting data to obtain questionnaire titles, subjective questions and answer text data;
s2, industry classification;
s3: a text extraction viewpoint;
s4: and combining the statistical viewpoints.
Further, in the step S2, industry classification is performed according to the questionnaire titles and the subjective question questions acquired in the step S1, since the questionnaire relates to each industry, and the division of the industry enables the extraction of viewpoints to be more accurate, the industry classification adopts a method of classifying according to the questionnaire titles and the subjective question questions by rules, and the current industry is classified into catering hotels, medical health, university education, other education (including primary schools, kindergartens, student teaching, and courses), enterprise management, and other industries.
Further, the specific steps of the step S3 are as follows:
a. preprocessing the text data, and deleting some invalid texts including empty texts and texts with meaningless contents;
b. selecting a corresponding model according to the industry, carrying out sequence labeling on the texts in batches by the model, wherein the sequence labeling result is as follows: the 'dormitory air input is poor, not ventilated', and the results obtained [ 'O', 'O', 'B-ASP', 'I-ASP', 'B-OPI', 'I-OPI', 'O', 'B-OPI', 'I-OPI', 'I-OPI' ].
Model training used the Bert training sequence labeling model, and the sequence labeling type is [ 'B-OPI', 'I-OPI', 'B-ASP', 'I-ASP', 'O' ]. The model training process is to use a part of data training sequence labeled model as a pre-training model, label the pre-training model with the training sequence in different industries, and the two groups of data are not overlapped;
c. and processing the result of the sequence annotation and integrating the viewpoints. Such as: the sequence labeling results [ 'O', 'O', 'B-ASP', 'I-ASP', 'B-OPI', 'I-OPI', 'O', 'B-OPI', 'I-OPI', 'I-OPI' ], and the views integrated to obtain [ 'poor air', 'no ventilation' ].
Further, in step S4, the statistical viewpoints are merged. And performing similarity calculation on the extracted viewpoints, combining viewpoints with higher similarity, and counting the number of the viewpoints.
The scheme has the following effects:
the scheme can simplify the text information, and enables a user to quickly know the viewpoint of the answerer in the subjective question.
Drawings
Fig. 1 is a flowchart illustrating a method for extracting a core point of view from a subjective answer text according to an embodiment 1 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that all the directional indicators (such as upper, lower, left, right, front and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.
In addition, the descriptions related to "first", "second", etc. in the present invention are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Example 1
As shown in fig. 1, a method for extracting a core opinion from a subjective answer text includes the following steps:
s1, inputting data to obtain questionnaire titles, subjective questions and answer text data;
s2, industry classification;
s3: a text extraction viewpoint;
s4: and combining the statistical viewpoints.
In the step S2, industry classification is performed according to the questionnaire titles and the subjective question questions acquired in the step S1, since questionnaires relate to various industries, and division of industries can make the extraction of viewpoints more accurate, a method for industry classification is to classify according to the questionnaire titles and the subjective question questions by using rules, and the industries at present are classified into catering hotels, medical health, university education, other education (including primary schools, kindergartens, student teaching and course training), enterprise management and other industries. a. Preprocessing the text data, and deleting some invalid texts including empty texts and texts with meaningless contents;
b. selecting a corresponding model according to the industry, carrying out sequence labeling on the texts in batches by the model, wherein the sequence labeling result is as follows: the 'dormitory air input is poor, not ventilated', and the results obtained [ 'O', 'O', 'B-ASP', 'I-ASP', 'B-OPI', 'I-OPI', 'O', 'B-OPI', 'I-OPI', 'I-OPI' ].
Model training used the Bert training sequence labeling model, and the sequence labeling type is [ 'B-OPI', 'I-OPI', 'B-ASP', 'I-ASP', 'O' ]. The model training process is to use a part of data training sequence labeled model as a pre-training model, label the pre-training model with the training sequence in different industries, and the two groups of data are not overlapped;
c. and processing the result of the sequence annotation and integrating the viewpoints. Such as: the sequence labeling results [ 'O', 'O', 'B-ASP', 'I-ASP', 'B-OPI', 'I-OPI', 'O', 'B-OPI', 'I-OPI', 'I-OPI' ], and the views integrated to obtain [ 'poor air', 'no ventilation' ].
Meanwhile, in step S4, the statistical viewpoints are merged. And performing similarity calculation on the extracted viewpoints, combining viewpoints with higher similarity, and counting the number of the viewpoints.
The scheme can simplify the text information, and enables a user to quickly know the viewpoint of the answerer in the subjective question.
Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.
Claims (4)
1. A method for extracting a core viewpoint from a subjective answer text is characterized by comprising the following steps:
s1, inputting data to obtain questionnaire titles, subjective questions and answer text data;
s2, industry classification;
s3: a text extraction viewpoint;
s4: and combining the statistical viewpoints.
2. The method for extracting core opinions from the subjective answer text as claimed in claim 1, wherein in the step S2, the categories are classified into restaurant, medical health, university education, other education, enterprise management and other industries according to the questionnaire titles and subjective question titles obtained in the step S1.
3. The method for extracting core opinions from the subjective answer text as claimed in claim 1, wherein said step S3 comprises the following steps:
a. preprocessing the text data, and deleting some invalid texts including empty texts and texts with meaningless contents;
b. selecting a corresponding model according to the industry, and carrying out sequence marking on the texts in batches by the model; the model training process is to use a part of data training sequence labeled model as a pre-training model, label the pre-training model with the training sequence in different industries, and the two groups of data are not overlapped;
c. and processing the result of the sequence annotation and integrating the viewpoints.
4. The method as claimed in claim 1, wherein in step S4, similarity calculation is performed on the extracted viewpoints, the viewpoints with higher similarity are merged, and the number of the viewpoints is counted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110178549.XA CN112905766A (en) | 2021-02-09 | 2021-02-09 | Method for extracting core viewpoints from subjective answer text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110178549.XA CN112905766A (en) | 2021-02-09 | 2021-02-09 | Method for extracting core viewpoints from subjective answer text |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112905766A true CN112905766A (en) | 2021-06-04 |
Family
ID=76123199
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110178549.XA Pending CN112905766A (en) | 2021-02-09 | 2021-02-09 | Method for extracting core viewpoints from subjective answer text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112905766A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104063497A (en) * | 2014-07-04 | 2014-09-24 | 百度在线网络技术(北京)有限公司 | Viewpoint processing method and device and searching method and device |
CN104331394A (en) * | 2014-08-29 | 2015-02-04 | 南通大学 | Text classification method based on viewpoint |
CN106777236A (en) * | 2016-12-27 | 2017-05-31 | 北京百度网讯科技有限公司 | The exhibiting method and device of the Query Result based on depth question and answer |
CN108984521A (en) * | 2018-06-20 | 2018-12-11 | 国家计算机网络与信息安全管理中心 | Personage's viewpoint abstracting method in a kind of media event |
CN109582948A (en) * | 2017-09-29 | 2019-04-05 | 北京国双科技有限公司 | The method and device that evaluated views extract |
CN110163257A (en) * | 2019-04-23 | 2019-08-23 | 百度在线网络技术(北京)有限公司 | Method, apparatus, equipment and the computer storage medium of drawing-out structure information |
CN111401061A (en) * | 2020-03-19 | 2020-07-10 | 昆明理工大学 | Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention |
JP2020177367A (en) * | 2019-04-16 | 2020-10-29 | ナレルシステム株式会社 | Computer system for edge-driven collaborative ai, and program and method therefor |
-
2021
- 2021-02-09 CN CN202110178549.XA patent/CN112905766A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104063497A (en) * | 2014-07-04 | 2014-09-24 | 百度在线网络技术(北京)有限公司 | Viewpoint processing method and device and searching method and device |
CN104331394A (en) * | 2014-08-29 | 2015-02-04 | 南通大学 | Text classification method based on viewpoint |
CN106777236A (en) * | 2016-12-27 | 2017-05-31 | 北京百度网讯科技有限公司 | The exhibiting method and device of the Query Result based on depth question and answer |
CN109582948A (en) * | 2017-09-29 | 2019-04-05 | 北京国双科技有限公司 | The method and device that evaluated views extract |
CN108984521A (en) * | 2018-06-20 | 2018-12-11 | 国家计算机网络与信息安全管理中心 | Personage's viewpoint abstracting method in a kind of media event |
JP2020177367A (en) * | 2019-04-16 | 2020-10-29 | ナレルシステム株式会社 | Computer system for edge-driven collaborative ai, and program and method therefor |
CN110163257A (en) * | 2019-04-23 | 2019-08-23 | 百度在线网络技术(北京)有限公司 | Method, apparatus, equipment and the computer storage medium of drawing-out structure information |
CN111401061A (en) * | 2020-03-19 | 2020-07-10 | 昆明理工大学 | Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
John et al. | What makes a high-quality user-generated answer? | |
Isik et al. | Effects of brand on consumer preferences: A study in Turkmenistan | |
KR101734728B1 (en) | Method and server for providing online collaborative learning using social network service | |
Jalali et al. | Analytical assessment process of e-learning domain research between 1980 and 2014 | |
CN106446287A (en) | Answer aggregation method and system facing crowdsourcing scene question-answering system | |
Wong et al. | Analyzing MOOC discussion forum messages to identify cognitive learning information exchanges | |
US9640085B2 (en) | System and method for automated content generation for enhancing learning, creativity, insights, and assessments | |
CN110807086A (en) | Text data labeling method and device, storage medium and electronic equipment | |
Kwon et al. | Exploring customers’ luxury consumption in restaurants: A combined method of topic modeling and three-factor theory | |
CN112148859A (en) | Question-answer knowledge base management method, device, terminal equipment and storage medium | |
Liu et al. | International comparisons of themes in higher education research | |
CN110738050A (en) | Text recombination method, device and medium based on word segmentation and named entity recognition | |
CN109087224A (en) | A method of the individual demand based on examinee carries out college entrance will recommendation and prediction | |
CN112434173A (en) | Search content output method and device, computer equipment and readable storage medium | |
CN110532374A (en) | The processing method and processing device of insurance information | |
US11134045B2 (en) | Message sorting system, message sorting method, and program | |
CN112905766A (en) | Method for extracting core viewpoints from subjective answer text | |
CN111144103A (en) | Film review identification method and device | |
CN112637684B (en) | Method for detecting user portrait label at smart television terminal | |
CN115640403A (en) | Knowledge management and control method and device based on knowledge graph | |
CN112989217B (en) | System for managing human veins | |
Lawson | The Life of a Number: Measurement, Meaning and the Media | |
Jianwu et al. | Artificial intelligence-enabled evaluating for computer-aided drawings (AMCAD) | |
US20150304269A1 (en) | System and method | |
CN114037256A (en) | Method for collecting and analyzing multi-person answer data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210604 |