CN112905766A

CN112905766A - Method for extracting core viewpoints from subjective answer text

Info

Publication number: CN112905766A
Application number: CN202110178549.XA
Authority: CN
Inventors: 封黎
Original assignee: Changsha Ranxing Information Technology Co ltd
Current assignee: Changsha Ranxing Information Technology Co ltd
Priority date: 2021-02-09
Filing date: 2021-02-09
Publication date: 2021-06-04

Abstract

The invention belongs to the technical field of online questionnaire text processing, and particularly relates to a method for extracting a core viewpoint from a subjective answer text, which comprises the following steps: s1, inputting data to obtain questionnaire titles, subjective questions and answer text data; s2, industry classification; s3: a text extraction viewpoint; s4: merging the statistical viewpoints; the invention can simplify the text information, and the user can quickly know the viewpoint of the answerer in the subjective question.

Description

Method for extracting core viewpoints from subjective answer text

Technical Field

The invention belongs to the technical field of online questionnaire text processing, and particularly relates to a method for extracting a core viewpoint from a subjective answer text.

Background

In the subjective answer text of the existing online questionnaire, the text is characterized by different lengths, complex viewpoints and no obvious rules, and a user cannot quickly and comprehensively acquire text information. The purpose of viewpoint extraction is to simplify text information and enable a user to quickly know the viewpoints of answerers in subjective questions.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a method for extracting a core viewpoint from a subjective answer text.

The technical scheme of the invention is realized as follows:

a method for extracting core opinions from a subjective answer text comprises the following steps:

s1, inputting data to obtain questionnaire titles, subjective questions and answer text data;

s2, industry classification;

s3: a text extraction viewpoint;

s4: and combining the statistical viewpoints.

Further, in the step S2, industry classification is performed according to the questionnaire titles and the subjective question questions acquired in the step S1, since the questionnaire relates to each industry, and the division of the industry enables the extraction of viewpoints to be more accurate, the industry classification adopts a method of classifying according to the questionnaire titles and the subjective question questions by rules, and the current industry is classified into catering hotels, medical health, university education, other education (including primary schools, kindergartens, student teaching, and courses), enterprise management, and other industries.

Further, the specific steps of the step S3 are as follows:

a. preprocessing the text data, and deleting some invalid texts including empty texts and texts with meaningless contents;

b. selecting a corresponding model according to the industry, carrying out sequence labeling on the texts in batches by the model, wherein the sequence labeling result is as follows: the 'dormitory air input is poor, not ventilated', and the results obtained [ 'O', 'O', 'B-ASP', 'I-ASP', 'B-OPI', 'I-OPI', 'O', 'B-OPI', 'I-OPI', 'I-OPI' ].

Model training used the Bert training sequence labeling model, and the sequence labeling type is [ 'B-OPI', 'I-OPI', 'B-ASP', 'I-ASP', 'O' ]. The model training process is to use a part of data training sequence labeled model as a pre-training model, label the pre-training model with the training sequence in different industries, and the two groups of data are not overlapped;

c. and processing the result of the sequence annotation and integrating the viewpoints. Such as: the sequence labeling results [ 'O', 'O', 'B-ASP', 'I-ASP', 'B-OPI', 'I-OPI', 'O', 'B-OPI', 'I-OPI', 'I-OPI' ], and the views integrated to obtain [ 'poor air', 'no ventilation' ].

Further, in step S4, the statistical viewpoints are merged. And performing similarity calculation on the extracted viewpoints, combining viewpoints with higher similarity, and counting the number of the viewpoints.

The scheme has the following effects:

the scheme can simplify the text information, and enables a user to quickly know the viewpoint of the answerer in the subjective question.

Drawings

Fig. 1 is a flowchart illustrating a method for extracting a core point of view from a subjective answer text according to an embodiment 1 of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that all the directional indicators (such as upper, lower, left, right, front and rear … …) in the embodiment of the present invention are only used to explain the relative position relationship between the components, the movement situation, etc. in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indicator is changed accordingly.

In addition, the descriptions related to "first", "second", etc. in the present invention are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

Example 1

As shown in fig. 1, a method for extracting a core opinion from a subjective answer text includes the following steps:

s2, industry classification;

s3: a text extraction viewpoint;

s4: and combining the statistical viewpoints.

In the step S2, industry classification is performed according to the questionnaire titles and the subjective question questions acquired in the step S1, since questionnaires relate to various industries, and division of industries can make the extraction of viewpoints more accurate, a method for industry classification is to classify according to the questionnaire titles and the subjective question questions by using rules, and the industries at present are classified into catering hotels, medical health, university education, other education (including primary schools, kindergartens, student teaching and course training), enterprise management and other industries. a. Preprocessing the text data, and deleting some invalid texts including empty texts and texts with meaningless contents;

Meanwhile, in step S4, the statistical viewpoints are merged. And performing similarity calculation on the extracted viewpoints, combining viewpoints with higher similarity, and counting the number of the viewpoints.

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims

1. A method for extracting a core viewpoint from a subjective answer text is characterized by comprising the following steps:

s2, industry classification;

s3: a text extraction viewpoint;

s4: and combining the statistical viewpoints.

2. The method for extracting core opinions from the subjective answer text as claimed in claim 1, wherein in the step S2, the categories are classified into restaurant, medical health, university education, other education, enterprise management and other industries according to the questionnaire titles and subjective question titles obtained in the step S1.

3. The method for extracting core opinions from the subjective answer text as claimed in claim 1, wherein said step S3 comprises the following steps:

b. selecting a corresponding model according to the industry, and carrying out sequence marking on the texts in batches by the model; the model training process is to use a part of data training sequence labeled model as a pre-training model, label the pre-training model with the training sequence in different industries, and the two groups of data are not overlapped;

c. and processing the result of the sequence annotation and integrating the viewpoints.

4. The method as claimed in claim 1, wherein in step S4, similarity calculation is performed on the extracted viewpoints, the viewpoints with higher similarity are merged, and the number of the viewpoints is counted.