CN111881694A

CN111881694A - Chapter point detection method, device, equipment and storage medium

Info

Publication number: CN111881694A
Application number: CN202010776952.8A
Authority: CN
Inventors: 李荣真; 胡阳; 付瑞吉; 王士进; 胡国平; 文皓
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2020-11-03

Abstract

The application discloses a method, a device, equipment and a storage medium for detecting the key points of sections, and the application considers the question stem information corresponding to the sections to be detected and the answer library corresponding to the question stem when the sections to be detected are subjected to key point detection, wherein the question stem defines the key points which should be expressed by the sections, and the answer library can provide rich auxiliary information. According to the method and the device, the information of the to-be-detected chapters is considered, the question stem information is further considered, the model essay is screened out based on the question stem, and the text units belonging to the key points of the chapters are determined by referring to the model essay, so that the key point detection result is more accurate.

Description

Chapter point detection method, device, equipment and storage medium

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting discourse and chapter points.

Background

The main points of the discourse represent the main content and the central thought of the discourse, are the key points and hot points of the current discourse linguistics and the analysis and research of the words, and have wide application, for example, can be used in the fields of question answering, chat robots, Chinese-English composition evaluation and the like.

With the development and maturity of artificial intelligence technology, in the traditional education field, the new generation of education informatization upgrading exploration is gradually developed, and the main investigation form of the learning condition of students is still various types of examinations at the current basic education stage in China. Under the condition, teachers bear great pressure on the modification work, particularly the modification of Chinese-English composition, and a great deal of time and energy are occupied by teachers. Therefore, the evaluation is completed by the aid of a computer, so that the workload of manual evaluation is reduced, the accuracy and objectivity of the evaluation are improved, and the method has great significance for the teaching process. The chapter main points detect, can help mr's faster, accurate evaluation a composition, can also be used to composition quality control simultaneously, for example: if there is no point of meeting the question, it is likely to be a departure composition. Therefore, a scheme capable of accurately detecting the points of sections is urgently needed in the prior art.

Disclosure of Invention

In view of the above problems, the present application is provided to provide a method, an apparatus, a device and a storage medium for detecting the essence of a chapter, so as to achieve the purpose of accurately detecting the essence of a chapter. The specific scheme is as follows:

a chapter point detection method comprises the following steps:

acquiring semantic representation of a stem corresponding to a discourse to be tested and semantic representation of each text unit in the discourse to be tested;

selecting a model essay meeting the answer key points specified by the question stem from an answer library corresponding to the question stem based on the semantic representation of the question stem;

and acquiring semantic representation of the model text, and determining text units belonging to the key points of the discourse in the discourse to be detected based on the similarity between each text unit in the discourse to be detected and the semantic representation of the model text.

Preferably, the obtaining of the semantic representation of the stem corresponding to the discourse to be tested includes:

determining sentence characteristic vectors of each sentence in the question stem;

performing attention operation on sentence feature vectors among sentences in the question stem to obtain the attention score of each sentence;

and determining a context representation vector of the question stem as a semantic representation of the question stem based on the sentence feature vector of each sentence in the question stem and the attention score thereof.

Preferably, the determining a sentence feature vector of each sentence in the stem includes:

segmenting each sentence in the question stem to obtain a segmentation result of each sentence;

determining a word embedding vector of each participle in each sentence, and determining a sentence characteristic vector of the corresponding sentence based on the word embedding vector of each participle.

Preferably, the selecting a model sentence of an answer main point meeting the question stem specification from an answer library corresponding to the question stem based on the semantic representation of the question stem includes:

for each answer chapter in the answer library, determining a sentence feature vector of each sentence in the answer chapter;

determining the association degree of each sentence and the answer key point specified by the question stem based on the sentence characteristic vector of each sentence in the answer chapters and the semantic representation of the question stem;

selecting target sentences of which the association degrees meet the set association degree conditions in the answer chapters, and forming a key point sentence set corresponding to the answer chapters by the target sentences;

and selecting the main point sentence set corresponding to the target answer chapters to form the model sentence based on the size of the main point sentence set corresponding to each answer chapter in the answer library and the set model sentence selection strategy.

Preferably, the determining the association degree of each sentence with the answer main point specified by the question stem based on the sentence feature vector of each sentence in the answer chapter and the semantic representation of the question stem includes:

and performing attention operation on the semantic representation of the question stem and the sentence characteristic vector of each sentence in the answer chapters respectively to obtain the attention score of each sentence in the answer chapters relative to the question stem, wherein the attention score is used as the association degree of the sentence and the answer key point specified by the question stem.

Preferably, the process of determining the degree of association between each sentence and the answer point specified by the question stem further comprises:

performing attention operation on sentence characteristic vectors among sentences in the answer chapters to obtain attention scores in the chapters of each sentence;

and adding the attention scores in the sections of each sentence in the answer sections and the attention scores relative to the question stem to obtain the total attention score of each sentence, wherein the total attention score is used as the association degree of the sentence and the answer key points specified by the question stem.

Preferably, the obtaining of the semantic representation of the paradigm includes:

and determining a context expression vector of the model sentence as semantic expression of the model sentence based on the sentence characteristic vector of each target sentence in the model sentence and the association degree of the sentence characteristic vector and the answer main points specified by the question stem.

Preferably, the model is multiple;

the determining the text units belonging to the main points of the discourse to be tested based on the similarity between each text unit in the discourse to be tested and the semantic representation of the model text comprises:

judging the number of similarity exceeding a set similarity threshold in the similarity between each text unit and the semantic representation of each model sentence for each text unit;

and determining whether the text unit belongs to the main points of sections according to a set voting strategy based on the number exceeding the set similarity threshold and the number of the model texts.

Preferably, the method further comprises the following steps:

and for the text unit which is determined to belong to the main point of the chapter, determining the confidence coefficient of the text unit which belongs to the main point of the chapter based on the similarity between the text unit and the semantic representations of the various norms.

A chapter and section main point detection device includes:

the semantic representation acquisition unit is used for acquiring semantic representation of a stem corresponding to a to-be-detected chapter and semantic representation of each text unit in the to-be-detected chapter;

the model sentence selecting unit is used for selecting a model sentence meeting the answer key points specified by the question stem from an answer library corresponding to the question stem based on the semantic representation of the question stem;

and the chapter main point determining unit is used for acquiring the semantic representation of the model text and determining the text units belonging to the chapter main points in the to-be-detected chapters based on the similarity between each text unit in the to-be-detected chapters and the semantic representation of the model text.

A chapter point detection device includes: a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the steps of the chapter point detection method.

A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the chapter point detection method as described above.

By means of the technical scheme, when the text to be detected is subjected to the key point detection, question stem information corresponding to the text to be detected and an answer library corresponding to the question stem are considered, wherein the question stem specifies the key point to be expressed by the text, the answer library can provide rich auxiliary information, on the basis, semantic representation of the question stem and semantic representation of each text unit in the text to be detected are obtained, then a model text meeting the key point of the answer specified by the question stem is selected from the answer library based on the semantic representation of the question stem, semantic representation of the model text is obtained, and the text unit belonging to the key point of the text is determined based on the similarity between each text unit in the text to be detected and the semantic representation of the model text. Therefore, the method and the device have the advantages that the information of the sections to be detected is considered, the question stem information is further considered, the model texts are screened out from the answer library based on the semantic representation of the question stems, and then the text units belonging to the main points of the sections are determined by referring to the model texts, so that the main point detection result is more accurate.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a schematic view illustrating a flow of chapter point detection provided in an embodiment of the present application;

FIG. 2a illustrates a stem diagram of an English composition;

FIG. 2b illustrates a schematic diagram of an English composition;

FIG. 3 is a flowchart of a chapter point state detection method according to an embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating a process for determining semantic representation of a stem according to an embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating a process of determining semantic representations of answers chapters according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a chapter point detection device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a chapter point detection device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The application provides a chapter main point detection scheme, which can accurately detect text units, such as sentences or phrases, belonging to chapter main points in a chapter to be detected.

The scheme can be realized based on a terminal with data processing capacity, and the terminal can be a mobile phone, a computer, a server, a cloud terminal and the like.

First, the overall description of the chapter point detection scheme of the present application is provided with reference to the chapter point detection flow illustrated in fig. 1.

The method and the device for detecting the chapter points are applicable to scenes needing chapter point detection, such as chapter point detection of composition answering students.

And defining the discourse needing to be subjected to the key point detection as the discourse to be detected. It can be understood that the chapters to be tested can be completed based on the corresponding question stem, and the question stem specifies the information of the key points of the answer. Meanwhile, a corresponding answer library can also exist based on the question stem, the answer library comprises a plurality of answer chapters except the chapters to be detected, and the answer chapters are also completed based on the question stem. Taking the examination scenario as an example, for a given question stem, different students can complete different answer chapters, and the answer chapters can be put into an answer library.

As shown in fig. 2a and 2b, taking english composition as an example, fig. 2a illustrates the stem of english composition, in which the gist information of the composition is specified. Fig. 2b illustrates the corresponding english composition, which can be used as the chapters to be tested, and can also be used as an answer chapter in the answer library.

According to the chapter main point detection scheme, the semantic representation of the question stem is obtained for the question stem corresponding to the chapter to be detected. Furthermore, based on the semantic representation of the question stem, a model essay meeting the answering key points specified by the question stem can be selected from the answering database. And further acquiring semantic representation of the model sentence.

For the to-be-detected chapters, the semantic representation of each text unit in the to-be-detected chapters can be obtained according to the text units as units. The text unit may be a unit such as a sentence, a word, or a phrase.

Because the model sentence meets the answer key points specified by the question stem, similarity calculation can be carried out on the semantic representation of the model sentence and the semantic representation of each text unit in the section to be detected by referring to the model sentence, so that the text units with higher similarity to the model sentence can be screened out based on the similarity calculation result, and the text units are considered to belong to the section key points, namely the key point detection result of the section to be detected is obtained.

Therefore, when the chapter main point detection scheme is used for detecting the main points of the to-be-detected chapter, the question stem information corresponding to the to-be-detected chapter and the answer library corresponding to the question stem are considered, wherein the question stem specifies the main points to be expressed by the chapter, and the answer library can provide rich auxiliary information, so that the chapter main point detection result is more accurate.

Next, as described with reference to fig. 3, the method for detecting the state of the main point of the chapter may include the following steps:

s100, obtaining semantic representation of the stem corresponding to the to-be-tested discourse and semantic representation of each text unit in the to-be-tested discourse.

Specifically, the chapters to be tested can be completed based on the corresponding question stem, and the question stem specifies the information of the key points of the answer. In this step, the semantic representation of the stem and the semantic representation of each text unit in the chapter to be tested are obtained. The text unit may be a sentence, a word, or a phrase. The semantic representation can embody semantic representation characteristics of corresponding question stems or text units.

Further optionally, the semantic representation in this embodiment may also incorporate paragraph structure information, for example, the semantic representation of the stem may include semantic feature of the stem, and position information of each sentence in the stem. Correspondingly, the semantic representation of each text unit in the to-be-tested chapters may include the semantic representation characteristics of the text unit and the position information of the text unit in the to-be-tested chapters.

There is a certain regularity in the distribution of points of chapters, such as generally at the beginning of each section. Therefore, in the embodiment, the paragraph structure information is merged into the semantic representation, so that the semantic representation information of the text unit in the question stem and the chapter to be tested is richer, and the subsequent more accurate detection of the main point is facilitated.

And S110, selecting a model sentence meeting the answer key points specified by the question stem from an answer library corresponding to the question stem based on the semantic representation of the question stem.

It can be understood that there may be a corresponding answer library based on the question stem, and the answer library includes a plurality of answers chapters other than the chapters to be tested, and the answers chapters are also completed based on the question stem.

The answer chapters in the answer library may have multiple copies, some of which contain the answer key points specified by the question stem, and some of which may not contain or only contain part of the answer key points. Because the question stem specifies the information of the answering key points, the step can refer to the semantic representation of the question stem and select the model essay of the answering key points meeting the question stem specification from the answering database.

And S120, acquiring the semantic representation of the model text, and determining text units belonging to the main points of the sections to be tested based on the similarity between each text unit in the sections to be tested and the semantic representation of the model text.

Specifically, for the obtained model essay, the semantic representation of the model essay is further obtained in the step, and further the similarity between the semantic representation of the model essay and the semantic representation of each text unit in the chapter to be tested can be calculated. It can be understood that the text unit with higher similarity represents that the text unit has higher similarity with the model text, that is, the probability that the text unit belongs to the main point of the chapter is higher, so in this step, the text unit belonging to the main point of the chapter in the to-be-detected chapter can be determined based on the similarity between each text unit and the semantic representation of the model text, that is, the main point detection result of the chapter to-be-detected chapter is obtained.

The method for detecting the key points of the chapters comprises the steps of considering question stem information corresponding to the chapters to be detected and an answer library corresponding to the question stems when the chapters to be detected are subjected to key point detection, wherein the question stems specify key points to be expressed by the chapters, the answer library can provide rich auxiliary information, on the basis, semantic representation of the question stems and semantic representation of each text unit in the chapters to be detected are obtained, then a model text meeting the key points of the answer specified by the question stems is selected from the answer library based on the semantic representation of the question stems, semantic representation of the model text is obtained, and text units belonging to the key points of the chapters are determined based on the similarity between each text unit in the chapters to be detected and the semantic representation of the model text. Therefore, the method and the device have the advantages that the information of the sections to be detected is considered, the question stem information is further considered, the model texts are screened out from the answer library based on the semantic representation of the question stems, and then the text units belonging to the main points of the sections are determined by referring to the model texts, so that the main point detection result is more accurate.

Further, the process of obtaining the semantic representation of the stem corresponding to the chapter to be tested in the step S100 is introduced. An optional implementation is provided in this embodiment, as follows:

and S1, determining sentence feature vectors of each sentence in the question stem.

It is understood that the stem may be composed of several sentences. In this step, for each sentence constituting the question stem, a sentence feature vector is determined. The sentence feature vector is capable of characterizing semantic characterization features for the sentence.

Wherein, the process of determining the sentence feature vector may comprise:

s11, performing word segmentation on each sentence in the stem to obtain a word segmentation result of each sentence.

When the words are segmented for the sentences in the question stem, the word segmentation tool can be used for segmenting the words. For example, for the question stem in the english form, a CoreNLP segmentation tool developed by stanford university can be used to segment the sentences in the question stem into words, and meanwhile, the sentence and paragraph information to which the words belong can be retained. For Chinese-form question stems, Chinese-form word-segmentation tools can be used to segment sentences in the question stem into words.

Optionally, after word segmentation, data cleaning may be performed, including: filtering stop words; removing words without distinction such as's', etc.; filtering unknown words, etc. And obtaining the final word segmentation result of the sentence after the word segmentation result of each sentence is subjected to data cleaning.

In conjunction with the flow diagram of determining semantic representation of the stem illustrated in fig. 4, the stem can obtain the segmentation result of each sentence after segmentation, and the stem illustrated in fig. 4 includes three sentences S0-S2, where each sentence includes segmentation words represented as W0-Wn.

And S12, determining a word embedding vector of each participle in each sentence, and determining a sentence characteristic vector of the corresponding sentence based on the word embedding vector of each participle.

Specifically, a word embedding vector determining module of the model may be used to operate to obtain a word embedding vector word embedding of each participle in the sentence.

As shown in FIG. 4, for the participles W0-Wn, the corresponding word embedding vectors are represented as H0-Hn, respectively.

After determining the word embedding vector for each participle in each sentence, a sentence feature vector for the corresponding sentence may be determined based on the word embedding vector for each participle. If the word embedding vector of each participle in each sentence is sent into a BilSTM network, the BilSTM network takes out the hidden layer characteristic H _ forward corresponding to the last participle in forward calculation and the hidden layer characteristic H _ backward corresponding to the first participle in backward calculation, and the characteristic vector obtained by splicing the hidden layer characteristic H _ forward and the hidden layer characteristic H _ backward is the sentence characteristic vector corresponding to the sentence.

As shown in fig. 4, after the word embedding vectors of the participles included in each of the sentences S0-S1 are sent to the BilSTM network, the sentence feature vector embedding corresponding to each of the sentences S0-S1 is obtained: SV0-SV 2.

And S2, performing attention operation on sentence feature vectors among sentences in the question stem to obtain the attention score of each sentence.

As shown in fig. 4, attention operation biantentation is performed on sentence feature vectors between sentences in the question stem, that is, attention operation is performed on sentence feature vectors of each sentence and sentence feature vectors of all other sentences, so that an attention score of each sentence is obtained.

The attention score of each sentence can be understood as the importance degree of the sentence, and if all sentences in the question stem are closely related to the current sentence, the attention score of the current sentence is high.

By calculating the attention score of each sentence, the degree of closeness of association of other sentences in the question stem to the current sentence, namely the degree of importance of the current sentence in the whole question stem, can be focused.

As shown in fig. 4, SV0 performs attention operations with SV1 and SV2 to obtain a corresponding attention score C0. Similarly, SV1 gets the corresponding attention score C1, and SV2 gets the corresponding attention score C2.

And S3, determining a context representation vector of the question stem as a semantic representation of the question stem based on the sentence feature vector of each sentence in the question stem and the attention score thereof.

Specifically, the attention score of the sentence may be used as a weight, and the sentence feature vectors of each sentence in the question stem are weighted and added, and the result is used as the context expression vector of the question stem.

As shown in fig. 4, the context representation vector of the stem is equal to:

SV0*C0+SV1*C1+SV2*C2

in the context expression vector of the question stem determined based on the method of the embodiment, the importance degree of each sentence in the question stem is considered, and the sentence with higher importance degree has greater influence on the context expression vector of the question stem, whereas the sentence with lower importance degree has less influence on the context expression vector of the question stem.

Further, in the step S100, the process of obtaining the semantic representation of each text unit in the to-be-tested chapters may refer to the step S1 to determine the sentence feature vector of each sentence in the stem. That is, each text unit in the chapter to be tested may be participated to determine a word embedding vector, and a feature vector of the text unit may be determined based on the word embedding vector of each participated word in the text unit as a semantic representation of the text unit.

In another embodiment of the present application, regarding step S110, a process of selecting a model sentence meeting the answer gist specified by the question stem from the answer library corresponding to the question stem based on the semantic representation of the question stem is described.

In an alternative embodiment, the steps may include:

and S1, determining sentence characteristic vectors of each sentence in the answer chapters for each answer chapter in the answer library.

The determination process of the sentence feature vector of the sentence in the answer chapter can refer to the determination process of the sentence feature vector of each sentence in the preceding question stem, and the determination methods of the two are the same, and are not repeated here.

In order to distinguish the representation from the sentence feature vector SV of each sentence in the question stem, the sentence feature vector of each sentence in the sentence chapter is represented as CV in the present embodiment. Taking the example that the answer chapters contain 8 sentences, the corresponding sentence feature vectors are expressed as: CV1-CV 8.

S2, determining the association degree of each sentence and the answer main point stated by the question stem based on the sentence characteristic vector of each sentence in the answer chapters and the semantic representation of the question stem.

Specifically, the sentence feature vector of each sentence in the answer chapter and the semantic representation of the question stem have been determined in the foregoing. The sentence feature vector represents information contained in the corresponding sentence, and the semantic representation of the question stem contains information of the answer point specified by the question stem, so that the association degree between each sentence and the answer point specified by the question stem can be determined based on the sentence feature vector of the sentence and the semantic representation of the question stem.

In an alternative embodiment, the semantic representation of the question stem may be performed attention operation with the sentence feature vector of each sentence in the answer chapters, to obtain an attention score of each sentence in the answer chapters relative to the question stem, as a degree of association between the sentence and the answer key point specified by the question stem.

Specifically, attention operation is respectively performed on the semantic representation of the question stem and the sentence feature vector of each sentence in the answer chapter, so that an attention score of each sentence relative to the question stem is obtained and is used for measuring the degree of closeness between each sentence and the question stem.

Further, since the question stem specifies information of the answer point, the attention score can be regarded as a degree of association between the sentence and the answer point specified by the question stem.

In another alternative embodiment, on the basis of the above calculation of the attention score of each sentence with respect to the question stem, the relevance information between the sentences in the chapters is further considered, and the attention operation is performed on the sentence feature vectors between the sentences in the answering chapters to obtain the attention score in the chapters of each sentence. The attention score in the chapters can be focused on the degree of closeness of the other sentences in the answer chapters to the current sentence, that is, the importance degree of the current sentence in the whole answer chapters.

On the basis, the intra-chapter attention scores of each sentence in the answer chapters are added with the attention scores relative to the question stem to obtain the total attention score of each sentence, and the total attention score is used as the association degree of the sentence and the answer key point specified by the question stem.

S3, selecting the target sentences of the answer chapters with the relevance degrees meeting the set relevance degree condition, and forming the essential point sentence set corresponding to the answer chapters by the target sentences.

Specifically, taking the attention score of the sentence in the above-described answer chapter (which may be the attention score of the sentence with respect to the question stem, or the total attention score of the sentence) as the degree of association between the sentence and the answer main point specified by the question stem, an attention score threshold may be preset, and for the target sentence with the attention score exceeding the attention score threshold in the answer chapter, the target sentence constitutes the main point sentence set corresponding to the answer chapter.

It is understood that for each topic piece in the topic library, a corresponding set of gist sentences can be determined.

S4, selecting the main point sentence set corresponding to the target answer chapters to form the model sentence based on the size of the main point sentence set corresponding to each answer chapter in the answer library and the set model sentence selection strategy.

It can be understood that the sizes of the main point sentence sets corresponding to different pieces of answer chapters may be different, that is, the number of target sentences contained in different main point sentence sets may be different. The method and the device can preset a model text selection strategy, for example, a key point sentence set with the key point sentence set size exceeding a set value is selected, or a topK key point sentence set is selected according to the key point sentence set size sorting, and the like.

Based on the above, for each selected essential point sentence set, each essential point sentence set forms a paradigm.

According to the model essay determination method described in the above embodiment, the model essay determined in this embodiment is not the whole of a certain answer chapter in the answer library as the model essay, but the answer chapter is denoised, only the main point sentence set therein is retained, and the essential point sentence set satisfying the condition is used as the model essay, so that the model essay is more accurate, and further, when the essential point of the to-be-detected chapter is determined according to the model essay, the essential point detection is more accurate.

Further, the process of obtaining the semantic representation of the model sentence in the step S120 is introduced.

According to the introduction of the foregoing embodiments, the paradigm determined by the present application is composed of a main point sentence set, where the main point sentence set includes a plurality of target sentences, and each target sentence determines a corresponding sentence feature vector. Furthermore, the degree of association between each target sentence and the answer point specified by the question stem, such as the attention score (which may be the attention score of the target sentence relative to the question stem, or the total attention score of the target sentence) is further determined.

Based on this, in this embodiment, a context expression vector of the model sentence can be determined as the semantic expression of the model sentence according to the sentence feature vector of each target sentence in the model sentence and the degree of association between each target sentence and the answer key point specified by the question stem.

Specifically, the degree of association between each target sentence and the answer point specified by the question stem may be used as a weight, and the sentence feature vectors of each target sentence in the model sentence may be weighted and added, and the result may be used as the context expression vector of the model sentence.

Next, a description will be given by way of a specific example.

The composition illustrated in fig. 2b is illustrated as an answer chapter, which contains 9 words: S0-S8.

Referring to fig. 5, a flow diagram illustrating a process for determining semantic representations of an answer chapter is shown.

In the manner illustrated in FIG. 5, segmentation is performed for S0-S8, respectively, and the segmentation results are denoted as W0-Wn. A word embedding vector H0-Hn for each participle is determined.

And (3) the word embedding vector of each participle in each sentence in the S0-S8 passes through a BilSTM network to obtain sentence characteristic vectors CV0-CV8 of each sentence.

The BiAttentension is performed on the sentence feature vectors among the sentences in the sentence-answering chapters, that is, each sentence is performed with the sentence feature vectors of all other sentences to obtain the intra-sentence attention scores A0-A8 of each sentence in S0-S8.

Meanwhile, the context of the topic stem determined in the above embodiment represents a vector stem context vector, and here, the stem context vector is respectively subjected to attention operations with CVs 0-CV8, so as to obtain the attention scores C0-C8 of each sentence relative to the topic stem.

To this end, we have obtained the intra-chapter attention scores A0-A8 and the attention scores C0-C8 for each sentence in S0-S8.

The intra-chapter attention scores of each sentence in S0-S8 are added to the attention score relative to the stem to obtain the total attention scores F0-F8 for each sentence.

In this embodiment, an attention score threshold may be preset, and the total attention score of each sentence is compared with the attention score threshold, so as to determine whether each sentence is a main point sentence.

For convenience of calculation, the obtained total attention score F0-F8 of each sentence may be normalized in the present embodiment. Assuming the results after normalization are shown in table 1 below:

TABLE 1

As shown in table 1, the total attention score of the four sentences S1, S3, S5 and S7 exceeds the threshold, so that the answer piece contains four punctuation sentences. A model sentence is composed of the four sentences.

Further, in order to determine the semantic representation of the model sentence, the total attention score of each of the four main sentences may be used as a weight, and the sentence feature vectors CV of each of the four main sentences are subjected to weighted addition, and the result is used as the semantic representation Chapter context vector of the model sentence:

CV1*F1+CV3*F3+CV5*F5+CV7*F7。

in another embodiment of the present application, a process of determining the text units belonging to the gist of the chapter in the chapter to be tested based on the similarity between each text unit in the chapter to be tested and the semantic representation of the model sentence in the above step S120 is described.

After the semantic representation of each text unit in the to-be-tested sections and the semantic representation of the model sentence are obtained in the above embodiment, the similarity between the semantic representations of each text unit and the model sentence can be calculated to obtain the similarity calculation result.

It should be noted that there may be a plurality of model texts, and each text unit may perform similarity calculation with each model text, so as to obtain the similarity between each text unit and each model text.

Further, whether each text unit belongs to the point of the chapter is determined based on the similarity calculation result.

An alternative way is, for example, to determine whether there is a similarity exceeding a set similarity threshold in the similarity between each text unit and the semantic representations of each model sentence, and if so, consider that the corresponding text unit belongs to the point of the chapter.

In consideration of the problem of errors, in order to make the detection of the points of the chapters more accurate, the detection of the points of the chapters may be performed in another manner in this embodiment, that is, a voting method is adopted, which is specifically as follows:

and judging the number of similarity between each text unit and the semantic representation of each model sentence, wherein the number of the similarity exceeds a set similarity threshold.

If the voting strategy is set, the number of the voting strategy exceeds the set similarity threshold value accounts for more than half of the number of the model texts, the voting is considered to be successful, namely the corresponding text unit is considered to belong to the main point of the chapter, otherwise, the voting is considered to be failed, and the corresponding text unit does not belong to the main point of the chapter.

Further optionally, for determining the text unit belonging to the main point of the chapter, the text unit may be highlighted in the chapter to be tested to be visually displayed to the user.

As still further optional, for determining the text unit belonging to the main point of the chapter, in this embodiment, the confidence level that the text unit belongs to the main point of the chapter may be further determined based on the similarity between the text unit and the semantic representations of the respective norms.

For example, the average of the similarity between the text unit and the semantic representation of each paradigm is taken as the confidence that the text unit belongs to the main point of the chapter.

The following describes the chapter point detection device provided in the embodiment of the present application, and the chapter point detection device described below and the chapter point detection method described above may be referred to in correspondence with each other.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a chapter point detection device disclosed in the embodiment of the present application.

As shown in fig. 6, the apparatus may include:

the semantic representation acquiring unit 11 is configured to acquire a semantic representation of a stem corresponding to a chapter to be tested, and a semantic representation of each text unit in the chapter to be tested;

a model sentence selecting unit 12, configured to select, based on the semantic representation of the question stem, a model sentence that meets the answer key point specified by the question stem from an answer library corresponding to the question stem;

and a chapter main point determining unit 13, configured to obtain the semantic representation of the model document, and determine a text unit belonging to a chapter main point in the to-be-detected chapter based on a similarity between each text unit in the to-be-detected chapter and the semantic representation of the model document.

Optionally, the process of acquiring the semantic representation of the stem corresponding to the discourse to be detected by the semantic representation acquiring unit may include:

Optionally, the process of determining the sentence feature vector of each sentence in the stem by the semantic representation obtaining unit may include:

Optionally, the step of selecting the model essay meeting the answer key points specified by the question stem in the answer library corresponding to the question stem based on the semantic representation of the question stem by the model essay selecting unit may include:

Optionally, the process of determining the degree of association between each sentence and the answer key point specified by the question stem by the model text selecting unit based on the sentence feature vector of each sentence in the answer chapter and the semantic representation of the question stem may include:

Optionally, the process of determining the degree of association between each sentence and the answer key point specified by the question stem based on the sentence feature vector of each sentence in the answer chapter and the semantic representation of the question stem by the model text selecting unit may further include:

Optionally, the process of acquiring the semantic representation of the paradigm by the chapter and point determining unit may include:

Optionally, the model essay may be multiple; the process of determining the text unit belonging to the main point of the discourse in the to-be-tested discourse based on the similarity between each text unit in the to-be-tested discourse and the semantic representation of the model text by the discourse main point determining unit may include:

Further optionally, the process of determining the text unit belonging to the main point of the discourse in the to-be-tested discourse based on the similarity between each text unit in the to-be-tested discourse and the semantic representation of the model text by the discourse main point determining unit may further include:

The discourse main point detection device provided by the embodiment of the application can be applied to discourse main point detection equipment, such as a terminal: mobile phones, computers, etc. Optionally, fig. 7 is a block diagram illustrating a hardware structure of the chapter point detection device, and referring to fig. 7, the hardware structure of the chapter point detection device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;

in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete mutual communication through the communication bus 4;

the processor 1 may be a central processing unit CPU, or an application specific Integrated circuit asic, or one or more Integrated circuits configured to implement embodiments of the present invention, etc.;

the memory 3 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;

wherein the memory stores a program and the processor can call the program stored in the memory, the program for:

Alternatively, the detailed function and the extended function of the program may be as described above.

Embodiments of the present application further provide a storage medium, where a program suitable for execution by a processor may be stored, where the program is configured to:

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, the embodiments may be combined as needed, and the same and similar parts may be referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A chapter point detection method is characterized by comprising the following steps:

2. The method of claim 1, wherein obtaining the semantic representation of the stem corresponding to the discourse to be examined comprises:

3. The method of claim 2, wherein determining a sentence feature vector for each sentence in the stem comprises:

4. The method according to claim 1, wherein selecting a model essay satisfying an answer point specified by the question stem from an answer library corresponding to the question stem based on the semantic representation of the question stem comprises:

5. The method of claim 4, wherein determining the degree of association of each sentence with the answer point specified by the question stem based on the sentence feature vector of each sentence in the answer chapter and the semantic representation of the question stem comprises:

6. The method of claim 5, wherein the process of determining the degree of association of each sentence with the answer point specified by the stem further comprises:

7. The method of claim 4, wherein obtaining the semantic representation of the template comprises:

8. The method of claim 1, wherein the paradigm is a plurality;

9. The method of claim 8, further comprising:

10. A chapter and section main point detection device, its characterized in that includes:

11. A chapter point check out test set, characterized by, includes: a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program to implement the steps of the chapter point detection method according to any one of claims 1 to 9.

12. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the chapter point detection method according to any one of claims 1 to 9.