CN107783958B

CN107783958B - Target statement identification method and device

Info

Publication number: CN107783958B
Application number: CN201610792978.5A
Authority: CN
Inventors: 施亮亮; 付瑞吉; 胡国平; 宋巍; 秦兵; 刘挺
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2016-08-31
Filing date: 2016-08-31
Publication date: 2021-07-02
Anticipated expiration: 2036-08-31
Also published as: CN107783958A

Abstract

The embodiment of the invention provides a target statement identification method and a target statement identification device, wherein the method comprises the following steps: acquiring a text to be processed, wherein the text comprises one or more natural language sentences; extracting the identification features of each sentence, wherein the identification features comprise first features and/or second features, the first features are used for indicating the features of the sentences in semantic aspect, and the second features are used for indicating the features of the sentences in literal aspect; and identifying the target sentence in the text according to a pre-constructed target sentence identification model and the identification characteristics of each sentence in the text. The invention can automatically find the sentences belonging to the target sentences (such as graceful sentences), thereby greatly improving the recognition efficiency of the target sentences; meanwhile, the identification standard of the invention is based on objective characteristics and models, so that the identification result is objective, thereby avoiding the problem of subjectivity during manual identification.

Description

Target statement identification method and device

Technical Field

The invention relates to the field of natural language processing, in particular to a target sentence recognition method and device.

Background

When reading an article (e.g., the composition of a student or other text content), people often find some target sentences, such as graceful sentences, in the article for some purpose. The existing target sentence recognition method generally relies on reading the article manually and then pointing out the target sentence in the article. For example, when the teacher corrects the composition, the teacher can mark out graceful sentences in the composition and give corresponding comments, which is significant for students to improve the composition level, wherein the graceful sentences can generally refer to sentences expressing graceful, unique knowledge and the like, such as sentences using more idioms, classical sentences and the like.

However, in the process of implementing the present invention, the inventor finds that with the rapid development of information technology, the education industry also starts to step into the information era, numerous online education platforms are emerged, more and more students also start to get used to online education, and on the same online education platform, a large number of students are used as users to perform operations such as online learning, online examination and the like, and at this time, the students facing teachers are not traditional dozens of students in one class, but are tens of thousands of platform users. In this new situation, the workload of teachers is beginning to increase by several times, and especially the batch modification of composition by teachers is time-consuming and labor-consuming. Meanwhile, when the teacher changes the composition at once, the subjectivity is often large, and the judgment results of different teachers on which the target sentences are in the same composition are likely to be different, namely, the identification results completely depend on the people reading the article, which is not beneficial to the improvement of the composition level of students. Therefore, a method for efficiently and objectively identifying a target sentence is urgently needed in the industries such as online education.

Disclosure of Invention

The invention provides a target sentence recognition method and device, which are used for improving the efficiency of recognizing a target sentence in a text.

According to a first aspect of the embodiments of the present invention, there is provided a target sentence recognition method, including:

acquiring a text to be processed, wherein the text comprises one or more natural language sentences;

extracting the identification features of each sentence, wherein the identification features comprise first features and/or second features, the first features are used for indicating the features of the sentences in semantic aspect, and the second features are used for indicating the features of the sentences in literal aspect;

and identifying the target sentence in the text according to a pre-constructed target sentence identification model and the identification characteristics of each sentence in the text.

Optionally, when the identification feature includes a first feature, extracting the first feature of each sentence includes:

performing word segmentation on the current sentence;

obtaining a word vector of each word after word segmentation;

and acquiring a first characteristic of the current statement according to a word vector of each word of the current statement and a pre-constructed first identification model, wherein the first identification model sequentially comprises an LSTM-RNN layer, a pA operation layer, a weighted summation layer and an output layer.

Optionally, the obtaining the first feature of the current sentence according to the word vector of each word of the current sentence and the pre-constructed first recognition model includes:

inputting a word vector of each word of the current sentence into the LSTM-RNN layer;

taking the output of the LSTM-RNN layer as the input of the pA operation layer, and performing dot product operation on the pA operation layer by using pA vectors and the values of each node to enhance the historical information stored by each node;

then the input of the pA operation layer and the output of the pA operation layer are jointly used as the input of the weighted summation layer, and the weighted summation layer carries out weighted summation on the value of the node and the value of the node after the pA vector is enhanced;

and inputting the result of the weighted summation into the output layer, obtaining the initial probability of the current sentence belonging to the target sentence through a preset formula in the output layer, and taking the initial probability as the first characteristic of the sentence.

Optionally, the second feature comprises one or more of:

the part-of-speech distribution is used for indicating the number proportion of each part-of-speech word in the current sentence;

average word frequency, which is used for indicating the average value of the occurrence times of each word in the current sentence in all the collected texts;

the maximum word frequency and the minimum word frequency are used for indicating the maximum value and the minimum value of the occurrence times of each word in the current sentence in all the collected texts;

whether the idiom is contained;

the non-repeated word proportion is used for indicating the number proportion of the non-repeated words in the current sentence;

the repeated word type number is used for indicating the type number of repeated words in the current sentence, wherein the same type of repeated words is counted as one type.

Optionally:

extracting part-of-speech distribution of the current sentence, comprising:

counting the total word number in the current sentence, and calculating the ratio of the number of words of each part of speech in the current sentence to the total word number to obtain the part of speech distribution of the current sentence;

extracting the average word frequency of the current sentence, comprising:

respectively counting the occurrence times of each word in the current sentence in all the collected texts, and calculating the average value of the times to obtain the average word frequency of the current sentence;

extracting the maximum word frequency and the minimum word frequency of the current sentence, wherein the extracting comprises the following steps:

respectively counting the occurrence times of each word in the current sentence in all the collected texts, and selecting the maximum value and the minimum value of the times as the maximum word frequency and the minimum word frequency of the current sentence respectively;

extracting the proportion of non-repeated words of the current sentence, comprising the following steps:

respectively finding out non-repeated words in the current sentence, wherein the non-repeated words are words with different fonts, counting the total number of the non-repeated words, and taking the ratio of the total number of the non-repeated words to the total number of words of the current sentence as the ratio of the non-repeated words of the current sentence;

extracting the number of repeated word types of the current sentence, comprising the following steps:

and respectively finding repeated words in the current sentence, wherein the repeated words are words with the same font, and the type number of the repeated words in the current sentence is used as the type number of the repeated words, wherein the same type of the repeated words is counted as one type.

Optionally, the identifying the target sentence in the text according to the pre-established target sentence identification model and the identification feature of each sentence in the text includes:

taking the recognition features of the current sentence as the input of the target sentence recognition model;

receiving an output of the target sentence recognition model, wherein the output is a probability that the current sentence belongs to the target sentence;

and when the probability is greater than a preset threshold value, determining that the current statement belongs to the target statement.

Optionally, after the target sentence in the text is identified, the method further includes:

and marking the target sentence in the text by using a preset mode.

According to a second aspect of the embodiments of the present invention, there is provided a target sentence recognition apparatus, the apparatus including:

the input module is used for acquiring a text to be processed, wherein the text comprises one or more natural language sentences;

the characteristic extraction module is used for extracting the identification characteristic of each statement, wherein the identification characteristic comprises a first characteristic and/or a second characteristic, the first characteristic is used for indicating the characteristic of the statement in the aspect of semantics, and the second characteristic is used for indicating the characteristic of the statement in the aspect of characters;

and the identification module is used for identifying the target sentence in the text according to a pre-constructed target sentence identification model and the identification characteristics of each sentence in the text.

performing word segmentation on the current sentence;

obtaining a word vector of each word after word segmentation;

Optionally, when obtaining the first feature of the current sentence according to the word vector of each word of the current sentence and the pre-constructed first recognition model, the method includes:

Optionally, the second feature comprises one or more of:

whether the idiom is contained;

Optionally:

extracting part-of-speech distribution of the current sentence, comprising:

extracting the average word frequency of the current sentence, comprising:

Optionally, the identification module is configured to:

Optionally, the apparatus further comprises:

and the marking module is used for marking the target sentence in the text in a preset mode.

The technical scheme provided by the embodiment of the invention can have the following beneficial effects:

the method identifies each natural language sentence in the text according to the semantic features and/or the literal features of the sentence and the target sentence identification model constructed in advance through training, so that the sentences belonging to the target sentences (such as graceful sentences) can be automatically found, and the identification efficiency of the target sentences is greatly improved; meanwhile, the identification standard of the invention is based on objective characteristics and models, so that the identification result is objective, thereby avoiding the problem of subjectivity during manual identification.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise. Furthermore, these descriptions should not be construed as limiting the embodiments, wherein elements having the same reference number designation are identified as similar elements throughout the figures, and the drawings are not to scale unless otherwise specified.

FIG. 1 is a flowchart illustrating a target sentence recognition method according to an exemplary embodiment of the present invention;

FIG. 2 is a flowchart illustrating a target sentence recognition method according to an exemplary embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a first recognition model shown in accordance with an exemplary embodiment of the present invention;

FIG. 4 is a flowchart illustrating a target sentence recognition method according to an exemplary embodiment of the present invention;

FIG. 5 is a flowchart illustrating a target sentence recognition method according to an exemplary embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating a target sentence recognition apparatus according to an exemplary embodiment of the present invention;

fig. 7 is a schematic diagram illustrating a target sentence recognition apparatus according to an exemplary embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Fig. 1 is a flowchart illustrating a target sentence recognition method according to an exemplary embodiment of the present invention. The method can be used for terminals such as mobile phones and computers, servers and the like.

Referring to fig. 1, the method may include:

step S101, a text to be processed is obtained, wherein the text comprises one or more natural language sentences.

For example, student compositions and the like may be received as pending text. In the present invention, a natural language sentence may be simply referred to as a sentence, or may be colloquially referred to as a sentence. The text may be split into sentences according to punctuations in the text, that is, contents ending with periods, question marks, exclamation marks, ellipses, etc. are taken as a sentence.

Step S102, extracting the identification characteristics of each statement, wherein the identification characteristics comprise a first characteristic and/or a second characteristic, the first characteristic is used for indicating the characteristic of the statement in the aspect of semantics, and the second characteristic is used for indicating the characteristic of the statement in the aspect of characters.

The first feature and the second feature can describe the sentence from two different perspectives of semantics and words respectively. Where used, an identified feature of a sentence may comprise the first feature or the second feature, or a combination of the first feature and the second feature. The embodiment is not limited to the specific content of the first feature and the second feature, and those skilled in the art can design these features according to different needs and different scenarios, and these designs can be used herein without departing from the spirit and scope of the present invention.

Step S103, identifying the target sentence in the text according to a pre-constructed target sentence identification model and the identification characteristics of each sentence in the text.

For example, a large amount of text may be collected in advance and manually labeled, so as to serve as a training sample, and the target sentence recognition model may be constructed in advance through training. When the sentence recognition model is used, the recognition features of a sentence are input into the target sentence recognition model, so that whether the sentence belongs to the target sentence or not is judged according to the output. For example, the output may be a probability that the sentence belongs to the target sentence, and for a scenario of a graceful sentence, the probability may be referred to as a graceful degree of the sentence.

In the embodiment, each natural language sentence in the text is identified according to the semantic features and/or the literal features of the sentence and the target sentence identification model pre-constructed through training, so that the sentences belonging to the target sentence (for example, graceful sentence) can be automatically found, and the identification efficiency of the target sentence is greatly improved; meanwhile, the identification standard of the invention is based on objective characteristics and models, so that the identification result is objective, thereby avoiding the problem of subjectivity during manual identification.

Referring to fig. 2, in this embodiment or some other embodiments of the present invention, when the identification feature includes a first feature, extracting the first feature of each sentence may include:

step S201, performing word segmentation on the current sentence.

The embodiment is not limited to a specific word segmentation technique, and for example, a conditional random field method may be used to segment a text.

Step S202, obtaining word vectors of each word after word segmentation.

For example, word vectors for each word may be trained using the word2vec method.

For a sentence, its word vector may be represented as (w1, w 2.. wn).

Step S203, acquiring a first characteristic of the current statement according to a word vector of each word of the current statement and a pre-constructed first identification model, wherein the first identification model sequentially comprises an LSTM-RNN layer, a pA operation layer, a weighted summation layer and an output layer. Wherein RNN is a recurrent neural network, LSTM is Long-Short Term Memory.

As an example, see fig. 3, fig. 3 is an exemplary structure of the first recognition model, which may include an LSTM-RNN layer, a pa (pseudo-attention) operation layer, a weighted sum (weighted sum) layer, and an output layer.

As an example, the obtaining the first feature of the current sentence according to the word vector of each word of the current sentence and the pre-constructed first recognition model may specifically include:

i) the word vector for each word of the current sentence is input into the LSTM-RNN layer.

Taking a word vector (w1, w 2.. wn) of a statement as the input of an LSTM-RNN layer, coding the current statement through the LSTM-RNN layer, and storing the historical information of each word in the coding process to obtain the value h of the t-th node of the LSTM-RNN layer_tIs h_t＝LSTM(w_t,h_t-1) Where LSTM () is a function encoding the input word vector, h_t-1The value of the t-1 th node is the historical information of the t-1 th node. LSTM-RNN belongs to the prior art and is not described in detail herein.

ii) taking the output of the LSTM-RNN layer as the input of the pA operation layer, and performing dot product operation on the pA operation layer by using pA vectors and the values of each node so as to enhance the historical information stored by each node.

The output of the LSTM-RNN layer is the input of the pA operation layer. Since the nodes are dot-product operated using the pA vector, they are called pA operation layer. By enhancing the history information stored in each node, the occurrence of the history information can be preventedThe historical information of the nodes is degraded with the time. Obtaining the value alpha of the enhanced t-th node_tIs alpha_t＝dot(h_tAnd a), wherein dot () is a dot product operation function, a is an element of a pA vector, and the pA vector is a model parameter, and specific values thereof can be obtained through training of a large amount of text data. In addition, the nodes belong to the prior art in the field of neural networks and the like, and the description of the invention is omitted.

And iii) taking the input of the pA operation layer and the output of the pA operation layer as the input of the weighted summation layer, and carrying out weighted summation on the value of the node and the value of the node after the pA vector is enhanced by the weighted summation layer.

Before specific weighted summation, the values of the nodes after pA vector enhancement can be normalized to obtain the normalized value beta of the tth node_tIs composed of

For beta again_tAnd node value h_tCarrying out weighted summation to obtain h,

iv) inputting the result of the weighted summation into the output layer, obtaining the initial probability of the current statement belonging to the target statement through a preset formula in the output layer, and taking the initial probability as the first characteristic of the statement.

As an example, the preset formula may be p ═ sigmoid (W × h + b), where p is output, and W and b are model parameters, and specific values thereof may be obtained through training of a large amount of text data.

Of course, in other embodiments of the present invention, the first recognition model may also use other model descriptions, such as cnn (volumetric neural networks) or LSTM (Long-Short Term Memory). Or respectively describing the first recognition models by using different neural network models, respectively obtaining the first features of the current sentence, and then taking the plurality of first features together as the first features of the current sentence.

In this embodiment or some other embodiments of the invention, the second feature may include one or more of the following:

1) the part-of-speech distribution is used for indicating the number proportion of each part-of-speech word in the current sentence;

in specific implementation, extracting part-of-speech distribution of the current sentence may include:

counting the total word number in the current sentence, and calculating the ratio of the number of words of each part of speech (such as nouns, verbs, adjectives, adverbs, conjunctions and the like) in the current sentence to the total word number to obtain the part of speech distribution of the current sentence.

For example, if the current sentence is "a word starts to be drilled out from the ground surreptitiously," a word is segmented to obtain "a little/adjective word grass/a noun starts \ a verb steals/an adverb, other words are drilled out from \ other words, a noun/other words, a verb/a verb", the total number of words in the current sentence is 10, wherein there are 2 nouns, 3 verbs, 1 adjective, 1 adverb, 0 conjunctive word, and 3 other words, then the parts of speech distribution of the noun, verb, adjective, adverb, conjunctive word, and other words in the sentence is: 0.2,0.3,0.1,0.1,0.0,0.3.

2) Average word frequency, which is used for indicating the average value of the occurrence times of each word in the current sentence in all the collected texts;

in specific implementation, extracting the average word frequency of the current sentence may include:

and respectively counting the occurrence times of each word in the current sentence in all the collected texts, and calculating the average value of the times to obtain the average word frequency of the current sentence.

3) The maximum word frequency and the minimum word frequency are used for indicating the maximum value and the minimum value of the occurrence times of each word in the current sentence in all the collected texts;

in specific implementation, extracting the maximum word frequency and the minimum word frequency of the current sentence may include:

and respectively counting the occurrence times of each word in the current sentence in all the collected texts, and selecting the maximum value and the minimum value of the times as the maximum word frequency and the minimum word frequency of the current sentence.

4) Whether the idiom is contained;

in specific implementation, whether each word in the current sentence is an idiom or not can be sequentially detected according to a pre-constructed idiom table, if the word in the current sentence is an idiom, the current sentence is considered to contain the idiom, and if not, the current sentence is considered to not contain the idiom. Further details may be represented by 0 or 1, such as 1 indicating that the current sentence contains idioms and 0 indicating that the current sentence does not contain idioms.

5) The non-repeated word proportion is used for indicating the number proportion of the non-repeated words in the current sentence;

in specific implementation, extracting the proportion of non-duplicate words of the current sentence may include:

respectively finding out the non-repeated words in the current sentence, wherein the non-repeated words are words with different fonts, counting the total number of the non-repeated words, and taking the ratio of the total number of the non-repeated words to the total number of words of the current sentence as the ratio of the non-repeated words of the current sentence.

For example, if the current sentence is "the grass starts to burrow from the ground surreptitiously", 10 words are obtained after the word segmentation, wherein the 10 words comprise 2 identical words, namely the former "ground" and the latter "ground", and 8 different words, the ratio of the non-duplicated words in the sentence is equal to

6) The repeated word type number is used for indicating the type number of repeated words in the current sentence, wherein the same type of repeated words is counted as one type.

In specific implementation, extracting the number of repeated word types of the current sentence may include:

For example, the current sentence is "hello, welcome", where "hello" and "good" appear twice respectively, and are repeated words, and the glyphs of the two are different, so that the number of types of repeated words of the current sentence is 2.

Referring to fig. 4, in this embodiment or some other embodiments of the present invention, the recognizing the target sentence in the text according to the pre-constructed target sentence recognition model and the recognition feature of each sentence in the text may include:

step S401, using the recognition feature of the current sentence as the input of the target sentence recognition model.

Step S402, receiving the output of the target sentence recognition model, wherein the output is the probability that the current sentence belongs to the target sentence.

Step S403, when the probability is greater than a preset threshold, determining that the current sentence belongs to the target sentence.

By way of example, the target sentence recognition model may be a common classification model, such as a support vector machine model, a decision tree model, or the like.

The target sentence recognition model can be obtained by pre-training. For example, the recognition features of the sentence and the artificial label indicating whether the sentence belongs to the target sentence may be used as training samples to train and update the parameters of the model.

The artificial labeling tags can be divided into two types, that is, the current sentence is the target sentence or the current sentence is not the target sentence, if 0 or 1 is used for representing, when the label is 1, the current sentence is the target sentence, and when the label is 0, the current sentence is not the target sentence. And during specific marking, the same sentence can be respectively submitted to two marking personnel for marking, if the marking results of the two marking personnel are consistent, the marking is considered to be correct, otherwise, the current sentence can be submitted to a domain expert for marking, and the marking result of the domain expert is used as the standard. And updating the parameters of the model through the training sample, and obtaining the parameter values of the target sentence recognition model after the training is finished. The specific training process is not described in detail.

In addition, referring to fig. 5, in this embodiment or some other embodiments of the present invention, after the target sentence in the text is identified, the method may further include:

and step S104, marking the target sentence in the text in a preset mode.

For example, taking the target sentence as an elegant sentence, after identifying the elegant sentence in the article, the corresponding elegant sentence may be marked in the article, and the specific marking method is not limited in the present invention, for example, the elegant sentence may be marked by using other color fonts, bold, underline, etc., or the elegant sentence may be put into the block diagram by using the block diagram, etc.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.

Fig. 6 is a schematic diagram illustrating a target sentence recognition apparatus according to an exemplary embodiment of the present invention. The device can be used for terminals such as mobile phones and computers, servers and the like.

Referring to fig. 6, the apparatus may include:

an input module 601, configured to obtain a text to be processed, where the text includes one or more natural language sentences;

a feature extraction module 602, configured to extract an identification feature of each sentence, where the identification feature includes a first feature and/or a second feature, the first feature is used to indicate a feature of the sentence in a semantic aspect, and the second feature is used to indicate a feature of the sentence in a literal aspect;

the identifying module 603 is configured to identify a target sentence in the text according to a pre-constructed target sentence identifying model and the identifying feature of each sentence in the text.

In this embodiment or some other embodiments of the present invention, when the identification feature includes a first feature, extracting the first feature of each sentence may include:

performing word segmentation on the current sentence;

obtaining a word vector of each word after word segmentation;

In this embodiment or some other embodiments of the present invention, when obtaining the first feature of the current sentence according to the word vector of each word of the current sentence and the pre-constructed first recognition model, the obtaining may include:

whether the idiom is contained;

In this embodiment or some other embodiment of the invention:

extracting part-of-speech distributions of the current sentence may include:

extracting the average word frequency of the current sentence may include:

extracting the maximum word frequency and the minimum word frequency of the current sentence may include:

extracting the non-repeated word proportion of the current sentence may include:

extracting the number of repeated word types of the current sentence may include:

In this embodiment or some other embodiments of the present invention, the identification module may be configured to:

Referring to fig. 7, in this embodiment or some other embodiments of the present invention, the apparatus may further include:

a marking module 604, configured to mark the target sentence in the text in a preset manner.

The specific manner in which each unit \ module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A target sentence recognition method, the method comprising:

extracting the identification features of each sentence, wherein the identification features comprise first features and/or second features, the first features are used for indicating the features of the sentences in semantic aspect, the second features are used for indicating the features of the sentences in literal aspect, when the identification features comprise the first features, the first features of each sentence are extracted, and the method comprises the following steps:

performing word segmentation on the current sentence;

obtaining a word vector of each word after word segmentation;

acquiring a first characteristic of a current statement according to a word vector of each word of the current statement and a pre-constructed first identification model, wherein the first identification model sequentially comprises an LSTM-RNN layer, a pA operation layer, a weighted summation layer and an output layer, the LSTM-RNN layer is used for encoding the word vector of the current statement to obtain a corresponding node value, the output of the LSTM-RNN layer is used as the input of the pA operation layer, the pA operation layer is a structural layer which performs dot product operation by using the pA vector and the value of each node, and the pA vector is a model parameter;

2. The method of claim 1, wherein obtaining the first feature of the current sentence according to the word vector of each word of the current sentence and the pre-constructed first recognition model comprises:

3. The method of claim 1, wherein the second feature comprises one or more of:

whether the idiom is contained;

4. The method of claim 3, wherein:

extracting part-of-speech distribution of the current sentence, comprising:

extracting the average word frequency of the current sentence, comprising:

5. The method of claim 1, wherein the identifying the target sentence in the text according to the pre-constructed target sentence identification model and the identification feature of each sentence in the text comprises:

6. The method of claim 1, wherein after the identifying the target sentence in the text, the method further comprises:

and marking the target sentence in the text by using a preset mode.

7. An apparatus for recognizing a target sentence, the apparatus comprising:

a feature extraction module, configured to extract an identification feature of each sentence, where the identification feature includes a first feature and/or a second feature, the first feature is used to indicate a feature of the sentence in a semantic aspect, and the second feature is used to indicate a feature of the sentence in a literal aspect, where, when the identification feature includes the first feature, extracting the first feature of each sentence, including:

performing word segmentation on the current sentence;

obtaining a word vector of each word after word segmentation;

8. The apparatus of claim 7, wherein the obtaining the first feature of the current sentence according to the word vector of each word of the current sentence and the pre-constructed first recognition model comprises:

9. The apparatus of claim 7, wherein the second feature comprises one or more of:

whether the idiom is contained;

10. The apparatus of claim 9, wherein:

extracting part-of-speech distribution of the current sentence, comprising:

extracting the average word frequency of the current sentence, comprising:

11. The apparatus of claim 7, wherein the identification module is configured to:

12. The apparatus of claim 7, further comprising: