CN111898374B

CN111898374B - Text recognition method, device, storage medium and electronic equipment

Info

Publication number: CN111898374B
Application number: CN202010752677.6A
Authority: CN
Inventors: 蔡晓凤; 关俊辉; 叶礼伟; 刘萌; 李超; 卢鑫鑫; 刘晓靖; 肖世伟; 孙朝旭; 张艺博; 滕达; 付贵; 周伟强; 王静; 崔立鹏; 曹云波
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2023-11-07
Anticipated expiration: 2040-07-30
Also published as: CN111898374A

Abstract

The application provides a text recognition method, a text recognition device, a storage medium and electronic equipment, belongs to the technical field of computers, and relates to artificial intelligence and natural language processing technology. After the word vector features corresponding to sentence pairs formed by two sentences are obtained, text feature sequences corresponding to the sentence pairs are obtained according to the word vector features, and then the text feature sequences are converted into text feature vectors corresponding to the sentence pairs according to weights of word vector feature elements corresponding to each word segmentation. Because the weights of the word vector feature elements corresponding to the word segmentation can represent the importance of the word segmentation pair judging sentence pairs as the rank sentence pairs, whether the sentence pairs are rank sentences or not is determined based on the text feature vectors corresponding to the sentence pairs obtained by considering the weights of the word vector feature elements corresponding to the word segmentation, the accuracy of the recognition result can be improved, and the rank sentences in the text can be accurately recognized.

Description

Text recognition method, device, storage medium and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a text recognition method, a text recognition device, a storage medium, and an electronic device.

Background

In recent years, with the popularization of online education and online class, automatic composition correction by using natural language processing technology is becoming an urgent technology.

The ranking is a very commonly used method of the repair, and is an important component of the dimension evaluation index of the repair for the composition correction. The rank sentence is a long sentence composed of three or more sentences which are similar in structure, adjacent in position and consistent in mood. The ranking sentences are used in the writing, so that the sentences are tidier and harmonious and are more transparent, and the whole writing can be enhanced. If the rank sentences in the composition can be identified, the composition is favorably evaluated more accurately in the mining dimension.

Therefore, how to accurately identify the ranking sentences in the automatic text correction process by using the natural language processing technology is a problem to be solved.

Disclosure of Invention

In order to solve the existing technical problems, the embodiment of the application provides a text recognition method, a device, a storage medium and electronic equipment, which can accurately recognize the rank comparison in the text, thereby being beneficial to accurately recognizing the rank comparison sentence in the text.

In order to achieve the above object, the technical solution of the embodiment of the present application is as follows:

In a first aspect, an embodiment of the present application provides a text recognition method, including:

acquiring a word vector matrix corresponding to sentence pairs formed by two sentences, and extracting word vector features of the word vector matrix;

obtaining a text feature sequence corresponding to the sentence pair according to the word vector feature; the text feature sequence comprises word vector feature elements corresponding to the segmentation words contained in each sentence of the sentence pair;

converting the text feature sequence into text feature vectors corresponding to the sentence pairs according to the weights of word vector feature elements corresponding to the word segmentation; the weights of the word vector characteristic elements corresponding to the word segmentation are used for representing the importance of the word segmentation pair in judging whether the sentence pair is a rank comparison sentence pair or not;

determining sentence pair recognition results corresponding to the sentence pairs based on the text feature vectors corresponding to the sentence pairs; the sentence pair recognition result includes that the sentence pair is a rank comparison sentence pair or that the sentence pair is a non-rank comparison sentence pair.

In a second aspect, an embodiment of the present application provides a text recognition apparatus, including:

the feature extraction unit is used for obtaining a word vector matrix corresponding to sentence pairs formed by two sentences and extracting word vector features of the word vector matrix;

The feature processing unit is used for obtaining a text feature sequence corresponding to the sentence pair according to the word vector feature; the text feature sequence comprises word vector feature elements corresponding to the segmentation words contained in each sentence of the sentence pair; converting the text feature sequence into text feature vectors corresponding to the sentence pairs according to the weights of word vector feature elements corresponding to the word segmentation; the weights of the word vector characteristic elements corresponding to the word segmentation are used for representing the importance of the word segmentation pair in judging whether the sentence pair is a rank comparison sentence pair or not;

the feature recognition unit is used for determining sentence pair recognition results corresponding to the sentence pairs based on the text feature vectors corresponding to the sentence pairs; the sentence pair recognition result includes that the sentence pair is a rank comparison sentence pair or that the sentence pair is a non-rank comparison sentence pair.

In an alternative embodiment, the feature extraction unit is specifically configured to:

generating a word vector sequence corresponding to the sentence pair according to word vectors of the word segmentation contained in each sentence in the sentence pair;

generating a part-of-speech vector sequence corresponding to each sentence pair according to the part-of-speech vector of the word segmentation contained in each sentence in the sentence pair;

splicing the word vector sequences corresponding to the sentence pairs with the part-of-speech vector sequences to obtain word vector matrixes corresponding to the sentence pairs;

And inputting the word vector matrix into a feature extraction network model to obtain the word vector features of the word vector matrix.

In an alternative embodiment, the feature processing unit is specifically configured to: inputting the word vector features into a two-way long-short-term memory network model to obtain text feature sequences corresponding to the sentence pairs output by the two-way long-term memory network model; inputting the text feature sequence into a multi-head attention mechanism model to obtain text feature vectors corresponding to the sentence pairs output by the multi-head attention mechanism model; the multi-head attention mechanism model is used for converting the text feature sequence into a text feature vector according to the weights of word vector feature elements corresponding to each word segmentation;

the two-way long-short-term memory network model and the multi-head attention mechanism model are obtained by training a training sample with a category label; the class labels are used for indicating that the corresponding training samples are positive samples or negative samples, the positive samples are sentence pair samples composed of two sentences extracted from the rank sentences, and the negative samples are sentence pair samples composed of two sentences obtained randomly.

In an alternative embodiment, the feature extraction network model includes a plurality of convolution layers of different convolution kernel widths; the feature extraction unit is specifically configured to:

Respectively inputting the word vector matrix into each convolution layer to obtain a feature vector output by each convolution layer;

and splicing all the obtained feature vectors to obtain the word vector features of the word vector matrix.

In an alternative embodiment, the multi-headed attention mechanism model includes a plurality of attention subnetworks having different network parameters; the feature processing unit is specifically configured to:

and respectively inputting the text feature sequences into each attention sub-network, and splicing the outputs of all the attention sub-networks to obtain the text feature vector corresponding to the sentence pair.

In an alternative embodiment, the feature recognition unit is specifically configured to:

determining the probability that the sentence pair is a rank comparison sentence pair based on the text feature vector corresponding to the sentence pair; if the probability that the sentence pair is a rank comparison sentence pair is greater than or equal to a first threshold value, determining that the sentence pair is a rank comparison sentence pair; or,

determining the probability that the sentence pair is a non-rank comparison sentence pair based on the text feature vector corresponding to the sentence pair; and if the probability that the sentence pair is a non-rank comparison sentence pair is smaller than or equal to a second threshold value, and the sentences contained in the sentence pair meet the preset rank comparison condition, determining that the sentence pair is a rank comparison sentence pair.

and inputting the text feature vector corresponding to the sentence pair into a full connection layer, and classifying the output of the full connection layer through a classifier to obtain the probability that the sentence pair is a rank comparison sentence pair.

In an alternative embodiment, the sentence pair includes a sentence that satisfies a preset ranking condition, including some or all of the following conditions:

the number of characters or word fragments contained in each sentence in the sentence pair is greater than or equal to the set number;

the difference value of the number of characters contained in the two sentences in the sentence pair is smaller than or equal to a first set difference value;

the difference value of the number of the segmented words contained in the two sentences in the sentence pair is smaller than or equal to a second set difference value;

the matching rate of punctuation marks contained in two sentences in the sentence pair is greater than or equal to a set matching threshold;

the part-of-speech similarity of the segmented words contained in the two sentences in the sentence pair is greater than or equal to a set similarity value;

for co-occurrence words in the two sentences, the distance between the positions of each co-occurrence word in the two sentences is less than or equal to a set distance value.

In an alternative embodiment, the apparatus further comprises a data acquisition unit for:

Dividing the text to be recognized into a plurality of sentences according to the specified separator;

sequentially combining at least two adjacent sentences into a sentence pair;

the feature recognition unit is further configured to:

and extracting rank comparison sentences from the text to be identified according to the sentence pair identification result corresponding to each sentence pair.

In an alternative embodiment, the apparatus further comprises a model training unit for:

acquiring a training sample set, wherein the training sample set comprises sentence pair samples with category labels;

extracting sentence pair samples from the training sample set, and obtaining word vector characteristics of the extracted sentence pair samples;

inputting word vector features of sentence pair samples into a two-way long-short-term memory network model to be trained to obtain text feature sequences corresponding to the sentence pair samples;

inputting the text feature sequence corresponding to the sentence pair sample into a multi-head attention mechanism model to obtain a text feature vector corresponding to the sentence pair sample;

classifying text feature vectors corresponding to the sentence pair samples through a classifier to obtain classification results of the sentence pair samples;

determining a loss value according to the classification result of the sentence-to-sample and the class label of the sentence-to-sample;

and adjusting network parameters of the two-way long-short-term memory network model and the multi-head attention mechanism model according to the loss value until the loss value converges to a preset expected value, and obtaining the trained two-way long-short-term memory network model and multi-head attention mechanism model.

In an alternative embodiment, the model training unit may be further configured to:

if the sentences contained in the sentence pair samples meet the preset ranking condition, setting the class labels of the sentence pair samples as positive samples;

and if the sentences contained in the sentence pair samples do not meet the preset ranking condition, setting the category labels of the sentence pair samples as negative samples.

In a third aspect, embodiments of the present application further provide a computer readable storage medium, in which a computer program is stored, which when executed by a processor, implements the text recognition method of the first aspect.

In a fourth aspect, an embodiment of the present application further provides an electronic device, including a memory and a processor, where the memory stores a computer program executable on the processor, and when the computer program is executed by the processor, the text recognition method of the first aspect is implemented.

According to the text recognition method, the text recognition device, the storage medium and the electronic equipment, after the word vector features corresponding to the sentence pairs formed by two sentences are obtained, text feature sequences corresponding to the sentence pairs are obtained according to the word vector features, and then the text feature sequences are converted into text feature vectors corresponding to the sentence pairs according to weights of word vector feature elements corresponding to each word segmentation. Because the weights of the word vector feature elements corresponding to the word segmentation can represent the importance of the word segmentation pair judging sentence pairs as the rank sentence pairs, whether the sentence pairs are rank sentences or not is determined based on the text feature vectors corresponding to the sentence pairs obtained by considering the weights of the word vector feature elements corresponding to the word segmentation, the accuracy of the recognition result can be improved, and the rank sentences in the text can be accurately recognized.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of an application scenario of a text recognition method according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a text recognition method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating another text recognition method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a model used in a text recognition method according to an embodiment of the present application;

FIG. 5 is an interface schematic diagram of a text recognition method according to an embodiment of the present application;

FIG. 6 is an interface diagram of another text recognition method according to an embodiment of the present application;

FIG. 7 is a schematic flow chart of a training process of a network model according to an embodiment of the present application;

fig. 8 is a block diagram of a text recognition device according to an embodiment of the present application;

FIG. 9 is a block diagram of another text recognition device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the terms "comprises" and "comprising," along with their variants, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Some terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

(1) Word vector: the term Vector features may also be referred to as word Vector features, and are used to describe semantic features of terms included in the natural language text, and generally refer to a Dense Vector (Vector) or matrix form that can be understood by a machine that performs Vector conversion on terms represented by the natural language, where a word Vector is a numeric representation of terms in the natural language text in the machine.

(2) Word2Vec model: the open source word vector tool of Google can convert words into word vectors by utilizing semantic relations among words in text data, and can identify words by utilizing semantic distance relations among word vectors.

(3) LSTM (Long Short-Term Memory) model: is a time-cycled neural network commonly used to process long sequences of data, such as word vector sequences. The Bi-LSTM (Bi-directional Long Short-Term Memory network) model adopts two LSTM with different directions to process long-sequence data from two different directions, and when processing natural language text, the influence of the context environment of the text on the current word can be fully considered.

(4) Multi-head (Multi-head) attention mechanism model: including a plurality of attention subnetworks having different network parameters. Each attention sub-network can be understood as a single attention mechanism model for carrying out weight distribution on the combination of a plurality of inputs, and different weights represent different importance degrees of different words in judging the alignment.

The word "exemplary" is used hereinafter to mean "serving as an example, embodiment, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The terms "first," "second," and the like herein are used for descriptive purposes only and are not to be construed as either explicit or implicit relative importance or to indicate the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature, and in the description of embodiments of the application, unless otherwise indicated, the meaning of "a plurality" is two or more.

Embodiments of the present application relate to artificial intelligence (Artificial Intelligence, AI) and Machine Learning techniques, designed based on natural language processing (natural language processing, NLP) techniques and Machine Learning (ML) techniques in artificial intelligence.

Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. Artificial intelligence techniques mainly include computer vision techniques, natural language processing techniques, machine learning/deep learning, and other major directions.

With research and progress of artificial intelligence technology, artificial intelligence is developed in various fields such as common smart home, intelligent customer service, virtual assistant, smart speaker, smart marketing, unmanned, automatic driving, robot, smart medical, etc., and it is believed that with the development of technology, artificial intelligence will be applied in more fields and become more and more important value.

Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, and the like.

Natural language processing technology is an important direction in the fields of computer science and artificial intelligence. Various theories and methods for realizing effective communication between a person and a computer by using natural language are researched. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

Text processing is a main process in natural language processing technology and can be widely applied to various application scenes. Identifying rank sentences in text is an important part of text processing. For example, during the composition correction process, if the rank sentences in the composition can be identified, more accurate evaluation of the composition in the mining dimension is facilitated.

Based on the above, the embodiment of the application provides a text recognition method, a device, a storage medium and electronic equipment, which are beneficial to accurately recognizing the ranking sentences in the text. The text recognition method comprises the following steps: after the word vector characteristics corresponding to sentence pairs formed by two sentences are obtained, the word vector characteristics are processed by utilizing a two-way long-short-term memory network model, the relation between the front word and the rear word is described, a text characteristic sequence corresponding to the sentence pairs is obtained, the importance degree of judging whether each word pair is the alignment is determined by utilizing a multi-head attention mechanism model, the text characteristic vector corresponding to the sentence pairs is obtained, and whether the sentence pair is the alignment sentence is determined based on the text characteristic vector corresponding to the sentence pairs, so that the accuracy of the recognition result can be improved.

In order to better understand the technical solution provided by the embodiments of the present application, some simple descriptions are provided below for application scenarios applicable to the technical solution provided by the embodiments of the present application, and it should be noted that the application scenarios described below are only used to illustrate the embodiments of the present application, but not limited thereto. In the specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

Fig. 1 shows an application scenario of the text recognition method provided by the embodiment of the present application, and referring to fig. 1, the application scenario includes a plurality of terminal devices 11 and a service server 12. The terminal device 11 and the service server 12 may be connected by a wired connection or a wireless connection, and transmit data. For example, the terminal device 11 and the service server 12 may be connected by a data line or by a wired network; the terminal device 11 and the service server 12 may also be connected through a radio frequency module, a bluetooth module, or a wireless network.

The terminal device 11 may be a mobile phone, a Personal computer (Personal DigitalAssistant, PDA), a computer, a notebook, a tablet computer, or the like. For example, an online teaching class application may be installed on the terminal device 11, through which a user can learn online courses, and through which a class or post-class job may be submitted, which may be a composition or the like. The user can send text to be identified, i.e. composition to be modified, to the service server 12 via the terminal device 11. The service server 12 may receive the text to be recognized sent by each terminal device 11, and recognize and evaluate the text to be recognized. The service server 12 may be a server or a server cluster or a cloud computing center composed of a plurality of servers, or a virtualization platform, or may be a personal computer, a mainframe computer, a computer cluster, or the like.

In order to further explain the technical solution provided by the embodiments of the present application, the following details are described with reference to the accompanying drawings and the detailed description. Although embodiments of the present application provide the method operational steps shown in the following embodiments or figures, more or fewer operational steps may be included in the method, either on a routine or non-inventive basis. In steps where there is logically no necessary causal relationship, the execution order of the steps is not limited to the execution order provided by the embodiments of the present application. The methods may be performed sequentially or in parallel as shown in the embodiments or the drawings when the actual processing or the apparatus is performed.

Fig. 2 shows a text recognition method provided by the embodiment of the application, which is used for determining whether a sentence pair consisting of two sentences is a rank comparison sentence pair, wherein the rank comparison sentence pair refers to a sentence pair consisting of two sentences in a rank comparison sentence. The method may be performed by the service server 12 of fig. 1, or by a terminal device or other electronic device. Illustratively, the following describes a specific implementation procedure of the text recognition method with the service server 12 as an execution subject. As shown in fig. 2, the text recognition method includes the steps of:

Step S201, a word vector matrix corresponding to sentence pairs formed by two sentences is obtained, and word vector features of the word vector matrix are extracted.

In one embodiment, for any sentence pair consisting of two sentences, word segmentation processing may be performed on each sentence in the sentence pair, a word vector of each obtained word segment may be determined, a word vector sequence corresponding to each sentence may be generated according to the word vector of the word segment included in each sentence in the sentence pair, a word vector matrix may be determined according to the word vector sequence, and specifically, cartesian product (Cartesian product) calculation may be performed on the word vector sequences corresponding to the two sentences, so as to obtain a word vector matrix corresponding to the sentence pair. And inputting the word vector matrix into a feature extraction network model to obtain the word vector features of the word vector matrix. Where the Cartesian product calculation is used to determine the Cartesian product, also called the straight product, of two sequences X and Y, which can be expressed as X Y, the first object being a member of X and the second object being one of all possible ordered pairs of Y. For example, assuming the sequence x= { a, b }, the sequence y= {0,1,2}, the cartesian product of the two sequences is { (a, 0), (a, 1), (a, 2), (b, 0), (b, 1), (b, 2) }.

In one embodiment, for any sentence pair consisting of two sentences, word segmentation processing may be performed on each sentence in the sentence pair, and part of speech may be labeled, so as to determine a word vector and a part of speech vector of each obtained word. Generating a word vector sequence corresponding to the sentence pair according to word vectors of the word segmentation contained in each sentence in the sentence pair, generating a word part vector sequence corresponding to the sentence pair according to word part vectors of the word segmentation contained in each sentence in the sentence pair, splicing the word vector sequence corresponding to the sentence pair and the word part vector sequence to obtain a word vector matrix corresponding to the sentence pair, inputting the word vector matrix into a feature extraction network model, and obtaining word vector features of the word vector matrix.

Step S202, according to the word vector characteristics, a text characteristic sequence corresponding to the sentence pair is obtained.

Alternatively, word vector features may be input into a two-way long-short term memory network model to obtain text feature sequences corresponding to sentence pairs.

The bidirectional long-short-term memory network model comprises a plurality of hidden layers, a forward LSTM unit and a backward LSTM unit, forward state transmission and backward state transmission are respectively carried out among the plurality of hidden layers, and finally the output of each hidden layer is formed into a text feature sequence.

Step S203, converting the text feature sequence into text feature vectors corresponding to sentence pairs according to the weights of the word vector feature elements corresponding to the individual segmentation words.

The weight of the word vector characteristic element corresponding to a certain word is used for representing the importance of the word segmentation pair judging whether the sentence pair is a rank sentence pair or not.

In some embodiments, the text feature sequence may be input into a multi-headed attention mechanism model to obtain text feature vectors corresponding to sentence pairs. The multi-head attention mechanism model is used for converting the text feature sequence into the text feature vector according to the weights of the word vector feature elements corresponding to each word segmentation.

The multi-headed attentiveness mechanism model may include a plurality of attentiveness subnetworks having different network parameters, the network parameters in each attentiveness subnetwork being used to characterize the importance of a word vector feature element pair of a respective word segment from different angles to determine whether the sentence pair is a rank sentence pair. And respectively inputting the text feature sequences into each attention sub-network, and splicing the outputs of all the attention sub-networks to obtain text feature vectors corresponding to sentence pairs.

The two-way long-short-term memory network model and the multi-head attention mechanism model are obtained by training a training sample with category labels. The class labels are used for indicating that the corresponding training samples are positive samples or negative samples, wherein the positive samples are sentence pair samples consisting of two sentences extracted from the row comparison sentences, and the negative samples are sentence pair samples consisting of two sentences obtained randomly.

Step S204, determining sentence pair recognition results corresponding to the sentence pairs based on the text feature vectors corresponding to the sentence pairs.

The sentence pair recognition result comprises that the sentence pair is a rank comparison sentence pair or that the sentence pair is a non-rank comparison sentence pair.

In one embodiment, the probability that the sentence pair is a rank-comparison sentence pair may be determined based on the text feature vector corresponding to the sentence pair, and if the probability that the sentence pair is a rank-comparison sentence pair is greater than or equal to a set first threshold, the sentence pair is determined to be a rank-comparison sentence pair. If the probability of the sentence pair being a rank comparison sentence pair is smaller than a first threshold value, determining that the sentence pair is a non-rank comparison sentence pair.

In another embodiment, the probability that the sentence pair is a non-rank-comparison sentence pair may be determined based on the text feature vector corresponding to the sentence pair, and if the probability that the sentence pair is a non-rank-comparison sentence pair is less than or equal to a set second threshold, and the sentence pair includes sentences satisfying a preset rank condition, the sentence pair is determined to be a rank-comparison sentence pair.

The preset ranking condition satisfied by the sentence included in the sentence pair may include some or all of the following conditions: the number of characters or word fragments contained in each sentence in the sentence pair is greater than or equal to the set number; the difference of the number of characters contained in the two sentences in the sentence pair is smaller than or equal to a first set difference; the difference value of the number of the segmented words contained in the two sentences in the sentence pair is smaller than or equal to a second set difference value; the matching rate of punctuation marks contained in two sentences in the sentence pair is greater than or equal to a set matching threshold value; the part-of-speech similarity of the segmented words contained in the two sentences in the sentence pair is greater than or equal to a set similarity value; for co-occurrence words in each sentence, a distance between positions of the respective co-occurrence words in the two sentences is less than or equal to a set distance value. For example, the preset ranking condition may include only any one of the above conditions, or may include a plurality of conditions.

In some embodiments, the sentence pair consisting of two sentences may be sentence pairs consisting of sentences in the text to be recognized. For example, the text to be recognized may be a composition uploaded by the terminal device. The business server receives the text to be identified, divides the text to be identified into a plurality of sentences according to the specified separator, and sequentially forms at least two adjacent sentences into a sentence pair. For each sentence pair obtained, determining a sentence pair recognition result corresponding to the sentence pair by the method shown in fig. 2, extracting a ranking sentence from the text to be recognized according to the sentence pair recognition result corresponding to each sentence pair, and highlighting the obtained ranking sentence.

For better understanding, fig. 3 is a schematic diagram illustrating an implementation process of the text recognition method provided in an embodiment of the present application in a specific application, and as shown in fig. 3, the process includes the following steps:

step S301, a text to be recognized is acquired.

In an alternative embodiment, for example, in a composition correction scenario, the service server receives a composition document sent by a user through the terminal device, and processes the composition document as a text to be identified.

Step S302, dividing the text to be recognized into a plurality of sentences according to the specified separator.

Wherein the specified separator may be a semicolon "; ", comma", "or other punctuation. For example, in one embodiment, the text to be identified may be split into multiple sentences with the semicolons as separators, with text before, after, and between the semicolons as one sentence. In another embodiment, the comma is used as a separator, the text before the comma, after the comma and between the commas is used as one sentence, and the text to be recognized is divided into a plurality of sentences. In another embodiment, it may be determined whether the text to be recognized includes a semicolon; if the text to be recognized is included, dividing the text to be recognized into a plurality of sentences by taking a semicolon as a separator, and if the text to be recognized is not included, dividing the text to be recognized into a plurality of sentences by taking a comma as a separator.

Illustratively, a sentence may be divided into a plurality of sentences with a specified separator by using a re.split () function in a re module of the Python language, and the separator specified by the re.split () function is set to a semicolon when recognized with the semicolon as the separator; when recognizing with comma as a separator, the separator specified by the re.split () function is set to comma.

Step S303, two adjacent sentences are sequentially formed into a sentence pair.

In one embodiment, each sentence and an immediately following sentence may be formed into a sentence pair, and assuming that the text to be recognized contains L sentences in total, L-1 sentence pairs may be obtained in total. In another embodiment, each sentence with odd numbers and adjacent sentences with even numbers can be formed into a sentence pair, and if the text to be recognized contains L sentences in total, L/2 sentence pairs can be obtained in total.

Step S304, determining the sentence pair identification result corresponding to each obtained sentence pair.

For each sentence pair, the process shown in fig. 4 may be used to identify the sentence pair, and determine the sentence pair identification result corresponding to the sentence pair, which specifically includes the following steps:

step one, generating word vector sequences corresponding to sentence pairs according to word vectors of word segmentation contained in each sentence in the sentence pairs.

The word segmentation process can be performed on each sentence in the sentence pair to obtain a plurality of words, and each word serves as a word segmentation. For example, the Jieba word segmentation method or other general word segmentation methods may be used to perform word segmentation processing on each sentence, so as to obtain multiple words contained in each sentence, where some words may be composed of a single word, and other words may be composed of multiple words. For example, taking sentence pairs consisting of a first sentence "friend is a light" and a second sentence "friend is a fire" as an example, 3 word fragments of "friend", "is" are obtained after word segmentation processing for the first sentence "friend is a light", and 3 word fragments of "friend", "is" and "fire" are obtained after word segmentation processing for the second sentence "friend is a fire".

It should be noted that, in some embodiments, words obtained by word segmentation on sentences may include real words with actual meaning, such as nouns, verbs, adjectives, and the like, and imaginary words without actual meaning, such as prepositions, conjunctions, mood words, assisted words, and exclaments. For example, the word "in" beautiful flowers "does not have a practical meaning, and is an imaginary word, and in the field of computers, these imaginary words are called stop words, and after ignoring these stop words, the meaning of the whole text is not affected. Therefore, stop words in the words obtained by word segmentation can be removed, and the rest words with actual meaning are used as word segmentation contained in the sentence.

After obtaining the word segmentation included in each sentence, obtaining the word vector of each word segmentation. In one embodiment, the Word vector for each Word in each sentence may be determined by a Word vector recognition model, such as the Word2Vec model. The Word2Vec model may determine the Word vector for a Word segment based on the context in which the Word segment is located (i.e., other segments in the sentence that are adjacent to the Word segment). For example, word vectors of each Word segment can be obtained by inputting Word2Vec models of Word sequence "friend is lamp" included in the first sentence and Word sequence "friend is fire" included in the second sentence, wherein "SEP" is a segmenter for separating the Word segment of the first sentence from the Word segment of the second sentence. In another embodiment, the word vectors for each word in the sentence may be found in a pre-stored word vector library.

After the word vector of the word segmentation contained in each sentence in the sentence pair is obtained, the word vector of each word segmentation is sequentially arranged to form a word vector sequence corresponding to the sentence pair. The dimension of the word vector may be 100, and the maximum sequence length may be 100.

And step two, generating a part-of-speech vector sequence corresponding to the sentence pair according to the part-of-speech vector of the segmentation included in each sentence in the sentence pair.

After word segmentation is performed on each sentence in the sentence pair, the part of speech of each word segment can be determined. For example, still taking sentence pairs consisting of the first sentence "friend is a light" and the second sentence "friend is a fire" as an example, for 3 words "friend", "yes", "light" included in the first sentence, the parts of speech thereof are "n", "v", "n", respectively, where "n" represents a noun and "v" represents a verb. Similarly, the parts of speech corresponding to the 3 parts of speech in the second sentence "friend is fire" are also "n", "v", "n". In one embodiment, the part-of-speech sequence "n v n SEP n v n" of the part-of-speech of the segmented words contained in the first sentence and the second sentence is input into the Word2Vec model, so as to obtain the part-of-speech vector of each segmented Word. In another embodiment, the part-of-speech vector corresponding to the part of speech of each word in the sentence may be found in a pre-stored part-of-speech vector library.

After the part-of-speech vectors of the segmented words contained in each sentence in the sentence pair are obtained, the part-of-speech vectors of each segmented word are sequentially arranged to form a part-of-speech vector sequence corresponding to the sentence pair.

And thirdly, splicing the word vector sequences corresponding to the sentence pairs with the part-of-speech vector sequences to obtain word vector matrixes corresponding to the sentence pairs.

Assuming that both sentences in a sentence pair contain m word segments, the word vector and the part-of-speech vector corresponding to each word segment are n-dimensional vectors, taking the word vector corresponding to each word segment as one row in a word vector matrix, and taking the part-of-speech vector corresponding to each word segment as one row in the word vector matrix, wherein the obtained word vector matrix corresponding to the sentence pair is a matrix of 4m x n.

In this embodiment, the word vector matrix corresponding to the sentence pair is obtained by splicing the word vector sequence corresponding to the sentence pair and the part-of-speech vector sequence, that is, when judging whether the two sentences form a row alignment, not only the words contained in the two sentences but also the parts-of-speech of the words contained in the two sentences are considered. Because the parts of speech of the words at the same position in each sentence of the rank comparison sentences are basically the same, when judging whether the two sentences form the rank comparison, the part of speech factors of the words contained in the two sentences are considered, so that the accuracy of a judging result can be improved.

In one embodiment, word sequences corresponding to sentence pairs and part-of-speech sequences corresponding to sentence pairs are input into a word vector recognition model, and a word vector matrix corresponding to the sentence pairs can be obtained.

And step four, inputting the word vector matrix into a feature extraction network model to obtain the word vector features of the word vector matrix.

In one embodiment, the feature extraction network model may be implemented using a full convolutional neural network (Fully Convolutional Neuron Networks, FCN). The full convolutional neural network may include at least one convolutional layer and at least one pooling layer. The convolutional layers and the pooling layers may be arranged at intervals, i.e. one or more convolutional layers may be arranged between adjacent pooling layers. Each convolution layer comprises one or more convolution kernels for extracting feature information from an input word vector matrix, the input word vector matrix is traversed by the convolution kernels according to a certain step length to obtain at least one feature value, and the at least one feature value is used for forming a convolution feature vector. The pooling layer is used for performing dimension reduction processing on the convolution feature vectors output by the convolution layer, and the last pooling layer outputs word vector features corresponding to the word vector matrix.

In another embodiment, the feature extraction network model may be implemented using a convolutional neural network (Convolutional Neuron Networks, CNN). The feature extraction network model comprises a plurality of convolution layers with different convolution kernel widths; and respectively inputting the word vector matrix into each convolution layer of the feature extraction network model to obtain feature vectors output by each convolution layer, and splicing all the obtained feature vectors to obtain word vector features of the word vector matrix. For example, as shown in fig. 4, the feature extraction network model includes 3 convolution layers, the convolution kernel widths are 2, 3, and 4, respectively, and the number of filters of the convolution kernel may be 256. And respectively carrying out one-dimensional convolution operation on the input word vector matrix by adopting convolution kernels with different widths according to a set step length to obtain 3 feature vectors, wherein the set step length can be 1. Assuming that n participles are contained in a sentence, a convolution layer with a convolution kernel width of 2 is used for carrying out convolution fusion on word vectors of the continuous 2 participles, corresponding feature vectors are determined, a convolution layer with a convolution kernel width of 3 is used for carrying out convolution fusion on word vectors of the continuous 3 participles, corresponding feature vectors are determined, and a convolution layer with a convolution kernel width of 3 is used for carrying out convolution fusion on word vectors of the continuous 3 participles, and corresponding feature vectors are determined. And splicing the obtained 3 feature vectors by adopting a splicing layer in the network to obtain the word vector features of the input word vector matrix, namely the word vector features corresponding to sentence pairs. The word vector feature may also be regarded as a feature sequence consisting of features of word vectors corresponding to the partitionings in sentence pairs.

In another embodiment, the feature extraction network may also include a residual network and a FPN (Feature Pyramid Networks, feature pyramid network). The residual network includes a plurality of feature extraction layers, and the feature pyramid network includes a corresponding plurality of network layers. The word vector matrix is input into a residual network, a plurality of feature response graphs with a plurality of sizes are output through a plurality of feature extraction layers of the residual network, the feature response graphs with a plurality of sizes are correspondingly input into a plurality of network layers of a feature pyramid network, and word vector features corresponding to the word vector matrix can be obtained through feature fusion from bottom to top.

And fifthly, inputting the word vector characteristics into the two-way long-short-term memory network model to obtain a text characteristic sequence output by a hidden layer of the two-way long-term memory network model.

Considering that words in sentences used by the article are always related in front-to-back, a Bi-LSTM model is used for processing sentence pairs consisting of natural language sentences. The Bi-LSTM model processes data in two different directions when processing sentences, and propagates in the front and rear two different directions respectively, so that the influence of the context environment on the current word can be fully considered.

Specifically, the feature sequence of the word vector corresponding to the sentence pair obtained in the fourth step is taken as a feature sequence, and the feature sequence is input into the Bi-LSTM model. As shown in fig. 4, the Bi-LSTM model includes a plurality of hidden layers, and a forward lstm_cell and a backward lstm_cell, where forward and backward state transmissions are performed between the plurality of hidden layers, respectively, and finally, the output of each hidden layer is formed into a text feature sequence. Illustratively, the hidden layer of the Bi-LSTM model may have a dimension of 256. The Bi-LSTM model provided by the embodiment of the application refers to not only the current input data but also the input data before and after the current input data when processing the current input data, thereby avoiding the problem that only the influence of the previous input data is considered when processing the sequence data.

And step six, inputting the text feature sequence into a multi-head attention mechanism model to obtain text feature vectors corresponding to sentence pairs.

The essence of the attention mechanism is a means of screening out high value information from a large amount of information in which different information is different in importance to the result, which can be represented by giving attention weights of different magnitudes, in other words, the attention mechanism can be understood as a mechanism of assigning weights when synthesizing a plurality of inputs.

The multi-headed attention mechanism model in embodiments of the present application includes a plurality of attention subnetworks having different network parameters. The network parameters in each attention sub-network are used to characterize the importance of the word vector feature elements of the respective tokens in sentence pairs from different angles. For word vector feature elements with higher importance, the weight of the word vector feature elements is improved, and for word vector feature elements with low importance, the weight of the word vector feature elements is reduced, so that key words in sentence pairs are focused, and the accuracy of sentence pair recognition results is improved.

Inputting text feature sequences into eachAnd the attention sub-networks are used for splicing the outputs of all the attention sub-networks to obtain text feature vectors corresponding to sentence pairs. Wherein the output a of the ith attention sub-network _i Can be expressed as:

a _i ＝softmax(V _i tanh(W _i H ^T ))

wherein H is a text feature sequence output by a hidden layer of the Bi-LSTM model, and can be also understood as a feature matrix, and comprises a vector [ H ] ₁ ,…,h _n ]. Wherein h is _n The vector output for the nth hidden layer. H ^T Is a transpose of H. W (W) _i And V _i The network parameters of the ith attention sub-network are also understood as parameter matrices. For example, the multi-headed attention mechanism model shown in FIG. 4 includes 3 attention subnetworks, i.e., i can have a value of 1,2, or 3. And splicing the outputs of the 3 attention subnetworks to obtain text feature vectors corresponding to sentence pairs.

The embodiment of the application adopts a multi-head attention mechanism model, and can respectively set weights for word vector characteristic elements of the segmented words from a plurality of different angles, for example, the weights can be set from the positions of the segmented words in sentences, or the weights can be set from the parts of speech of the segmented words, or the weights can be set by considering the relation between the segmented words and other segmented words in the context. From different angles, the influence of various different factors on the weight of the word vector characteristic elements of the word segmentation is comprehensively considered, so that the accuracy of sentence pair recognition results can be improved.

The two-way long-short-term memory network model in the fifth step and the multi-head attention mechanism model in the sixth step are obtained by training by using training samples with category labels. The class labels are used to indicate whether the corresponding training samples are positive or negative samples. The positive sample is a sentence pair sample consisting of two sentences extracted from the rank comparison sentences, and the negative sample is a sentence pair sample consisting of two randomly acquired sentences.

And step seven, inputting text feature vectors corresponding to sentence pairs into a full-connection layer, and classifying the output of the full-connection layer through a classifier to obtain the probability that the sentence pairs are rank-comparison sentence pairs.

The full connection layer is used for carrying out dimension reduction on the text feature vector, the text feature vector after the dimension reduction is input into the classifier, and the classifier outputs the probability that sentence pairs are rank comparison sentence pairs.

In one embodiment, the classifier may employ an sgmoid classifier. In another embodiment, the classifier may be implemented using an SVM (Support Vector Machine ) classifier. The SVM classifier is a linear classifier and is mainly used for two classification, and the SVM classifier can determine sentence pairs as rank-aligned sentence pairs or non-rank-aligned sentence pairs based on input text feature vectors. In another embodiment, the classifier may employ a Softmax classifier. The classifier may output a class (Label) and a confidence (rate), i.e., a probability that a sentence pair is a rank-aligned sentence pair and a probability that a sentence pair is a non-rank-aligned sentence pair.

And step eight, determining sentence pair identification results corresponding to the sentence pairs according to the probability that the sentence pairs are rank comparison sentence pairs.

The sentence pair recognition result comprises that the sentence pair is a rank comparison sentence pair or that the sentence pair is a non-rank comparison sentence pair. In one embodiment, if the probability that the sentence pair is a rank comparison sentence pair is greater than or equal to a set first threshold, determining that the sentence pair is a rank comparison sentence pair; if the probability of the sentence pair being a rank comparison sentence pair is smaller than a first threshold value, determining that the sentence pair is a non-rank comparison sentence pair. Illustratively, the first threshold may be 0.8 or 0.75.

The embodiment of the application combines the characteristics of several models: converting the Word segmentation in the sentences contained in the sentence pairs into Word vectors by using a Word2vec model, and generating a Word vector matrix; extracting features of the word vector matrix by using a CNN model comprising a plurality of convolution layers with different convolution kernel widths, and carrying out feature fusion on a plurality of word segmentation sequences with different lengths to obtain word vector features of the word vector matrix; considering that if word vector features obtained by the CNN model are directly pooled, a lot of information is easy to lose, and the relation among feature vectors of each word is ignored, the word vector features are processed by using the two-way long-short-term memory network model, and the relation among feature vectors of the words before and after the word is characterized, so that a text feature sequence corresponding to sentence pairs is obtained; and determining the importance degree of the feature vector pair of each word to judge whether the feature vector pair is the alignment by utilizing the multi-head attention mechanism model, so that the accuracy of the recognition result can be improved.

In another embodiment, the classifier in the step seven may further output a probability that the sentence pair is a non-rank-aligned sentence pair. And determining a sentence pair identification result corresponding to the sentence pair according to the probability that the sentence pair is a non-rank comparison sentence pair. For example, if the probability that a sentence pair is a non-rank-aligned sentence pair is less than or equal to a set minimum threshold, determining that the sentence pair is a rank-aligned sentence pair; if the probability of the sentence pair being a non-rank comparison sentence pair is greater than a set minimum threshold, determining that the sentence pair is a non-rank comparison sentence pair. Illustratively, the minimum threshold may be 0.2 or 0.3.

In another embodiment, the sentence pair recognition result corresponding to the sentence pair may be determined by combining the probability that the sentence pair is a non-rank-comparison sentence pair and a preset rank condition. For example, if the probability that the sentence pair is a non-rank-ratio sentence pair is less than or equal to a set second threshold value, and the sentence included in the sentence pair satisfies a preset rank condition, determining that the sentence pair is a rank-ratio sentence pair; if the probability of the sentence pair being the non-rank comparison sentence pair is larger than a second threshold value or the sentence contained in the sentence pair does not meet the preset rank comparison condition, determining that the sentence pair is the non-rank comparison sentence pair. Wherein the second threshold may be 0.6 or 0.7.

Illustratively, in some embodiments, the score of sentence pairs may be determined according to the preset ranking conditions described above. For example, assuming that the sentence pair Y includes a first sentence and a second sentence, if the number of characters contained in the first sentence or the second sentence is smaller than 4, the sentence pair Y is a non-rank sentence pair. If the number of characters contained in the first sentence and the second sentence is greater than or equal to 4, an initial score of 0 for sentence pair Y is set. Determining a sentence pair Y score according to the following conditions: if the difference between the number of characters contained in the first sentence and the second sentence is less than or equal to 5, the current score of sentence pair Y is increased by 0.5; if the difference between the number of the words contained in the first sentence and the second sentence is less than or equal to 2, the current score of the sentence pair Y is increased by 1; if the matching rate of punctuation marks contained in the first sentence and the second sentence is greater than or equal to 0.7, the current score of the sentence to Y is increased by 0.5; if the part-of-speech similarity of the segmented words contained in the first sentence and the second sentence is greater than or equal to 0.65, increasing the current score of the sentence to Y by 2; for co-occurrence words in the two sentences, the distance between the positions of each co-occurrence word in the two sentences is less than or equal to 0.15, and the current score of sentence pair Y is increased by 0.5.

The part-of-speech similarity of the segmented words can be determined by using the Lewenstein distance. The lycenstant distance is used to measure the similarity between two sentences and can be determined by the minimum number of editing operations required to transform a first sentence into a second sentence. Editing operations taken by the first sentence to the second sentence may include inserting a character, deleting a character, replacing a character, and so forth. The smaller the number of editing operations, the smaller the distance between the first sentence and the second sentence, indicating that the two sentences are more similar.

For co-occurrence word w in two sentences, the distance S between its positions in the two sentences can be determined using the following formula _w ：

Wherein K is _w1 The position of the co-occurrence word w in the first sentence is represented, or referred to as an index of the co-occurrence word w in the first sentence, that is, what number of the partial words the co-occurrence word w is in the first sentence. K (K) _w2 The position of the co-occurrence word w in the second sentence is represented, or referred to as an index of the co-occurrence word w in the second sentence, that is, what number of the partial words the co-occurrence word w is in the second sentence. L (L) ₁ The length of the first sentence or the number of characters contained in the first sentence; l (L) ₂ Is the length of the second sentence or the number of characters contained in the second sentence.

Determining the score of the sentence pair Y through the process, and if the score of the sentence pair Y is greater than or equal to 4, determining that the sentence pair Y meets a preset ranking condition; if the score of the sentence pair Y is smaller than 4, determining that the sentence pair Y does not accord with the preset ranking condition.

In another embodiment, the sentence pair recognition result corresponding to the sentence pair may be determined by combining the probability that the sentence pair is a rank comparison sentence pair, the probability that the sentence pair is a non-rank comparison sentence pair, and a preset rank comparison condition. For example, if the first condition or the second condition is satisfied, determining the sentence pair as a rank comparison sentence pair; the first condition is that the probability of sentence pair being a rank comparison sentence pair is larger than or equal to a first threshold value, the second condition is that the probability of sentence pair being a non-rank comparison sentence pair is smaller than or equal to a second threshold value, and the sentences contained in the sentence pair meet a preset rank comparison condition. Otherwise, if neither the first condition nor the second condition is satisfied, the sentence pair is determined to be a non-rank-aligned sentence pair.

The method has the advantages that the model identification and the preset rule identification are combined to identify the comparatives, so that the accuracy of the identification result can be further improved.

Step S305, extracting rank sentences from the text to be recognized according to the sentence pair recognition result corresponding to each sentence pair.

Since the rank-comparison sentence generally contains three or more sentences, if the sentence pair recognition result corresponding to two or more continuous sentence pairs in the text to be recognized indicates that the sentence pair is a rank-comparison sentence pair, the sentences contained in the sentence pairs can be considered to constitute the rank-comparison sentence. Extracting sentences contained in the sentence pairs which are continuous in the text to be identified and indicate that the sentence pairs are rank comparison sentence pairs from the text to be identified to obtain rank comparison sentences. Considering that a text to be identified may contain a plurality of ranking sentences, each sentence in the same ranking sentence is continuous, and other sentences may exist between sentences of different ranking sentences, i.e. sentence pairs of different ranking sentences do not satisfy continuous relations between sentences. Therefore, through the method, a plurality of ranking sentences can be extracted from the text to be recognized. The service server may send the identification result to the terminal device of the user, and the terminal device displays the identification result to the user.

The text recognition method provided by the embodiment of the application can be used for automatically modifying the composition of students in online teaching and coaching application, and the ranking sentences in the composition can be obtained by extracting the ranking sentence pairs in the composition. The text recognition method provided by the embodiment of the present application may also be executed by the terminal device 11. For example, in the "Chinese composition correction" application module shown in fig. 5, the ranking sentence in the composition can be automatically identified: sun is given to us in blue sky, so that us thrive; naturally gives green to us, so that us has a beautiful and reverberant idea; parents give our unbiased love, so that we transmit countless true feelings among people; the teacher gives our knowledge, so that the original empty brocade becomes a magic hall; friends give us a mutual aid to the group, so that we do not have a single route.

The embodiment adopts CNN and Bi-LSTM models based on a multi-head attention mechanism, and can more accurately identify alignment.

In one embodiment, after identifying the comparand in the composition, the comparand in the composition can be highlighted in the display interface, as shown in fig. 6, the comparand in the composition can be highlighted in a font bolded or italic manner, so that the comparand is more obvious in the composition, and a teacher who modifies the composition can more easily see the comparand in the composition. In another embodiment, after identifying the rank sentences in the composition, a score of the composition mining dimension may be obtained, and a corresponding comment or the like may be given for the rank sentences in the composition.

In order for the machine learning model or the deep learning model to accurately recognize rank comparison sentence pairs, each model used in the above process needs to be trained in advance. In one embodiment, as shown in FIG. 7, the training process of the model includes the steps of:

step S701, a training sample set is acquired.

The training sample set includes sentence pair samples with category labels. The class labels are used for indicating that the corresponding sentence pair samples are positive samples or negative samples, wherein the positive samples are sentence pair samples consisting of two sentences extracted from the row comparison sentences, and the negative samples are sentence pair samples consisting of two sentences obtained randomly.

In one embodiment, a scrapy crawler tool may be used to crawl some ranking sentences and other articles from the network. For the crawled ranking sentences, the comparison symbols, such as', can be used; . And (3) performing segmentation, namely combining the segmented sentences in pairs to obtain a positive sample, and marking the class label of the positive sample as 1. And (3) cutting other articles by adopting the method, combining the sentences obtained by cutting, such as metaphors, anthropomorphic sentences, anti-questioning sentences and the like, or combining the sentences obtained by cutting other articles and the sentences obtained by cutting the comparatives, so as to obtain a negative sample, and marking the class label of the negative sample as 0. The training sample set may contain around 4000 negative samples.

In another embodiment, multiple sentence pair samples may be obtained over a network or otherwise, for example, crawling articles in the network, and constructing sentence pair samples from the crawled articles in such a way that adjacent sentence pairs are formed. Labeling category labels on sentence pair samples in the following mode: judging whether sentences contained in the sentence-to-sample meet a preset ranking condition or not, and if the sentences contained in the sentence-to-sample meet the preset ranking condition, setting a class label of the sentence-to-sample as a positive sample; and if the sentences contained in the sentence-to-sample do not meet the preset ranking condition, setting the category label of the sentence-to-sample as a negative sample. The preset ranking conditions may refer to the conditions described above, and are not described herein. By adopting the method to label sentence pair samples, the artificial labeling workload can be reduced while the model training corpus is expanded.

Step S702, sentence pair samples are extracted from the training sample set, and a word vector matrix corresponding to the extracted sentence pair samples is obtained.

Specifically, a jieba word segmentation tool can be used for word segmentation and part-of-speech tagging of sentences contained in the sample. Recognizing Word vectors of the words contained in each sentence in the sentence pair sample through a Word2vec model, forming Word vectors of the words into Word vector sequences corresponding to the sentence pair sample, determining part-of-speech vectors of the words contained in each sentence in the sentence pair sample, forming part-of-speech vectors of the words into part-of-speech vector sequences corresponding to the sentence pair sample, for example, assuming that two sentences of sentence 1 and sentence 2 are contained in the sentence pair sample, and inputting the words and parts of speech contained in each sentence into the Word2vec model according to a mode of 'sentence 1 Word list SEP sentence 2 Word list', 'sentence 1 part-of-speech list SEP sentence 2 part-of-speech', so that the Word vector sequences and the part-of-speech vector sequences corresponding to the sentence pair sample can be obtained. And then splicing the word vector sequence corresponding to the sentence pair sample and the part-of-speech vector sequence to obtain a word vector matrix corresponding to the sentence pair sample.

Step S703, inputting the word vector matrix corresponding to the sentence pair sample into the feature extraction network model, and obtaining the word vector feature of the word vector matrix corresponding to the sentence pair sample.

Step S704, word vector features of sentence pair samples are input into a two-way long-short-term memory network model to be trained, and a text feature sequence corresponding to the sentence pair samples is obtained.

Step S705, inputting the text feature sequence corresponding to the sentence-to-sample into a multi-head attention mechanism model to obtain the text feature vector corresponding to the sentence-to-sample.

Step S706, classifying the text feature vectors corresponding to the sentence-to-sample through a classifier to obtain a sentence-to-sample classification result.

In some embodiments, text feature vectors corresponding to sentence-to-sample may be input into a full-connection layer, and the output of the full-connection layer is classified by a classifier, so as to obtain a classification result of the sentence-to-sample.

Step S707, determining a loss value according to the classification result of the sentence-to-sample and the class label of the sentence-to-sample.

The Loss function used in calculating the Loss value may be, but is not limited to, a cross entropy Loss function, a contrast Loss function (contrast Loss), or a Center Loss function, etc.

Step S708, judging whether the loss value converges to a preset expected value; if yes, go to step S710; if not, step S709 is performed.

Step S709, network parameters of the two-way long-short term memory network model and the multi-head attention mechanism model are adjusted according to the loss value.

After the model parameters are adjusted, the process returns to step S702, and the training process of the next round is continued.

And step S710, taking the current parameters as parameters of the two-way long-short-term memory network model and the multi-head attention mechanism model to obtain the trained two-way long-short-term memory network model and the multi-head attention mechanism model.

And if the loss values of the two-way long-short-period memory network model and the multi-head attention mechanism model are converged to the set expected values, taking the current parameters as parameters of the two-way long-short-period memory network model and the multi-head attention mechanism model to obtain the trained two-way long-period memory network model and the trained multi-head attention mechanism model.

In the training process, the feature extraction network model and the classifier can be pre-trained models, and when the model parameters are adjusted according to the loss values, the network parameters of the two-way long-short-term memory network model and the multi-head attention mechanism model are only adjusted, but the network parameters of the feature extraction network model and the classifier are not adjusted. In another embodiment, the feature extraction network model, the two-way long-short term memory network model, the multi-head attention mechanism model and the classifier can be trained together, and the training process can also be performed with reference to fig. 7, wherein when the model parameters are adjusted according to the loss values, the network parameters of the feature extraction network model, the two-way long-short term memory network model, the multi-head attention mechanism model and the classifier are all adjusted. Illustratively, the parameters of each layer of the model can be optimized by using an Adam algorithm, which is an algorithm used for replacing random gradient descent in the deep learning model and can provide an optimization method for solving the problems of sparse gradient and noise. During training, the expected value may be set to 0.00125, and the sample size in each input to the model to be trained may be 256.

In one embodiment, after one round of training is completed, sentence pair samples can be obtained again, the sentence pair samples are labeled by adopting a trained model and combining with a preset ranking condition, category labels of the sentence pair samples are determined, the labeling result is corrected manually, the obtained labeled sentence pair samples are added into a training data set, and training of the model is continued. The trained model is combined with a preset ranking condition to label the sentence, so that the workload of manually labeling data can be greatly reduced.

Corresponding to the embodiment of the text recognition method, the embodiment of the application also provides a text recognition device. Fig. 8 is a schematic structural diagram of a text recognition device according to an embodiment of the present application; as shown in fig. 8, the text recognition apparatus includes a feature extraction unit 81, a feature processing unit 82, and a feature recognition unit 83.

The feature extraction unit 81 is configured to obtain a word vector matrix corresponding to a sentence pair composed of two sentences, and extract a word vector feature of the word vector matrix;

the feature processing unit 82 is configured to input word vector features into the two-way long-short term memory network model to obtain text feature sequences corresponding to sentence pairs; inputting the text feature sequence into a multi-head attention mechanism model to obtain text feature vectors corresponding to sentence pairs; the two-way long-short-term memory network model and the multi-head attention mechanism model are obtained by training by using training samples with category labels; the class labels are used for indicating that the corresponding training samples are positive samples or negative samples, wherein the positive samples are sentence pair samples consisting of two sentences extracted from the rank comparison sentences, and the negative samples are sentence pair samples consisting of two sentences obtained randomly;

A feature recognition unit 83, configured to determine a sentence pair recognition result corresponding to the sentence pair based on the text feature vector corresponding to the sentence pair; the sentence pair recognition result includes that the sentence pair is a rank-aligned sentence pair or that the sentence pair is a non-rank-aligned sentence pair.

In an alternative embodiment, the feature extraction unit 81 is specifically configured to:

generating a word vector sequence corresponding to the sentence pair according to the word vector of the word segmentation contained in each sentence in the sentence pair;

In an alternative embodiment, the feature extraction network model includes a plurality of convolution layers of different convolution kernel widths; the feature extraction unit 81 is specifically configured to:

In an alternative embodiment, the multi-headed attentiveness mechanism model includes a plurality of attentiveness subnetworks having different network parameters; the feature processing unit 82 is specifically configured to:

And respectively inputting the text feature sequences into each attention sub-network, and splicing the outputs of all the attention sub-networks to obtain text feature vectors corresponding to sentence pairs.

In an alternative embodiment, the feature recognition unit 83 is specifically configured to:

determining the probability that sentence pairs are rank-comparison sentence pairs based on text feature vectors corresponding to the sentence pairs; if the probability that the sentence pair is the rank comparison sentence pair is greater than or equal to a first threshold value, determining that the sentence pair is the rank comparison sentence pair; or,

determining the probability of sentence pairs being non-rank-comparison sentence pairs based on the text feature vectors corresponding to the sentence pairs; if the probability of the sentence pair being the non-rank comparison sentence pair is smaller than or equal to the second threshold value, and the sentences contained in the sentence pair meet the preset rank comparison condition, determining the sentence pair to be the rank comparison sentence pair.

and inputting the text feature vector corresponding to the sentence pair into a full-connection layer, and classifying the output of the full-connection layer through a classifier to obtain the probability that the sentence pair is a rank comparison sentence pair.

The difference of the number of characters contained in the two sentences in the sentence pair is smaller than or equal to a first set difference;

the matching rate of punctuation marks contained in two sentences in the sentence pair is greater than or equal to a set matching threshold value;

In an alternative embodiment, as shown in fig. 9, the apparatus further includes a data acquiring unit 92 configured to:

sequentially combining at least two adjacent sentences into a sentence pair;

the feature recognition unit 83 is further configured to:

and extracting rank sentences from the text to be recognized according to the sentence pair recognition result corresponding to each sentence pair.

In an alternative embodiment, as shown in fig. 9, the apparatus may further include a model training unit 91 for training the network model used in the above embodiment.

In an alternative embodiment, model training unit 91 is specifically configured to:

In an alternative embodiment, the model training unit 91 may be further configured to:

if the sentence contained in the sentence pair sample does not meet the preset ranking condition, setting the class label of the sentence pair sample as a negative sample.

According to the text recognition device provided by the embodiment of the application, after the word vector characteristics corresponding to sentence pairs formed by two sentences are obtained, the word vector characteristics are processed by utilizing a two-way long-short-term memory network model, the relation between the front and rear words is described, a text characteristic sequence corresponding to the sentence pairs is obtained, the importance degree of judging whether each word pair is the rank alignment is determined by utilizing a multi-head attention mechanism model, the text characteristic vector corresponding to the sentence pairs is obtained, and whether the sentence pair is the rank alignment sentence is determined based on the text characteristic vector corresponding to the sentence pairs, so that the accuracy of a recognition result can be improved, and the rank alignment sentence in a text can be accurately recognized.

Corresponding to the embodiment of the text recognition method, the embodiment of the application also provides electronic equipment. The electronic device may be a server, such as the service server 12 shown in fig. 1, or a terminal device, such as a mobile terminal or a computer, such as the terminal device 11 shown in fig. 1.

The electronic device comprises at least a memory for storing data and a processor for data processing. Among them, for a processor for data processing, when performing processing, a microprocessor, a CPU, a GPU (Graphics Processing Unit, a graphics processing unit), a DSP, or an FPGA may be employed. For the memory, the operation instructions, which may be computer-executable codes, are stored in the memory, and each step in the flow of the text recognition method according to the embodiment of the present application is implemented by the operation instructions.

Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application; as shown in fig. 10, the electronic device 100 according to the embodiment of the present application includes: a processor 101, a display 102, a memory 103, an input device 106, a bus 105, and a communication module 104; the processor 101, memory 103, input device 106, display 102, and communication module 104 are all coupled via a bus 105, and the bus 105 is used to transfer data between the processor 101, memory 103, display 102, communication module 104, and input device 106.

The memory 103 may be used to store software programs and modules, such as program instructions/modules corresponding to the text recognition method in the embodiment of the present application, and the processor 101 executes the software programs and modules stored in the memory 103, thereby executing various functional applications and data processing of the electronic device 100, such as the text recognition method provided by the embodiment of the present application. The memory 103 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program of at least one application, and the like; the storage data area may store data created according to the use of the electronic device 100 (such as text interpretation information, and related data of each trained network model, etc.), and the like. In addition, memory 103 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The processor 101 is a control center of the electronic device 100, connects various parts of the entire electronic device 100 using the bus 105 and various interfaces and lines, and performs various functions of the electronic device 100 and processes data by running or executing software programs and/or modules stored in the memory 103, and invoking data stored in the memory 103. Alternatively, the processor 101 may include one or more processing units, such as a CPU, GPU, digital processing unit, or the like.

The processor 101 may present the processing results of the text data to the user via the display 102.

The processor 101 may also be connected to a network through the communication module 104 to obtain text data, etc.

The input device 106 is mainly used to obtain input operations by a user, and the input device 106 may be different when the electronic devices are different. For example, when the electronic device is a computer, the input device 106 may be an input device such as a mouse, keyboard, etc.; when the electronic device is a portable device such as a smart phone or a tablet computer, the input device 106 may be a touch screen.

The embodiment of the application also provides a computer storage medium, wherein the computer storage medium is stored with computer executable instructions for realizing the text recognition method according to any embodiment of the application.

In some possible embodiments, aspects of the text recognition method provided by the present application may also be implemented in the form of a program product, which includes a program code for causing a computer device to perform the steps of the text recognition method according to the various exemplary embodiments of the present application described above when the program product is run on the computer device, for example, the computer device may perform the flow of the text recognition method as shown in fig. 2 in steps S201 to S204.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application.

Claims

1. A method of text recognition, comprising:

dividing a text to be identified into a plurality of sentences according to the specified separator, and constructing sentence pairs according to two sentences in the plurality of sentences;

acquiring a word vector matrix corresponding to sentence pairs formed by two sentences, and extracting word vector features of the word vector matrix through a feature extraction network model; wherein the feature extraction network model comprises: a residual error network and a characteristic pyramid network;

Determining sentence pair recognition results corresponding to the sentence pairs based on the text feature vectors corresponding to the sentence pairs; the sentence pair recognition result comprises that the sentence pair is a rank comparison sentence pair or the sentence pair is a non-rank comparison sentence pair;

wherein the constructing sentence pairs according to two sentences in the plurality of sentences includes:

each sentence and an adjacent sentence form a sentence pair; or (b)

Forming a sentence pair by each sentence with odd numbers and adjacent sentences with even numbers;

wherein, based on the text feature vector corresponding to the sentence pair, determining the sentence pair recognition result corresponding to the sentence pair includes:

2. The method of claim 1, wherein the obtaining a word vector matrix corresponding to sentence pairs consisting of two sentences and extracting word vector features of the word vector matrix comprises:

3. The method of claim 2, wherein the feature extraction network model comprises a plurality of convolution layers of different convolution kernel widths; inputting the word vector matrix into a feature extraction network to obtain word vector features of the word vector matrix, wherein the word vector features comprise:

4. The method of claim 1, wherein obtaining a text feature sequence corresponding to the sentence pair from the word vector feature comprises:

inputting the word vector features into a two-way long-short-term memory network model to obtain text feature sequences corresponding to the sentence pairs output by the two-way long-term memory network model;

According to the weights of the word vector feature elements corresponding to each word segmentation, converting the text feature sequence into the text feature vector corresponding to the sentence pair, wherein the method comprises the following steps:

inputting the text feature sequence into a multi-head attention mechanism model to obtain text feature vectors corresponding to the sentence pairs output by the multi-head attention mechanism model; the multi-head attention mechanism model is used for converting the text feature sequence into a text feature vector according to the weights of word vector feature elements corresponding to each word segmentation;

5. The method of claim 4, wherein the multi-headed attention mechanism model comprises a plurality of attention subnetworks having different network parameters; inputting the text feature sequence into a multi-head attention mechanism model to obtain text feature vectors corresponding to the sentence pairs, wherein the text feature vectors comprise:

6. The method of claim 1, wherein determining a probability that the sentence pair is a rank-aligned sentence pair based on the text feature vector corresponding to the sentence pair comprises:

7. The method of claim 1, wherein the sentence pair comprises a sentence satisfying a preset ranking condition, including some or all of the following conditions:

8. The method of claim 1, wherein after determining a sentence pair recognition result corresponding to the sentence pair based on the text feature vector corresponding to the sentence pair, the method further comprises:

9. The method of claim 4, wherein the training process of the two-way long-short term memory network model and the multi-headed attentiveness mechanism model comprises:

10. The method of claim 9, wherein the sentence-to-sample category labels are labeled in the following manner:

11. A text recognition device, comprising:

the data acquisition unit is used for dividing the text to be identified into a plurality of sentences according to the specified separator, and constructing sentence pairs according to two sentences in the plurality of sentences;

The feature extraction unit is used for obtaining a word vector matrix corresponding to sentence pairs formed by two sentences, and extracting word vector features of the word vector matrix through a feature extraction network model; wherein the feature extraction network model comprises: a residual error network and a characteristic pyramid network;

the feature recognition unit is used for determining sentence pair recognition results corresponding to the sentence pairs based on the text feature vectors corresponding to the sentence pairs; the sentence pair recognition result comprises that the sentence pair is a rank comparison sentence pair or the sentence pair is a non-rank comparison sentence pair;

Each sentence and an adjacent sentence form a sentence pair; or (b)

12. The apparatus according to claim 11, wherein the feature extraction unit is specifically configured to:

13. A computer-readable storage medium having a computer program stored therein, characterized in that: the computer program, when executed by a processor, implements the method of any of claims 1-10.

14. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, the computer program, when executed by the processor, implementing the method of any of claims 1-10.