CN113850235A

CN113850235A - Text processing method, device, equipment and medium

Info

Publication number: CN113850235A
Application number: CN202111416713.2A
Authority: CN
Inventors: 李盼盼; 秦勇
Original assignee: Beijing Century TAL Education Technology Co Ltd
Current assignee: Beijing Century TAL Education Technology Co Ltd
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2021-12-28
Anticipated expiration: 2041-11-26
Also published as: CN113850235B

Abstract

The present disclosure relates to a text processing method, apparatus, device, and medium, the method comprising: acquiring an image to be processed containing a text object; detecting the image to be processed through a detection model to obtain a target detection frame; wherein, the target detection frame includes: a target test question detection frame surrounding the test questions and a target answer detection frame surrounding the answers; respectively identifying the target test question detection frame and the target answer detection frame through an identification model to obtain a test question character string and an answer character string; executing question judging processing based on the question judging model, the test question character string and the answer character string; the test question distinguishing model is obtained by training a generator based on a word sample set and a text sample set; the text sample set includes: and describing the first text sample set obtained based on the standard question stem of the test question and/or the second text sample set obtained based on the standard answer of the test question. The method and the device can improve the speed and the accuracy of judging the questions.

Description

Text processing method, device, equipment and medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a text processing method, apparatus, device, and medium.

Background

The shooting judgment problem is an important application of the artificial intelligence technology in the field of education. The current technology of shooting and judging questions obtains better correction effect on the questions which can be corrected logically; however, for the question types with semantic information, such as selecting, filling in gaps, and judging questions, the processing capability is greatly limited, the effect of the question judging speed and accuracy is poor, and the requirement of the user for effectively processing the question types cannot be met.

Disclosure of Invention

To solve the technical problem or at least partially solve the technical problem, the present disclosure provides a text processing method, apparatus, device, and medium.

According to an aspect of the present disclosure, there is provided a text processing method including:

acquiring an image to be processed containing a text object;

detecting the image to be processed through a detection model to obtain a target detection frame; wherein the target detection frame includes: a target test question detection frame surrounding the test questions and a target answer detection frame surrounding the answers;

respectively identifying the target test question detection frame and the target answer detection frame through an identification model to obtain a test question character string and an answer character string;

executing question judging processing based on a question judging model, the test question character string and the answer character string; the test question distinguishing model is obtained by training a generator based on a word sample set and a text sample set; the set of text samples includes: and describing the first text sample set obtained based on the standard question stem of the test question and/or the second text sample set obtained based on the standard answer of the test question.

According to another aspect of the present disclosure, there is provided a text processing apparatus including:

the image acquisition module is used for acquiring an image to be processed containing a text object;

the image detection module is used for detecting the image to be processed through a detection model to obtain a target detection frame; wherein the target detection frame includes: a target test question detection frame surrounding the test questions and a target answer detection frame surrounding the answers;

the text recognition module is used for respectively recognizing the target test question detection box and the target answer detection box through a recognition model to obtain a test question character string and an answer character string;

the question judging module is used for executing question judging processing based on the test question judging model, the test question character string and the answer character string; the test question distinguishing model is obtained by training a generator based on a word sample set and a text sample set; the set of text samples includes: and describing the first text sample set obtained based on the standard question stem of the test question and/or the second text sample set obtained based on the standard answer of the test question.

According to another aspect of the present disclosure, there is provided an electronic apparatus including: a processor; and a memory storing a program, wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the text processing method according to the above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform a method according to text processing.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:

the embodiment of the disclosure provides a text processing method, a text processing device, text processing equipment and a text processing medium, wherein the method comprises the following steps: acquiring an image to be processed containing a text object; detecting the image to be processed through a detection model to obtain a target detection frame; wherein, the target detection frame includes: a target test question detection frame surrounding the test questions and a target answer detection frame surrounding the answers; respectively identifying the target test question detection frame and the target answer detection frame through an identification model to obtain a test question character string and an answer character string; executing question judging processing based on the question judging model, the test question character string and the answer character string; the test question distinguishing model is obtained by training a generator based on a word sample set and a text sample set; the text sample set includes: and describing the first text sample set obtained based on the standard question stem of the test question and/or the second text sample set obtained based on the standard answer of the test question. The method and the device can improve the speed and the accuracy of judging the questions.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart of a text processing method provided in an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a correspondence between training data sets and models provided by an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a text processing apparatus according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure can be more clearly understood, embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description. It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The question types with semantic information, such as selection questions, blank filling questions, judgment questions and the like, are very common and large in quantity in teaching, the application range of photographing and judging questions is directly influenced by the processing of the question types, and a wide user group has the requirement for automatically processing the question types. However, the current shooting question-judging technology has great limitation on the processing capacity of the question type, the effect of question-judging speed and accuracy is poor, and even if the demand is so strong, a method which can perfectly batch-modify the question type with semantic information still lacks at present, and the demand of a user for effectively processing the question type cannot be met. In order to solve the above problem, embodiments of the present disclosure provide a text processing method, apparatus, device, and medium. For ease of understanding, the embodiments of the present disclosure are described in detail below.

Referring to a flowchart of a text processing method provided in fig. 1, the method may include the following steps:

step S102, acquiring an image to be processed containing a text object.

In practical applications, the image to be processed may be an image obtained by a user through an image selection operation, an image capturing operation, an image uploading operation, or the like in the terminal. The text objects contained in the image to be processed are such as: the text object may be a handwritten form and/or a printed form, or may be a language type different from that of chinese and english, and the text object is not limited specifically herein.

Step S104, detecting the image to be processed through a detection model to obtain a target detection frame; wherein, the target detection frame includes: a target test question detection frame surrounding the test questions and a target solution detection frame surrounding the solutions.

According to this embodiment, the detection model may be, for example, centret. The image to be processed can be input into the detection model, and at least the following target detection frames in the image to be processed are output by the detection model: the system comprises a target test question detection frame surrounding the test questions, a target solution detection frame surrounding the solutions and an overall detection frame surrounding the test questions and the solutions. The answer is generally a text content that the user answers the test questions by handwriting and the like, and in the embodiment, the answer needs to be judged by using the recognition model and the test question judgment model subsequently to determine whether the answer of the user to the test questions is correct or not.

And step S106, respectively identifying the target test question detection frame and the target answer detection frame through the identification model to obtain a test question character string and an answer character string.

This recognition model is, for example, CRNN (Convolutional Recurrent Neural Networks). In this embodiment, the screenshot corresponding to the target detection box may be input to the recognition network; exemplarily, inputting a test question screenshot corresponding to the target test question detection frame into a recognition network, recognizing the test questions in the test question screenshot through the recognition network, and outputting test question character creation; similarly, the answer screenshot corresponding to the target answer detection box is input to the recognition network, so that the recognition network outputs an answer character string.

Step S108, executing question judging processing based on the question judging model, the question character string and the answer character string; the test question distinguishing model is obtained by training a generator based on a word sample set and a text sample set; the text sample set includes: and describing the first text sample set obtained based on the standard question stem of the test question and/or the second text sample set obtained based on the standard answer of the test question.

In practical application, the question stem description of the question is not completely true for the types of choice, filling in blank, judgment and the like. For example, the question stem description of the selection and filling-in-blank type test question lacks some content information, and the missing content information needs to be completely supplemented by the user through answering; the content description of the judgment type test question can be correct or wrong, and the conclusion is also required to be given by the user through answering. In this case, the present embodiment provides several possible test question discrimination models to perform the discrimination processing on the test questions based on the test question character strings and the answer character strings.

In one possible embodiment, the test question identification model is a model for generating a sentence character string, the test question identification model is input as the test question character string, and output as the sentence character string with correct description of the question stem corresponding to the test question. The correct question stem description means that the missing content information is supplemented, and the wrongly-described content such as a real-fact error, a logic error, a concept use error and a word expression error is corrected in other ways, so that the text content description is complete and accurate. The topic judgment processing in the present embodiment can be understood as: comparing the sentence character string generated by the test question distinguishing model with the test question character string, determining a standard test question with correct question stem description corresponding to the test question character string according to a comparison result, and then comparing the standard test question with an answer represented by the answer character string to obtain a question judgment processing result with correct or wrong answer. The test question discrimination model provided by this embodiment may be obtained by training the generator based on the word sample set and the first text sample set labeled with the labeled question stem description.

In another possible embodiment, the test question determination model is a model for generating an answer character string, the test question determination model is input as the test question character string, and the test question determination model is output as the answer character string of the standard answer corresponding to the test question. The topic judgment processing in the present embodiment can be understood as: and comparing the answer character string generated by the test question judging model with the answer character string to obtain a judgment result of correct or wrong answer. The test question discrimination model provided in this embodiment may be obtained by training the generator based on the word sample set and the second text sample set labeled with the standard answer.

The text processing method provided by the embodiment of the disclosure detects the image to be processed to distinguish the target test question detection frame and the target answer detection frame, then respectively identifies the detection frames, and correspondingly obtains the test question character string and the answer character string, so that the test question character string and the answer character string can be clearly and clearly distinguished, the processing limitation on the question type with semantic information is relieved, and the speed of judging the question is improved; in this case, the question determination process is executed based on the test question determination model, the test question character string, and the answer character string, and the question determination process is executed from the viewpoint of whether the question stem description of the test question is correct or not and whether the answer is a standard answer or not, whereby the question determination accuracy can be improved.

The test question determination model in step S108 may include: and generating a text model, wherein the text generation model is the model for generating the sentence character string. Based on this, the present embodiment provides a method for executing question determination processing based on a text generation model, a test question character string, and an answer character string, including the following steps 1 to 5:

step 1, inputting the test question character string into a text generation model, and generating a prediction statement character string through the text generation model. The predicted sentence character string output by the text generation model is a sentence character string with correct content description corresponding to the test question, that is, the predicted sentence character string and the test question character string correspond to the same test question, but the content description represented by the predicted sentence character string is more correct than the test question character string.

And 2, calculating the similarity between the test question character string and the prediction statement character string, and judging whether the similarity reaches a preset similarity threshold value.

And 3, if so, determining that the test question content represented by the test question character string is correctly described.

And 4, if the test question content does not reach the preset value, determining that the test question content represented by the test question character string is wrongly described.

And 5, comparing the achieved or failed judgment result with the answer character string to obtain a judgment processing result.

Specifically, if the similarity between the test question character string and the prediction sentence character string reaches the similarity threshold, it can be determined that the content of the test question stem represented by the test question character string is correctly described. And then, determining the correct answer of the test question according to the prediction character string or the test question character string, and comparing the correct answer with the answer represented by the answer character string to obtain a question judgment processing result.

If the similarity between the test question character string and the prediction statement character string reaches a similarity threshold, the content of the test question stem represented by the test question character string can be determined to be wrongly described. And then, determining the correct answer of the test question according to the prediction character string and the test question character string, and comparing the correct answer with the answer represented by the answer character string to obtain a question judgment processing result.

For convenience of understanding, the present embodiment takes the test question of the judgment type as an example, and a detailed description is made on the judgment processing method based on the text generation model. Under the condition that the similarity between the test question character string of the judgment question and the character string of the prediction statement reaches a preset similarity threshold, the content of the test question stem of the judgment question can be determined to be correctly described; accordingly, the answer to the judgment question should be "pair"; and verifying whether the answer represented by the answer character string is 'pair', if so, judging that the answer processing result is correct answer of the user to the judgment question, and if not, judging that the answer processing result is wrong answer of the user to the judgment question.

Under the condition that the similarity between the test question character string of the judgment question and the character string of the prediction statement does not reach a preset similarity threshold, determining that the content of the test question stem of the judgment question is wrongly described; accordingly, the answer to the question should be "wrong"; and verifying whether the answer represented by the answer character string is wrong, if so, judging that the answer processing result is that the answer of the user to the judgment question is correct, and if not, judging that the answer processing result is that the answer of the user to the judgment question is wrong.

Compared with the existing scheme, for example, the scheme based on the question bank, the method for processing the question based on the text generation model provided by the embodiment saves the cost for establishing the question bank, can solve the problem of high searching difficulty and low precision based on the question bank, shortens the whole question link, and reduces the accumulated error.

The test question determination model in step S108 may include: the test question answering model is the model for generating the answer character string. Based on this, the present embodiment provides a method for executing question-making processing based on a question-answering model, a question character string, and an answer character string, including the following two steps:

firstly, inputting test question character strings into a test question answering model, and generating reference answer character strings through the test question answering model; the reference answer character string is used for representing a reference answer of the test question corresponding to the test question character string. And then, comparing the reference answer character string with the answer character string to obtain a question judgment processing result. Of course, if the reference answer character string is consistent with the answer character string, the answer processing result is that the answer of the user to the test question is correct, otherwise, if the reference answer character string is inconsistent, the answer is wrong.

The method for processing the questions based on the test question answering model directly outputs the reference answer character string, further shortens the whole question judging link, reduces the accumulated error, and can effectively improve the question judging speed.

In the embodiment of the disclosure, a plurality of models are used in the text processing process, including a detection model, a recognition model, a text generation model and a test question answering model, and training is required to enable the models to be directly used for text processing.

The present embodiment first gives a way to acquire a plurality of training data sets for model training, as shown below.

(a) A sample set of text images containing text objects is obtained. The image samples in the text image sample set can be pre-collected text images to be corrected answered by a large number of users.

(b) Labeling a text detection box of each image in the text image sample set to obtain a detection box sample set; wherein, detecting the frame sample set and including: and image samples marked with test question detection frame samples and answer detection frame samples.

(c) And obtaining a character string sample set by transcribing the character strings of the text objects of the images in the labeled text sample image set. Specifically, a text object of a print and/or a handwriting in an image is transcribed into a character string to obtain a reference character string; the sample set of strings includes: and marking the image sample of the reference character string corresponding to the text object.

(d) Labeling words and keywords of text objects of all images in the text sample image set to obtain a word sample set; wherein, the word sample set includes: a keyword sample set corresponding to the vector matrix of the keywords and a vocabulary sample set corresponding to the vector matrix of the words.

In a specific implementation, words corresponding to the text object are labeled, and keywords corresponding to the text object are labeled, where the keywords are generally part of the words, such as numbers, operation symbols, and proportional relationships. A sample set formed by vector matrixes of words in the images is a vocabulary sample set, and a sample set formed by vector matrixes of keywords in the images is a keyword sample set; the word sample set includes at least one of the vocabulary sample set and the keyword sample set.

The way of labeling words and keywords may be: a keyword recognition model is obtained, which is used for recognizing keywords in text objects and is specifically constructed as a two-layer bidirectional LSTM network. Inputting character strings of a text object into a keyword recognition model, using tools such as jieba word segmentation and the like and combining labeling by the keyword recognition model, segmenting each input character string, giving a higher score to the labeled keyword, and giving a lower score to other words; then, encoding each word and each keyword by using word2vec or Glove and other methods, and correspondingly obtaining a vector matrix of the word and a vector matrix of the keyword.

(e) And modifying the text object with the wrong description of the theme stem in the text image sample set to obtain the standard theme stem description corresponding to the text object, and labeling the standard theme stem description of the text object of each image in the text sample image set to obtain the first text sample set.

In specific implementation, the text object describing the error in the text image sample set can be modified by adopting description information such as missing supplementary text, correcting error description information such as actual fact errors, logic errors, concept use errors and word expression errors and other modification modes, so that the text object correctly describes the text content. Determining a first text sample set corresponding to the text image sample set according to the modified text object with the correct description and the original text object with the correct description; it will be appreciated that the first set of text samples includes image samples annotated with strings describing the correct test questions.

(f) Labeling standard answers of text objects of all the images in the text sample image set to obtain a second text sample set; wherein the second set of text samples comprises: a plurality of image samples labeled with standard answers.

Referring to fig. 2, the above training data sets may be used to train different models, specifically, the detection box sample set is used to train the recognition model due to training the detection model, the word sample set is used to train the text generation model, the first text sample set is used to train the text generation model, and the word sample set is used to train the entity discrimination model. The following examples describe the training process of each model separately.

In one embodiment, the test model to be trained is trained from a set of test box samples. After training is finished, a detection frame capable of simultaneously detecting the surrounding test questions, a detection frame surrounding the solution and a detection frame surrounding the test questions and the solution are obtained. The reason why the multiple detection frames can be obtained by using one detection model in the embodiment is that the aspect ratio difference between the test question and the answer is very small, and the problem of inaccurate detection caused by factors such as deviation and the like cannot occur.

In one embodiment, the recognition model to be trained is trained from a sample set of character strings. Because the character string samples in the character string sample set comprise both the print body and the handwritten form, and the number of the samples in the two styles is large, the recognition model trained by using the character string sample set can recognize the text object of the print body and the text object of the handwritten form.

In one embodiment, the text generation model is obtained by training a preset first generator based on the word sample set and the first text sample set.

The present embodiment utilizes a transform structure to construct the first generator. The first generator comprises an encoder and a decoder; the encoder includes a plurality of first basis modules and the decoder of the first generator includes a plurality of second basis modules. The first and second base modules are substantially identical in structure and each include: multi-headed self-attention layers, jump junctions, layer normalization, and feed-forward neural networks.

Wherein the multi-head self-attention layer in the second basic module is not added with the mask.

Meanwhile, a discriminator is constructed, which is similar to a transform-structured encoder and comprises two same branches, wherein each branch is composed of a plurality of first basic modules, the output is a vector, the outputs of the two branches are spliced together and then input into two full-connection layers, and the number of nodes of the last full-connection layer is 2.

In the embodiment, the first generator and the discriminator are constructed by using the Transformer structure, so that the time for model training and reasoning can be greatly shortened, and the precision of various tasks can be effectively improved.

Based on the word sample set and the first text sample set, training a preset first generator to obtain a text generation model, and referring to the following embodiments, the process is divided into a plurality of stages.

The first stage. Performing first training on an encoder and a decoder in a preset first generator by adopting a keyword sample set, a first text sample set and a preset first loss function until the training is finished when the first loss function is converged to obtain a first generator after the first training; wherein the first generator is constructed using a Transformer structure.

For a first generator, inputting a keyword sample set, and generating a first statement sample for a vector matrix of keywords in the keyword sample set by the first generator, thereby outputting the first statement sample set; the first sentence sample in the first sentence sample set is a sentence with correct content description of the test question stem. And using a multi-classification cross entropy loss function as a first loss function, calculating a loss function value between a first sentence sample in the first sentence sample set and a text sample in the first text sample set, training an encoder and a decoder in the first generator for the first time according to the calculated loss function value, and performing training at a second stage after the training is finished.

And a second stage. Performing secondary training on an encoder and a decoder in the first generator after the primary training by adopting the vocabulary sample set, the first text sample set and a preset second loss function until the training is finished when the second loss function is converged to obtain the first generator after the secondary training; and obtaining a text generation model based on the first generator after the secondary training.

Basically the same principle as the first stage, the second stage inputs a vocabulary sample set with higher recognition precision than the keywords to the first generator after the first training, and the first generator after the first training generates a second sentence sample to the vector matrix of the vocabulary in the vocabulary sample set, thereby outputting the second sentence sample set. And similarly, a multi-class cross entropy loss function can be used as a second loss function, the first generator after the first training is subjected to secondary training based on the loss function values between the second sentence sample set and the first text sample set, and the training is finished and the training of the third stage is carried out.

And a third stage. Acquiring an output test question sentence character string of the first generator for the word sample set after the secondary training; inputting the output test question sentence character string and the word sample set to a preset discriminator, and performing final training on the secondarily trained first generator through the discriminator and a preset third loss function until the training is finished when the third loss function is converged to obtain a finally trained first generator; and taking the finally trained first generator as a text generation model.

In the training process of the third stage, the input of the first generator after the second training is the vector matrix of the keywords of the word sample set and/or the vector matrix of the words, and the output is the second sentence sample set. And the input of the discriminator is a word sample set and a second sentence sample set, the first generator after the secondary training is trained by using the confrontation loss function as a third loss function, and the first generator after the final training is reserved as a text generation model after the training is finished.

Compared with other scheme models, the first generator constructed by using the Transformer structure provided by the embodiment is easier to learn the contents with similar input and output, so that the training is easier and a satisfactory training effect is achieved.

In one embodiment, the test question answering model is obtained by training a preset second generator based on the keyword sample set and the second text sample set. The second generator can be constructed by using a Transformer structure with reference to the first generator, and the specific structure of the second generator is the same as that of the first generator.

The process of training the preset second generator based on the word sample set and the second text sample set to obtain the test question answering model may refer to the following embodiments.

Training an encoder and a decoder in a preset second generator by adopting the word sample set, the second text sample set and a preset fourth loss function until the training is finished when the fourth loss function is converged to obtain a trained second generator; and taking the trained second generator as a test question answering model.

For the second generator, the input is a word sample set, and the second generator generates answer samples for vector matrices of keywords and/or vector matrices of words in the word sample set, thereby outputting the answer sample set. And calculating a loss function value between the first sentence sample in the first sentence sample set and the standard answer in the second text sample set by using the two-classification cross entropy loss function as a fourth loss function, training an encoder and a decoder in a second generator according to the calculated loss function value, and taking the second generator after training as a test question answering model.

In summary, the text processing method provided by the embodiment of the disclosure utilizes the Transformer structure to construct and train to obtain the text generation model and the test question answering model, so that the training difficulty can be reduced, and the training effect can be improved. In the text processing process, the test questions and the answers in the image to be processed are respectively detected and identified, test question character strings and answer character strings can be clearly and clearly distinguished, the processing limitation on question types with semantic information is relieved, and the speed of judging the questions is favorably improved. Based on the above, the trained text generation model or the test question answering model is combined, and the question judgment processing is executed based on the test question character string and the answer character string, so that the question judgment accuracy can be improved.

Based on the text processing method provided by the above embodiment, the embodiment provides a text processing device. Referring to a schematic structural diagram of a text processing apparatus shown in fig. 3, the text processing apparatus 300 includes:

an image obtaining module 302, configured to obtain an image to be processed including a text object;

the image detection module 304 is configured to detect the image to be processed through the detection model to obtain a target detection frame; wherein, the target detection frame includes: a target test question detection frame surrounding the test questions and a target answer detection frame surrounding the answers;

the text recognition module 306 is used for respectively recognizing the target test question detection box and the target answer detection box through the recognition model to obtain a test question character string and an answer character string;

the question judging module 308 is used for executing question judging processing based on the test question judging model, the test question character string and the answer character string; the test question distinguishing model is obtained by training a generator based on a word sample set and a text sample set; the text sample set includes: the test question content describes a correct first set of text samples or a second set of text samples labeled with standard answers.

The device provided by the embodiment has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments.

An exemplary embodiment of the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor, the computer program, when executed by the at least one processor, is for causing the electronic device to perform a method according to an embodiment of the disclosure.

The exemplary embodiments of the present disclosure also provide a computer program product comprising a computer program, wherein the computer program, when executed by a processor of a computer, is adapted to cause the computer to perform a method according to an embodiment of the present disclosure.

Referring to fig. 4, a block diagram of a structure of an electronic device 400, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 4, the electronic device 400 includes a computing unit 401 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data required for the operation of the device 400 can also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

A number of components in the electronic device 400 are connected to the I/O interface 405, including: an input unit 406, an output unit 407, a storage unit 408, and a communication unit 409. The input unit 406 may be any type of device capable of inputting information to the electronic device 400, and the input unit 406 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. Output unit 407 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, a video/audio output terminal, a vibrator, and/or a printer. The storage unit 404 may include, but is not limited to, a magnetic disk, an optical disk. The communication unit 409 allows the electronic device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.

Computing unit 401 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 401 executes the respective methods and processes described above. For example, in some embodiments, the text recognition method or the training method of the recognition network may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 400 via the ROM 402 and/or the communication unit 409. In some embodiments, the computing unit 401 may be configured to perform a text recognition method or a training method of a recognition network by any other suitable means (e.g., by means of firmware).

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of text processing, comprising:

acquiring an image to be processed containing a text object;

2. The method of claim 1, wherein the test question discrimination model comprises: generating a model for the text; the executing of the question-judging process based on the question-judging model, the question character string and the answer character string includes:

inputting the test question character string into the text generation model, and generating a prediction statement character string through the text generation model;

calculating the similarity between the test question character string and the prediction statement character string, and judging whether the similarity reaches a preset similarity threshold value;

if yes, determining that the test question content represented by the test question character string is correctly described;

if not, determining that the test question content represented by the test question character string is wrongly described;

and comparing the achieved or not achieved judgment result with the answer character string to obtain a question judgment processing result.

3. The method of claim 1, wherein the test question discrimination model comprises: a test question answering model; the executing of the question-judging process based on the question-judging model, the question character string and the answer character string includes:

inputting the test question character string into the test question answering model, and generating a reference answer character string through the test question answering model; the reference answer character string is used for representing a reference answer of the test question corresponding to the test question character string;

and comparing the reference answer character string with the answer character string to obtain a processing result of the judgment.

4. The method of claim 1, further comprising:

acquiring a text image sample set containing a text object;

labeling the text detection boxes of the images in the text image sample set to obtain a detection box sample set; wherein the detection box sample set comprises: image samples marked with test question detection frame samples and answer detection frame samples;

obtaining a character string sample set by transcribing and marking character strings of text objects of all images in the text sample image set; wherein the sample set of character strings comprises: marking an image sample of a reference character string corresponding to the text object;

labeling words and keywords of the text object of each image in the text sample image set to obtain a word sample set; wherein the sample set of words comprises: a keyword sample set corresponding to the vector matrix of the keywords and a vocabulary sample set corresponding to the vector matrix of the words;

modifying the text object with the wrong description of the theme stem in the text image sample set to obtain the standard theme stem description corresponding to the text object, and labeling the standard theme stem description of the text object of each image in the text sample image set to obtain the first text sample set; and/or labeling standard answers of the text objects of the images in the text sample image set to obtain the second text sample set.

5. The method of claim 4, wherein the test question discrimination model comprises: generating a model for the text; the test question discrimination model is obtained by training a generator based on a word sample set and a text sample set, and comprises the following steps:

performing first training on an encoder and a decoder in a preset first generator by adopting the keyword sample set, the first text sample set and a preset first loss function until the training is finished when the first loss function is converged to obtain a first generator after the first training; wherein the first generator is constructed using a Transformer architecture;

performing secondary training on an encoder and a decoder in the first generator after the first training by adopting the vocabulary sample set, the first text sample set and a preset second loss function until the training is finished when the second loss function is converged to obtain the first generator after the secondary training;

and obtaining the text generation model based on the secondarily trained first generator.

6. The method of claim 5, wherein obtaining the text generation model based on the second trained first generator comprises:

acquiring the character string of the output test question sentence of the first generator after the secondary training aiming at the word sample set;

inputting the output test question sentence character string and the word sample set to a preset discriminator, and performing final training on the secondarily trained first generator through the discriminator and a preset third loss function until the training is finished when the third loss function is converged to obtain a finally trained first generator;

and taking the finally trained first generator as the text generation model.

7. The method of claim 5, wherein the encoder of the first generator comprises a plurality of first basis modules, and the decoder of the first generator comprises a plurality of second basis modules;

the first base module and the second base module each include: the multi-head self-attention layer in the second basic module is not added with a mask.

8. The method of claim 4, wherein the test question discrimination model comprises: a test question answering model; the test question discrimination model is obtained by training a generator based on a word sample set and a text sample set, and comprises the following steps:

training an encoder and a decoder in a preset second generator by adopting the word sample set, the second text sample set and a preset fourth loss function until the training is finished when the fourth loss function is converged to obtain a trained second generator;

and taking the trained second generator as the test question answering model.

9. The method of claim 4, further comprising:

training a detection model to be trained according to the detection frame sample set;

and training the recognition model to be trained according to the character string sample set.

10. A text processing apparatus, comprising:

11. An electronic device, characterized in that the electronic device comprises:

a processor; and

a memory for storing a program, wherein the program is stored in the memory,

wherein the program comprises instructions which, when executed by the processor, cause the processor to carry out the text processing method according to any one of claims 1 to 9.

12. A non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the text processing method according to any one of claims 1 to 9.