CN111767883A

CN111767883A - Title correction method and device

Info

Publication number: CN111767883A
Application number: CN202010645304.9A
Authority: CN
Inventors: 杨万里; 郭常圳
Original assignee: Beijing Ape Power Future Technology Co Ltd
Current assignee: Beijing Ape Power Future Technology Co Ltd
Priority date: 2020-07-07
Filing date: 2020-07-07
Publication date: 2020-10-13
Anticipated expiration: 2040-07-07
Also published as: CN111767883B

Abstract

The application provides a topic correction method and a device, wherein the topic correction method comprises the following steps: receiving a picture to be identified, wherein the picture to be identified comprises a subject to be corrected; performing target detection on the picture to be identified, and determining a first detection area and a second detection area corresponding to the question to be corrected; carrying out image description identification on the first detection area to obtain image description information of the first detection area, and carrying out text description identification on the second detection area to obtain text information of the second detection area; according to the image description information and the text information, the correction result of the to-be-corrected topic is determined, the image description information identifying the first detection area and the text information identifying the second detection area are detected and identified, the image description information identifying the first detection area and the text information identifying the second detection area are compared, the image type topic correction can be directly performed, the execution efficiency is high, the consumed computing resources are less, and the final topic correction accuracy is higher.

Description

Title correction method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a title approval method and apparatus, a computing device, and a computer-readable storage medium.

Background

With the development of computer technology, online teaching is rapidly developed, corresponding teaching tool products are produced, technical support and help in education guidance are provided for students, teachers and parents, and a plurality of teaching tool products can provide the function of correcting subjects by taking pictures.

The current tool for carrying out batch correction function by photographing questions intelligently solves the problem of the arithmetic type of a primary school stage, can not directly process batch correction of graphic questions such as abacus and the like, most of the batch correction of the questions of the arithmetic type is solved by replacing the scheme of searching pictures by pictures, but the batch correction of the questions can be carried out only when the questions in a question bank have corresponding questions, the picture type is usually huge in quantity and complex in category, when the number of the questions in the question bank is small, the searching result is not good, accurate answers cannot be obtained, when the number of the questions in the question bank is large, the problems of low searching efficiency, long searching time and the like are usually brought, and therefore the searching mode of the question bank depends on the instructions of the question bank and the advantages and disadvantages of the picture searching algorithm.

Therefore, how to solve the above problems and improve the correction efficiency of graphic titles becomes a problem to be solved urgently by technical staff.

Disclosure of Invention

In view of this, embodiments of the present application provide a title approval method and apparatus, a computing device, and a computer-readable storage medium, so as to solve technical defects in the prior art.

According to a first aspect of the embodiments of the present application, there is provided a title modification method, including:

receiving a picture to be identified, wherein the picture to be identified comprises a subject to be corrected;

performing target detection on the picture to be identified, and determining a first detection area and a second detection area corresponding to the question to be corrected;

carrying out image description identification on the first detection area to obtain image description information of the first detection area, and carrying out text description identification on the second detection area to obtain text information of the second detection area;

and determining a correction result of the subject to be corrected according to the image description information and the text information.

Optionally, performing target detection on the picture to be recognized, and determining a first detection region and a second detection region corresponding to the title to be corrected, includes:

inputting the picture to be recognized into a target detection model for target detection, and determining a first detection area corresponding to the question to be corrected;

and inputting the picture to be recognized into a text box detection model for target detection, and determining a second detection area corresponding to the question to be corrected.

Optionally, the first detection region comprises at least one first detection sub-region;

the image description identification of the first detection area is performed to obtain the image description information of the first detection area, and the method comprises the following steps:

and carrying out image description identification on the first detection areas through an image description model, and acquiring image description information corresponding to each first detection area.

Optionally, the second detection region comprises at least one second detection sub-region;

performing text description recognition on the second detection area to acquire text information of the second detection area, including:

and carrying out image description identification on the second detection areas through a text identification model, and acquiring text information corresponding to each second detection area.

Optionally, the first detection region includes at least one first detection sub-region, and the second detection region includes at least one second detection sub-region;

determining a correction result of the to-be-corrected subject according to the image description information and the text information, comprising:

and determining a correction result of the to-be-corrected topic according to the image description information corresponding to each first detection subarea and the text information corresponding to each second detection subarea.

Optionally, determining a correction result of the to-be-corrected title according to the image description information corresponding to each first detection sub-region and the text information corresponding to each second detection sub-region, includes:

matching a corresponding second detector region for each of the first detector regions;

comparing the image description information of each first detection subregion with the text information of a second detection subregion corresponding to the first detection subregion, and determining the batch modification result corresponding to each first detection subregion topic.

Optionally, matching a corresponding second detector sub-region for each first detector sub-region includes:

acquiring a first target point coordinate, a first length and a first width corresponding to each first detection sub-area, and acquiring a second target point coordinate, a second length and a second width corresponding to each second detection sub-area;

determining first detection sub-region position information corresponding to each first detection sub-region according to a first target point coordinate, a first length and a first width corresponding to each first detection sub-region, and determining second detection sub-region position information corresponding to each second detection sub-region according to a second target point coordinate, a second length and a second width corresponding to each second detection sub-region;

and determining a second detector sub-region corresponding to each first detector sub-region according to the position information of each first detector sub-region and the position information of each second detector sub-region.

Optionally, the method further includes:

and returning the prompt that the question type of the question to be corrected does not accord with the question type under the condition that the first detection area corresponding to the question to be corrected is not detected.

According to a second aspect of the embodiments of the present application, there is provided a title modification device, including:

the receiving module is configured to receive a picture to be identified, wherein the picture to be identified comprises a subject to be corrected;

the detection module is configured to perform target detection on the picture to be identified and determine a first detection area and a second detection area corresponding to the title to be corrected;

the identification module is configured to perform image description identification on the first detection area to acquire image description information of the first detection area, and perform text description identification on the second detection area to acquire text information of the second detection area;

and the determining module is configured to determine a correction result of the to-be-corrected title according to the image description information and the text information.

According to a third aspect of embodiments of the present application, there is provided a computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the title modification method when executing the instructions.

According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the title batching method.

According to the title correction method provided by the embodiment of the application, the picture to be identified is received, wherein the picture to be identified comprises the title to be corrected; performing target detection on the picture to be identified, determining a first detection area and a second detection area corresponding to the question to be corrected, and determining a question position and an answer position; carrying out image description identification on the first detection area to obtain image description information of the first detection area, carrying out text description identification on the second detection area to obtain text information of the second detection area, and obtaining question information and answer information; according to the image description information and the text information, the correction result of the subject to be corrected is determined, the image description information for identifying the first detection area and the text information for identifying the second detection area are detected and compared, the graphic subject can be directly corrected, a subject library is not needed, a complex searching process of searching the image by the image is omitted, the execution efficiency is high, the consumed computing resources are less, and the final subject correction accuracy is higher.

Drawings

FIG. 1 is a block diagram of a computing device provided by an embodiment of the present application;

FIG. 2 is a flow chart of a topic modification method provided in an embodiment of the present application;

fig. 3 is a structural diagram of a yoolov 3 network provided in the embodiment of the present application;

FIG. 4 is a flowchart of a topic modification method according to another embodiment of the present application;

FIGS. 5a to 5g are schematic diagrams of a topic batching method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a title correction device provided in an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit of this application and is therefore not limited to the specific implementations disclosed below.

The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present invention relate are explained.

Image description model (Image capturing): is a neural network model of very classical two-dimensional information recognition of an image, the input of the image description is an image, and the output is description information of the image, and an encoder and a decoder are generally included in the image description model.

Encoder (Encoder): the image description model is used for coding an input image to extract characteristic information, and a convolutional neural network is commonly used for extracting the characteristic information, so that the information of input sequence data can be compressed into a context vector with a fixed length, and a desired expression vector can better contain the information of the whole input.

Decoder (Decoder): the image description model is used for receiving the feature information output by the encoder, and the decoder usually uses a recurrent neural network, a long-short term memory network, etc. to process the feature information to obtain a string of output sentences.

Attention mechanism (Attention): the attention mechanism can be roughly understood in deep learning as how much attention is focused on a certain vector, which may represent a certain local area in an image or a certain word in a sentence, and the attention vector is used to estimate the strength of the relationship between the focused part and other elements, and the values of different parts and the result weighted by the attention vector are taken as the approximate values of the target.

Embedding (Embedding): the embedded vector has the property that objects corresponding to vectors with similar distances have similar meanings, and the characteristic that the embedded vector can encode the objects by using the low-dimensional vector and can also retain the meanings of the objects is very suitable for deep learning.

In the present application, a title correction method and apparatus, a computing device and a computer-readable storage medium are provided, which are described in detail in the following embodiments one by one.

FIG. 1 shows a block diagram of a computing device 100 according to an embodiment of the present application. The components of the computing device 100 include, but are not limited to, memory 110 and processor 120. The processor 120 is coupled to the memory 110 via a bus 130 and a database 150 is used to store data.

Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 140 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present application, the above-mentioned components of the computing device 100 and other components not shown in fig. 1 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 1 is for purposes of example only and is not limiting as to the scope of the present application. Those skilled in the art may add or replace other components as desired.

Computing device 100 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.

Wherein, the processor 120 can execute the steps of the title correcting method shown in fig. 2. FIG. 2 shows a flowchart of a topic modification method according to an embodiment of the present application, including steps 202 to 208.

Step 202: receiving a picture to be identified, wherein the picture to be identified comprises a subject to be corrected.

The picture to be recognized is a picture including a subject to be corrected and sent by a user through a device terminal, for example, a picture of an exercise book shot by the user through a mobile phone, a picture obtained by scanning the exercise book by the user through application software, and the like.

The picture to be identified comprises the question to be corrected, wherein the question to be corrected is in a figure type, such as a bead calculation question, a figure recognition question and the like, and a user needs to correct the answer of the question to be corrected and judge whether the answer is correct or not.

In a specific implementation manner provided by the application, a picture to be recognized is taken as an example of an exercise book picture shot by a user through a mobile phone, a subject to be corrected in the picture to be recognized is a bead calculation subject of a graphic class, and a number is determined by observing the number of beads on the bead calculation.

In another specific implementation manner provided by the application, taking the picture to be recognized as a picture obtained by a user through scanning an exercise book by using application software as an example, the topic to be changed in the picture to be recognized is a picture-like recognition topic, and a total number of sheep in the picture is determined by observing the picture.

Step 204: and carrying out target detection on the picture to be identified, and determining a first detection area and a second detection area corresponding to the question to be corrected.

The first detection area is an image area corresponding to the subject to be corrected, the first detection area is a subject part in the subject to be corrected, and taking the abacus subject as an example, the first detection area is an area of the abacus graph.

The second detection area is an answer area corresponding to the question to be corrected, the second detection area is an answer part in the question to be corrected, and if the abacus question is taken as an example, the second detection area is the answer area of the abacus picture.

In practical application, target detection needs to be performed on a picture to be identified, and a first detection area and a second detection area corresponding to the question to be corrected are determined, that is, an image area and an answer area corresponding to the question to be corrected are determined.

Optionally, performing target detection on the picture to be recognized, and determining a first detection region and a second detection region corresponding to the title to be corrected, includes: inputting the picture to be recognized into a target detection model for target detection, and determining a first detection area corresponding to the question to be corrected; and inputting the picture to be recognized into a text box detection model for target detection, and determining a second detection area corresponding to the question to be corrected.

The target detection model is used for solving the problem of target detection, namely, an image is given, targets in the image are found, the positions of the targets are found, and the targets are classified, the target detection model is usually trained on a group of fixed training sets, the target detection model needs to determine the position information of the targets in the image and classify the targets, the images to be recognized are recognized through the target detection model, a first detection area can be recognized accurately, the detection purpose is achieved through a neural network model, the recognition accuracy is effectively improved, and the images to be recognized do not need to be searched from a question bank.

The target detection model may be a fast R-CNN, SSD or YoloV3 model, and the specific architecture of the target detection model is not limited in the present application.

For further explanation of the present application, taking the target detection model as yoolov 3 model as an example, the backbone network used by yoolov 3 model is a darknet-53 network, and the first 52 layers in the network structure of the darknet-53 network, as shown in fig. 3 below, fig. 3 shows the structure diagram of yoolov 3 network, wherein DBL is the basic component of yoolov 3, and is convolution + BN + leak gyrlu, and for yoolov 3, BN and leak relu are already inseparable parts from the convolution layer, and together form the minimum component.

N in resn represents a number, including res1, res2, …, res8, etc., indicating how many res _ units are contained in the res _ block. Being a large component of yoolov 3, yoolov 3 started to draw reference to the residual structure of ResNet, and using this structure can make the network structure deeper.

Concat is tensor stitching, stitching the up-sampling of the middle layer and the later layer of the darknet. The operation of splicing is different from that of the residual layer add, splicing expands the dimensionality of the tensor, and adding add directly does not result in a change in the tensor dimensionality.

As shown in fig. 3, YoloV3 outputs 3 feature images Y1, Y2, and Y3 of different scales, where the depths of Y1, Y2, and Y3 are 255, the side length rule is 13:26:52, 3 prediction frames are output in each feature image, 9 prediction frames are counted, and the prediction frame with the highest target existence probability score is found from the 9 prediction frames as the first detection region corresponding to the topic to be modified.

Optionally, when the first detection region corresponding to the to-be-corrected topic is not detected, returning a prompt that the topic type of the to-be-corrected topic does not conform to the question type.

In practical application, if the first detection area corresponding to the to-be-corrected question is not detected in the to-be-recognized picture, the to-be-corrected question not including graphics in the to-be-recognized picture is determined, and prompt information that the question types of the to-be-corrected questions are not in accordance is returned.

The text box detection model is used for detecting a text box in the picture to be recognized, and as the target detection model, the text box detection model also outputs position information of the text in the picture to be recognized and classifies the text, such as a handwritten text, a printed number text, a printed text and the like, and the text box detection model is preferably a combination of YoloV3+ Feature Pyramids (FPN). The characteristic pyramid (FPN) solves the problem of multiple scales in the target detection of the picture, namely, for the same picture to be identified, a large target can be detected, and a small target can also be detected. The text box detection model identifies the text box in a neural network model mode, and is high in speed and high in accuracy.

In a specific embodiment provided by the present application, a picture to be recognized is input to a text box detection model for text box detection, and the text box can recognize different text boxes in the picture to be recognized, and the text boxes correspond to different categories, such as a text box category corresponding to a handwritten text is 101, a text box category corresponding to a printed digital text is 102, a text box category corresponding to a printed text is 103, and the like.

Step 206: and performing image description identification on the first detection area to acquire image description information of the first detection area, and performing text description identification on the second detection area to acquire text information of the second detection area.

After the first detection area and the second detection area are determined, image description information of the image area needs to be obtained by performing image description recognition on an image in the first detection area, and text information of the answer area needs to be obtained by performing text recognition on the answer area in the second detection area.

Optionally, the first detection region comprises at least one first detection sub-region; the image description identification of the first detection area is performed to obtain the image description information of the first detection area, and the method comprises the following steps: and carrying out image description identification on the first detection areas through an image description model, and acquiring image description information corresponding to each first detection area.

In practical applications, one image classification problem has a plurality of subtotals, for example, for one recognition problem, a plurality of pictures are recognized, for one bead calculation problem, a plurality of pictures are also recognized, a plurality of first detection sub-regions are provided in the first detection region, and each first detection sub-region is a subtotal.

And carrying out image description identification on each first detection subarea through an image description model, and acquiring image description information corresponding to each first detection subarea.

The Image description (Image capturing) model is a sequence-to-sequence structure, and includes two parts, namely an encoder and a decoder, and is used for receiving an Image, and outputting a text or a sentence describing the Image by performing recognition analysis on the Image.

The encoder of the image description model uses a convolutional neural network (Resnet) with a residual error structure, encodes an input image by performing convolution operation on the image, extracts visual characteristic information of the image to generate a coding vector, performs embedding processing on the coding vector to perform dimension reduction processing on original characteristic information, inputs the processed coding vector to a decoder of the image description model to decode, wherein the decoder comprises a bidirectional long-short term memory network (LSTM) and an Attention mechanism (Attention), decodes the coding vector through the decoder, and outputs description information corresponding to the image.

In one embodiment, an image of a subject in bead mathematics is taken as an example in the first detection sub-region, wherein 2 beads are present in hundreds, 3 beads are present in tens, and 1 bead is present in units, and the image is input into an image description model for processing, so that a corresponding numerical result is obtained as 231.

In another embodiment provided by the present application, taking the first detection sub-region as an image of a subject of the chart recognition, 3 sheep are input into the image description model for processing, and the corresponding numerical result is 3.

Optionally, the second detection region comprises at least one second detection sub-region; performing text description recognition on the second detection area to acquire text information of the second detection area, including: and carrying out image description identification on the second detection areas through a text identification model, and acquiring text information corresponding to each second detection area.

The number of the second detection subareas in the second detection area is the same as that of the first detection subareas, and each second detection subarea corresponds to one first detection subarea, namely, a corresponding answer result exists under each detection subimage. And corresponding answer results are obtained in the second detection subareas, and corresponding text information is obtained by carrying out image description identification on each second detection subarea.

The text recognition model is based on a Convolutional Neural Network (CNN) and a long-short term memory network (LSTM), wherein a residual error neural network structure is applied to the convolutional neural network and is used for extracting feature information of an image from the image of the second detection sub-region, then inputting the extracted feature information into the long-short term memory network for classification and recognition, and finally obtaining character information in the second detection sub-region.

In a specific embodiment provided by the present application, taking the content in the second detection sub-area as a handwritten number "57" as an example, the handwritten number is input into the text recognition model to be processed, and the corresponding character information is obtained as "57".

In another specific embodiment provided by the present application, taking the content in the second detection sub-area as handwritten capitalized number "fifty seven" as an example, the content is input into the text recognition model for processing, and the corresponding text information is obtained as "fifty seven".

Step 208: and determining a correction result of the subject to be corrected according to the image description information and the text information.

Optionally, the first detection region includes at least one first detection sub-region, and the second detection region includes at least one second detection sub-region; determining a correction result of the to-be-corrected subject according to the image description information and the text information, comprising: and determining a correction result of the to-be-corrected topic according to the image description information corresponding to each first detection subarea and the text information corresponding to each second detection subarea.

The first detection zone comprises at least one first detector sub-zone, the second detection zone comprises at least one second detector sub-zone, and each first detector sub-zone corresponds to one second detector sub-zone.

In practical application, a correction result of a subject to be corrected is determined according to the image description information corresponding to each first detection subarea and the text information corresponding to each second detection subarea, and the method includes steps S2082-S2084:

and S2082, matching a corresponding second detection subarea for each first detection subarea.

Specifically, a first target point coordinate, a first length and a first width corresponding to each first detection sub-region are obtained, and a second target point coordinate, a second length and a second width corresponding to each second detection sub-region are obtained; determining first detection sub-region position information corresponding to each first detection sub-region according to a first target point coordinate, a first length and a first width corresponding to each first detection sub-region, and determining second detection sub-region position information corresponding to each second detection sub-region according to a second target point coordinate, a second length and a second width corresponding to each second detection sub-region; and determining a second detector sub-region corresponding to each first detector sub-region according to the position information of each first detector sub-region and the position information of each second detector sub-region.

The first target point coordinate may be a coordinate of any vertex of the first detection sub-region, such as an upper left corner coordinate or a lower right corner coordinate, the first length is a length of the first detection sub-region, the first width is a width of the first detection sub-region, and similarly, the second target point coordinate may be a coordinate of any vertex of the second detection sub-region, the second length is a length of the second detection sub-region, and the second width is a width of the second detection sub-region.

According to the first target point, the first length and the first width, the position information of the first detection subarea can be determined, and according to the second target point, the second length and the second width, the position information of the second detection subarea can be determined.

And matching the corresponding second detection subarea for the first detection subarea according to the position information of the first detection subarea and the position information of the second detection subarea.

In the embodiments provided in the present application, 3 first detector sub-regions in the first detection region and 3 second detector sub-regions in the second detection region are taken as examples for explanation.

Taking the vertex of the upper left corner of each first detection subregion as a first target point, the coordinates of the vertices of the upper left corners of the 3 first detection subregions are respectively (5, 10), (30, 10), (55, 9), the first length is 10, the first width is 10, and the position information of the first detection subregions of the three first detection subregions is respectively A₁(5，10，10，10)、A₂(30，10，10，10)、A₃(55，9，10，10)。

Taking the vertex of the upper left corner of each second detection sub-region as a second target point, the coordinates of the vertices of the upper left corners of the 3 second detection sub-regions are (8, 21), (32, 20), (53, 21), the second length is 8, the second width is 5, and the position information of the second detection sub-regions of the three second detection sub-regions is B₁(8，21，8，5)、B₂(32，20，8，5)、B₃(53，21，8，5)。

Wherein the first detection subregion position information A₁And second detection sub-region position information B₁The difference value of the X-axis of the coordinate points is 3, which is smaller than the preset threshold, and the difference value of the Y-axis of the coordinate points is 11, which is also smaller than the preset threshold, the first detection sub-region a may be determined₁And a second detector region B₁Correspondingly, and so on, the first detection subregion A₂And a second detector region B₂Correspondingly, the first detector area A₃And a second detector region B₃And correspondingly.

And each first detection subarea is matched with a corresponding second detection subarea, so that the subsequent comparison process is more targeted and the comparison efficiency is higher.

S2084, comparing the image description information of each first detection subarea with the text information of a second detection subarea corresponding to the first detection subarea, and determining the batch correction result corresponding to each first detection subarea topic.

After the first detection sub-region and the second detection sub-region corresponding to the first detection sub-region are determined, comparing image description information of the first detection sub-region with text information corresponding to the second detection sub-region, if the image description information is consistent with the text information, the question answering in the first detection sub-region is correct, and if the image description information is not consistent with the text information, the question answering in the first detection sub-region is wrong.

It should be noted that when comparing the image description information and the text information, the image description information and the text information need to be standardized, such as bracket removal, unit removal, recognition result replacement, and the like. The standardization processing effectively reduces interference information, unifies different image description information and text information into the same format, and facilitates improvement of comparison efficiency in subsequent comparison.

In a specific embodiment provided by the present application, the image description information is "057", the text information is "fifty-seven", the image description information needs to be simplified, the image description information is determined to be "57", the text information is replaced with the identification result, the text information is converted into "57", and then the image description information and the text information are compared to determine that the correction result of the subject is correct.

The display forms of the correction result are many, for example, the correction result is represented by a square root and an error is represented by an x; or the correction result is correctly represented by a green frame, and the error is identified by a red frame.

Fig. 4 shows a topic modification method according to an embodiment of the present application, which is described by taking topic modification for an abacus topic as an example, and includes steps 402 to 414.

Step 402: receiving a picture to be identified, wherein the picture to be identified comprises a subject to be corrected.

In the embodiment provided by the present application, referring to fig. 5a, fig. 5a is a picture to be identified provided by the embodiment of the present application, wherein the subject to be modified is a bead calculation subject.

Step 404: and inputting the picture to be identified into a target detection model for target detection, and determining at least one first detection subarea corresponding to the subject to be corrected.

In the embodiment provided by the present application, as shown in fig. 5b, fig. 5b shows a schematic view of a first detection sub-region provided by the embodiment of the present application, where a picture to be recognized is input to a target detection model for target detection, and 4 first detection sub-regions of types 301 corresponding to a title to be modified are determined.

Step 406: and carrying out image description identification on the first detection areas through an image description model, and acquiring image description information corresponding to each first detection area.

In the embodiment provided by the present application, as shown in fig. 5c, fig. 5c shows image description information corresponding to the first detection sub-region provided by the embodiment of the present application. And inputting each first detection subregion into an image description model for image description identification, and identifying that the image description information corresponding to each first detection subregion is 01220, 01203, 05300 and 02050 respectively from left to right.

Step 408: and inputting the picture to be recognized into a text box detection model for target detection, and determining at least one second detection subarea corresponding to the question to be corrected.

In the embodiment provided by the application, referring to fig. 5d, fig. 5d shows a schematic diagram of a second detection sub-region provided by the embodiment of the application, where a picture to be recognized is input to a text box detection model for target detection, and 4 second detection sub-regions with the type 102 corresponding to the title to be modified are determined.

Step 410: and carrying out image description identification on the second detection areas through a text identification model, and acquiring text information corresponding to each second detection area.

In the embodiment provided by the present application, referring to fig. 5e, fig. 5e shows text information corresponding to the second detection sub-region provided by the embodiment of the present application. Inputting each second detection subarea into a text recognition model for image description recognition, and obtaining text information corresponding to each second detection subarea as 1220, 1203, 5300 and 2050 from left to right.

It should be noted that the execution sequence between steps 404 to 406 and 408 to 410 is not necessarily sequential, and may be executed simultaneously.

Step 412: matching a corresponding second detector sub-region for each of the first detector sub-regions.

In the embodiment provided by the present application, referring to fig. 5f, fig. 5f shows a schematic diagram of matching the first detector sub-region and the second detector sub-region provided by the embodiment of the present application, as shown in fig. 5f, the first detector sub-region 0 corresponds to the second detector sub-region 0, the first detector sub-region 1 corresponds to the second detector sub-region 1, the first detector sub-region 2 corresponds to the second detector sub-region 2, and the first detector sub-region 3 corresponds to the second detector sub-region 3.

Step 414: comparing the image description information of each first detection subregion with the text information of a second detection subregion corresponding to the first detection subregion, and determining the batch modification result corresponding to each first detection subregion topic.

In the embodiment provided by the present application, referring to fig. 5g, fig. 5g shows a schematic diagram of comparison between image description information of a first detection sub-region and text information of a second detection sub-region, which is provided by the embodiment of the present application, as shown in fig. 5g, the image description information of the first detection sub-region 0 is 01220, the text information of the second detection sub-region 0 is 1220, and the comparison shows that 1220 is 01220, and further it is determined that the modification result corresponding to the topic in the first detection sub-region 0 is correct, and so on, the modification result corresponding to the topic in the first detection sub-region 1 is correct, the modification result corresponding to the topic in the first detection sub-region 2 is correct, and the modification result corresponding to the topic in the first detection sub-region 3 is correct.

Corresponding to the above method embodiment, the present application further provides an embodiment of a title correction device, and fig. 6 shows a schematic structural diagram of the title correction device according to an embodiment of the present application. As shown in fig. 6, the apparatus includes:

the receiving module 602 is configured to receive a picture to be identified, wherein the picture to be identified includes a topic to be corrected;

a detection module 604, configured to perform target detection on the picture to be identified, and determine a first detection region and a second detection region corresponding to the title to be modified;

an identifying module 606 configured to perform image description identification on the first detection region to obtain image description information of the first detection region, and perform text description identification on the second detection region to obtain text information of the second detection region;

the determining module 608 is configured to determine a correction result of the to-be-corrected topic according to the image description information and the text information.

Optionally, the detection module 604 is further configured to input the picture to be recognized into a target detection model for target detection, and determine a first detection area corresponding to the title to be modified; and inputting the picture to be recognized into a text box detection model for target detection, and determining a second detection area corresponding to the question to be corrected.

the identifying module 606 is further configured to perform image description identification on the first detection regions through an image description model, and acquire image description information corresponding to each first detection region.

the identification module 606 is further configured to perform image description identification on the second detection regions through a text identification model, and acquire text information corresponding to each second detection region.

the determining module 608 is further configured to determine a correction result of the to-be-corrected topic according to the image description information corresponding to each of the first detection sub-regions and the text information corresponding to each of the second detection sub-regions.

Optionally, the determining module 608 is further configured to match a corresponding second detection sub-region for each first detection sub-region; comparing the image description information of each first detection subregion with the text information of a second detection subregion corresponding to the first detection subregion, and determining the batch modification result corresponding to each first detection subregion topic.

Optionally, the determining module 608 is further configured to obtain a first target point coordinate, a first length, and a first width corresponding to each first detection sub-region, and obtain a second target point coordinate, a second length, and a second width corresponding to each second detection sub-region; determining first detection sub-region position information corresponding to each first detection sub-region according to a first target point coordinate, a first length and a first width corresponding to each first detection sub-region, and determining second detection sub-region position information corresponding to each second detection sub-region according to a second target point coordinate, a second length and a second width corresponding to each second detection sub-region; and determining a second detector sub-region corresponding to each first detector sub-region according to the position information of each first detector sub-region and the position information of each second detector sub-region.

Optionally, the apparatus further comprises:

and the returning module is configured to return the question type non-conformity prompt of the to-be-corrected question under the condition that the first detection area corresponding to the to-be-corrected question is not detected.

The title correction device provided by the embodiment of the application receives a picture to be identified, wherein the picture to be identified comprises a title to be corrected; performing target detection on the picture to be identified, determining a first detection area and a second detection area corresponding to the question to be corrected, and determining a question position and an answer position; carrying out image description identification on the first detection area to obtain image description information of the first detection area, carrying out text description identification on the second detection area to obtain text information of the second detection area, and obtaining question information and answer information; according to the image description information and the text information, the correction result of the subject to be corrected is determined, the image description information for identifying the first detection area and the text information for identifying the second detection area are detected and compared, the graphic subject can be directly corrected, a subject library is not needed, a complex searching process of searching the image by the image is omitted, the execution efficiency is high, the consumed computing resources are less, and the final subject correction accuracy is higher.

An embodiment of the present application further provides a computing device, which includes a memory, a processor, and computer instructions stored in the memory and executable on the processor, wherein the processor executes the instructions to implement the steps of the title modifying method.

An embodiment of the present application further provides a computer readable storage medium, which stores computer instructions, and the instructions, when executed by a processor, implement the steps of the title batching method as described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the title correcting method belong to the same concept, and for details that are not described in detail in the technical solution of the storage medium, reference may be made to the description of the technical solution of the title correcting method.

The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims

1. A topic approval method is characterized by comprising the following steps:

2. The title correction method according to claim 1, wherein the performing target detection on the picture to be recognized and determining the first detection region and the second detection region corresponding to the title to be corrected comprises:

3. The title batching method according to claim 1, wherein said first detection area comprises at least one first detection sub-area;

4. The title batching method according to claim 1, wherein said second detection area comprises at least one second detection sub-area;

5. The title batching method according to claim 1, wherein said first detection region comprises at least one first detection sub-region and said second detection region comprises at least one second detection sub-region;

6. The title approval method of claim 5, wherein determining the approval result of the title to be approved according to the image description information corresponding to each of the first detection sub-regions and the text information corresponding to each of the second detection sub-regions comprises:

7. The title wholesale method of claim 6, wherein matching, for each of the first detector regions, a corresponding second detector region comprises:

8. The title batching method of claim 1, wherein the method further comprises:

9. A title approval apparatus, comprising:

10. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-8 when executing the instructions.

11. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 8.