CN113111154A

CN113111154A - Similarity evaluation method, answer search method, device, equipment and medium

Info

Publication number: CN113111154A
Application number: CN202110655717.XA
Authority: CN
Inventors: 李自荐; 秦勇
Original assignee: Beijing Century TAL Education Technology Co Ltd
Current assignee: Beijing Century TAL Education Technology Co Ltd
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2021-07-13
Anticipated expiration: 2041-06-11
Also published as: CN113111154B

Abstract

The disclosure relates to a similarity evaluation method, an answer search method, a device, equipment and a medium, wherein the similarity evaluation method comprises the following steps: acquiring a target character string with similarity to be evaluated and a target text image; inputting the target character string and the target text image into a similarity evaluation model obtained by pre-training; respectively extracting character characteristic information of the target character string and image characteristic information of the target text image through a similarity evaluation model, and evaluating the content similarity of the target character string and the target text image based on the character characteristic information and the image characteristic information; the character feature information comprises a position relation and a semantic relation among character features; the image feature information includes a positional relationship and a semantic relationship between image features. The method comprehensively improves the similarity evaluation accuracy, is favorable for further improving the answer search precision, and can be better applied to photo-taking and question judgment.

Description

Similarity evaluation method, answer search method, device, equipment and medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a similarity evaluation method, an answer search method, an apparatus, a device, and a medium.

Background

The shooting question judgment is an important application of the artificial intelligence technology in the field of education, so that the question judgment cost of teachers can be saved, and the question judgment efficiency is improved. Specifically, the user takes a picture after answering the question, and uploads the answer image obtained by taking the picture to an application program capable of taking the picture and judging the question, and the application program identifies and scores the answer image.

However, the application of the current photographing problem-judging technology is very limited, most of the current photographing problem-judging technologies can only process primary school mathematics calculation problems with modifiable logics such as horizontal type and vertical type, but can not process common problem types with semantic information, the main difficulty is that the current technology is difficult to accurately search corresponding correct answers from a problem bank based on images of the problem types with semantic information uploaded by users, the inventor finds that a key bottleneck breaking through the difficulty is that a similarity evaluation mode is not good, and the existing similarity evaluation modes between text images and between character strings have the problem of low accuracy, so that the current photographing problem-judging technology is difficult to be well applied to the occasions of answer searching for the problem types with semantic information.

Disclosure of Invention

In order to solve the technical problems described above or at least partially solve the technical problems, the present disclosure provides a similarity evaluation method, an answer search method, an apparatus, a device, and a medium.

According to an aspect of the embodiments of the present disclosure, there is provided a similarity evaluation method, including: acquiring a target character string with similarity to be evaluated and a target text image; inputting the target character string and the target text image into a similarity evaluation model obtained by pre-training; respectively extracting character feature information of the target character string and image feature information of the target text image through the similarity evaluation model, and evaluating content similarity of the target character string and the target text image based on the character feature information and the image feature information; the character feature information comprises a position relation and a semantic relation among character features; the image feature information includes a positional relationship and a semantic relationship between image features.

According to another aspect of the embodiments of the present disclosure, there is provided an answer search method, including: acquiring a character string corresponding to a target question of an answer to be searched; respectively calculating the content similarity of the character string and the scanned image of each question in the question bank by adopting the similarity evaluation method of any one of the preceding items; wherein each of the scanned images is associated with a corresponding answer; and taking the corresponding answer of the scanned image with the highest content similarity as the searched answer aiming at the target question.

According to another aspect of the embodiments of the present disclosure, there is provided a similarity evaluation device including: the target acquisition module is used for acquiring a target character string with similarity to be evaluated and a target text image; the model input module is used for inputting the target character string and the target text image into a similarity evaluation model obtained by pre-training; the similarity evaluation module is used for respectively extracting character characteristic information of the target character string and image characteristic information of the target text image through the similarity evaluation model and evaluating the content similarity of the target character string and the target text image based on the character characteristic information and the image characteristic information; the character feature information comprises a position relation and a semantic relation among character features; the image feature information includes a positional relationship and a semantic relationship between image features.

According to another aspect of the embodiments of the present disclosure, there is provided an answer searching apparatus including: the character string acquisition module is used for acquiring a character string corresponding to a target question of an answer to be searched; the similarity calculation module is used for respectively calculating the content similarity between the character string and the scanned image of each question in the question bank by adopting the similarity evaluation method; wherein each of the scanned images is associated with a corresponding answer; and the answer determining module is used for taking the corresponding answer of the scanned image with the highest content similarity as the searched answer aiming at the target question.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic device including: a processor; a memory for storing the processor-executable instructions; the processor is used for reading the executable instructions from the memory and executing the instructions to realize the similarity evaluation method or the answer search method provided by the embodiment of the disclosure.

According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the similarity evaluation method or the answer search method provided by the embodiments of the present disclosure.

The similarity evaluation method and device provided by the embodiment of the disclosure can input the target character string with similarity to be evaluated and the target text image into a similarity evaluation model obtained by pre-training, respectively extract character feature information (including multiple kinds of time sequence relation, position relation and semantic relation among character features) of the target character string and image feature information (including multiple kinds of time sequence relation, position relation and semantic relation among image features) of the target text image through the similarity evaluation model, and evaluate the content similarity between the target character string and the target text image based on the character feature information and the image feature information. The method provides a similarity evaluation method of cross-information modality between the character string and the text image, and the comparison is carried out based on respective position relation and semantic relation between the character string and the text image, so that the problems that the accuracy of the similarity evaluation method between the text images is not high due to the fact that the character string cannot be focused on a character layer is improved, and the accuracy of the similarity evaluation method between the character string and the text image is not high due to the fact that the similarity evaluation method between the character string and the text image is not as complete as image information (such as the fact that position information is lost) are improved.

According to the answer searching method and device provided by the embodiment of the disclosure, a character string corresponding to a target question of an answer to be searched can be obtained firstly, and then the similarity evaluation method is adopted to calculate the content similarity between the character string and a scanned image of each question in a question bank respectively; wherein each of the scanned images carries a corresponding answer; and finally, taking the corresponding answer of the scanned image with the highest content similarity as the answer searched for aiming at the target question. The answer searching method directly searches answers corresponding to the scanned images with the highest content similarity from the question bank by adopting the character strings of the target questions, and because the method for evaluating the similarity between the character strings and the text images in the cross-information mode is adopted, the accuracy of evaluating the similarity is comprehensively improved, and the answer searching precision is further improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic flow chart of a similarity evaluation method according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a training method of a similarity evaluation model according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a similarity evaluation model provided in an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of another similarity evaluation model provided in the embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of another similarity evaluation model provided in the embodiment of the present disclosure;

fig. 6 is a schematic flowchart of an answer searching method according to an embodiment of the disclosure;

fig. 7 is a schematic structural diagram of a similarity evaluation apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an answer searching apparatus according to an embodiment of the disclosure;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description. It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

The shooting problem in the related art can only process some conventional calculation problem types such as mathematical arithmetic, is difficult to process problem types with semantic information such as filling in gaps and questions and answers, and has a limited application range. Because the questions related to the strong semanteme are to understand the question meaning from the human perspective, if the electronic equipment such as a computer is used for modifying the questions, a theoretical strategy is to make the electronic equipment automatically solve the questions, namely, the technologies such as natural language understanding and the like are comprehensively applied, and a process of solving the problems by a simulator is used for solving the questions, but the technologies are not mature at present and cannot be put into use, for the problem, the inventor proposes another current feasible strategy and constructs a question bank in advance, wherein the question bank comprises a large number of questions and corresponding standard answers, for the convenience of realization, a large number of scanned images of the questions can be added into the question bank, and the scanned images are all related to the standard answers; and then, the electronic equipment such as the computer searches original questions consistent with the questions in the image to be corrected from the question bank, and scores the answer results of the questions in the image to be corrected based on the standard answers corresponding to the original questions.

If a corresponding topic scanning image is searched from a topic library based on an image to be corrected uploaded by a user, it can be understood that the image to be corrected and the topic scanning image are both text images, and based on the related technology, there are two ways: one way is to adopt a similarity evaluation method between images, and the other way is to convert text images into character strings and then adopt a similarity evaluation method between the character strings. After the study, the inventors describe the following:

as for (i) methods for evaluating similarity between images, there are three methods: 1) and (3) evaluating the similarity of the two images by using an empirical formula, and specifically judging the similarity of the two images by calculating the empirical formula directly from pixel values. However, this approach does not utilize semantic (content) information of the image at all. 2) The similarity of the two images is evaluated by using a mode identification method, the operator designed by artificial experience is needed, for example, SIFT (scale invariant feature transform) and SURF (accelerated robust feature extraction) are adopted to respectively extract feature points of the two images to form feature vectors, then, the distance between the two feature vectors is calculated by adopting measurement modes such as cosine distance, Euclidean distance or Hamming distance, and the similarity of the two images is judged according to a preset threshold value. However, this method requires manual setting of the threshold, so that the quality of the evaluation result is closely related to the manual experience to some extent, and there is a certain uncertainty. 3) And finally, respectively extracting the characteristics of the two images through the trained neural network model, and judging the similarity of the two images based on the image characteristics of the two images. Although the method can achieve a good effect, a large amount of image training data needs to be acquired and labeled, and the quantity and quality of the training data directly restrict the accuracy of the output result of the neural network model. The text image similarity judgment model constructed based on the deep learning mode needs to construct positive and negative samples during training, whether the positive and negative samples are similar or not is judged through a two-classification mode, however, sample data required by training is difficult to collect, the collected positive and negative samples are usually not balanced, the model obtained through final training is poor in effect, and the similarity evaluation accuracy between text images is low.

In addition, in any of the above manners, it is difficult to accurately compare the similarity between two text images on a character semantic level, for example, two text images containing writing looks similar, but the substantial contents are different, and especially similar words such as "person" and "enter", "wood" and "art" can hardly be distinguished by directly comparing the similarities between the images, so the evaluation accuracy of the content similarity is not high, and therefore it is difficult to accurately screen out the title and the corresponding answer matching the uploaded image (image to be modified) of the user from the scanned image of the title library.

And (II) a similarity evaluation method between character strings mainly comprises the steps of converting two text images into character strings and comparing the similarity between the character strings. However, this method may lose information of the image itself, such as losing location information and association information between texts, and only perform similarity evaluation on a two-dimensional level, which may also affect the precision of similarity evaluation, resulting in poor accuracy of answer search from the question bank.

Based on the above discussion, in order to improve at least one of the above problems, the embodiments of the present disclosure provide a similarity evaluation method, an answer search method, an apparatus, a device, and a medium, which can combine text information and image information together to perform similarity evaluation by using a similarity evaluation method across information modalities, thereby implementing more accurate text image search, and can be better applied to scenes such as searching answers from a question bank, photographing questions, and the like. For ease of understanding, the following is set forth in detail:

fig. 1 is a schematic flowchart of a similarity evaluation method provided in an embodiment of the present disclosure, which may be executed by a similarity evaluation apparatus, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 1, the method mainly includes the following steps S102 to S106:

step S102, obtaining a target character string and a target text image of similarity to be evaluated. A text image is an image containing text, such as a scanned image containing a title.

The embodiment of the disclosure does not limit the acquisition mode of the target character string and the target text image, any character string and text image which need to be compared with the similarity can be obtained, and the method and the device can be flexibly set according to the application scene. In a scene such as shooting and question judging, text detection and text recognition can be carried out from an uploaded text image of a user (such as a to-be-corrected test paper shot by the user), so that an extracted character string is used as a target character string; and then extracting the title scanning images in the title library one by one to be used as target text images.

And step S104, inputting the target character string and the target text image into a similarity evaluation model obtained by pre-training. The similarity evaluation model is a neural network model, and content similarity between the character strings and the text images can be evaluated more accurately through training.

And step S106, respectively extracting character characteristic information of the target character string and image characteristic information of the target text image through the similarity evaluation model, and evaluating the content similarity of the target character string and the target text image based on the character characteristic information and the image characteristic information.

The character feature information at least includes a position relationship and a semantic relationship between character features, and may also include a time sequence relationship, a front-back association relationship, and the like. The image feature information at least includes a position relationship and a semantic relationship between image features, and may further include a time sequence relationship, a front-back association relationship, and the like.

The similarity evaluation model can fully evaluate the content similarity between the character string and the text image by the character characteristic information and the image characteristic information which embody the position relation and the semantic relation, thereby not only improving the problem of low accuracy of a similarity evaluation mode between the text images due to the fact that the similarity evaluation mode cannot be focused on a character layer, but also improving the problem of low accuracy of the similarity evaluation mode between the character string due to the fact that the similarity evaluation mode is not as complete as image information (such as position information is lost).

In some embodiments, the similarity evaluation model may be trained according to the following steps:

(1) the method comprises the steps of obtaining a plurality of training sample sets, wherein each training sample set comprises a text image sample and a character string sample, and the training sample sets are marked with content similarity between the text image samples and the character string samples.

(2) And training the initial model by adopting a plurality of training sample groups, and taking the model obtained after the training as a similarity evaluation model.

It can be understood that, the training sample set required by the similarity evaluation model is constructed by the character string samples and the text image samples, and compared with the problems that a network model for evaluating the similarity between text images in the related art is difficult to obtain a large number of training samples, and the positive and negative samples required for classification are unbalanced, the similarity evaluation model of the embodiment of the disclosure is easier to obtain the character string samples and the text image samples required for training, mainly because the character string can be easily changed, it is very easy to acquire a large number of character strings and convert the similarity evaluation problem into a regression problem, that is, two classification problems (similar or dissimilar) involved in the original similarity evaluation can be converted into function problems (e.g., similarity can be measured by predicting a value between 0 and 1, with closer to 1 being more similar). For understanding, reference may be made to a flowchart of a training method of the similarity evaluation model shown in fig. 2, which focuses on and describes an acquisition process of a training sample set, and mainly includes the following steps S202 to S210:

step S202, a text image sample is obtained, a character string of the text image sample is extracted, and the extracted character string is used as an original character string.

And step S204, tampering the original character string by adopting one or more modes of adding characters, deleting characters and modifying characters to obtain a plurality of tampered character strings. In practical application, the original character string may be tampered in a designated manner, or the original character string may be randomly tampered such as added, deleted, changed, etc., to obtain a plurality (specific number may be set according to actual requirements) of tampered character strings. That is, one original character string can be derived into a plurality of character strings through a falsified form.

Step S206, respectively calculating the similarity between each tampered character string and the original character string, and taking the similarity between each tampered character string and the original character string as the content similarity between each tampered character string and the text image sample.

In some embodiments, the similarity between the tampered character string and the original character string can be calculated by directly adopting a character string similarity calculation method in the related art, and a calculation rule between the character string similarities can be set by itself. The embodiment of the present disclosure provides a specific implementation manner for calculating the similarity between each tampered character string and the original character string, which can be implemented by referring to the following steps 1 to 3:

step 1, calculating an editing distance between each tampered character string and an original character string; the edit distance can be understood as the minimum number of edit operations required to convert one character string into another.

And 2, comparing the length of the tampered character string with the length of the original character string, and selecting the maximum character string length from the tampered character string.

And 3, calculating the similarity between the tampered character string and the original character string based on the editing distance and the maximum character string length. In a specific implementation example, the ratio between the edit distance and the maximum string length may be first calculated, and then the value 1 is subtracted from the ratio to obtain the similarity between the tampered string and the original string, that is, the similarity between the tampered string and the original string =1 — edit distance/maximum string length. In this way, the similarity between the character strings can be measured more objectively.

And step S208, enabling the original character string and each tampered character string to be respectively used as character string samples, respectively combining the text image sample and each character string sample to obtain a plurality of training sample groups, and labeling the content similarity between the character string sample and the text image sample in each training sample group.

That is, for each text image sample, an extracted original character string corresponds to a plurality of (assuming N) falsified character strings, so that N +1 training sample groups can be formed at this time, and if M text image samples are extracted, M × N +1 training sample groups can be easily obtained, thereby solving the problem of difficulty in collecting training samples. In addition, the similarity between the tampered character string and the original character string is used as the content similarity between the tampered character string and the text image sample for labeling, the similarity evaluation problem is directly converted into a regression problem, and the problem that a training result is poor due to the fact that positive and negative samples are unbalanced in training samples required by a network model adopting a text image similarity evaluation method in the prior art is solved.

Step S210, training an initial model by adopting a plurality of training sample groups, adjusting parameters of the initial model based on a preset loss function, and taking the model obtained after the training as a similarity evaluation model. In some embodiments, the predetermined loss function comprises a mean square error loss function.

The embodiment of the present disclosure further provides a structural example of a similarity evaluation model, which may be referred to as a structural schematic diagram of a similarity evaluation model shown in fig. 3, and the structural schematic diagram includes an image feature extraction network, a character feature extraction network, and a similarity calculation network respectively connected to the image feature extraction network and the character feature extraction network.

The input of the image feature extraction network is a target text image, and the output is image feature information; the image feature information may include, for example, information such as a positional relationship, a semantic relationship, a time-series relationship, a front-back association relationship, and the like between image features. The input of the character feature extraction network is a target character string, and the output is character feature information; the character feature information may include, for example, information on a positional relationship, a semantic relationship, a time-series relationship, a front-back association relationship, and the like between character features. The similarity calculation network inputs the image characteristic information and the character characteristic information and outputs the content similarity of the target character string and the target text image.

On the basis of fig. 3, referring to a schematic structural diagram of another similarity evaluation model shown in fig. 4, an implementation manner of an image feature extraction network and a character feature extraction network is illustrated, and as shown in fig. 4, the image feature extraction network includes a feature extraction unit and an image information extraction unit; the feature extraction unit is used for extracting an image feature vector of the target text image; the image feature vector comprises a deep layer feature and a shallow layer feature, and the image information extraction unit is used for extracting image feature information based on the image feature vector. The image information extraction unit may perform analysis based on the deep layer features and the shallow layer features to obtain a timing relationship, a positional relationship, a semantic relationship, and the like between the image features.

The character feature extraction network comprises a coding unit and a character information extraction unit; the encoding unit is used for encoding the target character string so as to convert the target character string into a character feature vector; in particular, characters may be converted to numerical vectors for subsequent network processing. The character information extraction unit is used for extracting character characteristic information based on the character characteristic vector, namely, the character characteristic vector is analyzed to obtain the time sequence relation, the position relation, the semantic relation and the like among the character characteristics.

On the basis of fig. 4, referring to a schematic structural diagram of another similarity evaluation model shown in fig. 5, it is illustrated that the feature extraction unit includes a residual error network; the image information extraction unit includes a first long-short term memory network. The residual network is characterized by easy optimization and can improve the identification accuracy by adding considerable depth. In addition, a first dimension conversion unit is connected between the feature extraction unit and the image information extraction unit and used for performing dimension conversion on the output features of the residual error network and then inputting the output features to the first long-short term memory network, so that the first long-short term memory network can more conveniently process the features extracted by the feature extraction unit. The long-short term memory network (LSTM) is a recurrent neural network, and can fully resolve the relationship between features, such as the position relationship, the timing relationship, the semantic relationship, the association relationship, and the like.

The residual network can be implemented by using, for example, Resnet18, in a specific embodiment, a Resnet18 network is constructed by connecting 4 block blocks in series, each block includes several layers of convolution operations, the size of a feature vector (feature map) output by a first block is 1/4 of an original, the second is an original 1/8, the third is an original 1/16, the fourth is an original 1/32, the number of feature vectors output by each block is 128, then a first dimension conversion unit scales all the feature vectors of the four groups of 128 channels to an original size 1/32 by an interpolation mode and superposes them in series to obtain a group of feature vectors of 512 channels, then the group of 512-dimensional feature vectors is input to a first long-short term memory network, and the first long-short term memory network processes them to obtain a 32-dimensional vector corresponding to the input size, after point-by-point splicing, a feature vector with the same size as the input size (assumed to be n × m × 32) is obtained, and the feature vector can fully embody information such as the position relation, the semantic relation, the time sequence relation, the front-back association relation and the like among image features. It is to be understood that the above is merely a simple illustrative illustration and should not be taken as a limitation.

In FIG. 5, it is illustrated that the encoding unit includes Word2vec algorithm or glove algorithm; the character information extraction unit includes a second long-short term memory network. In addition, a second dimension conversion unit is connected between the coding unit and the character information extraction unit and used for performing dimension conversion on the output characteristics of the coding unit and then inputting the output characteristics to the character information extraction unit, so that the character information extraction unit can process the output characteristics better. During specific implementation, the encoding unit encodes each Word in the input character string through a Word2vec algorithm or a glove algorithm, converts the character string into a digital vector, and the second dimension conversion unit can perform dimension conversion on the digital vector to obtain a feature vector of N × M × L, wherein N is the specified arrangement width, M is the specified arrangement height, and L is the length of an encoding vector obtained after each Word passes through the Word2vec algorithm or the glove algorithm. In a specific implementation, the number vectors may be arranged based on the word appearance order (such as from left to right or from top to bottom), resulting in a feature vector that will be N × M × L. And then inputting the characteristic vectors of N x M x L to a character information extraction unit, specifically, the character information extraction unit can process the L-dimensional vectors corresponding to the characteristic vectors of N x M x L point by point, and after the output is spliced and rearranged point by point, the characteristic vectors of N x M32 consistent with the input size are obtained, and the characteristic vectors can fully embody the information of the position relation, the semantic relation, the time sequence relation, the front-back incidence relation and the like among character characteristics.

In fig. 5, it is further illustrated that the similarity calculation network includes a feature merging layer, a convolution layer and a fully-connected layer, which are connected in sequence, at this time, the N × M × 32 feature vectors output by the image feature extraction network and the N × M × 32 feature vectors output by the character feature extraction network are input into the feature merging layer, the feature merging layer may first scale the N × M × 32 feature vectors and the N × M32 feature vectors to the same size by means of interpolation, then, serial splicing is carried out to obtain the feature vector of 64 channels, then convolution operation is carried out on the feature vector of 64 channels by the convolution layer for two times, and connecting the full-connection layer after the convolutional layer, specifically, sequentially connecting a full-connection layer with 32 nodes and a full-connection layer with 1 node, wherein the output result of the last full-connection layer is the similarity between the input character string and the image.

It should be understood that the above is only an example illustration of a similarity evaluation model, and should not be considered as a limitation, and in practical applications, the similarity evaluation model may add more network structures.

The similarity evaluation model may be obtained by training the similarity evaluation model shown in fig. 2, where in the training process, the input to the similarity evaluation model is a training sample group including text image samples and character string samples, and then the network parameters of the similarity evaluation model are reversely adjusted based on the output result of the similarity evaluation model and a preset loss function until the similarity evaluation model can output a result meeting expectations, so as to obtain a trained similarity evaluation model.

On the basis of the foregoing, fig. 6 is a flowchart illustrating an answer searching method provided in an embodiment of the present disclosure, where the answer searching method may be executed by an answer searching apparatus, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in an electronic device. As shown in fig. 6, the method mainly includes the following steps S602 to S606:

step S602, a character string corresponding to a target question of an answer to be searched is obtained.

Step S604, respectively calculating the content similarity of the character string and the scanned image of each question in the question bank by adopting the similarity evaluation method; wherein each scanned image carries a corresponding answer. Wherein, the question bank contains a large number of scanning images of the questions recorded in advance.

Step S606, the corresponding answer of the scanned image with the highest content similarity is used as the answer searched for the target question, that is, the standard answer of the target question is obtained, so as to apply the reference answer to score the answering content of the user for the target question.

The answer searching method directly searches the answer corresponding to the scanned image with the highest content similarity from the question bank by using the character string of the target question, and because the cross-information-modality similarity evaluation method between the character string and the text image provided by the embodiment is adopted, the similarity evaluation accuracy is comprehensively improved, and the answer searching precision is further improved.

The method can be better applied to a scene of photographing and judging the question, and the step of acquiring the character string corresponding to the target question of the answer to be searched at the moment comprises the following steps: acquiring a response image uploaded by a user; and carrying out text detection processing and text recognition processing on the answer image in sequence to obtain a character string corresponding to the target question in the answer image. The answer image of the user is obtained after the user shoots the answer paper (the test paper to be corrected) of the question through equipment such as a mobile phone, a tablet personal computer and the like. In some embodiments, the position of the target question in the answer image can be detected through the layout analysis model and the text line detection model, so as to obtain the coordinates of the target question in the answer image, then the answer image is cut according to the coordinates to obtain the image of the target question, and the image of the target question is transmitted to the text recognition model, so as to obtain the character string corresponding to the target question. In addition, the position sequence of the character string corresponding to the target topic in the answering image can be marked.

Further, the method further comprises: acquiring answer contents corresponding to the target questions in the answer images; and comparing the answer content with the searched answers, and determining the user answer score of the target question based on the comparison result. In some embodiments, the electronic device may respectively detect positions of the questions and the answer content in the answer image by using a text detection technology, and extract character strings corresponding to the questions and the answer content by using a text recognition technology. In practical application, the character strings corresponding to the answer content and the character strings corresponding to the target question can be obtained simultaneously, the obtaining modes are the same, and the target question of the printed form and the answer of the handwritten form can be distinguished only through layout analysis and/or handwriting analysis.

The similarity between the answer content and the searched answer (standard answer) is calculated by comparing the character string corresponding to the answer content with the searched answer, and the score is scored based on the similarity, that is, the answer score of the user for the target question is determined.

To sum up, the similarity evaluation method and the answer search method across information modalities provided by the embodiments of the present disclosure compare the character strings and the text images based on their respective position relationships and semantic relationships, thereby improving the problem of low accuracy of the similarity evaluation method between the text images due to the fact that the character strings cannot be focused on the character layer, improving the problem of low accuracy of the similarity evaluation method between the character strings due to the fact that the similarity evaluation method is not as complete as the image information (such as the loss of the position information), comprehensively improving the similarity evaluation accuracy, and contributing to further improving the answer search accuracy, thereby being better applied to the photo-taking judgment.

Moreover, the training sample group required by the similarity evaluation model in the embodiment of the disclosure is constructed by the character string samples and the text image samples, and compared with the problems that a network model for evaluating the similarity between text images in the related art is difficult to obtain a large number of training samples, and positive and negative samples required by classification are unbalanced, the similarity evaluation model in the embodiment of the disclosure is easier to obtain the character string samples and the text image samples required by training, and can convert the similarity evaluation problem into a regression problem, thereby avoiding the problems that training data is difficult to collect, the positive and negative samples are difficult to construct and are easy to generate imbalance, greatly simplifying the model training mode, and further ensuring the accuracy and reliability of the model obtained by training through the quantity and quality guarantee of the training samples.

Corresponding to the foregoing similarity evaluation method, an embodiment of the present disclosure provides a similarity evaluation apparatus, which may be implemented by software and/or hardware and may be generally integrated in an electronic device, and refer to a schematic structural diagram of a similarity evaluation apparatus shown in fig. 7, which mainly includes the following modules:

a target obtaining module 702, configured to obtain a target character string with similarity to be evaluated and a target text image;

a model input module 704, configured to input the target character string and the target text image into a similarity evaluation model obtained through pre-training;

a similarity evaluation module 706, configured to extract character feature information of the target character string and image feature information of the target text image through a similarity evaluation model, and evaluate content similarity between the target character string and the target text image based on the character feature information and the image feature information;

the character feature information comprises a position relation and a semantic relation among character features; the image feature information includes a positional relationship and a semantic relationship between image features.

In some embodiments, the apparatus further comprises a model training module configured to: acquiring a plurality of training sample sets, wherein each training sample set comprises a text image sample and a character string sample, and the training sample sets are marked with content similarity between the text image samples and the character string samples; and training the initial model by adopting the training sample groups, and taking the model obtained after the training as a similarity evaluation model.

In some embodiments, the model training model is further configured to: acquiring a text image sample, extracting a character string of the text image sample, and taking the extracted character string as an original character string; tampering the original character string by adopting one or more modes of adding characters, deleting characters and modifying characters to obtain a plurality of tampered character strings; respectively calculating the similarity between each tampered character string and the original character string, and taking the similarity between each tampered character string and the original character string as the content similarity between each tampered character string and the text image sample; and respectively taking the original character string and each tampered character string as character string samples, respectively combining the text image samples and each character string sample to obtain a plurality of training sample groups, and labeling the content similarity between the character string samples and the text image samples in each training sample group.

In some embodiments, the model training model is further configured to: for each tampered character string, calculating an editing distance between the tampered character string and the original character string; comparing the length of the tampered character string with the length of the original character string, and selecting the maximum character string length from the length of the tampered character string; and calculating the similarity between the tampered character string and the original character string based on the editing distance and the maximum character string length.

In some embodiments, the model training model is further configured to: calculating a ratio between the edit distance and the maximum string length; and subtracting the ratio from the numerical value 1 to obtain the similarity between the tampered character string and the original character string.

In some embodiments, the similarity evaluation model includes an image feature extraction network, a character feature extraction network, and a similarity calculation network respectively connected to the image feature extraction network and the character feature extraction network; the input of the image feature extraction network is the target text image, and the output of the image feature extraction network is image feature information; the input of the character feature extraction network is the target character string, and the output is character feature information; and the similarity calculation network inputs the image characteristic information and the character characteristic information and outputs the content similarity of the target character string and the target text image.

In some embodiments, the image feature extraction network comprises a feature extraction unit and an image information extraction unit; the feature extraction unit is used for extracting an image feature vector of the target text image; the image information extraction unit is used for extracting image characteristic information based on the image characteristic vector.

In some embodiments, the feature extraction unit comprises a residual network; the image information extraction unit includes a first long-short term memory network.

In some embodiments, the character feature extraction network includes an encoding unit and a character information extraction unit; the encoding unit is used for encoding the target character string so as to convert the target character string into a character feature vector; the character information extraction unit is used for extracting character characteristic information based on the character characteristic vector.

In some embodiments, the encoding unit comprises a Word2vec algorithm or a glove algorithm; the character information extraction unit comprises a second long-short term memory network.

The similarity evaluation device provided by the embodiment of the disclosure can execute the similarity evaluation method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatus embodiments may refer to corresponding processes in the method embodiments, and are not described herein again.

Corresponding to the answer searching method, the embodiment of the present disclosure provides an answer searching apparatus, which may be implemented by software and/or hardware, and may be generally integrated in an electronic device, and referring to a schematic structural diagram of an answer searching apparatus shown in fig. 8, the answer searching apparatus mainly includes the following modules:

a character string obtaining module 802, configured to obtain a character string corresponding to a target question of an answer to be searched;

a similarity calculation module 804, configured to calculate content similarities between the character string and the scanned image of each topic in the topic library respectively by using any one of the similarity evaluation methods described above; wherein each scanned image carries a corresponding answer;

and an answer determining module 806, configured to use a corresponding answer of the scanned image with the highest content similarity as an answer searched for the target question.

The answer searching device directly searches the answer corresponding to the scanned image with the highest content similarity from the question bank by using the character string of the target question, and because the cross-information-modality similarity evaluation method between the character string and the text image provided by the embodiment is adopted, the similarity evaluation accuracy is comprehensively improved, and the answer searching precision is further improved.

In some embodiments, the string obtaining module 802 is further configured to: acquiring a response image uploaded by a user; and carrying out text detection processing and text recognition processing on the answer image in sequence to obtain a character string corresponding to the target question in the answer image.

In some embodiments, the above apparatus further comprises:

the answer content acquisition module is used for acquiring the answer content corresponding to the target question in the answer image;

and the scoring module is used for comparing the answer content with the searched answers and determining the user answer score of the target question based on the comparison result.

The answer searching device provided by the embodiment of the disclosure can execute the answer searching method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.

An embodiment of the present disclosure further provides an electronic device, which includes: a processor and a memory for storing processor-executable instructions; and the processor is used for reading the executable instructions from the memory and executing the instructions to realize any one of the similarity evaluation methods or any one of the answer search methods.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 9, the electronic device 900 includes one or more processors 901 and memory 902.

The processor 901 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 900 to perform desired functions.

Memory 902 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 901 to implement the similarity evaluation method or the answer search method of the embodiments of the present disclosure described above and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 900 may further include: an input device 903 and an output device 904, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The input device 903 may include, for example, a keyboard, a mouse, and the like.

The output device 904 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 904 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

Of course, for simplicity, only some of the components of the electronic device 900 relevant to the present disclosure are shown in fig. 9, omitting components such as buses, input/output interfaces, and the like. In addition, electronic device 900 may include any other suitable components depending on the particular application.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the similarity evaluation method or the answer search method provided by the embodiments of the present disclosure.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Embodiments of the present disclosure also provide a computer program product comprising a computer program/instruction, which when executed by a processor implements the similarity evaluation method or the answer search method in embodiments of the present disclosure.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A similarity evaluation method is characterized by comprising the following steps:

acquiring a target character string with similarity to be evaluated and a target text image;

inputting the target character string and the target text image into a similarity evaluation model obtained by pre-training;

respectively extracting character feature information of the target character string and image feature information of the target text image through the similarity evaluation model, and evaluating content similarity of the target character string and the target text image based on the character feature information and the image feature information;

2. The similarity evaluation method according to claim 1, wherein the similarity evaluation model is trained according to the following steps:

acquiring a plurality of training sample sets, wherein each training sample set comprises a text image sample and a character string sample, and the training sample sets are marked with content similarity between the text image samples and the character string samples;

and training the initial model by adopting the training sample groups, and taking the model obtained after the training as a similarity evaluation model.

3. The similarity evaluation method according to claim 2, wherein the step of obtaining a plurality of training sample sets comprises:

acquiring a text image sample, extracting a character string of the text image sample, and taking the extracted character string as an original character string;

tampering the original character string by adopting one or more modes of adding characters, deleting characters and modifying characters to obtain a plurality of tampered character strings;

respectively calculating the similarity between each tampered character string and the original character string, and taking the similarity between each tampered character string and the original character string as the content similarity between each tampered character string and the text image sample;

and respectively taking the original character string and each tampered character string as character string samples, respectively combining the text image samples and each character string sample to obtain a plurality of training sample groups, and labeling the content similarity between the character string samples and the text image samples in each training sample group.

4. The similarity evaluation method according to claim 3, wherein the step of separately calculating the similarity between each of the falsified strings and the original string comprises:

for each tampered character string, calculating an editing distance between the tampered character string and the original character string;

comparing the length of the tampered character string with the length of the original character string, and selecting the maximum character string length from the length of the tampered character string;

and calculating the similarity between the tampered character string and the original character string based on the editing distance and the maximum character string length.

5. The similarity evaluation method according to claim 4, wherein the step of calculating the similarity between the falsified character string and the original character string based on the edit distance and the maximum character string length includes:

calculating a ratio between the edit distance and the maximum string length;

and subtracting the ratio from the numerical value 1 to obtain the similarity between the tampered character string and the original character string.

6. The similarity evaluation method according to claim 1, wherein the similarity evaluation model includes an image feature extraction network, a character feature extraction network, and similarity calculation networks connected to the image feature extraction network and the character feature extraction network, respectively;

the input of the image feature extraction network is the target text image, and the output of the image feature extraction network is image feature information;

the input of the character feature extraction network is the target character string, and the output is character feature information;

and the similarity calculation network inputs the image characteristic information and the character characteristic information and outputs the content similarity of the target character string and the target text image.

7. The similarity evaluation method according to claim 6, wherein the image feature extraction network includes a feature extraction unit and an image information extraction unit; wherein the content of the first and second substances,

the feature extraction unit is used for extracting an image feature vector of the target text image;

the image information extraction unit is used for extracting image characteristic information based on the image characteristic vector.

8. The similarity evaluation method according to claim 7, wherein the feature extraction unit includes a residual network; the image information extraction unit includes a first long-short term memory network.

9. The similarity evaluation method according to any one of claims 6 to 8, wherein the character feature extraction network includes a coding unit and a character information extraction unit; wherein the content of the first and second substances,

the encoding unit is used for encoding the target character string so as to convert the target character string into a character feature vector;

the character information extraction unit is used for extracting character characteristic information based on the character characteristic vector.

10. The similarity evaluation method according to claim 9, wherein the encoding unit includes a Word2vec algorithm or a glove algorithm; the character information extraction unit comprises a second long-short term memory network.

11. An answer search method, comprising:

acquiring a character string corresponding to a target question of an answer to be searched;

respectively calculating the content similarity of the character string and the scanned image of each topic in the topic library by adopting the similarity evaluation method of any one of claims 1 to 10; wherein each of the scanned images is associated with a corresponding answer;

and taking the corresponding answer of the scanned image with the highest content similarity as the searched answer aiming at the target question.

12. The answer searching method of claim 11, wherein the step of obtaining the character string corresponding to the target question of the answer to be searched comprises:

acquiring a response image uploaded by a user;

and carrying out text detection processing and text recognition processing on the answer image in sequence to obtain a character string corresponding to the target question in the answer image.

13. The answer searching method of claim 12, further comprising:

acquiring answer content corresponding to the target question in the answer image;

and comparing the answer content with the searched answer, and determining the user answer score of the target question based on the comparison result.

14. A similarity evaluation device, comprising:

the target acquisition module is used for acquiring a target character string with similarity to be evaluated and a target text image;

the model input module is used for inputting the target character string and the target text image into a similarity evaluation model obtained by pre-training;

the similarity evaluation module is used for respectively extracting character characteristic information of the target character string and image characteristic information of the target text image through the similarity evaluation model and evaluating the content similarity of the target character string and the target text image based on the character characteristic information and the image characteristic information;

15. An answer search apparatus, comprising:

the character string acquisition module is used for acquiring a character string corresponding to a target question of an answer to be searched;

a similarity calculation module for calculating the content similarity between the character string and the scanned image of each topic in the topic library by using the similarity evaluation method of any one of claims 1 to 10; wherein each of the scanned images is associated with a corresponding answer;

and the answer determining module is used for taking the corresponding answer of the scanned image with the highest content similarity as the searched answer aiming at the target question.

16. An electronic device, characterized in that the electronic device comprises:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to read the executable instructions from the memory and execute the instructions to implement the similarity evaluation method according to any one of claims 1 to 10 or the answer search method according to any one of claims 11 to 13.

17. A computer-readable storage medium, characterized in that the storage medium stores a computer program for executing the similarity evaluation method according to any one of claims 1 to 10 or the answer search method according to any one of claims 11 to 13.