CN112633420A

CN112633420A - Image similarity determination and model training method, device, equipment and medium

Info

Publication number: CN112633420A
Application number: CN202110252999.9A
Authority: CN
Inventors: 张蓓蓓; 秦勇
Original assignee: Beijing Yizhen Xuesi Education Technology Co Ltd
Current assignee: Beijing Yizhen Xuesi Education Technology Co Ltd
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2021-04-09
Anticipated expiration: 2041-03-09
Also published as: CN112633420B

Abstract

The disclosure relates to the technical field of image processing, and discloses an image similarity determination and model training method, device, equipment and medium. The method comprises the following steps: collecting a plurality of handwritten character images, and performing data augmentation operation based on each handwritten character image and a preset data augmentation model to generate a training sample set; the handwritten character image comprises an easily-identified handwritten character image and a difficultly-identified handwritten character image, and the preset data augmentation model is constructed on the basis of writing kinematics; and training a preset neural network model based on the training sample set to generate an image similarity determination model. By the technical scheme, data amplification of a small number of collected handwritten character images based on the writing kinematics principle is achieved, a large number of training samples are obtained, and model training efficiency and recognition accuracy of the models on the handwritten character images are improved.

Description

Image similarity determination and model training method, device, equipment and medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a medium for image similarity determination and model training.

Background

The evaluation of similarity of character images is a specific problem of the evaluation of similarity of images, and the character images refer to images whose contents are characters, letters or numbers. The character image similarity evaluation has very important value in many problems, such as handwriting comparison, shooting and searching questions, electronic correction operation and the like. Handwritten characters in the images can be identified through character image similarity evaluation, then the identified handwritten characters are compared with possible standard character images to obtain the similarity between the handwritten characters and the possible standard character images, and powerful prior knowledge is provided for subsequent handwriting comparison, question judgment, searching and other operations.

The conventional image similarity evaluation mostly utilizes a deep learning method, a neural network model is utilized to extract the characteristics of an image, numerical value information and semantic information of the image can be fully utilized, a loss function is optimized through a back propagation algorithm, and when the value of the loss function is optimized to achieve a good result, the network carries out similarity evaluation on the image to obtain a good result.

However, in consideration of the difference between the handwriting of the writer and the standard body and the situations that the handwritten character is extremely difficult to recognize or cannot be recognized due to random operations such as altering, sketching and smearing of the writer, the confidence of character recognition of the neural network model based on deep learning is low. In order to improve the accuracy of model identification of handwritten characters difficult to identify, a large number of training samples are needed to be used for model training, however, the difficulty of collecting training samples of handwritten characters difficult to identify is high, the labor consumption is high, and the efficiency of model training is low.

Disclosure of Invention

To solve the above technical problem or at least partially solve the above technical problem, the present disclosure provides an image similarity determination and model training method, apparatus, device, and medium.

In a first aspect, the present disclosure provides an image similarity determination model training method, including:

collecting a plurality of handwritten character images, and performing data augmentation operation based on each handwritten character image and a preset data augmentation model to generate a training sample set; the handwritten character image comprises an easily-identified handwritten character image and a difficultly-identified handwritten character image, and the preset data augmentation model is constructed on the basis of writing kinematics;

and training a preset neural network model based on the training sample set to generate an image similarity determination model.

In some embodiments, the performing a data augmentation operation based on each of the handwritten character images and a preset data augmentation model, and generating a training sample set includes:

inputting the set parameter values of each handwritten character image and each character deformation adjustment parameter into the preset data augmentation model to generate augmented character images of corresponding handwritten character images; the preset data augmentation model is composed of a first preset number of logarithmic Gaussian function response signals, and each logarithmic Gaussian function response signal controls the deformation degree through a second preset number of character deformation adjustment parameters;

constructing the training sample set from each of the handwritten character images and each of the augmented character images.

In some embodiments, the training a preset neural network model based on the training sample set, and generating an image similarity determination model includes:

constructing a positive sample set containing each positive sample and a negative sample set containing each negative sample based on the training sample set and the positive and negative sample proportion; the positive sample comprises any one of the easily-recognized handwritten character images and a third preset number of the difficultly-recognized handwritten character images, and characters in the corresponding difficultly-recognized handwritten character images are the same as characters in the corresponding easily-recognized handwritten character images; the negative sample comprises any one of the easily-recognized handwritten character images and the third preset number of other handwritten character images, the other handwritten character images comprise the easily-recognized handwritten character images and/or the difficultly-recognized handwritten character images, and characters in the other handwritten character images are different from characters in corresponding easily-recognized handwritten character images;

and training the preset neural network model based on the positive samples and the negative samples to generate the image similarity determination model.

In some embodiments, the preset neural network model includes a feature extraction sub-network and a similarity estimation sub-network; the feature extraction sub-network is a convolutional neural network comprising a fourth preset number of branches, and each branch comprises 5 convolutional layers and 3 pooling layers; the fourth preset number is the third preset number plus 1; the similarity estimation sub-network comprises 3 full-connection layers and a loss function, and the number of the last nodes of the full-connection layer is the third preset number.

In some embodiments, the training the preset neural network model based on the positive samples and the negative samples, and the generating the image similarity determination model includes:

inputting the easily-recognized handwritten character images in any sample in the training sample set into a first branch of the preset neural network model, respectively inputting the residual handwritten character images in the sample into each residual branch of the preset neural network model, operating the preset neural network model, and outputting a model training result; wherein the sample is the positive sample or the negative sample, and the model training result is the third preset number of image similarity values;

determining a loss value by using the loss function based on the model training result, the sample type of the sample and a training reference value of the sample type; wherein the sample type is a positive sample type or a negative sample type;

if the loss value or the training frequency does not meet the model convergence condition, adjusting the model parameters of the preset neural network model based on the loss value, continuing the model training process until the loss value or the training frequency meets the model convergence condition, and generating the image similarity determination model.

In some embodiments, the determining, using the loss function, a loss value based on the model training result, the sample type of the sample, and the training reference value for the sample type includes:

if the sample type of the sample is the positive sample type, determining the loss value by using the loss function based on the minimum image similarity value in the model training result and the training reference value of the positive sample type;

and if the sample type of the sample is the negative sample type, determining the loss value by using the loss function based on the maximum image similarity value in the model training result and the training reference value of the negative sample type.

In some embodiments, the handwritten character images are handwritten digital images and the third predetermined number is 9.

In a second aspect, the present disclosure provides an image similarity determining method, including:

acquiring an easily-identified handwritten character image and a difficultly-identified handwritten character image;

inputting the easily-recognized handwritten character image into a first branch of the image similarity determination model, respectively inputting the difficultly-recognized handwritten character image into each remaining branch of the image similarity determination model, operating the image similarity determination model, and outputting a plurality of image similarity values; the image similarity determination model is obtained by utilizing an image similarity determination model training method disclosed by any embodiment of the disclosure in advance; the number of the image similarity values is less than 1 than the number of branches of the image similarity determination model;

based on each of the image similarity values, a target image similarity value between the easily recognizable handwritten character image and the difficult-to-recognize handwritten character image is determined.

In some embodiments, said determining a target image similarity value between said legible handwritten character image and said difficult-to-discern handwritten character image based on each of said image similarity values comprises:

and determining the mean value of the image similarity values, and determining the mean value as the target image similarity.

In some embodiments, the image similarity determination model includes a feature extraction sub-network and a similarity estimation sub-network; the feature extraction sub-network is a convolutional neural network comprising a fourth preset number of branches, and each branch comprises 5 convolutional layers and 3 pooling layers; the similarity estimation sub-network comprises 3 full-connection layers and a loss function, and the number of the nodes of the last full-connection layer is the fourth preset number minus 1.

In some embodiments, the easily recognizable handwritten character image and the difficult recognizable handwritten character image are both handwritten digital images, and the fourth predetermined number is 10.

In a third aspect, the present disclosure provides an image similarity determination model training apparatus, including:

the training sample set generating module is used for collecting a plurality of handwritten character images, and performing data augmentation operation based on each handwritten character image and a preset data augmentation model to generate a training sample set; the handwritten character image comprises an easily-identified handwritten character image and a difficultly-identified handwritten character image, and the preset data augmentation model is constructed on the basis of writing kinematics;

and the model training module is used for training a preset neural network model based on the training sample set to generate an image similarity determination model.

In some embodiments, the training sample set generating module is specifically configured to:

In some embodiments, the model training module comprises:

the positive sample set and negative sample set training submodule is used for constructing a positive sample set containing each positive sample and a negative sample set containing each negative sample based on the training sample set and the proportion of positive samples and negative samples; the positive sample comprises any one of the easily-recognized handwritten character images and a third preset number of the difficultly-recognized handwritten character images, and characters in the corresponding difficultly-recognized handwritten character images are the same as characters in the corresponding easily-recognized handwritten character images; the negative sample comprises any one of the easily-recognized handwritten character images and the third preset number of other handwritten character images, the other handwritten character images comprise the easily-recognized handwritten character images and/or the difficultly-recognized handwritten character images, and characters in the other handwritten character images are different from characters in corresponding easily-recognized handwritten character images;

and the model training submodule is used for training the preset neural network model based on each positive sample and each negative sample to generate the image similarity determination model.

In some embodiments, the model training submodule comprises:

a model training result output unit, configured to input the easily-recognizable handwritten character image in any sample in the training sample set into a first branch of the preset neural network model, respectively input the remaining handwritten character images in the sample into each remaining branch of the preset neural network model, and run the preset neural network model to output a model training result; wherein the sample is the positive sample or the negative sample, and the model training result is the third preset number of image similarity values;

a loss value determination unit configured to determine a loss value using the loss function based on the model training result, the sample type of the sample, and a training reference value of the sample type; wherein the sample type is a positive sample type or a negative sample type;

and the model parameter adjusting unit is used for adjusting the model parameters of the preset neural network model based on the loss value if the loss value or the training times do not meet the model convergence condition, continuing the model training process until the loss value or the training times meet the model convergence condition, and generating the image similarity determination model.

In some embodiments, the loss value determining unit is specifically configured to:

In a fourth aspect, the present disclosure provides an image similarity determination apparatus, comprising:

the handwritten character image acquisition module is used for acquiring an easily-identified handwritten character image and a difficultly-identified handwritten character image;

the image similarity value output module is used for inputting the easily-identified handwritten character image into a first branch of the image similarity determining model, respectively inputting the difficultly-identified handwritten character image into each residual branch of the image similarity determining model, operating the image similarity determining model and outputting a plurality of image similarity values; wherein the image similarity determination model is obtained by training in advance by using the image similarity determination model training method according to any one of claims 1 to 7; the number of the image similarity values is less than 1 than the number of branches of the image similarity determination model;

and the target image similarity value determining module is used for determining a target image similarity value between the easily-identified handwritten character image and the difficultly-identified handwritten character image based on each image similarity value.

In some embodiments, the target image similarity value determination module is specifically configured to:

In a fifth aspect, the present disclosure provides an electronic device, including:

a processor and a memory;

the processor is configured to perform the steps of the method of any embodiment of the present disclosure by calling a program or instructions stored in the memory.

In a sixth aspect, the present disclosure provides a computer-readable storage medium storing a program or instructions for causing a computer to perform the steps of the method described in any embodiment of the present disclosure.

According to the technical scheme provided by the embodiment of the disclosure, a training sample set is generated by collecting a plurality of handwritten character images and performing data augmentation operation based on each handwritten character image and a preset data augmentation model; the handwritten character image comprises an easily-identified handwritten character image and a difficultly-identified handwritten character image, and the preset data augmentation model is constructed on the basis of writing kinematics; and training the preset neural network model based on the training sample set to generate an image similarity determination model. The method and the device have the advantages that data amplification is carried out on a small number of collected handwritten character images based on the writing kinematics principle to obtain a large number of training samples, the problem of manpower consumption caused by the fact that a large number of handwritten character images are difficult to distinguish are collected manually is avoided, the problem of low model accuracy caused by the fact that the number of samples is too small is also avoided, and the model training efficiency and the recognition accuracy of the model on the handwritten character images are improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart of an image similarity determination model training method provided by an embodiment of the present disclosure;

FIG. 2 is a flowchart of another image similarity determination model training method provided by the embodiment of the present disclosure;

fig. 3 is a flowchart of an image similarity determining method provided by an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an image similarity determination model training device provided in an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an image similarity determination apparatus provided in an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be described in further detail below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

The image similarity determination model training method provided by the embodiment of the disclosure is mainly suitable for training similarity determination models for handwritten character images (such as numbers, letters, symbols, Chinese characters and the like), and is particularly suitable for training similarity determination models for handwritten characters which are difficult to identify. The image similarity determination model training method provided by the embodiment of the present disclosure may be executed by an image similarity determination model training apparatus, which may be implemented in software and/or hardware, and may be integrated in an electronic device with a large number of image processing functions, such as a notebook computer, a desktop computer, a server, a supercomputer, or a server cluster.

Fig. 1 is a flowchart of an image similarity determination model training method provided in an embodiment of the present disclosure. Referring to fig. 1, the image similarity determination model training method specifically includes:

s110, collecting a plurality of handwritten character images, and performing data augmentation operation based on each handwritten character image and a preset data augmentation model to generate a training sample set.

The handwritten character image is an image in which characters in an image are handwritten. The handwritten character image includes an easily recognizable handwritten character image and an difficultly recognizable handwritten character image. The easily-recognized handwritten character image refers to a handwritten character image with clear characters and easy recognition in the image. The hard-to-recognize handwritten character image is a handwritten character image in which handwritten characters are extremely difficult to recognize or cannot be recognized due to random operations such as altering, sketching or smearing by a writer. The preset data augmentation model is constructed based on writing kinematics, and is obtained by modeling the writing kinematics and used for performing data augmentation operation on input images to acquire more images. Writing kinematics is the principle that describes the law of rapid writing by a person.

Specifically, some easily-recognized handwritten character images and some difficultly-recognized handwritten character images are collected in advance to serve as basic image samples for model training. Then, inputting the basic image samples into a preset data augmentation model one by one, and outputting corresponding augmentation images through the operation of the model, so that a training sample set with a large number of images is generated.

In some embodiments, performing a data augmentation operation based on each handwritten character image and a preset data augmentation model, generating the training sample set includes: inputting the set parameter values of each handwritten character image and each character deformation adjustment parameter into a preset data augmentation model to generate augmented character images of corresponding handwritten character images; and constructing a training sample set by each handwritten character image and each augmented character image.

The preset data augmentation model is composed of a first preset number of logarithmic Gaussian function response signals, and each logarithmic Gaussian function response signal controls the deformation degree through a second preset number of character deformation adjustment parameters. The deformation adjustment parameter is a parameter for adjusting the deformation of the handwritten content. The character deformation adjustment parameter refers to a deformation adjustment parameter for adjusting the deformation of the handwritten character. The parameter value setting means a parameter value obtained by introducing a certain degree of jitter on the basis of an initial parameter value (default parameter value) of the character deformation adjustment parameter. In some embodiments, a plurality of sets of setting parameter values are preset, and each set of setting parameter values corresponds to a set of character deformation adjustment parameters. In other embodiments, the set parameter values may not be preset, but rather a set of set parameter values may be determined during the model run by introducing random values for each character deformation adjustment parameter.

Specifically, the writing kinematics theory recognizes that when writing fast, the pen-end velocity signal is formed by overlapping a plurality of overlapped logarithmic gaussian function response signals, wherein each signal controls the shape of the corresponding signal by a plurality of deformation adjustment parameters. Based on this, the preset data augmentation model in the embodiments of the present disclosure may be constructed as follows: the logarithmic gaussian function response signals of a first preset number (for example, 6) are superimposed, and each logarithmic gaussian function response signal is controlled by the parameter value of the character deformation adjustment parameter of a second preset number (for example, 6), for example, the preset data augmentation model may be implemented as a syncsig 2Vec model. When the method is concretely implemented, a certain handwritten character image and a certain set of set parameter values are input into a preset data augmentation model, and the handwritten character image (namely the augmented character image) with character deformation introduced can be output through the operation of the model, wherein the character deformation degree depends on the signal jitter degree caused by the set of set parameter values, and the larger the jitter degree is, the larger the deformation degree of the handwritten character is. According to the process, different sets of set parameter values are converted, different deformation degrees can be introduced into the handwritten character image, and therefore different augmented character images are generated for the handwritten character image. Likewise, different augmented character images may be generated for each handwritten character image by converting different handwritten character images and different sets of set parameter values in accordance with the above process. Thus, a training sample set may be constructed from all handwritten character images and all augmented character images. It should be noted that the augmented character image corresponding to the easy-to-recognize handwritten character image also belongs to the easy-to-recognize handwritten character image, and the augmented character image corresponding to the difficult-to-recognize handwritten character image also belongs to the difficult-to-recognize handwritten character image.

And S120, training the preset neural network model based on the training sample set to generate an image similarity determination model.

Specifically, each character image in the training sample set is used for carrying out iterative training on the preset neural network model so as to continuously adjust model parameters in the preset neural network model until the training is finished, and the final model parameters are obtained. And applying the final model parameters to a preset neural network model to obtain an image similarity determination model. It can be understood that the model architecture of the preset neural network model and the model architecture of the image similarity determination model are the same, and the model parameters are different.

In some embodiments, the preset neural network model includes a feature extraction sub-network and a similarity estimation sub-network; the feature extraction sub-network is a convolutional neural network comprising a fourth preset number of branches, and each branch comprises 5 convolutional layers and 3 pooling layers; the fourth preset number is the third preset number plus 1; the similarity estimation sub-network comprises 3 fully-connected layers and a loss function, and the number of nodes of the last fully-connected layer is a third preset number.

Specifically, the model architecture of the preset neural network model is similar to that of the MatchNet model, and comprises a feature extraction sub-network and a similarity estimation sub-network. The feature extraction sub-network is used for extracting feature information in the image, such as numerical value information and semantic information; the similarity estimation sub-network is used for integrating the characteristic information of each image and evaluating the similarity value between the images. The feature extraction subnetwork is a convolutional neural network model and comprises a fourth preset number of branches, each branch comprises 5 convolutional layers and 3 pooling layers, weights (namely model parameters) can be shared among the fourth preset number of branches, and the weights can be not shared among the fourth preset number of branches. The similarity evaluation sub-network contains 3 fully-connected layers and a loss function, and the 3 rd fully-connected layer is followed by a softmax function. Since the similarity estimation sub-network evaluates the image similarity between two images, and each image corresponds to a branch in the feature extraction sub-network, the number of output nodes of the last full-connection layer in the similarity estimation sub-network is a third preset number, and the third preset number is 1 less than the fourth preset number. The fourth preset number is at least greater than 2, and the larger the numerical value of the fourth preset number is, the more the model branches are, the more the number of images participating in the same model training is, and the higher the identification accuracy of the model is; however, the model operation rate needs to be considered at the same time, and the greater the number of branches, the greater the model operation amount, and the lower the model operation rate. Therefore, according to the service requirement, the model identification accuracy and the model operation rate need to be integrated, and a suitable value of the fourth preset quantity/the third preset quantity needs to be determined.

In some embodiments, S120 comprises: constructing a positive sample set containing each positive sample and a negative sample set containing each negative sample based on the training sample set and the positive and negative sample proportion; and training a preset neural network model based on the positive samples and the negative samples to generate an image similarity determination model.

The positive sample comprises any one of easily-recognized handwritten character images and a third preset number of difficultly-recognized handwritten character images, and the characters in the corresponding difficultly-recognized handwritten character images are the same as the characters in the corresponding easily-recognized handwritten character images. The negative examples comprise any one of easily recognizable handwritten character images and a third preset number of other handwritten character images; the other handwritten character images comprise easily-identified handwritten character images and/or difficultly-identified handwritten character images; the characters in the other handwritten character image are different from the characters in the corresponding easily recognizable handwritten character image. That is, the number of handwritten character images included in the positive sample and the negative sample is consistent with the number of branches in the preset neural network model, and the first handwritten character image is an easily-recognizable handwritten character image. The difference is that the remaining third preset number of images in the positive sample and the first easily recognizable handwritten character image contain the same characters, and the remaining third preset number of images are all hard-to-recognize handwritten character images. The residual third preset number of images in the negative sample can be images of the hard-to-recognize handwritten characters, images of the easy-to-recognize handwritten characters, and a mixture of the two images; and the characters contained in the remaining third predetermined number of images are different from the characters contained in the first legible handwritten character image.

Specifically, according to the positive-negative sample ratio, as positive sample: negative examples are 1: and 3, combining the images in the training sample set to generate a plurality of positive samples to form a positive sample set, and forming a plurality of negative samples to form a negative sample set. And then, taking the positive sample and the negative sample as an input image group of a preset neural network model, training the model, and generating an image similarity determination model. The advantage of setting up like this lies in can utilizing more comprehensive data to carry out model training, and has increased the data bulk of the negative sample of participating in model training relatively, further ensures the model accuracy.

According to the technical scheme of the embodiment of the disclosure, a training sample set is generated by collecting a plurality of handwritten character images and performing data augmentation operation based on each handwritten character image and a preset data augmentation model; the handwritten character image comprises an easily-identified handwritten character image and a difficultly-identified handwritten character image, and the preset data augmentation model is constructed on the basis of writing kinematics; and training the preset neural network model based on the training sample set to generate an image similarity determination model. The method and the device have the advantages that data amplification is carried out on a small number of collected handwritten character images based on the writing kinematics principle to obtain a large number of training samples, the problem of manpower consumption caused by the fact that a large number of handwritten character images are difficult to distinguish are collected manually is avoided, the problem of low model accuracy caused by the fact that the number of samples is too small is also avoided, and the model training efficiency and the recognition accuracy of the model on the handwritten character images are improved.

Fig. 2 is a flowchart of another image similarity determination model training method provided in the embodiment of the present disclosure. The image similarity determination model training method is elaborated in detail by taking a handwritten character image as a handwritten digital image and taking the third preset number of 9, namely the number of branches of the image similarity determination model as 10 branches as an example. On the basis, the preset neural network model can be trained based on the positive samples and the negative samples to generate an image similarity determination model, and optimization can be further performed. Wherein explanations of the same or corresponding terms as those of the above embodiments are omitted. Referring to fig. 2, the image similarity determination model training method includes:

s210, collecting a plurality of handwritten character images.

Specifically, some easily recognizable handwritten digital images including 0-9 and some hard recognizable handwritten digital images including 0-9 are collected.

S220, inputting the set parameter values of each handwritten character image and each character deformation adjusting parameter into a preset data augmentation model, and generating augmentation character images of corresponding handwritten character images.

And S230, constructing a training sample set by each handwritten character image and each augmented character image.

S240, based on the training sample set and the positive and negative sample proportion, a positive sample set containing each positive sample and a negative sample set containing each negative sample are constructed.

Specifically, the positive sample contains 1 easily recognizable handwritten digital image and 9 randomly selected hard-to-recognize handwritten character images containing the same number. The negative examples contain 1 easily identifiable handwritten digital image and 9 randomly selected easily identifiable handwritten digital images and/or easily identifiable handwritten digital images containing other numbers. Other numbers here refer to numbers that are different from the numbers contained in their corresponding easily recognizable handwritten digital images.

And S250, inputting the easily-identified handwritten character images in any sample in the training sample set into a first branch of a preset neural network model, respectively inputting the residual handwritten character images in the sample into each residual branch of the preset neural network model, operating the preset neural network model, and outputting a model training result.

Specifically, one sample, which may be a positive sample or a negative sample, is selected for each training. Then, inputting the first easily-identified handwritten digital image in the selected sample into a first branch of a preset neural network model, respectively inputting the remaining 9 handwritten digital images in the selected sample into the remaining 9 branches of the preset neural network model, and obtaining image similarity values between every two adjacent branches through model operation, namely obtaining a model training result containing 9 image similarity values.

And S260, determining a loss value by using a loss function based on the model training result, the sample type of the sample and the training reference value of the corresponding sample type.

Wherein the sample type is a positive sample type or a negative sample type. The training reference value is a "similarity true value" corresponding to the input sample. For example, if the input sample is a positive sample, then any image similarity value that is output should theoretically be 1; conversely, if the input samples are negative samples, then the similarity value of any output image should be 0 theoretically. In this embodiment, the training reference value for the positive sample type is 1, and the training reference value for the negative sample type is 0.

Specifically, in order to optimize the model parameters, a loss value needs to be calculated in each training process, and error back transmission is performed by using the loss value. In specific implementation, the training reference value is determined according to the sample type of the input sample. Then, a loss value is calculated from the 9 image similarity values in the model training result and the training reference value.

In some embodiments, S260 comprises: if the sample type of the sample is the positive sample type, determining a loss value by using a loss function based on the minimum image similarity value in the model training result and the training reference value of the positive sample type; and if the sample type of the sample is a negative sample type, determining a loss value by using a loss function based on the maximum image similarity value in the model training result and the training reference value of the negative sample type.

Specifically, in order to maximize the difference between the image similarity corresponding to the positive samples and the image similarity corresponding to the negative samples, the loss value is calculated in the present embodiment using a loss function as shown below:

wherein Loss is the Loss value, P_minIs the minimum image similarity value, P_maxThe maximum image similarity value.

If the input sample is a positive sample, selecting the image similarity value with the minimum value from the 9 image similarity values, and calculating to obtain a loss value by using a loss function calculation formula corresponding to the positive sample type. Therefore, the loss value can be maximized, and the image similarity value corresponding to the positive sample is increased according to the iteration target that the loss value tends to 0 in the iteration process. If the input sample is a negative sample, selecting the image similarity value with the maximum value from the 9 image similarity values, and calculating to obtain a loss value by using a loss function calculation formula corresponding to the negative sample type. Therefore, the loss value can be maximized, and according to the iteration target that the loss value tends to 0 in the iteration process, the optimization result of model training is that the image similarity value corresponding to the positive sample is larger and larger, and the image similarity value corresponding to the negative sample is smaller and smaller, and the model convergence can be accelerated, so that the model training efficiency is further improved.

And S270, if the loss value or the training times do not meet the model convergence condition, adjusting the model parameters of the preset neural network model based on the loss value.

The model convergence condition is a preset convergence condition, and may be that the loss value is smaller than a set threshold, that the number of times of training reaches a set number of times, or that the two are combined.

Specifically, if the model training does not satisfy the model convergence condition, the loss value is required to be used for error back transmission, so as to adjust the model parameters of the preset neural network model.

And S280, continuing the model training process until the loss value or the training times meet the model convergence condition, and generating an image similarity determination model.

Specifically, after the model parameters are adjusted, the next model training process should be continued, and S250 to S280 are continuously repeated until the model convergence condition is satisfied, and the model training process is ended to obtain the image similarity determination model.

According to the technical scheme of the embodiment of the disclosure, the image similarity determination model is set to 10 model branches, so that the number of images participating in image similarity evaluation at one time is increased, the feature extraction strength of the model on the handwritten character image is enhanced, and the identification accuracy of the model on the handwritten character image is further improved. Inputting easily-identified handwritten character images in any sample in a training sample set into a first branch of a preset neural network model, respectively inputting residual handwritten character images in the sample into each residual branch of the preset neural network model, operating the preset neural network model, and outputting a model training result; the samples are positive samples or negative samples, and the model training result is a third preset number of image similarity values; determining a loss value by using a loss function based on the model training result, the sample type of the sample and the training reference value of the sample type; wherein the sample type is a positive sample type or a negative sample type; if the loss value or the training frequency does not meet the model convergence condition, adjusting the model parameters of the preset neural network model based on the loss value, continuing the model training process until the loss value or the training frequency meets the model convergence condition, and generating an image similarity determination model. The loss value in the model training process is determined by using the appropriate loss function, iterative training of the image similarity determination model is further performed, the convergence speed of the loss function in the model training process is increased, and therefore the model training efficiency is further improved.

The image similarity determining method provided by the embodiment of the disclosure is mainly applicable to the situation of similarity determination of handwritten character images (such as numbers, letters, symbols, Chinese characters and the like), and is particularly applicable to the similarity determining process of difficultly-recognized handwritten character images. The image similarity determining method provided by the embodiment of the present disclosure may be executed by an image similarity determining apparatus, where the apparatus may be implemented in a software and/or hardware manner, and the apparatus may be integrated in an electronic device with an image processing function, such as a mobile phone, a palm computer, a tablet computer, a notebook computer, a desktop computer, a server, or a super computer.

Fig. 3 is a flowchart of an image similarity determining method according to an embodiment of the present disclosure. Referring to fig. 3, the image similarity determining method specifically includes:

s310, acquiring an easily-recognized handwritten character image and a difficultly-recognized handwritten character image.

Specifically, an image of the hard-to-recognize handwritten character to be recognized and an image of the easy-to-recognize handwritten character to be matched are obtained.

S320, inputting the easily-identified handwritten character image into a first branch of the image similarity determination model, respectively inputting the difficultly-identified handwritten character image into each remaining branch of the image similarity determination model, operating the image similarity determination model, and outputting a plurality of image similarity values.

The image similarity determination model is obtained by training in advance by using the image similarity determination model training method provided by any embodiment of the disclosure. In some embodiments, the image similarity determination model includes a feature extraction sub-network and a similarity estimation sub-network; the feature extraction sub-network is a convolutional neural network comprising a fourth preset number of branches, and each branch comprises 5 convolutional layers and 3 pooling layers; the similarity estimation sub-network comprises 3 fully-connected layers and a loss function, and the number of nodes of the last fully-connected layer is a fourth preset number minus 1. Since the image similarity model calculates the similarity values between the images corresponding to every two adjacent model branches, the number of the image similarity values is 1 less than that of the branches of the image similarity determination model.

Specifically, the easily recognizable handwritten character image is input into a first branch of a trained image similarity determination model, the difficultly recognizable handwritten character image is respectively input into the rest branches of the image similarity determination model, and a plurality of image similarity values between the easily recognizable handwritten character image and the difficultly recognizable handwritten character image are calculated through model operation.

In some embodiments, the legible handwritten character image and the unrecognizable handwritten character image are both handwritten digital images, and the fourth predetermined number is 10. In this embodiment, the image similarity determination model has 10 model branches, so that 9 image similarity values between the easily recognizable handwritten digital image and the difficultly recognizable handwritten digital image can be obtained.

S330, determining a target image similarity value between the easily-recognized handwritten character image and the difficultly-recognized handwritten character image based on the image similarity values.

Specifically, the image similarity values may be processed by taking a maximum value, taking an average value, taking a median value, setting a threshold value, and the like, to obtain one or more image similarity values as a target image similarity value between the easily recognizable handwritten character image and the difficultly recognizable handwritten character image.

In some embodiments, S330 comprises: and determining the average value of the similarity values of the images, and determining the average value as the similarity of the target image between the easily-recognized handwritten character image and the difficultly-recognized handwritten character image. Specifically, in order to avoid the interference of human factors caused by setting a threshold value and the output result of the model caused by the mode of the maximum value or the median value and the like is not smooth enough, in this embodiment, an averaging operation is performed on each image similarity value, and the obtained image similarity average value is used as a target image similarity value between the easily-recognizable handwritten character image and the difficultly-recognizable handwritten character image.

According to the technical scheme of the embodiment of the disclosure, the easily-identified handwritten character image and the difficultly-identified handwritten character image are obtained; inputting easily-identified handwritten character images into a first branch of an image similarity determination model, respectively inputting difficultly-identified handwritten character images into each remaining branch of the image similarity determination model, operating the image similarity determination model, and outputting a plurality of image similarity values; the image similarity determination model is obtained by utilizing an image similarity determination model training method disclosed by any embodiment of the disclosure in advance; the number of the image similarity values is less than 1 than the number of branches of the image similarity determination model; based on each image similarity value, a target image similarity value between the easily recognizable handwritten character image and the difficultly recognizable handwritten character image is determined. The method and the device realize the evaluation of the image similarity between the easily-identified handwritten character image and the difficultly-identified handwritten character image by utilizing the pre-trained multi-branch image similarity determination model, and improve the similarity evaluation accuracy of the handwritten character image and the identification accuracy of the difficultly-identified handwritten character image.

Fig. 4 is a schematic structural diagram of an image similarity determination model training device according to an embodiment of the present disclosure. Referring to fig. 4, the apparatus specifically includes:

a training sample set generating module 410, configured to collect a plurality of handwritten character images, perform data augmentation operation based on each handwritten character image and a preset data augmentation model, and generate a training sample set; the handwritten character image comprises an easily-identified handwritten character image and a difficultly-identified handwritten character image, and the preset data augmentation model is constructed on the basis of writing kinematics;

and the model training module 420 is configured to train a preset neural network model based on the training sample set, and generate an image similarity determination model.

In some embodiments, the training sample set generation module 410 is specifically configured to:

inputting the set parameter values of each handwritten character image and each character deformation adjustment parameter into a preset data augmentation model to generate augmented character images of corresponding handwritten character images; the preset data augmentation model is composed of a first preset number of logarithmic Gaussian function response signals, and each logarithmic Gaussian function response signal controls the deformation degree through a second preset number of character deformation adjustment parameters;

and constructing a training sample set by each handwritten character image and each augmented character image.

In some embodiments, model training module 420 includes:

the positive sample set and negative sample set training submodule is used for constructing a positive sample set containing each positive sample and a negative sample set containing each negative sample based on the training sample set and the proportion of the positive sample to the negative sample; the positive sample comprises any one easily-recognized handwritten character image and a third preset number of difficultly-recognized handwritten character images, and the characters in the corresponding difficultly-recognized handwritten character image are the same as the characters in the corresponding easily-recognized handwritten character image; the negative sample comprises any one easily-recognized handwritten character image and a third preset number of other handwritten character images, the other handwritten character images comprise easily-recognized handwritten character images and/or difficultly-recognized handwritten character images, and characters in the other handwritten character images are different from characters in corresponding easily-recognized handwritten character images;

and the model training submodule is used for training the preset neural network model based on the positive samples and the negative samples to generate an image similarity determination model.

In some embodiments, the model training submodule comprises:

the model training result output unit is used for inputting the easily-identified handwritten character images in any sample in the training sample set into a first branch of the preset neural network model, respectively inputting the residual handwritten character images in the sample into each residual branch of the preset neural network model, operating the preset neural network model and outputting a model training result; the samples are positive samples or negative samples, and the model training result is a third preset number of image similarity values;

a loss value determining unit for determining a loss value by using a loss function based on the model training result, the sample type of the sample, and the training reference value of the sample type; wherein the sample type is a positive sample type or a negative sample type;

if the sample type of the sample is the positive sample type, determining a loss value by using a loss function based on the minimum image similarity value in the model training result and the training reference value of the positive sample type;

and if the sample type of the sample is a negative sample type, determining a loss value by using a loss function based on the maximum image similarity value in the model training result and the training reference value of the negative sample type.

Through the image similarity determination model training device provided by the embodiment of the disclosure, data amplification of a small number of collected handwritten character images based on the writing kinematics principle is realized to obtain a large number of training samples, the problem of manpower consumption caused by manually collecting a large number of difficultly distinguished handwritten character images is avoided, the problem of low model accuracy caused by too few samples is also avoided, and the model training efficiency and the recognition accuracy of the model to the handwritten character images are improved.

Fig. 5 is a schematic structural diagram of an image similarity determining apparatus according to an embodiment of the present disclosure. Referring to fig. 5, the apparatus specifically includes:

a handwritten character image obtaining module 510, configured to obtain an easily recognizable handwritten character image and a difficultly recognizable handwritten character image;

an image similarity value output module 520, configured to input the easily identifiable handwritten character image into a first branch of the image similarity determination model, input the difficultly identifiable handwritten character image into each remaining branch of the image similarity determination model, and operate the image similarity determination model to output a plurality of image similarity values; wherein, the image similarity determination model is obtained by training in advance by using the image similarity determination model training method according to any one of claims 1 to 7; the number of the image similarity values is less than 1 than the number of branches of the image similarity determination model;

a target image similarity value determining module 530, configured to determine a target image similarity value between the easily recognizable handwritten character image and the difficultly recognizable handwritten character image based on each image similarity value.

In some embodiments, the target image similarity value determination module 530 is specifically configured to:

and determining the mean value of the similarity values of the images, and determining the mean value as the similarity of the target image.

In some embodiments, the image similarity determination model includes a feature extraction sub-network and a similarity estimation sub-network; the feature extraction sub-network is a convolutional neural network comprising a fourth preset number of branches, and each branch comprises 5 convolutional layers and 3 pooling layers; the similarity estimation sub-network comprises 3 fully-connected layers and a loss function, and the number of nodes of the last fully-connected layer is a fourth preset number minus 1.

In some embodiments, the legible handwritten character image and the unrecognizable handwritten character image are both handwritten digital images, and the fourth predetermined number is 10.

By the image similarity determining device provided by the embodiment of the disclosure, the image similarity between the easily-recognized handwritten character image and the difficultly-recognized handwritten character image is evaluated by utilizing the pre-trained multi-branch image similarity determining model, and the similarity evaluation accuracy of the handwritten character image and the recognition accuracy of the difficultly-recognized handwritten character image are improved.

It should be noted that, in the embodiments of the foregoing apparatuses, the modules, the sub-modules, and the units included in the embodiments are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional modules/sub-modules/units are also only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present disclosure.

Fig. 6 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure. Referring to fig. 6, an electronic device 600 provided by an embodiment of the present disclosure includes: a processor 620 and a memory 610; the processor 620 is configured to execute the steps of the image similarity determination model training method provided in any embodiment of the present disclosure, or execute the steps of the image similarity determination method provided in any embodiment of the present disclosure, by calling a program or instructions stored in the memory 610.

The electronic device 600 shown in fig. 6 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure. As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: one or more processors 620, a memory 610, and a bus 650 that connects the various system components (including the memory 610 and the processors 620).

Bus 650 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 600 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 600 and includes both volatile and nonvolatile media, removable and non-removable media.

The memory 610 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 611 and/or cache memory 612. The electronic device 600 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, the storage system 613 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 650 by one or more data media interfaces. Memory 610 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.

A program/utility 614 having a set (at least one) of program modules 615 may be stored, for example, in memory 610, such program modules 615 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 615 generally perform the functions and/or methods of any of the embodiments described in this disclosure.

The electronic device 600 may also communicate with one or more external devices 660 (e.g., keyboard, pointing device, display 670, etc.), one or more devices that enable a user to interact with the electronic device 600, and/or any devices (e.g., network card, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may be through an input/output interface (I/O interface) 630. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 640. As shown in FIG. 6, the network adapter 640 communicates with the other modules of the electronic device 600 via a bus 650. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The embodiments of the present disclosure also provide a computer-readable storage medium storing a program or instructions for causing a computer to execute the steps of the image similarity determination model training method provided in any of the embodiments of the present disclosure, or execute the steps of the image similarity determination method provided in any of the embodiments of the present disclosure.

The computer storage media of the disclosed embodiments may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be understood that the terminology used in the disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present application. As used in the specification and claims of this disclosure, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are inclusive in the plural, unless the context clearly dictates otherwise. The term "and/or" includes any and all combinations of one or more of the associated listed items. Relational terms such as "first," "second," "third," and "fourth," and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image similarity determination model training method is characterized by comprising the following steps:

2. The method of claim 1, wherein performing a data augmentation operation based on each of the handwritten character images and a preset data augmentation model, generating a training sample set comprises:

3. The method of claim 1, wherein training a preset neural network model based on the training sample set to generate an image similarity determination model comprises:

4. The method of claim 3, wherein the pre-defined neural network model comprises a feature extraction sub-network and a similarity estimation sub-network; the feature extraction sub-network is a convolutional neural network comprising a fourth preset number of branches, and each branch comprises 5 convolutional layers and 3 pooling layers; the fourth preset number is the third preset number plus 1; the similarity estimation sub-network comprises 3 full-connection layers and a loss function, and the number of the last nodes of the full-connection layer is the third preset number.

5. The method of claim 4, wherein training the pre-set neural network model based on the positive samples and the negative samples, and generating the image similarity determination model comprises:

6. The method of claim 5, wherein the determining a loss value using the loss function based on the model training result, the sample type of the sample, and the training reference value for the sample type comprises:

7. The method according to any of claims 3-6, wherein the handwritten character images are handwritten digital images and the third predetermined number is 9.

8. An image similarity determination method, comprising:

inputting the easily-recognized handwritten character image into a first branch of the image similarity determination model, respectively inputting the difficultly-recognized handwritten character image into each remaining branch of the image similarity determination model, operating the image similarity determination model, and outputting a plurality of image similarity values; wherein the image similarity determination model is obtained by training in advance by using the image similarity determination model training method according to any one of claims 1 to 7; the number of the image similarity values is less than 1 than the number of branches of the image similarity determination model;

9. The method of claim 8, wherein determining a target image similarity value between the legible handwritten character image and the hard-to-recognize handwritten character image based on each of the image similarity values comprises:

10. An image similarity determination model training device, comprising:

11. An image similarity determination apparatus, comprising:

12. An electronic device, characterized in that the electronic device comprises:

a processor and a memory;

the processor is adapted to perform the steps of the method of any one of claims 1 to 7, or to perform the steps of the method of any one of claims 8 to 9, by calling a program or instructions stored in the memory.

13. A computer-readable storage medium, characterized in that it stores a program or instructions for causing a computer to perform the steps of the method according to any one of claims 1 to 7, or to perform the steps of the method according to any one of claims 8 to 9.