CN111738270A

CN111738270A - Model generation method, device, equipment and readable storage medium

Info

Publication number: CN111738270A
Application number: CN202010866710.8A
Authority: CN
Inventors: 秦勇; 李兵
Original assignee: Beijing Yizhen Xuesi Education Technology Co Ltd
Current assignee: Beijing Yizhen Xuesi Education Technology Co Ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2020-10-02
Anticipated expiration: 2040-08-26
Also published as: CN111738270B

Abstract

The application provides a model generation method, a device, equipment and a readable storage medium, comprising the following steps: acquiring a label containing a plurality of groups of image pairs and image pairs, wherein the label is used for indicating whether two images in the image pairs are similar or not; shearing a preset target area of the image pair to obtain an area image pair; performing down-sampling on the image pair by a preset sampling multiple to obtain a down-sampled image pair; acquiring a neural network model to be trained, wherein the neural network to be trained comprises a feature network and a classification network, the feature network comprises four branches, the weights of the four branches are different, each branch comprises a plurality of basic blocks and a first full-connection layer which are connected in series, the outputs of the basic blocks are connected with the first full-connection layer, and the outputs of the first full-connection layer are connected with the classification network; and correspondingly inputting the regional image pair and the downsampled image pair into four branches of the characteristic network to train the neural network to be trained so as to obtain an image similarity judgment model. The method and the device improve the accuracy of image similarity evaluation.

Description

Model generation method, device, equipment and readable storage medium

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a model generation method, apparatus, device, and readable storage medium.

Background

The conventional image similarity evaluation method based on deep learning has a very good effect in the image similarity evaluation problem of a natural scene. Compared with a handwritten digital character image, the resolution of the natural scene image is higher, the content is rich, and more detailed information is contained. However, the handwritten digital character images are not only single in content, but also not very different in detail. Therefore, the conventional evaluation method has a poor effect when used for evaluating the similarity of the handwritten digital character images.

Disclosure of Invention

The embodiment of the application provides a model generation method, a device, equipment and a readable storage medium, which are used for solving the problems in the related art, and the technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a model generation method, including:

acquiring a label containing a plurality of groups of image pairs and image pairs, wherein the label is used for indicating whether two images in the image pairs are similar or not;

shearing a preset target area of the image pair to obtain an area image pair;

performing down-sampling on the image pair by a preset sampling multiple to obtain a down-sampled image pair;

acquiring a neural network model to be trained, wherein the neural network to be trained comprises a feature network and a classification network, the feature network comprises four branches, the weights of the four branches are different, each branch comprises a plurality of basic blocks and a first full-connection layer which are connected in series, the outputs of the basic blocks are connected with the first full-connection layer, and the outputs of the first full-connection layer are connected with the classification network;

and correspondingly inputting the regional image pair and the downsampled image pair into four branches of the characteristic network to train the neural network to be trained so as to obtain an image similarity judgment model.

In a second aspect, an embodiment of the present application provides a model generation apparatus, including:

the training image acquisition module is used for acquiring a label containing a plurality of groups of image pairs and image pairs, and the label is used for indicating whether two images in the image pairs are similar or not;

the training image shearing module is used for shearing a preset target area of the image pair to obtain an area image pair;

the training image down-sampling module is used for performing down-sampling on the image pair by a preset sampling multiple to obtain a down-sampled image pair;

the model acquisition module is used for acquiring a neural network model to be trained, wherein the neural network to be trained comprises a feature network and a classification network, the feature network comprises four branches, the weights of the four branches are different, each branch comprises a plurality of basic blocks and a first full connection layer which are connected in series, the outputs of the basic blocks are connected with the first full connection layer, and the output of the first full connection layer is connected with the classification network;

and the training module is used for correspondingly inputting the regional image pair and the downsampled image pair into the four branches of the characteristic network so as to train the neural network to be trained and obtain the image similarity judgment model.

In a third aspect, an embodiment of the present application provides a model generation device, including: a memory and a processor. Wherein the memory and the processor are in communication with each other via an internal connection path, the memory is configured to store instructions, the processor is configured to execute the instructions stored by the memory, and the processor is configured to perform the method of any of the above aspects when the processor executes the instructions stored by the memory.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which stores a computer program, and when the computer program runs on a computer, the method in any one of the above-mentioned aspects is executed.

The advantages or beneficial effects in the above technical solution at least include: the network structures of the four branches of the feature network are the same but the weights are different, so that the similarity learning effect of the image with small detail difference is improved. In addition, the first full-connection layer combines the feature vectors of different basic blocks, focuses on different resolution information of the images, and is beneficial to better distinguishing whether two original images are matched or not.

The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present application will be readily apparent by reference to the drawings and following detailed description.

Drawings

In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.

FIG. 1 is a first flowchart of a model generation method according to an embodiment of the present application;

FIG. 2 is a diagram illustrating an example of a structure of a neural network to be trained in a model generation method according to an embodiment of the present application;

FIG. 3 is a diagram illustrating an exemplary structure of a basic block in a model generation method according to an embodiment of the present application;

FIG. 4 is a diagram illustrating an example of the structure of branches in a model generation method according to an embodiment of the present application;

FIG. 5 is a diagram illustrating an example of a classification network in a model generation method according to an embodiment of the present application;

FIG. 6 is a flow chart two of a model generation method according to an embodiment of the present application;

FIG. 7 is a flow chart III of a model generation method according to an embodiment of the present application;

FIG. 8 is an exemplary diagram of a model generation method according to an embodiment of the present application;

FIG. 9 is a block diagram of a model generation apparatus according to an embodiment of the present disclosure;

FIG. 10 is a block diagram of a second embodiment of a model generation method according to the present application;

fig. 11 is a block diagram of a model generation apparatus according to an embodiment of the present application.

Detailed Description

In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present application. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

FIG. 1 shows a flow diagram of a model generation method according to an embodiment of the present application. As shown in fig. 1, the model generation method may include:

s101, acquiring a label containing a plurality of sets of image pairs and image pairs, wherein the label is used for indicating whether two images in the image pairs are similar or not.

And S102, cutting a preset target area of the image pair to obtain an area image pair.

And S103, performing down-sampling on the image pair by a preset sampling multiple to obtain a down-sampled image pair.

And S104, obtaining a neural network model to be trained. The neural network to be trained comprises a feature network and a classification network. The feature network comprises four branches, and the weights of the four branches are different. Each branch comprises a plurality of serially connected basic blocks and a first fully connected layer. The outputs of the basic blocks are all connected with a first full connection layer, and the outputs of the first full connection layer are connected with a classification network.

And S105, correspondingly inputting the area image pair and the downsampled image pair into four branches of the characteristic network to train the neural network to be trained, so as to obtain an image similarity judgment model.

Referring to fig. 2, fig. 2 is an exemplary diagram of the neural network structure to be trained in step S104. Included in the feature network are branch 201, branch 202, branch 203, and branch 204. It is assumed that the region image pair includes a region image 101 and a region image 102, and the down-sampled image pair includes a down-sampled image 103 and a down-sampled image 104. Region image 101 input branch 201, region image 102 input branch 202, downsampled image 103 input branch 203, downsampled image input branch 204. The feature vectors output by the four branches are serially connected to serve as a feature network extracted by the feature network, and the classification network outputs a similarity result of an image pair based on the features extracted by the feature network.

In step S104, a plurality of basic blocks are connected in series, so that the input of each basic block except the first basic block is derived from the feature vector output by the previous basic block.

In this embodiment, first, the clipped pair of area images is beneficial to paying attention to the content in the target area in the original image, and the down-sampled pair of area images is beneficial to paying attention to the semantic information of the original image. And secondly, the network structures of the four branches of the characteristic network are the same but the weights are different, so that the similarity learning effect of the image with small detail difference is improved. In addition, the first full-connection layer combines the feature vectors of different basic blocks, focuses on different resolution information of the images, and is beneficial to better distinguishing whether two original images are matched or not.

In summary, the model generation method provided in this embodiment can better focus on the key semantic information expressed by the image, can improve the similarity learning effect of the image with small detail difference, and the extracted feature information includes various resolution information of the image. Therefore, the method and the device can ensure the evaluation efficiency of the image similarity judgment model and improve the accuracy of similarity evaluation.

In one embodiment, the label for each image pair may be set to 0 or 1, where "0" indicates that the two images do not match and "1" indicates that the two images match.

In one embodiment, the target region is a central region.

Usually the central region of the image contains more semantic information. Such as a character image, with the characters typically being located in the center of the image. Therefore, the center area of the image is cut out, which is beneficial to paying attention to the information transmitted by the image.

In one embodiment, the images of the pair of region images and the pair of down-sampled images are the same size.

Optionally, the region image pair occupies 1/4 of the original image pair. The original image pair is down-sampled by a factor of 2, and the down-sampled image pair accounts for 1/4 of the original image pair.

In one embodiment, the basic block includes a convolution layer, a batch normalization layer, and an activation function layer, the convolution layer, the batch normalization layer, and the activation function layer being alternately distributed.

Multiple convolution operations are determined and set in the basic block, and multiple convolution operations and multiple batch normalization operations on the image can be realized. In addition, the basic block may further include a pooling layer provided after the convolution layer.

Alternatively, referring to fig. 3, fig. 3 is a diagram of an example of a basic block in which the number of layers of the convolution layer, the batch normalization layer, and the activation function layer are all two layers.

Optionally, the activation function may include a RELU function.

In one embodiment, in the same branch of the feature network, according to the size of the first feature vector output by the last basic block, the first feature vectors output by the other basic blocks are subjected to down-sampling processing to obtain second feature vectors subjected to down-sampling processing;

and inputting the first feature vector of the last basic block and the second feature vectors of the rest basic blocks into the first full-connection layer.

Optionally, four basic blocks are included in the branch, that is, a first basic block, a second basic block, a third basic block, and a fourth basic block are included in the branch.

Referring to fig. 4, fig. 4 is a diagram illustrating a branched structure. The branch includes a basic block 401, a basic block 402, a basic block 403, and a basic block 404, which are connected in sequence. The feature handling process within the branch is exemplified as follows.

(1) The basic block 401 outputs a first feature vector 4011, the basic block 402 outputs a first feature vector 4022, the basic block 403 outputs a first feature vector 4033, and the basic block 404 outputs a first feature vector 4044.

(2) The first feature vector 4011, the first feature vector 4022 and the first feature vector 4033 are sequentially subjected to down-sampling processing, so that a second feature vector 40111, a second feature vector 40222 and a second feature vector 40333 which are the same as the first feature vector 4044 are obtained.

(3) The second eigenvector 40111, the second eigenvector 40222, the second eigenvector 40333, and the first eigenvector 4044 are connected in series, and the connected eigenvectors are input to the first fully connected layer.

Optionally, the feature vectors output by the multiple basic blocks are combined in a Feature Pyramid Network (FPN) manner.

In one embodiment, each branch of the feature network is constructed using the Resnet18 neural network model. The Resnet18 neural network model at least comprises two first basic blocks, two second basic blocks, two third basic blocks, two fourth basic blocks and a full connection layer which are connected in sequence. Each basic block comprises two convolution layers, two pooling layers, two batch normalization layers and two activation function layers.

The Resnet18 neural network model is used as a feature extractor, and is beneficial to acquiring high-level semantic information of the image.

In one embodiment, the classification network includes a second fully-connected layer and a normalization layer connected in series.

The classification network, which may also be referred to as a metric network, is used to measure an image distance, where the image distance represents an image similarity.

Optionally, the normalization layer comprises a softmax function. The softmax function is called normalized exponential function. The softmax function is used to "compress" a K-dimensional vector z containing arbitrary real numbers into another K-dimensional real vector σ (z) such that each element in the vector σ (z) ranges between (0, 1) and the sum of all elements is 1.

The normalization layer may output two probability values, a first value indicating the probability that the two images do not match and a second value indicating the probability that the two images match. According to the size of the probability value, not only can whether the image pair is matched be judged, but also the confidence of the judgment result can be determined according to the size of the probability value.

Referring to fig. 5, fig. 5 is a diagram illustrating a structure of a classification network. In this example, the classification network includes three fully connected layers and a normalization layer. The three full-connection layers are connected in sequence, and the last full-connection layer is connected with the normalization layer.

In one embodiment, referring to fig. 6, step S105, comprises:

s601, inputting the regional image pair and the downsampled image into four branches of a feature network correspondingly to obtain a similarity result of a classification network;

and S602, optimizing parameters of the neural network to be trained according to the similarity result, the label of the image pair, the loss function and the gradient back propagation algorithm so as to train the neural network to be trained.

Optionally, the loss function comprises a cross entropy loss (cross entropyloss) function.

Optionally, the similarity result includes a determination result of whether the image pairs are similar, and may further include a confidence of the determination result.

In the model training process, a loss function is also included in the classification network. The loss function is also called the objective function. A loss value of the loss function is determined based on the results output by the normalization layer and the error of the label of the image pair. And turning to backward propagation, and calculating partial derivatives of the loss function to the weight of each neuron layer by layer along the neural network to be trained to form gradient of the loss function to the weight vector as a basis for modifying the weight. The learning of the network is completed in the weight modifying process. And when the error reaches the expected value, namely the loss function is converged, the neural network finishes training to obtain an image similarity judgment model.

In one embodiment, referring to fig. 7, the method of fig. 1 further comprises:

s701, obtaining an image pair to be predicted;

s702, shearing a preset target area of the image pair to be predicted to obtain the image pair of the area to be predicted;

s703, performing down-sampling on the image pair to be predicted by a preset sampling multiple to obtain a down-sampled image pair to be predicted;

and S704, inputting the image pair of the region to be predicted and the image pair of the downsampled image to be predicted into an image similarity judgment model to obtain a similarity result output by the image similarity judgment model.

The model generation method provided by the embodiment of the application can be well suitable for similarity evaluation of character images. The character image has low resolution, single content and small difference in detail. However, the transformation is quite rich and far from the natural scene image is comparable. Different handwriting, different backgrounds, blurring, sketching, smearing and the like cause that detailed information rules of character images are difficult to capture. However, compared with the natural scene image, the semantic information of the character image is very clear and unique. The method and the device for evaluating the similarity of the handwritten digital character images can better focus on key semantic information expressed by the images, can improve the similarity learning effect of the images with small detail difference, and are beneficial to the accuracy rate of similarity evaluation of the handwritten digital character images

The embodiment of the application can be applied to shooting correction of math questions, shooting search questions and the like. The difficulty of recognizing the answer to the question is magnified due to the difference between the writer's handwriting and the standard body. The method of the embodiment of the application can evaluate the similarity of the digital character images. The digital result of the hand-written digital character image is compared with the possible standard digital character image, so that the similarity value between the two is obtained, and powerful prior knowledge is provided for subsequent operations such as question judgment, search and the like.

An example of an embodiment of the present application is given below. This example uses the Resnet18 neural network model as a feature extractor, aimed at obtaining high-level semantic information of handwritten digital character images. Meanwhile, the FPN is used for combining the extracted shallow feature information and deep feature information, vector quantization is carried out on the information, and then the information is input into a measurement network to carry out similarity evaluation.

Structure of neural network model to be trained

The neural network model to be trained comprises a feature network and a classification network.

(1) Feature network

The feature network consists of 4 branches. Each branch uses a Resnet18 neural network model to extract features, combines the features in a feature superposition mode, and all the features share the weight. The outputs of the 4 branches are connected to a classification network.

The Resnet18 model is constructed by stacking basic blocks (BasicBlock), and has 4 basic blocks. A basic block comprising two convolution operations, two batch normalization operations, using RELU as activation function

For the same branch, the feature vector output by each basic block is down-sampled to the same size as the output of the last basic block. And connecting the feature vectors of the first 3 downsampled basic blocks and the feature vectors of the last 1 basic block in series, and then inputting the feature vectors into a full connection layer, wherein the feature vectors output by the full connection layer are used as the feature vectors of the branch.

(2) Classification network

The classification network comprises 3 full connectivity layers, a softmax function and an objective function. Wherein, 3 full connection layers are connected in sequence, and the 3 rd full connection layer is followed by the softmax function. The objective function uses a cross entropy loss (CrossEntropyLoss) function.

(II) model Generation phase

Referring to fig. 8, fig. 8 is a diagram illustrating a process of generating an image similarity determination model. The specific process is as follows:

firstly, a neural network to be trained is constructed.

And secondly, collecting a large number of handwritten digital character images to form an image pair to be used as a training set of the similarity judgment model of the handwritten digital character images.

Third, the center of the image is cropped out for the pair of images in the training set, and the cropped image is the original image 1/4 in size and serves as the center image. The image is down-sampled by 2 times, and the obtained image is also the original image 1/4 in size and is used as a global image.

And fourthly, inputting the central image and the global image into the neural network model to be trained, training the neural network model to be trained, and obtaining an image similarity judgment model for the handwritten digital character image.

Specifically, two central images are respectively input into the 1 st branch and the 2 nd branch of the feature network, and two global images are respectively input into the 3 rd branch and the 4 th branch of the feature network.

For the same branch, each basic block is executed in turn, and the feature vector output by the previous basic block is used as the input feature vector of the next basic block.

The feature vectors output by the first 3 basic blocks are down-sampled so that the size of the feature vectors output by the first 3 basic blocks is the same as the size of the feature vectors output by the last 1 basic block.

And connecting the feature vectors of the first 3 downsampled basic blocks and the feature vectors of the last 1 basic block in series. And carrying out dimension adjustment on the feature vectors after the series connection. And inputting the feature vector after the dimensionality is adjusted to the full connection layer to obtain the feature vector output by the full connection layer as the branched feature vector.

And connecting the feature vectors of the 4 branches in series, and inputting the feature vectors into the classification network to obtain a similarity result output by the classification network.

And optimizing parameters of the whole image similarity judgment model according to the target function, the similarity result, the label of the image pair and the gradient back propagation algorithm.

(III) handwritten character image similarity judging stage

In the first step, an image pair to be predicted is acquired. For example, the image pair includes a handwritten digital character image and a standard font digital character image.

And secondly, cutting out the central position of the image to be predicted, wherein the cut-out image is the size of the original image 1/4 and is used as the central image pair to be predicted. The image pair to be predicted is down-sampled by a factor of 2, and the resulting image is also the size of the original image 1/4, and is taken as the global image pair to be predicted.

And thirdly, inputting the central image pair to be predicted and the global image pair into an image similarity judgment model to obtain a similarity judgment result of the image pairs.

The image similarity determination model can output two probability values, wherein the first value represents the probability that the two images are not matched, and the second value represents the probability that the two images are matched. According to the size of the probability value, whether the image pair is matched or not can be judged, and meanwhile, the confidence degree of the judgment result can be determined according to the size of the probability value.

Fig. 9 is a block diagram showing a structure of a model generation apparatus according to an embodiment of the present application. As shown in fig. 9, the apparatus may include:

a training image obtaining module 901, configured to obtain a label including a plurality of sets of image pairs and image pairs, where the label is used to indicate whether two images in an image pair are similar;

a training image clipping module 902, configured to clip a preset target region to the image pair to obtain a region image pair;

a training image down-sampling module 903, configured to down-sample the image pair by a preset sampling multiple to obtain a down-sampled image pair;

a model obtaining module 904, configured to obtain a neural network model to be trained, where the neural network to be trained includes a feature network and a classification network, the feature network includes four branches, weights of the four branches are different, each branch includes a plurality of basic blocks and a first full connection layer connected in series, outputs of the basic blocks are connected to the first full connection layer, and an output of the first full connection layer is connected to the classification network;

the training module 905 is configured to input the region image pair and the downsampled image pair into four branches of the feature network correspondingly, so as to train a neural network to be trained, and obtain an image similarity determination model.

In one embodiment, the target region is a central region.

In one embodiment, among others,

the feature network includes:

the feature vector down-sampling module is used for performing down-sampling processing on the first feature vectors output by the other basic blocks according to the size of the first feature vector output by the last basic block in the same branch to obtain a second feature vector after the down-sampling processing;

and the feature vector input module is used for inputting the first feature vector of the last basic block and the second feature vectors of the rest basic blocks into the first full-connection layer.

In one embodiment, the training module 905 includes:

the image pair input submodule is used for correspondingly inputting the regional image pair and the downsampled image into four branches of the feature network to obtain a similarity result of the classification network;

and the training submodule is used for optimizing parameters of the neural network to be trained according to the similarity result, the label of the image pair, the loss function and the gradient back propagation algorithm so as to train the neural network to be trained.

In one embodiment, referring to fig. 10, the model generation apparatus 1000 further includes:

a to-be-predicted image obtaining module 1001 for obtaining an image pair to be predicted;

the image to be predicted cutting module 1002 is configured to cut a preset target region of an image pair to be predicted to obtain an image pair of the region to be predicted;

the to-be-predicted image downsampling module 1003 is configured to downsample the to-be-predicted image pair by a preset sampling multiple to obtain a to-be-predicted downsampled image pair;

and the similarity prediction module 1004 is configured to input the image pair of the region to be predicted and the image pair of the downsampled image to be predicted into the image similarity determination model, so as to obtain a similarity result output by the image similarity determination model.

The functions of each module in each apparatus in the embodiment of the present application may refer to corresponding descriptions in the above method, and are not described herein again.

Fig. 11 shows a block diagram of a model generation apparatus according to an embodiment of the present application. As shown in fig. 11, the model generation apparatus includes: a memory 1110 and a processor 1120, the memory 1110 having stored therein computer programs that are executable on the processor 1120. The processor 1120, when executing the computer program, implements the model generation method in the above embodiments. The number of the memory 1110 and the processor 1120 may be one or more.

The model generation apparatus further includes:

the communication interface 1130 is used for communicating with an external device to perform data interactive transmission.

If the memory 1110, the processor 1120, and the communication interface 1130 are implemented independently, the memory 1110, the processor 1120, and the communication interface 1130 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (enhanced Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 11, but this is not intended to represent only one bus or type of bus.

Optionally, in an implementation, if the memory 1110, the processor 1120, and the communication interface 1130 are integrated on a chip, the memory 1110, the processor 1120, and the communication interface 1130 may complete communication with each other through an internal interface.

Embodiments of the present application provide a computer-readable storage medium, which stores a computer program, and when the program is executed by a processor, the computer program implements the method provided in the embodiments of the present application.

The embodiment of the present application further provides a chip, where the chip includes a processor, and is configured to call and execute the instruction stored in the memory from the memory, so that the communication device in which the chip is installed executes the method provided in the embodiment of the present application.

An embodiment of the present application further provides a chip, including: the system comprises an input interface, an output interface, a processor and a memory, wherein the input interface, the output interface, the processor and the memory are connected through an internal connection path, the processor is used for executing codes in the memory, and when the codes are executed, the processor is used for executing the method provided by the embodiment of the application.

It should be understood that the processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be an advanced reduced instruction set machine (ARM) architecture supported processor.

Further, optionally, the memory may include a read-only memory and a random access memory, and may further include a nonvolatile random access memory. The memory may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may include a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available. For example, Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and direct memory bus RAM (DR RAM).

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the present application are generated in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present application includes other implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. All or part of the steps of the method of the above embodiments may be implemented by hardware that is configured to be instructed to perform the relevant steps by a program, which may be stored in a computer-readable storage medium, and which, when executed, includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module may also be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

While the present invention has been described with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of model generation, comprising:

acquiring a label comprising a plurality of sets of image pairs and the image pairs, the label being used for indicating whether two images in the image pairs are similar;

shearing a preset target region of the image pair to obtain a region image pair;

obtaining a neural network model to be trained, wherein the neural network to be trained comprises a feature network and a classification network, the feature network comprises four branches, the weights of the four branches are different, each branch comprises a plurality of basic blocks and a first full-connection layer which are connected in series, the output of each basic block is connected with the first full-connection layer, and the output of each first full-connection layer is connected with the classification network;

and correspondingly inputting the area image pair and the downsampled image pair into four branches of the feature network to train the neural network to be trained so as to obtain an image similarity judgment model.

2. The method of claim 1, wherein the target region is a central region.

3. The method of claim 1, wherein the image sizes of the pair of region images and the pair of downsampled images are the same.

4. The method of claim 1, wherein the basic block comprises a convolutional layer, a bulk normalization layer, and an activation function layer, the convolutional layer, the bulk normalization layer, and the activation function layer being alternately distributed.

5. The method of claim 1, wherein,

in the same branch, according to the size of the first eigenvector output by the last basic block, performing down-sampling processing on the first eigenvectors output by the rest basic blocks to obtain second eigenvectors subjected to down-sampling processing;

6. The method of claim 1, wherein the classification network comprises a second fully-connected layer and a normalization layer connected in series.

7. The method of claim 1, wherein the inputting the region image pair and the downsampled image pair into four branches of the feature network correspondingly trains the neural network to be trained, comprising:

correspondingly inputting the regional image pair and the downsampled image into four branches of the feature network to obtain a similarity result of the classification network;

and optimizing parameters of the neural network to be trained according to the similarity result, the label of the image pair, the loss function and the gradient back propagation algorithm so as to train the neural network to be trained.

8. The method of any one of claims 1-7, further comprising:

acquiring an image pair to be predicted;

shearing a preset target area of the image pair to be predicted to obtain the image pair of the area to be predicted;

performing down-sampling on the image pair to be predicted by a preset sampling multiple to obtain a down-sampled image pair to be predicted;

and inputting the image pair of the region to be predicted and the image pair of the downsampled image to be predicted into the image similarity judgment model to obtain a similarity result output by the image similarity judgment model.

9. A model generation apparatus, comprising:

a training image acquisition module for acquiring a label comprising a plurality of sets of image pairs and the image pairs, the label being used to indicate whether two images in the image pairs are similar;

the model acquisition module is used for acquiring a neural network model to be trained, wherein the neural network to be trained comprises a feature network and a classification network, the feature network comprises four branches, the weights of the four branches are different, each branch comprises a plurality of basic blocks and a first full connection layer which are connected in series, the output of each basic block is connected with the first full connection layer, and the output of each first full connection layer is connected with the classification network;

and the training module is used for correspondingly inputting the region image pair and the downsampling image pair into four branches of the feature network so as to train the neural network to be trained and obtain an image similarity judgment model.

10. The apparatus of claim 9, wherein the target region is a central region.

11. The apparatus of claim 9, wherein the image sizes of the pair of region images and the pair of downsampled images are the same.

12. The apparatus of claim 9, wherein the basic block comprises a convolutional layer, a bulk normalization layer, and an activation function layer, the convolutional layer, the bulk normalization layer, and the activation function layer being alternately distributed.

13. The apparatus of claim 9, wherein,

the feature network includes:

the feature vector down-sampling module is used for performing down-sampling processing on the first feature vectors output by the rest of the basic blocks in the same branch according to the size of the first feature vector output by the last basic block to obtain second feature vectors subjected to down-sampling processing;

14. The apparatus of claim 9, wherein the classification network comprises a second fully-connected layer and a normalization layer connected in series.

15. The apparatus of claim 9, wherein the training module comprises:

and the training submodule is used for optimizing the parameters of the neural network to be trained according to the similarity result, the label of the image pair, the loss function and the gradient back propagation algorithm so as to train the neural network to be trained.

16. The apparatus of any one of claims 9-15, further comprising:

the image acquisition pair module to be predicted is used for acquiring an image pair to be predicted;

the image to be predicted shearing module is used for shearing a preset target area of the image pair to be predicted to obtain an image pair of the area to be predicted;

the image to be predicted downsampling module is used for downsampling the image pair to be predicted by a preset sampling multiple to obtain a downsampled image pair to be predicted;

and the similarity prediction module is used for inputting the image pair of the region to be predicted and the image pair of the downsampled image to be predicted into the image similarity judgment model to obtain a similarity result output by the image similarity judgment model.

17. A model generation apparatus, comprising: a processor and a memory, the memory having stored therein instructions that are loaded and executed by the processor to implement the method of any of claims 1 to 8.

18. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.