CN114140664A - Training method of image processing model, and image similarity determining method and device - Google Patents

Training method of image processing model, and image similarity determining method and device Download PDF

Info

Publication number
CN114140664A
CN114140664A CN202111471841.7A CN202111471841A CN114140664A CN 114140664 A CN114140664 A CN 114140664A CN 202111471841 A CN202111471841 A CN 202111471841A CN 114140664 A CN114140664 A CN 114140664A
Authority
CN
China
Prior art keywords
image
similarity
feature
processing model
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111471841.7A
Other languages
Chinese (zh)
Inventor
聂砂
刘海
贾国琛
罗奕康
戴菀庭
丁苏苏
郑江
张士存
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202111471841.7A priority Critical patent/CN114140664A/en
Publication of CN114140664A publication Critical patent/CN114140664A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features

Abstract

The disclosure provides a training method of an image processing model, and an image similarity determining method and device, which can be applied to the field of artificial intelligence, and particularly can be applied to the field of deep learning. The training method of the image processing model comprises the following steps: inputting a first image in the sample image into a feature extraction network to obtain a first image feature; inputting a second image in the sample image into a feature extraction network to obtain a second image feature; wherein the second image comprises a label indicating an actual similarity between the first image and the second image; determining the prediction similarity of the first image and the second image by adopting a similarity determination network based on the first image characteristic and the second image characteristic; and training the image processing model according to the prediction similarity and the actual similarity.

Description

Training method of image processing model, and image similarity determining method and device
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to the field of deep learning, and more particularly, to a training method for an image processing model, an image similarity determination method, an image similarity determination device, an electronic apparatus, a computer-readable storage medium, and a computer program product.
Background
With the development of recognition technologies such as character recognition and image recognition, more and more services will adopt the recognition technology to acquire information. However, in practical applications, due to the influence of the recognition accuracy, the recognized result may have errors, for example, the word "up" is recognized as the word "earth". In order to reduce the adverse effects of erroneous recognition results, correction of the erroneous results is required.
Disclosure of Invention
In view of the foregoing, the present disclosure provides a training method of an image processing model and an image similarity determination method, apparatus, electronic device, computer-readable storage medium, and computer program product.
According to one aspect of the present disclosure, a training method of an image processing model is provided, wherein the image processing model includes a feature extraction network and a similarity determination network; the method comprises the following steps: inputting a first image in the sample image into the feature extraction network to obtain a first image feature; inputting a second image in the sample image into the feature extraction network to obtain a second image feature; wherein the second image includes a label indicating an actual similarity between the first image and the second image; determining a predicted similarity between the first image and the second image using the similarity determination network based on the first image feature and the second image feature; and training the image processing model according to the prediction similarity and the actual similarity.
According to an embodiment of the present disclosure, the determining the predicted similarity between the first image and the second image by using the similarity determination network based on the first image feature and the second image feature includes: and determining the prediction similarity according to a dot product of the first image feature and the second image feature.
According to an embodiment of the present disclosure, the second image includes a plurality of sub-images; the second image feature comprises a sub-image feature of each of the plurality of sub-images; the determining the prediction similarity based on the dot product of the first image feature and the second image feature may include: and determining the prediction similarity between the first image and each sub-image according to the dot product result of the first image characteristic and the sub-image characteristic of each sub-image.
According to an embodiment of the present disclosure, the determining the prediction similarity according to a dot product of the first image feature and the second image feature includes: determining a distance feature between the first image feature and the second image feature by using a dot product operation; and flattening the distance features to obtain the prediction similarity.
According to an embodiment of the present disclosure, the first image includes a target character; the second image includes a plurality of sub-images, a part of the sub-images including a first character similar to the target character, and another part of the sub-images including a second character dissimilar to the target character.
According to an embodiment of the disclosure, the method further comprises: adding the target character to the template image to obtain the first image; acquiring at least one similar character of the target character, and adding the at least one similar character to the template image respectively to obtain a part of image; and acquiring at least one character which is dissimilar to the target character from a preset word stock, and adding the at least one character to the template image respectively to obtain the other part of the image.
According to another aspect of the present disclosure, there is provided an image similarity determining method including: inputting a plurality of images to be identified into a feature extraction network in an image processing model to obtain a first feature matrix; performing dot multiplication operation on the first characteristic matrix and a transposed matrix of the first characteristic matrix to obtain a distance matrix; determining the similarity between the plurality of images to be recognized based on the distance matrix; the image processing model is obtained by training by adopting a training method of any image processing model in the last year.
According to another aspect of the present disclosure, there is provided a training apparatus for an image processing model, wherein the image processing model includes a feature extraction network and a similarity determination network; the above-mentioned device includes: the system comprises a first characteristic determining module, a second characteristic determining module, a predicting module and a model training module; the first characteristic determining module is used for inputting a first image in the sample image into the characteristic extraction network to obtain a first image characteristic; the second characteristic determining module is used for inputting a second image in the sample image into the characteristic extraction network to obtain a second image characteristic; wherein the second image includes a label indicating an actual similarity between the first image and the second image; the prediction module is used for determining the prediction similarity between the first image and the second image by adopting the similarity determination network based on the first image characteristic and the second image characteristic; and the model training module is used for training the image processing model according to the prediction similarity and the actual similarity.
According to another aspect of the present disclosure, there is provided an image similarity determination apparatus including: the characteristic extraction module is used for inputting a plurality of images to be identified into a characteristic extraction network in the image processing model to obtain a first characteristic matrix; a dot multiplication module, configured to perform dot multiplication on the first feature matrix and a transposed matrix of the first feature matrix to obtain a distance matrix; the similarity determining module is used for determining the similarity between the images to be identified based on the distance matrix; the image processing model is obtained by training by adopting a training device of the image processing model.
According to another aspect of the present disclosure, there is provided an electronic device including: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform a training method and/or an image similarity determination method for the image processing model.
According to another aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described image processing model training method and/or image similarity determination method.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the above-described image processing model training method and/or image similarity determination method.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:
fig. 1 schematically illustrates an application scenario diagram of a training method of an image processing model and an image similarity determination method and apparatus according to an embodiment of the present disclosure;
FIG. 2 schematically shows a flow diagram of a method of training an image processing model according to an embodiment of the present disclosure;
FIG. 3 schematically shows a block diagram of a feature extraction network according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a schematic diagram of a training method of an image processing model according to an embodiment of the present disclosure;
FIG. 5 schematically shows a flow chart of an image similarity determination method according to an embodiment of the present disclosure;
FIG. 6 schematically shows a block diagram of a training apparatus for an image processing model according to an embodiment of the present disclosure;
fig. 7 schematically shows a block diagram of the structure of an image similarity determination apparatus according to an embodiment of the present disclosure; and
fig. 8 schematically shows a block diagram of an electronic device adapted to implement a training method and/or an image similarity determination method of an image processing model according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
According to an embodiment of the present disclosure, error correction of an address may be achieved by correcting words describing the address. For example, if the word "soil" has a high similarity to the word "up", the word "soil sea" may be corrected to "shanghai".
In one embodiment, a model lexicon can be constructed in advance, and the rules of the characters can be summarized manually. For example, the regularity of the text may include: the characters adopt an up-down structure, a left-right structure, a cursive head and the like. When error correction is needed, a word with high similarity to an error word can be selected from a pre-constructed model word bank, and the word with high similarity is adopted to replace the error word, so that the error correction function is realized.
By adopting the technical scheme, on one hand, the similarity can be determined only for the characters existing in the model lexicon, and for the characters which do not exist in the model lexicon, the similarity between the character and other characters can not be determined, so that the application range is small. On the other hand, the similarity determination is realized based on a rule summarized manually, and the summarized rule has subjectivity and has the problem of low accuracy.
The embodiment of the disclosure provides a training method of an image processing model, wherein the image processing model comprises a feature extraction network and a similarity determination network; the method comprises the following steps: inputting a first image in the sample image into a feature extraction network to obtain a first image feature; inputting a second image in the sample image into a feature extraction network to obtain a second image feature; wherein the second image comprises a label indicating an actual similarity between the first image and the second image; determining the prediction similarity of the first image and the second image by adopting a similarity determination network based on the first image characteristic and the second image characteristic; and training the image processing model according to the prediction similarity and the actual similarity.
An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.
Fig. 1 schematically shows an application scenario diagram of a training method of an image processing model and an image similarity determination method and apparatus according to an embodiment of the present disclosure.
As shown in fig. 1, the application scenario 100 of this embodiment may include an electronic device 110, and the electronic device 110 may be any electronic device with processing functionality, including but not limited to a smartphone, a tablet, a laptop, a desktop computer, a server, and so on.
The electronic device 110 may recognize the input multiple images 120 to be recognized, for example, to obtain the similarity 130 between the multiple images 120 to be recognized. For example, features of the image 120 to be recognized may be extracted, and the similarity 130 may be determined based on the extracted features.
According to an embodiment of the present disclosure, as shown in fig. 1, the application scenario 100 may further include a server 140. The electronic device 110 may be communicatively coupled to the server 140 via a network, which may include wireless or wired communication links.
Illustratively, the server 140 may be configured to train the image processing model 150, and transmit the trained image processing model 150 to the electronic device 110 in response to a model obtaining request transmitted by the electronic device 110, so as to facilitate the electronic device 110 to process the image. In an embodiment, the electronic device 110 may further send the image to be recognized 120 to the server 140 through a network, and the server processes the obtained image to be recognized 120 according to the trained image processing model 150.
According to an embodiment of the present disclosure, as shown in fig. 1, the application scenario 100 may further include a database 160, the database 160 may maintain a vast number of images, any two of the images constitute an image pair, each image pair may have a label indicating an actual similarity between the two included images. Server 140 may access database 160 and extract partial image pairs from database 160 and use the extracted image pairs as sample images to train image processing model 150.
In training the image processing model 150, a loss function may be used to determine the loss of the image processing model 150 based on the predicted similarity and the actual similarity indicated by the label, and the model training may be completed by minimizing the model loss.
It should be noted that the training method of the image processing model provided by the present disclosure may be executed by the server 140, and the image similarity determination method provided by the present disclosure may be executed by the electronic device 110 or the server 140. Accordingly, the training apparatus of the image processing model provided by the present disclosure may be disposed in the server 140, and the image classification apparatus provided by the present disclosure may be disposed in the electronic device 110 or the server 140.
It should be understood that the number and type of electronic devices, servers, and databases in FIG. 1 are merely illustrative. There may be any number and type of terminal devices, servers, and databases, as the implementation requires.
The following describes in detail a training method of an image processing model according to an embodiment of the present disclosure with reference to fig. 2 to 4 based on the scenario described in fig. 1.
FIG. 2 schematically shows a flow chart of a method of training an image processing model according to an embodiment of the present disclosure.
As shown in fig. 2, the training method 200 of the image processing model of this embodiment includes operations S210 to S240. The image processing model comprises a feature extraction network and a similarity determination network.
In operation S210, a first image in the sample image is input to the feature extraction network, and a first image feature is obtained.
According to an embodiment of the present disclosure, the image may be an image including a target object. The target object can be a text, a graphic and other objects, and the text can include characters of various languages, such as Chinese characters, English letters and the like.
According to an embodiment of the present disclosure, the Feature extraction network may be a convolutional neural network, a Spatial Pyramid Pooling (SPP) network, a Feature Pyramid Network (FPN), or the like. The obtained first image feature may be a feature map with a size (C, H, W) corresponding to one first image. Wherein, C is the number of convolution kernels of the last convolution layer in the feature extraction network, and H, W is the height and width of the feature map respectively.
In operation S220, a second image in the sample image is input to the feature extraction network, and a second image feature is obtained.
According to an embodiment of the present disclosure, the second image comprises a label indicating an actual similarity between the first image and the second image.
In one example, the first image is similar to the second image, e.g., text is similar, colors of the images are similar, etc. The label included in the second image may be a label indicating that the second image is similar to the first image, for example the label may be represented by a label value of 1.
In another example, the first image is not similar to the second image, e.g., the first image includes text that is more different from the text in the first image, or the second image includes no text but an animal. The label included in the second image may be a label indicating that the second image is dissimilar to the first image, for example the label may be represented by a label value of 0.
In another example, the second image may include a plurality of sub-images, each sub-image having a label indicating an actual similarity between the each sub-image and the first image. The second image may comprise a sub-image similar to the first image and/or a sub-image dissimilar to the first image.
In operation S230, a predicted similarity between the first image and the second image is determined using the similarity determination network based on the first image feature and the second image feature.
According to an embodiment of the present disclosure, the similarity determination network may employ a regression network or other network capable of determining the similarity between two images according to the feature map. The predicted similarity may be expressed in terms of a probability of similarity between the second image and the first image.
In operation S240, the image processing model is trained according to the predicted similarity and the actual similarity.
According to an embodiment of the present disclosure, a predetermined loss function may be employed to calculate a loss of the image processing model according to the predicted similarity and the actual similarity. And adjusting the network weight in the image processing model through a back propagation algorithm and the like to finish the training of the image processing model. The predetermined loss function may include a cross entropy loss function, a mean square error loss function, and the like, which is not limited by the present disclosure.
According to the embodiment of the disclosure, after the model is trained by adopting the training method of the image processing model of the embodiment of the disclosure, the similarity among a plurality of images can be determined through the trained model, and compared with the technical scheme that the image rule needs to be manually summarized in the related art, the accuracy of the determined similarity can be improved to a certain extent. In addition, for images which do not appear in the training set, the image features can be extracted through the trained image processing model, and the similarity is determined by comparing the image features, so that the image processing model of the embodiment has a wider application range compared with the technical scheme of constructing the model lexicon in the related art.
According to another embodiment of the present disclosure, the operation S230 may include the following operations: and determining the prediction similarity according to the dot product result of the first image characteristic and the second image characteristic.
According to an embodiment of the present disclosure, a point multiplication operation is employed to determine a prediction similarity between a first image feature and a second image feature. The dot product result obtained by the dot product operation may represent the prediction similarity between the first image feature and the second image feature, and the more similar the two are, the larger the dot product result of the two is. The prediction similarity is determined through point multiplication operation, the calculation amount can be reduced, and therefore training efficiency is improved.
According to another embodiment of the present disclosure, the second image includes a plurality of sub-images. The second image feature comprises a sub-image feature of each of the plurality of sub-images. Accordingly, the operation of determining the prediction similarity from the dot product of the first image feature and the second image feature may include the operations of: and determining the prediction similarity between the first image and each sub-image according to the dot product result of the first image characteristic and the sub-image characteristic of each sub-image.
It should be noted that when the second image includes a plurality of sub-images, the dimensions of the first image feature may be smaller than the dimensions of the second image feature. For example, the number of the first images is 1, the first images may be represented by data having a size (H, W, n), where H, W are the height and width of the first images, respectively, and n is the number of channels of the first images, for example, n takes 3 when the first images are a three-color chart. The second image includes N sub-images, the second image may be represented by data of size (N, H, W, N). Accordingly, after the first image and the second image pass through the feature extraction network, the dimension of the output first image feature is 1 dimension less than that of the second image feature.
In order to make the first image feature have the same dimension as the second image feature, dimension processing may be performed. In one example, the dimension processing may include: and inputting the first image feature into a dimension expansion layer (such as a Lambda layer) to obtain the first image feature after dimension expansion. In an example, the first image may also be copied multiple times to obtain a plurality of first image features, the plurality of first image features are in one-to-one correspondence with the plurality of sub-image features, and then the plurality of first image features and the plurality of sub-image features are subjected to corresponding point multiplication operations to determine the prediction similarity between the first image and each sub-image.
According to an embodiment of the present disclosure, the second image comprises a plurality of sub-images, each sub-image comprising a label indicating an actual similarity between the each sub-image and the first image. For example, the first image includes the text "soil", the plurality of sub-images included by the second image may include the text "king", "eat", "box", etc., respectively, and the label of the second image may include (1, 0, 0) indicating that the text "king" is similar to "soil", and the text "eat", "box" is not similar to the text "soil".
In the embodiment of the disclosure, the image processing model is trained by setting a plurality of sub-images for one first image, so that the image processing model can learn more accurate weight, thereby improving the precision of the image processing model.
According to another embodiment of the present disclosure, the operation of determining the prediction similarity according to the dot product result of the first image feature and the second image feature may include the operations of: firstly, a distance feature between the first image feature and the second image feature is determined by adopting a dot product operation. And then flattening the distance features to obtain the prediction similarity.
According to the embodiment of the present disclosure, in order to make the dimension of the obtained predicted similarity the same as the dimension of the label, considering that the dimension of the label included in the second image may be different from the dimension of the distance feature obtained by the dot product operation, the distance feature is subjected to the flattening processing, so that the dimension of the label included in the second image is the same as the dimension of the label included in the second image.
According to another embodiment of the present disclosure, the first image includes a target character. The second image includes a plurality of sub-images, a portion of the images of the plurality of sub-images including a first character similar to the target character, another portion of the images of the plurality of sub-images including a second character dissimilar to the target character.
According to the embodiment of the disclosure, the characters may include characters, such as chinese characters, english letters, and the like, and the language of the characters is not limited in the embodiment of the disclosure. The characters can be selected and determined by the user based on actual needs, and can also be obtained based on a character library, and the method for obtaining the characters is not limited by the disclosure.
With the adoption of the scheme of the embodiment of the disclosure, on one hand, the first image and the second image respectively comprise characters, so that the trained image processing model can be used for determining the similarity of the characters, for example, the similarity between two characters. On the other hand, the second image includes a first character similar to the target character and a second character dissimilar to the target character, and the image processing model can learn based on the first character and the second character and learn positive examples and negative examples, thereby enabling the trained image processing model to have higher precision.
According to another embodiment of the present disclosure, the training method of the image processing model may further include the following operations: and adding the target character to the template image to obtain a first image. And acquiring at least one similar character of the target character, and adding the at least one similar character on the template image respectively to obtain a part of image. And acquiring at least one character which is dissimilar to the target character from a preset word stock, and adding at least one character to the template image respectively to obtain another partial image.
According to an embodiment of the present disclosure, the template image may be an image of a predetermined size, for example, an image of 50 × 50 pixels in size is adopted as the target image. The target character may be any character, for example, the word "soil" is selected as the target character, and writing the target character into the template image may result in the first image. Similar characters can be selected from a similar character library, for example, a character similar to the character "soil" is searched from the existing similar character library to obtain "king", "working" and "upper", and then each similar character is written into the template image respectively to obtain a part of image in the second image. The dissimilar characters may be selected from a predetermined word stock, for example, characters other than the above-mentioned similar characters are randomly selected from a Xinhua dictionary as dissimilar characters.
According to the embodiment of the disclosure, the mode of adding characters on the sample images is adopted, so that the sizes of the sample images can be kept consistent, and the subsequent feature extraction of the sample images is facilitated. In addition, the second image can be determined based on the acquired similar characters and the acquired dissimilar characters respectively, so that the label of the second image can be determined automatically without manual labeling, and the labor amount of an operator is reduced.
In one example, to improve the accuracy of the trained image processing model, the following scheme may be adopted: the sizes of a plurality of characters including the target character, the similar character and the dissimilar character are made uniform, that is, the heights and widths of the plurality of characters are made uniform. For example, it is also possible to keep the positions of a plurality of characters uniform, that is, to add each of the plurality of characters to a predetermined region of the template image, thereby obtaining the first image and the second image. In one example, the fonts of multiple characters may also be made identical.
In one example, to improve the accuracy of the image processing model, the number of sample images may be increased. For example, a plurality of sets of sample images are determined for each target character, each set of sample images may include 1 similar character and a plurality of dissimilar characters, and then the image processing model is trained over the plurality of sets of sample images. For example, as for the target character "soil", an image including "king, tour, department, audit, logo, food, swell, house, supervisor" may be taken as a first second image, an image including "worker, cigarette, sweet, greedy, porridge, suck, prisoner, tamper, brand" may be taken as a second image, and an image including "worker, frame, nation, puppet, niao, tread, first sight, heavy, clam" may be taken as a third second image. The labels of the three second images are all (1, 0, 0, 0, 0, 0, 0, 0, 0).
Fig. 3 schematically shows a block diagram of a feature extraction network according to an embodiment of the present disclosure.
As shown in FIG. 3, the feature extraction network 300 of this embodiment may include an input layer 310, a scaling layer 320, convolutional layers 330, 350, 370, and also include pooling layers 340, 360, 380, as well as a Dropout layer 390, a flat layer 3100, and a fully-connected layer 3110. It is to be understood that the number of convolutional layers and pooling layers is merely an example to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto.
In the input layer 310, an array obtained by converting a plurality of sample images selected by one iteration training may be input.
The scaling layer 320 is used to scale the elements in the input array equally, and for the array obtained by image conversion, the value of each element in the array ranges from 0 to 255, where 0 represents black and 255 represents white. The scaling layer 320 may be used to divide each element by 255, completing the scaling of the log groups.
Convolutional layers 330, 350, 370 are used to extract features from the input array. Each filter of convolutional layers 330, 350, 370 is a feature map for extracting a feature, and the number of filters determines the number of convolutional layer output features. A first convolutional layer 330 may use 16 filters, a second convolutional layer 350 may use 32 filters, and a third convolutional layer 350 may use 64 filters.
The pooling layers 340, 360, 380 are sandwiched between successive convolutional layers 330, 350, 370, i.e., the first pooling layer 340 is located between the first convolutional layer 330 and the second convolutional layer 350, the second pooling layer 360 is located between the second convolutional layer 350 and the third convolutional layer 370, and the third pooling layer 380 is located at the output side of the third convolutional layer 370. Pooling layers 340, 360, 380 are used to compress the amount of data and parameters, reduce overfitting, and reduce the output dimensionality. The pooling layers 340, 360, 380 may employ a maximum pooling layer or an average pooling layer.
Dropout layer 390 is used to discard a portion of the neurons from the neural network during training, preventing overfitting.
The flat layer 3100 is used to convert the multi-dimensional array into a low-dimensional array, such as a one-dimensional array, and the amount of data in the high-dimensional array is the same as the amount of data in the low-dimensional array.
The fully-connected layer 3110 takes as input the output of the flattened layer for mapping the output of the flattened layer to the similarity space.
It should be understood that the structure of the feature extraction network described above is merely an example, and that the structure may be adjusted as necessary.
Fig. 4 schematically illustrates a principle schematic of a training method of an image processing model according to an embodiment of the present disclosure.
As shown in fig. 4, in the embodiment 400 of the present disclosure, a first image may be input to a feature extraction network 410, resulting in a first image feature. The first image feature may be represented, for example, by a feature array of size (None, 128). Wherein, None is the number of samples selected by one training, and the value can be 32, 64, 256, etc. The second image is input into the feature extraction network 410 to obtain a second image feature. The second image feature may be represented, for example, by an array of features of size (None, 9, 128). Where 9 denotes the number of sub-images included in the second image. The first image feature is dimension expanded using dimension expansion layer 430 to obtain the expanded first image feature, for example, expanding (None, 128) to a size (None, 9, 128) feature. And then, performing dot multiplication operation 430 on the second image features and the expanded first image features, and then performing flattening processing on the dot multiplication result by using a flattening layer 440 to obtain the prediction similarity. And then calculating a loss function according to the result of the flattening processing and the label of the second image, and training the image processing model.
Fig. 5 schematically shows a flowchart of an image similarity determination method according to an embodiment of the present disclosure.
As shown in fig. 5, the image similarity determining method 500 of this embodiment may include operations S510 to S530. The image similarity determining method 500 may be implemented by using a feature extraction network in the image processing model obtained by training the training method of the image processing model described above.
In operation S510, a plurality of images to be recognized are input to a feature extraction network in an image processing model, so as to obtain a first feature matrix. The operation S510 may determine the first feature matrix by a method similar to the method described in the foregoing operations S210 to S220, which is not described herein again. The first feature matrix is composed of a plurality of feature vectors corresponding to a plurality of images to be recognized one by one.
In operation S520, a dot product operation is performed on the first feature matrix and the transposed matrix of the first feature matrix to obtain a distance matrix.
In operation S530, a similarity between the plurality of images to be recognized is determined based on the distance matrix.
According to the embodiment of the disclosure, the element in the ith row and the jth column in the distance matrix represents the similarity between the ith image and the jth image in the plurality of images to be recognized.
In one example, all images to be recognized may be input into the feature extraction network in the image processing model in advance, and a distance matrix is obtained. When the similarity between two images needs to be determined, the predetermined distance matrix is called to determine the similarity. For example, all characters in a predetermined word stock (for example, 11200 characters in the Xinhua dictionary) are respectively converted into images to obtain 11200 images, and then the 11200 images are input into a feature extraction network in an image processing model to obtain a distance matrix. When the similarity between the upper part and the earth is required to be determined, calling a distance matrix to search the similarity between the upper part and the earth and the similarity between the Bohai and the earth.
In another example, a plurality of images to be recognized may be input to a feature extraction network in the image processing model, and the similarity may be calculated. For example, when the similarity between "up" and "Bohai" and "soil" is required to be determined, the images respectively including "up", "Bohai" and "soil" are input into a feature extraction network in an image processing model, so as to obtain the features of "up", "Bohai" and "soil", respectively, and the similarity between "up" and "Bohai" and "soil" can be directly calculated according to the features.
Based on the training method of the image processing model, the disclosure also provides a training device of the image processing model. The apparatus will be described in detail below with reference to fig. 6.
Fig. 6 schematically shows a block diagram of the structure of a training apparatus of an image processing model according to an embodiment of the present disclosure.
As shown in fig. 6, the training apparatus 600 for an image processing model of this embodiment includes a first feature determination module 610, a second feature determination module 620, a prediction module 630, and a model training module 640. The image processing model comprises a feature extraction network and a similarity determination network.
The first feature determining module 610 is configured to input a first image in the sample image into a feature extraction network, so as to obtain a first image feature. In an embodiment, the first characteristic determining module 610 may be configured to perform the operation S210 described above, which is not described herein again.
The second feature determining module 620 is configured to input a second image in the sample image into a feature extraction network to obtain a second image feature; wherein the second image comprises a label indicating an actual similarity between the first image and the second image. In an embodiment, the second characteristic determining module 620 may be configured to perform the operation S220 described above, which is not described herein again.
The prediction module 630 is configured to determine a predicted similarity between the first image and the second image using a similarity determination network based on the first image feature and the second image feature. In an embodiment, the prediction module 630 may be configured to perform the operation S230 described above, which is not described herein again.
The model training module 640 is configured to train the image processing model according to the prediction similarity and the actual similarity. In an embodiment, the model training module 640 may be configured to perform the operation S240 described above, which is not described herein again.
According to an embodiment of the present disclosure, the prediction module 630 is further configured to determine the prediction similarity according to a dot product of the first image feature and the second image feature.
According to an embodiment of the present disclosure, the second image comprises a plurality of sub-images; the second image feature comprises a sub-image feature of each of the plurality of sub-images; the prediction module 630 comprises a first prediction sub-module for determining a prediction similarity between the first image and each sub-image based on a point multiplication of the first image feature and the sub-image feature of each sub-image.
According to an embodiment of the present disclosure, the first prediction sub-module includes a distance feature determination sub-module and a flat processing sub-module. The distance feature determination submodule is used for determining the distance feature between the first image feature and the second image feature by adopting point multiplication operation. And the flattening processing submodule is used for flattening the distance features to obtain the prediction similarity.
According to an embodiment of the present disclosure, the first image includes a target character. The second image includes a plurality of sub-images, a portion of the images of the plurality of sub-images including a first character similar to the target character, another portion of the images of the plurality of sub-images including a second character dissimilar to the target character.
According to an embodiment of the present disclosure, the apparatus further comprises a first sample determination module, a second sample determination module, and a third sample determination module. The first sample determining module is used for adding the target character on the template image to obtain a first image. The second sample determining module is used for obtaining at least one similar character of the target character, and adding the at least one similar character on the template image respectively to obtain a part of image. And the third sample determining module is used for acquiring at least one character which is dissimilar to the target character from the preset word stock, and adding at least one character on the template image respectively to obtain another part of image.
According to an embodiment of the present disclosure, any plurality of the first feature determination module 610, the second feature determination module 620, the prediction module 630, and the model training module 640 may be combined and implemented in one module, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first feature determination module 610, the second feature determination module 620, the prediction module 630, and the model training module 640 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the first feature determination module 610, the second feature determination module 620, the prediction module 630, and the model training module 640 may be implemented at least in part as a computer program module that, when executed, may perform corresponding functions.
Based on the image similarity determining method, the disclosure also provides an image similarity determining device. The apparatus will be described in detail below with reference to fig. 7.
Fig. 7 schematically shows a block diagram of the structure of an image similarity determination apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the image similarity determination apparatus 700 of this embodiment may include a feature extraction module 710, a dot product module 720, and a similarity determination module 730.
The feature extraction module 710 is configured to input a plurality of images to be recognized into a feature extraction network in the image processing model, so as to obtain a first feature matrix. The feature extraction network in the image processing model may be obtained by training with the training device of the classification model described above. In an embodiment, the feature extraction module 710 may be configured to perform the operation S510 described above, which is not described herein again.
The dot multiplication module 720 is configured to perform dot multiplication on the first feature matrix and the transposed matrix of the first feature matrix to obtain a distance matrix. The dot product module 720 can be used to perform the operation S520 described above, and will not be described herein.
The similarity determination module 730 is configured to determine similarity between the plurality of images to be recognized based on the distance matrix. The similarity determining module 730 may be configured to perform the operation S530 described above, and is not described herein again.
In the technical scheme of the present disclosure, the processes of acquiring, storing, using, processing, transmitting, providing, disclosing and applying the related images all conform to the regulations of related laws and regulations, and necessary security measures are taken without violating the good custom of the public order.
Fig. 8 schematically shows a block diagram of an electronic device adapted to implement a training method and/or an image similarity determination method of an image processing model according to an embodiment of the present disclosure.
As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., Application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing different actions of the method flows according to embodiments of the present disclosure.
In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 800 may also include input/output (I/O) interface 805, input/output (I/O) interface 805 also connected to bus 804, according to an embodiment of the present disclosure. Electronic device 800 may also include one or more of the following components connected to I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 802 and/or RAM 803 described above and/or one or more memories other than the ROM 802 and RAM 803.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the item recommendation method provided by the embodiment of the disclosure.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 801. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via communication section 809, and/or installed from removable media 811. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (12)

1. A training method of an image processing model is provided, wherein the image processing model comprises a feature extraction network and a similarity determination network; the method comprises the following steps:
inputting a first image in the sample image into the feature extraction network to obtain a first image feature;
inputting a second image in the sample image into the feature extraction network to obtain a second image feature; wherein the second image comprises a label indicating an actual similarity between the first image and the second image;
determining a predicted similarity between the first image and the second image using the similarity determination network based on the first image feature and the second image feature; and
and training the image processing model according to the prediction similarity and the actual similarity.
2. The method of claim 1, the determining a predicted similarity between the first image and the second image using the similarity determination network based on the first image feature and the second image feature, comprising:
and determining the prediction similarity according to the dot product result of the first image characteristic and the second image characteristic.
3. The method of claim 2, wherein the second image comprises a plurality of sub-images; the second image feature comprises a sub-image feature of each of the plurality of sub-images; the determining the prediction similarity according to the dot product of the first image feature and the second image feature comprises:
and determining the prediction similarity between the first image and each sub-image according to the dot product result of the first image characteristic and the sub-image characteristic of each sub-image.
4. The method of claim 2, wherein the determining the predicted similarity from the dot product of the first image feature and the second image feature comprises:
determining a distance feature between the first image feature and the second image feature by adopting a point multiplication operation; and
and flattening the distance features to obtain the prediction similarity.
5. The method of claim 1, wherein:
the first image includes a target character;
the second image includes a plurality of sub-images, a portion of the images of the plurality of sub-images including a first character similar to the target character, another portion of the images of the plurality of sub-images including a second character dissimilar to the target character.
6. The method of claim 5, further comprising:
adding the target character to a template image to obtain the first image;
acquiring at least one similar character of the target character, and adding the at least one similar character to the template image respectively to obtain a part of image; and
and acquiring at least one character which is dissimilar to the target character from a preset word stock, and adding the at least one character to the template image respectively to obtain the other part of the image.
7. An image similarity determination method, comprising:
inputting a plurality of images to be identified into a feature extraction network in an image processing model to obtain a first feature matrix;
performing dot multiplication operation on the first characteristic matrix and a transposed matrix of the first characteristic matrix to obtain a distance matrix; and
determining similarity between the plurality of images to be identified based on the distance matrix;
wherein the image processing model is trained using the method of any one of claims 1 to 6.
8. A training device of an image processing model, wherein the image processing model comprises a feature extraction network and a similarity determination network; the device comprises:
the first characteristic determining module is used for inputting a first image in the sample image into the characteristic extraction network to obtain a first image characteristic;
the second characteristic determining module is used for inputting a second image in the sample image into the characteristic extraction network to obtain a second image characteristic; wherein the second image comprises a label indicating an actual similarity between the first image and the second image;
a prediction module, configured to determine a predicted similarity between the first image and the second image by using the similarity determination network based on the first image feature and the second image feature; and
and the model training module is used for training the image processing model according to the prediction similarity and the actual similarity.
9. An image similarity determination apparatus comprising:
the characteristic extraction module is used for inputting a plurality of images to be identified into a characteristic extraction network in the image processing model to obtain a first characteristic matrix;
the dot multiplication module is used for performing dot multiplication operation on the first characteristic matrix and a transposed matrix of the first characteristic matrix to obtain a distance matrix; and
the similarity determining module is used for determining the similarity between the images to be identified based on the distance matrix;
wherein the image processing model is trained using the apparatus of claim 8.
10. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.
11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 7.
12. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 7.
CN202111471841.7A 2021-12-02 2021-12-02 Training method of image processing model, and image similarity determining method and device Pending CN114140664A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111471841.7A CN114140664A (en) 2021-12-02 2021-12-02 Training method of image processing model, and image similarity determining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111471841.7A CN114140664A (en) 2021-12-02 2021-12-02 Training method of image processing model, and image similarity determining method and device

Publications (1)

Publication Number Publication Date
CN114140664A true CN114140664A (en) 2022-03-04

Family

ID=80388023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111471841.7A Pending CN114140664A (en) 2021-12-02 2021-12-02 Training method of image processing model, and image similarity determining method and device

Country Status (1)

Country Link
CN (1) CN114140664A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170893A (en) * 2022-08-29 2022-10-11 荣耀终端有限公司 Training method of common-view gear classification network, image sorting method and related equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115170893A (en) * 2022-08-29 2022-10-11 荣耀终端有限公司 Training method of common-view gear classification network, image sorting method and related equipment

Similar Documents

Publication Publication Date Title
US11657602B2 (en) Font identification from imagery
US11710293B2 (en) Target detection method and apparatus, computer-readable storage medium, and computer device
US20210271917A1 (en) Image processing method and apparatus, electronic device, and storage medium
US11507800B2 (en) Semantic class localization digital environment
US10789504B2 (en) Method and device for extracting information in histogram
US20200320273A1 (en) Remote sensing image recognition method and apparatus, storage medium and electronic device
WO2021135254A1 (en) License plate number recognition method and apparatus, electronic device, and storage medium
WO2017067456A1 (en) Method and device for recognizing character string in image
US11768876B2 (en) Method and device for visual question answering, computer apparatus and medium
CN107886082B (en) Method and device for detecting mathematical formulas in images, computer equipment and storage medium
US11887270B2 (en) Multi-scale transformer for image analysis
AU2021354030B2 (en) Processing images using self-attention based neural networks
CN109858327B (en) Character segmentation method based on deep learning
US20180365594A1 (en) Systems and methods for generative learning
CN114140664A (en) Training method of image processing model, and image similarity determining method and device
CN115512340A (en) Intention detection method and device based on picture
US11200676B2 (en) Shift invariant loss for deep learning based image segmentation
CN115937875A (en) Text recognition method and device, storage medium and terminal
US11837000B1 (en) OCR using 3-dimensional interpolation
CN117058437B (en) Flower classification method, system, equipment and medium based on knowledge distillation
US11983903B2 (en) Processing images using self-attention based neural networks
US20240062560A1 (en) Unified scene text detection and layout analysis
CN115841596A (en) Multi-label image classification method and training method and device of multi-label image classification model
CN115761449A (en) High-resolution image small target detection method
CN115775386A (en) User interface component identification method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination