CN113344189A - Neural network training method and device, computer equipment and storage medium - Google Patents
Neural network training method and device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN113344189A CN113344189A CN202110696473.XA CN202110696473A CN113344189A CN 113344189 A CN113344189 A CN 113344189A CN 202110696473 A CN202110696473 A CN 202110696473A CN 113344189 A CN113344189 A CN 113344189A
- Authority
- CN
- China
- Prior art keywords
- sample
- image
- sample image
- training
- images
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 222
- 238000000034 method Methods 0.000 title claims abstract description 75
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 35
- 238000012512 characterization method Methods 0.000 claims abstract description 42
- 238000010586 diagram Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012216 screening Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 description 21
- 238000012545 processing Methods 0.000 description 20
- 230000000694 effects Effects 0.000 description 6
- 230000001965 increasing effect Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000002708 enhancing effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000003062 neural network model Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 1
- 241000282818 Giraffidae Species 0.000 description 1
- 241000287530 Psittaciformes Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The present disclosure provides a training method, an apparatus, a computer device and a storage medium for a neural network, the method comprising: determining a first image feature of each first sample image in the first sample group and a second image feature of a second sample image corresponding to each first sample image in a second sample group corresponding to the first sample group by using a target network to be trained; obtaining a first class representation of each first sample image based on the first image characteristics, and obtaining a second class representation of each second sample image based on the second image characteristics; determining first similarity information based on the first category characterization of each first sample image; determining second similarity information based on the second class characterization of each second sample image; and taking the first similarity information as the supervision information of the second sample group, taking the second similarity information as the supervision information of the first sample group, and training the target network to be trained.
Description
Technical Field
The present disclosure relates to the field of computer and image processing technologies, and in particular, to a neural network training method and apparatus, a computer device, and a storage medium.
Background
With the development of artificial intelligence technology, neural networks are applied more and more frequently in the aspects of image classification and the like. According to different classification requirements, training the neural network through the sample image to obtain a neural network model, and then classifying through the neural network model.
Currently, in the training process of the neural network, the neural network considers each sample image as a class and then classifies the sample images. However, some sample images are similar images, and if they are classified into different categories, the classification accuracy is reduced.
Disclosure of Invention
The embodiment of the disclosure at least provides a training method and device of a neural network, computer equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a training method for a neural network, including:
determining a first image feature of each first sample image in a first sample group and a second image feature of a second sample image corresponding to each first sample image in a second sample group corresponding to the first sample group by using a target network to be trained; the first sample image and the corresponding second sample image are different enhanced images of the same original sample image;
classifying the plurality of first sample images based on the first image characteristics to obtain a first class representation of each first sample image, and classifying the plurality of second sample images based on the second image characteristics to obtain a second class representation of each second sample image;
determining first similarity information between every two first sample images in the first sample group based on the first class characterization of each first sample image;
determining second similarity information between every two second sample images in the second sample group based on the second class characterization of each second sample image;
and taking the first similarity information as supervision information of the second sample group, taking the second similarity information as supervision information of the first sample group, and training the target network to be trained.
The method and the device cluster the sample images, and use the first similarity information and the second similarity information as the supervision information, so that on one hand, classification is performed according to similar samples, and classification accuracy can be effectively improved compared with the classification of each image as a category, on the other hand, the similarity information of each group of enhanced images is used as the supervision information of other groups of enhanced images, self-supervision learning can be performed on a target network, and the training efficiency of the target network is improved; meanwhile, the same original sample image is enhanced twice to obtain two corresponding enhanced images (the first sample image and the second sample image), so that the situation that when the image characteristics among different enhanced images are overlapped more due to more times of enhancing the same original sample image is avoided, and the diversity of the sample images is reduced.
In an optional embodiment, the training the target network to be trained by using the first similarity information as the supervision information of the second sample group and using the second similarity information as the supervision information of the first sample group includes:
determining a first training loss based on the first class characteristics of each first sample image and the second similarity information;
determining a second training loss based on a second class characteristic of each second sample image and the first similarity information;
training the target network to be trained based on the first training loss and the second training loss.
In the embodiment of the disclosure, the second similarity information can relatively accurately represent the classification result of the first sample image, and the first similarity information can relatively accurately represent the classification result of the second sample image, so that the target network is trained by using the first training loss determined based on the first class characteristic and the second similarity information of each first sample image and the second training loss determined based on the second class characteristic and the first similarity information of each second sample image, and the training accuracy can be improved.
In an optional embodiment, the method further comprises: determining a third training loss based on the first image features of each first sample image and the corresponding second image features of each second sample image;
the training the target network to be trained based on the first training loss and the second training loss includes:
training the target network to be trained based on the first training loss, the second training loss, and the third training loss.
According to the embodiment of the disclosure, each sample image (the first sample image or the second sample image) is used as a class to establish a third training loss, and then the supervision information obtained by clustering similar sample images is combined to train the target network, so that the target network can learn the capabilities of classifying each sample image and classifying similar samples, and the training accuracy of the target network is improved.
In an alternative embodiment, the determining the third training loss based on the first image feature of each first sample image and the corresponding second image feature of each second sample image includes:
for each first sample image, determining loss information corresponding to the first sample image based on first image features of the first sample image and second image features of a second sample image corresponding to the first sample image;
and determining a third training loss based on the loss information corresponding to each first sample image.
According to the embodiment of the invention, the third training loss can be accurately determined by using the image characteristics of the two sample images of the same original sample image, and the target network can be trained to maximize the consistency between different enhanced images of the same original sample image by using the training loss, so that the classification precision of different enhanced images of the same original sample image can be improved.
In an optional embodiment, the determining first similarity information between each two first sample images in the first sample group based on the first category characterization of each first sample image includes:
based on the first class representation of each first sample image, connecting two first sample images with similarity greater than a first preset threshold value to generate at least one first connection graph;
for each first connection diagram, determining similarity information corresponding to the first connection diagram based on first class representations corresponding to all first sample images in the first connection diagram;
determining first similarity information between every two first sample images in the first sample group based on the similarity information corresponding to each first connection graph;
the determining second similarity information between every two second sample images in the second sample group based on the second class characterization of each second sample image comprises:
based on the second category characterization of each second sample image, connecting two second sample images with similarity greater than a second preset threshold value to generate at least one second connected graph;
for each second connected graph, determining similarity information corresponding to the second connected graph based on second class representations corresponding to all second sample images in the second connected graph;
and determining second similarity information between every two second sample images in the second sample group based on the similarity information corresponding to each second connected graph.
In the connected graph formed by the embodiment of the disclosure, any pair of sample images in each connected subgraph is similar, but the sample images among different connected subgraphs are not similar, and the similarity information can be determined more quickly and accurately by the way of the connected graph, so that the training efficiency and accuracy can be improved.
In an optional embodiment, the method further comprises:
respectively obtaining a plurality of third sample images corresponding to each original sample image; wherein the third sample image is an enhanced image of the corresponding original sample image;
and simultaneously using each third sample image in the third sample images as a first sample image and a second sample image to train the target network.
According to the embodiment of the disclosure, a plurality of third sample images corresponding to each original sample image are obtained and are used as the first sample image and the second sample image, so that the number and diversity of the sample images are increased, and the training precision is improved.
In an optional embodiment, the separately acquiring a plurality of third sample images corresponding to each original sample image includes:
determining a current first iteration number;
and respectively acquiring a plurality of third sample images generated based on each original sample image under the condition that the first iteration times are less than the preset iteration times.
According to the embodiment of the disclosure, under the condition of less iteration times, the number and diversity of the sample images are increased by directly using the enhanced images of the original sample images, and the training precision can be effectively improved.
In an optional embodiment, the separately acquiring a plurality of third sample images corresponding to each original sample image includes:
determining target image features corresponding to other original sample images except the original sample image for each original sample image, wherein the target image features corresponding to the other original sample images are first image features or second image features corresponding to the other original sample images;
respectively determining first image characteristics or second image characteristics of the original sample image and third similarity information between target image characteristics corresponding to each other original sample image;
and acquiring a plurality of third sample images corresponding to the original sample image based on the determined third similarity information.
According to the embodiment of the disclosure, a plurality of third sample images for target network training can be determined accurately through the similarity between the image features, so that the number and diversity of the sample images are increased, and the training accuracy of the target network is improved.
In an optional embodiment, the obtaining, based on the determined third similarity information, a plurality of third sample images corresponding to the original sample image includes:
screening the target image characteristics of which the similarity accords with preset conditions;
and generating a plurality of third sample images based on the first sample image or the second sample image corresponding to the target image features obtained by screening.
The embodiment of the disclosure can generate the enhanced images of the plurality of first sample images or the second sample images, namely, the third sample image, on the basis of the first sample image or the second sample image, thereby increasing the number and diversity of the sample images and being beneficial to improving the training accuracy of the target network.
In an optional embodiment, the determining, for each original sample image, a target image feature corresponding to each other original sample image except the original sample image includes:
determining a current second iteration number;
and under the condition that the second iteration times are larger than the preset iteration times, determining the target image characteristics corresponding to each other original sample image except the original sample image aiming at each original sample image.
According to the embodiment of the disclosure, under the condition of more iteration times, the performance of the target network is higher, the more accurate first image feature or second image feature can be determined, and the third sample image with higher precision can be determined based on the first image feature or the second image feature, so that the training accuracy of the target network can be improved.
In an optional embodiment, after acquiring the plurality of third sample images, the method further includes:
determining a third image characteristic of each third sample image by using a target network to be trained;
the training the target network with each of the third sample images as a first sample image and a second sample image at the same time includes:
classifying the plurality of third sample images based on the third image characteristics to obtain a third class representation of each third sample image;
for each third sample image, respectively determining second loss information between each first sample image and the third sample image based on the third class characterization of the third sample image and the first class characterization of each first sample image;
for each third sample image, determining third loss information between each second sample image and the third sample image based on the third class characterization of the third sample image and the second class characterization of each second sample image, respectively;
determining a fourth training loss based on the determined second loss information and the third loss information;
training the target network to be trained based on the first training loss, the second training loss, the third training loss, and the fourth training loss.
According to the method and the device, a fourth training loss is determined by utilizing a third class representation corresponding to a newly added third sample image, and the first training loss, the second training loss and the third training loss are combined, so that the training precision of the target network is facilitated.
In an optional embodiment, the training the target network to be trained based on the first training loss, the second training loss, the third training loss, and the fourth training loss includes:
for each third sample image, determining fourth loss information between each first sample image and the third sample image based on third image features of the third sample image and first image features of each first sample image, respectively;
for each third sample image, determining fifth loss information between each second sample image and the third sample image based on third image features of the third sample image and second image features of each second sample image, respectively;
determining a fifth training loss based on the determined fourth loss information and the fifth loss information;
training the target network to be trained based on the first training loss, the second training loss, the third training loss, the fourth training loss, and the fifth training loss.
According to the embodiment of the disclosure, a fifth training loss is determined by using a third image feature corresponding to a newly added third sample image, and the first training loss, the second training loss, the third training loss and the fourth training loss are combined, so that the training precision of the target network is facilitated.
In a second aspect, an embodiment of the present disclosure further provides a training apparatus for a neural network, including:
determining a first image feature of each first sample image in a first sample group and a second image feature of a second sample image corresponding to each first sample image in a second sample group corresponding to the first sample group by using a target network to be trained; the first sample image and the corresponding second sample image are different enhanced images of the same original sample image;
classifying the plurality of first sample images based on the first image characteristics to obtain a first class representation of each first sample image, and classifying the plurality of second sample images based on the second image characteristics to obtain a second class representation of each second sample image;
determining first similarity information between every two first sample images in the first sample group based on the first class characterization of each first sample image;
determining second similarity information between every two second sample images in the second sample group based on the second class characterization of each second sample image;
and taking the first similarity information as supervision information of the second sample group, taking the second similarity information as supervision information of the first sample group, and training the target network to be trained.
In a third aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the computer device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect described above, or any possible implementation of the first aspect.
In a fourth aspect, this disclosed embodiment also provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.
For the description of the effects of the training apparatus, the computer device, and the readable storage medium of the neural network, reference is made to the description of the training method of the neural network, and details are not repeated here.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 shows a flowchart of a training method of a neural network provided by an embodiment of the present disclosure;
FIG. 2 illustrates a schematic diagram of the effect of a sample image provided by an embodiment of the present disclosure;
FIG. 3 illustrates a flow diagram of network training provided by an embodiment of the present disclosure;
FIG. 4 illustrates a schematic diagram of a connectivity graph provided by an embodiment of the present disclosure;
FIG. 5 is a diagram illustrating an effect of clustering sample images provided by an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a training apparatus for a neural network provided by an embodiment of the present disclosure;
fig. 7 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
With the development of artificial intelligence technology, neural networks are applied more and more frequently in the aspects of image classification and the like. According to different classification requirements, training the neural network through the sample image to obtain a neural network model, and then classifying through the neural network model.
Currently, the trained neural network is classified by considering each sample image as a class. However, some sample images are similar images, and if they are classified into different categories, the classification accuracy is reduced.
Based on this, the embodiment of the disclosure clusters sample images, and uses the first similarity information and the second similarity information as supervision information, on one hand, classification is performed according to similar samples, and compared with classifying each image as a category, classification accuracy can be effectively improved, on the other hand, the similarity information of each group of enhanced images is used as supervision information of other groups of enhanced images, so that self-supervision learning can be performed on a target network, and training efficiency of the target network is improved; meanwhile, the same original sample image is enhanced twice to obtain two corresponding enhanced images (the first sample image and the second sample image), so that the situation that when the image characteristics among different enhanced images are overlapped more due to more times of enhancing the same original sample image is avoided, and the diversity of the sample images is reduced.
The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
To facilitate understanding of the present embodiment, first, a training method of a neural network disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the training method of the neural network provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and the computer device includes, for example, a server or other processing devices, and in some possible implementations, the training method of the neural network may be implemented by a processor calling computer readable instructions stored in a memory.
The following describes a training method of a neural network provided by an embodiment of the present disclosure by taking an execution subject as a server.
First, it is described that the neural network training method provided in the embodiments of the present disclosure is a method for implementing a training network based on self-supervision contrast learning, and the network obtained through the contrast learning training has a strong expression capability, and generally requires only a small amount of labeled data for fine tuning, so that very excellent performance can be obtained, and the network can be used as a downstream task of a computer vision task, such as classification, segmentation, detection, and the like.
Wherein the network based on the self-supervised learning can learn by itself from the image without the label to generate the label for the image.
Contrast learning can simultaneously maximize the consistency between different enhanced images of the same original sample image and minimize the consistency between enhanced images of different original sample images. Therefore, based on the network obtained by the self-supervision contrast learning training, after the same unlabeled original sample image is enhanced for multiple times, different enhancements can be identified from the same original sample image, and therefore the similarity of different enhancement images can be maximized. If after several enhancements of different unlabeled original sample images, different enhancements from different original sample images can be identified, the similarity of the different enhanced images can be minimized.
Referring to fig. 1, a flowchart of a training method of a neural network provided in an embodiment of the present disclosure is shown, where the method includes S101 to S105, where:
s101: determining a first image feature of each first sample image in a plurality of first sample groups and a second image feature of a second sample image corresponding to each first sample image in a second sample group corresponding to the first sample group by using a target network to be trained; wherein the first sample image and the corresponding second sample image are different enhanced images of the same original sample image.
In the embodiment of the present disclosure, the original sample image is an unlabeled sample image, and the first sample image and the second sample image are enhanced images obtained by image enhancement processing on the basis of the original sample image. Image enhancement is an image processing method that adds some information or transform data to an original sample image by a certain means, selectively emphasizes interesting features in the original sample image or suppresses (masks) some unwanted features in the original sample image. The image enhancement processing may include flipping, cropping, color transformation, changing resolution, and the like, but is not limited to the above list, and any possible enhancement processing is within the scope of the present disclosure.
The first sample image and the second sample image are enhanced images from the same original sample image, and as shown in the effect schematic diagram of the sample image shown in fig. 2, the sample image includes three original sample images, each original sample image corresponds to one first sample image and one second sample image, the first sample image and the second sample image corresponding to the same original sample image are obtained by performing image enhancement processing on the original sample image, and the first sample image and the second sample image both include partial image features in the original sample image, and it can be seen from fig. 2 that the first sample image and the second sample image are obtained by different image enhancement processing, and thus the first sample image and the second sample image may be different enhanced images.
In some possible embodiments, image enhancement processing may be performed on each original sample image for multiple times to obtain multiple different sample images, so as to increase the number of sample images and improve the accuracy of training samples. However, considering that when the original sample image is subjected to image enhancement for a larger number of times, there may be a larger overlap of image features between different enhanced images, and no more image features are provided, which affects the diversity of the sample image, in the embodiment of the present disclosure, two enhanced images, namely, the first sample image and the second sample image, are obtained by performing only two kinds of image enhancement processes on the original sample image.
In a specific embodiment, two enhanced images of the same original sample image may be divided into two corresponding sample groups to obtain a first sample group and a second sample group, where the first sample group includes any one of the two sample images corresponding to the same original sample image, and the second sample group includes the remaining one of the two sample images corresponding to the same original sample image.
In a further embodiment, when there are many original sample images, the original sample image may be divided into a plurality of original sample image groups, and then for each original sample image group, two sample images of each original sample image in the original sample image group are divided into two corresponding sample groups to form a sample group pair corresponding to each original sample image group, so that a plurality of sample group pairs corresponding to the plurality of original sample image groups may be obtained. And aiming at a first sample group and a second sample group in each sample group pair, the first sample group comprises any one of two sample images corresponding to the same original sample image, and the second sample group comprises the other residual one of the two sample images corresponding to the same original sample image.
The target network to be trained refers to a convolutional neural network to be trained based on contrast learning, and the target network to be trained can perform feature extraction based on an input sample image to obtain image features corresponding to the sample image. Specifically, each first sample image may be input into a target network to be trained, so as to obtain a first image feature of each first sample image; and inputting each second sample image into a target network to be trained to obtain a second image characteristic of each second sample image.
In some embodiments, the target network to be trained may be a neural network for image classification, which may include a feature extraction module for extracting image features and a classifier for classifying based on the extracted image features. The first sample image and the second sample image may be input to a target network to be trained, and a first image feature and a second image feature output by a feature extraction module in the target network to be trained are obtained.
In the network training process, after the first sample group and the second sample group are obtained, the first sample group and the second sample group may be input into a target network to be trained, and an image feature corresponding to each sample image in each sample image group is obtained.
In the embodiment of the present disclosure, an original sample image group including a plurality of original sample images may be defined as { X }, a first sample image corresponding to each original sample image is X1, and a second sample image corresponding to each first sample image X1 is X2, then each original sample group { X } corresponds to one first sample group { X1}, and one second sample group { X2}, as in the flowchart of network training shown in fig. 3, the first sample group and the second sample group may be respectively input into the target network, and a first image feature h 1F (X1) corresponding to each first sample image in the first sample group, and a second image feature h 2F (X2) corresponding to each second sample image in the second sample group may be extracted based on the target network.
S102: and classifying the plurality of first sample images based on the first image characteristics to obtain a first class representation of each first sample image, and classifying the plurality of second sample images based on the second image characteristics to obtain a second class representation of each second sample image.
In the embodiment of the present disclosure, S102 is used to mainly implement the self-supervised learning between the first sample image and the second sample image, and the embedded vector is used to explore the relationship between different enhanced images.
As described above, the first image feature and the second image feature are image features extracted based on the target network, and in order to learn the relationship between the enhanced images corresponding to different original sample images, the image features extracted by the target network need to be converted into class representations in an embedding space (embedding space), and then used as the supervision information to attract similar images in the embedding space. As shown in fig. 3, the head structure Φ may be used to obtain the class representations in the embedding space, the first image feature h1 corresponding to each first sample image and the second image feature h2 corresponding to each second sample image are input to the head structure Φ, and the first class representation V1 ═ Φ (h1) in the embedding space corresponding to each first sample image and the second class representation V2 ═ Φ (h2) in the embedding space corresponding to each second sample image may be obtained.
Next, based on the first class representation in the embedding space, the cosine distance between the first sample images may be calculated, so as to perform classification processing on the multiple first sample images, thereby obtaining first similarity information y1 between each first sample image and other first sample images in the same first sample group. Similarly, based on the second class characterization in the embedding space, the multiple second sample images are classified to obtain second similarity information y2 between each second sample image and other second sample images in the same second sample group.
In one possible implementation, the first similarity information and the second similarity information may be determined in a connectivity graph manner.
Specifically, two first sample images with similarity greater than a first preset threshold may be connected based on a first class representation in an embedding space, so as to generate at least one first connection subgraph. Here, the cosine similarity between the first class representations may be calculated, the distance between the two first sample images may be determined, and it may be considered that when the first sample image is similar to the second first sample image, and the second first sample image is similar to the third first sample image, then the first sample image is similar to the third first sample image, i.e., after constructing the connection between the first sample image and the second first sample image, and the connection between the second first sample image and the third first sample image, the connection between the first sample image and the third first sample image is constructed. Thus, in one first connection diagram, any two first sample images are connected to each other.
When the plurality of first sample images generate at least one first connection subgraph, for each first connection subgraph, determining similarity information corresponding to the first connection subgraph based on a first class representation corresponding to the first sample image in the first connection subgraph; and then, determining first similarity information between every two first sample images in the first sample group based on the similarity information corresponding to each first connection sub-image. Wherein any pair of first sample images in each first-communication sub-graph are similar, but the first sample images between different first-communication sub-graphs are dissimilar.
Similarly, two second sample images with similarity greater than a preset threshold value can be connected based on a second class representation in the embedding space, and at least one second connected subgraph is generated. Here, the cosine similarity between the second class representations may be calculated, the distance between the two second sample images may be determined, and it may be considered that when the first second sample image is similar to the second sample image, and the second sample image is similar to the third second sample image, the first second sample image is similar to the third second sample image. That is, after constructing the connection between the first second sample image and the second sample image, and the connection between the second sample image and the third second sample image, the connection between the first second sample image and the third second sample image is constructed. Thus, within one second connected subgraph, any two second sample images are connected with each other.
When the plurality of second sample images generate at least one second connected subgraph, for each second connected subgraph, determining similarity information corresponding to the second connected subgraph based on a second class representation corresponding to the second sample image in the second connected subgraph; and then, determining second similarity information between every two second sample images in the second sample group based on the similarity information corresponding to each second connected subgraph. Wherein any pair of second sample images in each second connected subgraph is similar, but the second sample images between different second connected subgraphs are not.
As shown in the schematic diagram of the connectivity graph shown in fig. 4, containing the first sample images a-J, three connectivity sub-graphs are formed, wherein in the connected subgraph A-B-C-D-E, B, C, E is the first sample image with similarity to A greater than the first preset threshold, and D is a first sample image with a similarity greater than a first preset threshold with C, according to said when the first sample image is similar to the second first sample image, the second first sample image is similar to the third first sample image, then the first sample image and the third first sample image are similar, then A, B, C, D, E are the first sample images with similarity greater than the first preset threshold, so that a connected subgraph A, B, C, D, E can be formed. Each connected subgraph comprises a plurality of similar first sample images, and the first class representation of each first sample image in the connected subgraph can be determined quickly and accurately in a connected subgraph mode, so that the training efficiency and accuracy can be improved.
S103: first similarity information between every two first sample images in the first sample group is determined based on the first category characterization of each first sample image.
S104: and determining second similarity information between every two second sample images in the second sample group based on the second class characterization of each second sample image.
In S103 and S104, the cosine similarity may be used to respectively calculate first similarity information between two first sample images and second similarity information between two second sample images.
Specifically, the first similarity information may be represented as 1 or 0, where 1 represents similarity between two sample images, and 0 represents dissimilarity between two sample images.
S105: and taking the first similarity information as supervision information of the second sample group, taking the second similarity information as supervision information of the first sample group, and training the target network to be trained.
In the embodiment of the present disclosure, the first similarity information is used as the supervision information of the second sample group, the second similarity information is used as the supervision information of the first sample group, and the parameter of the target network is adjusted, so that after the parameter is adjusted, the similarity information between the second sample images in the second sample group obtained based on the prediction result of the target network on the second sample group is closer to the first similarity information, and the similarity information between the first sample images in the first sample group obtained based on the prediction result of the target network on the first sample group is closer to the second similarity information.
Alternatively, the first training loss may be determined based on the first class characteristics and the second similarity information of each of the first sample images, and the second training loss may be determined based on the second class characteristics and the first similarity information of each of the second sample images.
Based on the determined first training loss and second training loss, a sum of training losses of the first training loss and the second training loss for training the target network to be trained may be obtained, that is: l isswap=Lsup(V1,y2)+Lsup(V2,y1) Wherein L issup(V1,y2) For the first training loss, Lsup(V2,y1) Is lost for the second training, and wherein vi, vj, vk respectively and sequentially represent elements corresponding to the ith first sample image, the jth first sample image, and the kth first sample image of the first sample group in V1, or respectively represent elements corresponding to the ith second sample image, the jth second sample image, and the kth second sample image of the second sample group in V2, τ is an enhancement function, N is the number of the first sample images or the second sample images, and based on the obtained number until the training is completed, a trained target network can be obtained.
In the embodiment of the present disclosure, the head structure phi is used as an auxiliary head structure of a single head structure in the prior art, on one hand, based on the first image feature of the first sample image and the second image feature of the second sample image, similar sample clustering can be performed on enhanced images from different original sample images, so that the classification accuracy can be effectively improved, and on the other hand, the similarity information of each group of enhanced images is used as the supervision information of other groups of enhanced images, so that self-supervision learning can be performed on a target network, and the training efficiency of the target network is improved. The head structure phi overcomes the defect that each sample image is regarded as a category and then classified when only one head structure is used for example judgment in the prior art, and similar sample images cannot be classified.
To fully explain the training process of the target network, the following describes the training process of the target network with reference to the header structure G and the auxiliary header structure phi for example discrimination.
As in fig. 3, after extracting the first image feature h1 of the first sample image and the second image feature h2 of the second sample image based on the target network, the first image feature h1 and the second image feature h2 may be input into the head structure G, the fourth image feature z1 of the first sample image and the fifth image feature z2 of the second sample image are obtained, and then the third training loss may be determined based on the fourth image feature z1 of each first sample image and the fifth image feature z2 of each second sample image.
For each first sample image, based on the fourth image feature z1 of the first sample image and the fifth image feature z2 of the second sample image corresponding to the first sample image, determining loss information corresponding to the first sample image or loss information corresponding to the second sample image, and based on the loss information corresponding to each first sample image or loss information corresponding to the second sample image, determining a third training loss, namely, a training lossWherein τ is an enhancement function, N is the number of the first sample image or the second sample image, zi and zj are fourth image features of the first sample image and a fourth image feature of the second sample image corresponding to the Nth original sample imageAnd (5) five image characteristics.
Considering that the diversity of the sample images plays a key role in the contrast learning, the effect of the contrast learning is better as the variety of the sample images increases. In a possible implementation manner, a plurality of third sample images corresponding to each original sample image may be acquired respectively; and then, taking each third sample image in the plurality of third sample images as the first sample image and the second sample image simultaneously to train the target network.
Here, the process of obtaining the third sample image may be to directly perform enhancement processing on the original sample image. In a specific implementation process, a preset number of original sample images may be subjected to enhancement processing once respectively to obtain a plurality of third sample images, and the enhancement modes of different third sample images may also be different. In particular implementations, the third sample image is enhanced in a different manner than the first and second sample images are generated. Specifically, a first sample image may be obtained by performing a first enhancement process (for example, increasing the contrast) on the original sample image, a second sample image may be obtained by performing a second enhancement process (the second enhancement process may be the same as the first enhancement process, and here, the contrast may also be increased) on the original sample image, and a third sample image may be obtained by performing a third enhancement process (for example, decreasing the contrast) on the original sample image.
In the training process, the third sample image can be added into the first sample group to form a first sample image, and the target network is trained; and simultaneously adding the third sample image into the second sample group to form a second sample image, and training the target network.
In a possible implementation manner, for each original sample image, determining a target image feature corresponding to each other original sample image except the original sample image, where the target image feature corresponding to the other original sample image is a first image feature or a second image feature corresponding to the other original sample image; respectively determining the similarity between the first image characteristic or the second image characteristic of the original sample image and the target image characteristic corresponding to each other original sample image; and acquiring a plurality of third sample images corresponding to the original sample image based on the determined similarity.
As already explained above, for each original sample image, after enhancement, a first sample image and a second sample image corresponding to the first sample image are obtained, where a target image feature corresponding to each original sample image can be determined, where the target image feature is a first image feature or a second image feature corresponding to each original sample image except the original sample image, that is, for each original sample image except the original sample image, only the first image feature or the second image feature is selected as the target image feature corresponding to the original sample image, where it is mainly considered that the first sample image and the corresponding second sample image are from the same original sample image, and the similarity between the first image feature corresponding to the first sample image and the second image feature corresponding to the second sample image is high, and it is not necessary to generate a third sample image based on the first sample image and the corresponding second sample image at the same time, the first image feature or the second image feature is therefore selected here as the dataset for selecting the third sample image.
Specifically, the first image feature or the second image feature of the original sample image and the similarity between the target image features corresponding to each of the other original sample images may be respectively determined, then the first sample image or the second sample image corresponding to the target image features obtained by screening may be based on the sequence of the similarity from high to low, and then the image enhancement processing may be performed on the first sample image or the second sample image obtained by screening to obtain a plurality of corresponding third sample images.
Considering that the accuracy of extracting the image features of the target network and the target head structure is not high when the training times are few, the acquisition mode of the third sample image can be selected according to the current iteration times and the preset iteration times.
In a possible embodiment, a current first iteration number may first be determined; and then respectively acquiring a plurality of third sample images generated based on each original sample image under the condition that the first iteration times are less than the preset iteration times. That is, when the current iteration number is small, the third sample image can be obtained by generating the third sample image from the original sample image, and in the process, the third sample image can be obtained without using a target network and a target head structure to extract image features, so that the accuracy of the image features is not affected.
In a possible embodiment, the current second iteration number is first determined; and then determining the target image characteristics corresponding to each other original sample image except the original sample image aiming at each original sample image under the condition that the second iteration times is greater than the preset iteration times. That is, when the current iteration number is large, the image enhancement processing may be performed on the first sample image or the second sample image corresponding to the target image feature obtained by the screening, through the target image feature corresponding to each other original sample image except for each original sample image, so as to generate the third sample image. The second iteration times are larger than the preset iteration times, so that the current target network and the target head structure have stronger capability of extracting image features, and the accuracy of the extracted image features is higher.
In a specific implementation process, when the current first iteration number is less than the first 25% of the preset total iteration number, a plurality of third sample images generated based on each original sample image may be obtained, and then when the current second iteration number is greater than the last 75% of the preset total iteration number, image enhancement processing may be performed on the first sample image or the second sample image corresponding to the target image feature obtained by screening, so as to generate a third sample image.
After the plurality of third sample images are obtained, determining a third image feature of each third sample image by using a target network to be trained, and classifying the plurality of third sample images based on the third image features to obtain a third class representation of each third sample image; for each third sample image, respectively determining second loss information between each first sample image and the third sample image based on the third class characterization of the third sample image and the first class characterization of each first sample image; for each third sample image, respectively determining third loss information between each second sample image and the third sample image based on the third class characterization of the third sample image and the second class characterization of each second sample image; determining a fourth training loss based on the determined second loss information and the third loss information; and training the target network to be trained based on the first training loss, the second training loss, the third training loss and the fourth training loss.
The third image feature may be an image feature extracted by the target network. Then, a third image feature may be input into the head structure phi to obtain a third class representation of the third sample image, then, based on the third class representation of the third sample image and the first class representation of each first sample image, fourth similarity information of the third sample image and each first sample image may be calculated, and then, second loss information between each first sample image and the third sample image may be determined according to the fourth similarity information and the first similarity information. And calculating fifth similarity information of the third sample image and each second sample image according to the third class characterization of the third sample image and the second class characterization of each second sample image, and then determining third loss information between each second sample image and the third sample image according to the fifth similarity information and the second similarity information.
Next, a fourth training loss, which may be expressed as L herein, may be determined by weighted summation of second loss information determined based on second loss information between each first sample image and each third sample image, and third loss information between each second sample image and each third sample imagecswap. The fourth training loss is the training loss between similar samples obtained using the head structure phi after the third sample image is added.
Further, fourth loss information between each first sample image and each third sample image can be respectively determined according to the third image characteristics of the third sample image and the first image characteristics of each first sample image; for each third sample image, determining fifth loss information between each second sample image and the third sample image based on third image features of the third sample image and second image features of each second sample image respectively; determining a fifth training loss based on the determined fourth loss information and the fifth loss information; and training the target network to be trained based on the first training loss, the second training loss, the third training loss, the fourth training loss and the fifth training loss.
In the above process, the third image feature may be input into the head structure G to obtain a sixth image feature of the third sample image, and then, for each third sample image, fourth loss information between each first sample image and the third sample image is respectively determined based on the sixth image feature of the third sample image and a seventh image feature of the first image feature of each first sample image after being processed by the head structure G; and for each third sample image, determining fifth loss information between each second sample image and the third sample image respectively based on the sixth image feature of the third sample image and the eighth image feature of each second sample image after the head structure G processing.
Wherein, the determining the fourth loss information between each of the first sample images and the third sample image based on the sixth image feature of the third sample image and the seventh image feature of each of the first sample images, respectively, may be calculating similarity information between the third sample image and each of the first sample images based on the sixth image feature of the third sample image and the seventh image feature of each of the first sample images, and then determining the fourth loss information between each of the first sample images and the third sample image based on the similarity information, respectively. The process for the fifth loss information is similar and will not be described here.
Next, the image may be based on the difference between each of the first sample image and the third sample imageThe fourth loss information is weighted and summed with fifth loss information between each of the second sample images and the third sample image to determine a fifth training loss, which may be denoted as L hereincNCE。
Finally, a total loss L can be obtained from the first training loss, the second training loss, the third training loss, the fourth training loss, and the fifth training lossoverall=LNCE+λLcNCE+βLswap+γLcswapWherein L isswapIs the sum of the first training loss and the second training loss, LNCEFor the third training loss, LcswapFor the fourth training loss, LcNCEFor the fifth training loss, λ, β, γ are weight hyperparameters.
The trained target network can cluster similar sample images without taking each sample image as one type, so that the classification accuracy is improved, for example, in the effect graph shown in fig. 5 after the sample images are clustered, the sample images containing airplanes are clustered into one type, the sample images containing parrots are clustered into one type, the sample images containing cats are clustered into one type, and the sample images containing giraffes are clustered into one type.
The method and the device cluster the sample images, and use the first similarity information and the second similarity information as the supervision information, so that on one hand, classification is performed according to similar samples, and classification accuracy can be effectively improved compared with the classification of each image as a category, on the other hand, the similarity information of each group of enhanced images is used as the supervision information of other groups of enhanced images, self-supervision learning can be performed on a target network, and the training efficiency of the target network is improved; meanwhile, the same original sample image is enhanced twice to obtain two corresponding enhanced images (the first sample image and the second sample image), so that the situation that when the image characteristics among different enhanced images are overlapped more due to more times of enhancing the same original sample image is avoided, and the diversity of the sample images is reduced.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, the embodiment of the present disclosure further provides a training apparatus for a neural network corresponding to the training method for the neural network, and since the principle of the apparatus in the embodiment of the present disclosure for solving the problem is similar to the training method for the neural network described above in the embodiment of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 6, there is shown a schematic architecture diagram of a training apparatus for a neural network according to an embodiment of the present disclosure, the apparatus includes: a first determination module 601, a classification module 602, a second determination module 603, a third determination module 604, and a training module 605; wherein,
a first determining module 601, configured to determine, by using a target network to be trained, a first image feature of each first sample image in a first sample group, and a second image feature of a second sample image corresponding to each first sample image in a second sample group corresponding to the first sample group; the first sample image and the corresponding second sample image are different enhanced images of the same original sample image;
a classification module 602, configured to perform classification processing on the multiple first sample images based on the first image feature to obtain a first class representation of each first sample image, and perform classification processing on the multiple second sample images based on the second image feature to obtain a second class representation of each second sample image;
a second determining module 603, configured to determine, based on the first class characterization of each first sample image, first similarity information between every two first sample images in the first sample group;
a third determining module 604, configured to determine second similarity information between every two second sample images in the second sample group based on the second class characterization of each second sample image;
a training module 605, configured to use the first similarity information as the supervision information of the second sample group, use the second similarity information as the supervision information of the first sample group, and train the target network to be trained.
The method and the device have the advantages that the sample images are clustered, the first similarity information and the second similarity information are used as supervision information, on one hand, classification is carried out according to similar samples, classification accuracy can be effectively improved compared with the classification of each image as a category, on the other hand, the similarity information of each group of enhanced images is used as the supervision information of other groups of enhanced images, self-supervision learning can be carried out on a target network, and the training efficiency of the target network is improved; meanwhile, the same original sample image is enhanced twice to obtain two corresponding enhanced images (the first sample image and the second sample image), so that the situation that when the image characteristics among different enhanced images are overlapped more due to more times of enhancing the same original sample image is avoided, and the diversity of the sample images is reduced.
In a possible implementation, the training module 605 is specifically configured to: determining a first training loss based on the first class characteristics of each first sample image and the second similarity information;
determining a second training loss based on a second class characteristic of each second sample image and the first similarity information;
training the target network to be trained based on the first training loss and the second training loss.
In a possible embodiment, the method further comprises:
a fourth determining module, configured to determine a third training loss based on the first image feature of each first sample image and the corresponding second image feature of each second sample image;
the training module 605 is specifically configured to: training the target network to be trained based on the first training loss, the second training loss, and the third training loss.
In a possible implementation manner, the fourth determining module is specifically configured to: for each first sample image, determining loss information corresponding to the first sample image based on first image features of the first sample image and second image features of a second sample image corresponding to the first sample image;
and determining a third training loss based on the loss information corresponding to each first sample image.
In a possible implementation, the classification module 602 is specifically configured to:
based on the first class representation of each first sample image, connecting two first sample images with similarity greater than a first preset threshold value to generate at least one first connection graph;
for each first connection diagram, determining similarity information corresponding to the first connection diagram based on first class representations corresponding to all first sample images in the first connection diagram;
determining first similarity information between every two first sample images in the first sample group based on the similarity information corresponding to each first connection graph;
the determining second similarity information between every two second sample images in the second sample group based on the second class characterization of each second sample image comprises:
based on the second category characterization of each second sample image, connecting two second sample images with similarity greater than a second preset threshold value to generate at least one second connected graph;
for each second connected graph, determining similarity information corresponding to the second connected graph based on second class representations corresponding to all second sample images in the second connected graph;
and determining second similarity information between every two second sample images in the second sample group based on the similarity information corresponding to each second connected graph.
In a possible embodiment, the method further comprises:
the acquisition module is used for respectively acquiring a plurality of third sample images corresponding to each original sample image; wherein the third sample image is an enhanced image of the corresponding original sample image;
a training module, configured to train the target network by using each of the plurality of third sample images as a first sample image and a second sample image at the same time;
in a possible implementation manner, the obtaining module is specifically configured to:
determining a current first iteration number;
and respectively acquiring a plurality of third sample images generated based on each original sample image under the condition that the first iteration times are less than the preset iteration times.
In a possible implementation manner, the obtaining module is specifically configured to:
determining target image features corresponding to other original sample images except the original sample image for each original sample image, wherein the target image features corresponding to the other original sample images are first image features or second image features corresponding to the other original sample images;
respectively determining first image characteristics or second image characteristics of the original sample image and third similarity information between target image characteristics corresponding to each other original sample image;
and acquiring a plurality of third sample images corresponding to the original sample image based on the determined third similarity information.
In a possible implementation manner, the obtaining module is specifically configured to:
screening the target image characteristics of which the similarity accords with preset conditions;
and generating a plurality of third sample images based on the first sample image or the second sample image corresponding to the target image features obtained by screening.
In a possible implementation manner, the obtaining module is specifically configured to:
determining a current second iteration number;
and under the condition that the second iteration times are larger than the preset iteration times, determining the target image characteristics corresponding to each other original sample image except the original sample image aiming at each original sample image.
In a possible embodiment, the method further comprises:
a fifth determining module, configured to determine, by using a target network to be trained, a third image feature of each third sample image;
the training module is specifically configured to perform classification processing on the plurality of third sample images based on the third image features to obtain a third class representation of each third sample image;
for each third sample image, respectively determining second loss information between each first sample image and the third sample image based on the third class characterization of the third sample image and the first class characterization of each first sample image;
for each third sample image, determining third loss information between each second sample image and the third sample image based on the third class characterization of the third sample image and the second class characterization of each second sample image, respectively;
determining a fourth training loss based on the determined second loss information and the third loss information;
training the target network to be trained based on the first training loss, the second training loss, the third training loss, and the fourth training loss.
In a possible embodiment, the training module is specifically configured to determine, for each third sample image, fourth loss information between each first sample image and the third sample image based on third image features of the third sample image and first image features of each first sample image, respectively;
for each third sample image, determining fifth loss information between each second sample image and the third sample image based on third image features of the third sample image and second image features of each second sample image, respectively;
determining a fifth training loss based on the determined fourth loss information and the fifth loss information;
training the target network to be trained based on the first training loss, the second training loss, the third training loss, the fourth training loss, and the fifth training loss.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
Based on the same technical concept, the embodiment of the disclosure also provides computer equipment. Referring to fig. 7, a schematic structural diagram of a computer device 700 provided in the embodiment of the present disclosure includes a processor 701, a memory 702, and a bus 703. The memory 702 is used for storing execution instructions and includes a memory 7021 and an external memory 7022; the memory 7021 is also referred to as an internal memory, and is used to temporarily store operation data in the processor 701 and data exchanged with an external memory 7022 such as a hard disk, the processor 701 exchanges data with the external memory 7022 through the memory 7021, and when the computer apparatus 700 is operated, the processor 701 communicates with the memory 702 through the bus 703, so that the processor 701 executes the following instructions:
determining a first image feature of each first sample image in a first sample group and a second image feature of a second sample image corresponding to each first sample image in a second sample group corresponding to the first sample group by using a target network to be trained; the first sample image and the corresponding second sample image are different enhanced images of the same original sample image;
classifying the plurality of first sample images based on the first image characteristics to obtain a first class representation of each first sample image, and classifying the plurality of second sample images based on the second image characteristics to obtain a second class representation of each second sample image;
determining first similarity information between every two first sample images in the first sample group based on the first class characterization of each first sample image;
determining second similarity information between every two second sample images in the second sample group based on the second class characterization of each second sample image;
and taking the first similarity information as supervision information of the second sample group, taking the second similarity information as supervision information of the first sample group, and training the target network to be trained.
The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the training method for a neural network described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The embodiments of the present disclosure also provide a computer program product, where the computer program product carries a program code, and instructions included in the program code may be used to execute the steps of the neural network training method described in the foregoing method embodiments, which may be referred to specifically for the foregoing method embodiments, and are not described herein again.
The computer program product may be implemented by hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.
Claims (15)
1. A method of training a neural network, comprising:
determining a first image feature of each first sample image in a first sample group and a second image feature of a second sample image corresponding to each first sample image in a second sample group corresponding to the first sample group by using a target network to be trained; the first sample image and the corresponding second sample image are different enhanced images of the same original sample image;
classifying the plurality of first sample images based on the first image characteristics to obtain a first class representation of each first sample image, and classifying the plurality of second sample images based on the second image characteristics to obtain a second class representation of each second sample image;
determining first similarity information between every two first sample images in the first sample group based on the first class characterization of each first sample image;
determining second similarity information between every two second sample images in the second sample group based on the second class characterization of each second sample image;
and taking the first similarity information as supervision information of the second sample group, taking the second similarity information as supervision information of the first sample group, and training the target network to be trained.
2. The method according to claim 1, wherein the training the target network to be trained by using the first similarity information as the supervision information of the second sample group and the second similarity information as the supervision information of the first sample group comprises:
determining a first training loss based on the first class characteristics of each first sample image and the second similarity information;
determining a second training loss based on a second class characteristic of each second sample image and the first similarity information;
training the target network to be trained based on the first training loss and the second training loss.
3. The method of claim 2, further comprising:
determining a third training loss based on the first image features of each first sample image and the corresponding second image features of each second sample image;
the training the target network to be trained based on the first training loss and the second training loss includes:
training the target network to be trained based on the first training loss, the second training loss, and the third training loss.
4. The method of claim 3, wherein determining a third training loss based on the first image features of each first sample image and the corresponding second image features of each second sample image comprises:
for each first sample image, determining loss information corresponding to the first sample image based on first image features of the first sample image and second image features of a second sample image corresponding to the first sample image;
and determining a third training loss based on the loss information corresponding to each first sample image.
5. The method according to any one of claims 1 to 4, wherein the determining of the first similarity information between each two first sample images in the first sample group based on the first class characterization of each first sample image comprises:
based on the first class representation of each first sample image, connecting two first sample images with similarity greater than a first preset threshold value to generate at least one first connection graph;
for each first connection diagram, determining similarity information corresponding to the first connection diagram based on first class representations corresponding to all first sample images in the first connection diagram;
determining first similarity information between every two first sample images in the first sample group based on the similarity information corresponding to each first connection graph;
the determining second similarity information between every two second sample images in the second sample group based on the second class characterization of each second sample image comprises:
based on the second category characterization of each second sample image, connecting two second sample images with similarity greater than a second preset threshold value to generate at least one second connected graph;
for each second connected graph, determining similarity information corresponding to the second connected graph based on second class representations corresponding to all second sample images in the second connected graph;
and determining second similarity information between every two second sample images in the second sample group based on the similarity information corresponding to each second connected graph.
6. The method of claim 3, further comprising:
respectively obtaining a plurality of third sample images corresponding to each original sample image; wherein the third sample image is an enhanced image of the corresponding original sample image;
and simultaneously using each third sample image in the third sample images as a first sample image and a second sample image to train the target network.
7. The method according to claim 6, wherein the separately acquiring a plurality of third sample images corresponding to each original sample image comprises:
determining a current first iteration number;
and respectively acquiring a plurality of third sample images generated based on each original sample image under the condition that the first iteration times are less than the preset iteration times.
8. The method according to claim 6, wherein the separately acquiring a plurality of third sample images corresponding to each original sample image comprises:
determining target image features corresponding to other original sample images except the original sample image for each original sample image, wherein the target image features corresponding to the other original sample images are first image features or second image features corresponding to the other original sample images;
respectively determining first image characteristics or second image characteristics of the original sample image and third similarity information between target image characteristics corresponding to each other original sample image;
and acquiring a plurality of third sample images corresponding to the original sample image based on the determined third similarity information.
9. The method according to claim 8, wherein the obtaining a plurality of third sample images corresponding to the original sample image based on the determined third similarity information comprises:
screening the target image characteristics of which the similarity accords with preset conditions;
and generating a plurality of third sample images based on the first sample image or the second sample image corresponding to the target image features obtained by screening.
10. The method of claim 8, wherein the determining, for each original sample image, the target image feature corresponding to each other original sample image except the original sample image comprises:
determining a current second iteration number;
and under the condition that the second iteration times are larger than the preset iteration times, determining the target image characteristics corresponding to each other original sample image except the original sample image aiming at each original sample image.
11. The method of any one of claims 6 to 10, further comprising, after acquiring the plurality of third sample images:
determining a third image characteristic of each third sample image by using a target network to be trained;
the training the target network with each of the third sample images as a first sample image and a second sample image at the same time includes:
classifying the plurality of third sample images based on the third image characteristics to obtain a third class representation of each third sample image;
for each third sample image, respectively determining second loss information between each first sample image and the third sample image based on the third class characterization of the third sample image and the first class characterization of each first sample image;
for each third sample image, determining third loss information between each second sample image and the third sample image based on the third class characterization of the third sample image and the second class characterization of each second sample image, respectively;
determining a fourth training loss based on the determined second loss information and the third loss information;
training the target network to be trained based on the first training loss, the second training loss, the third training loss, and the fourth training loss.
12. The method of claim 11, wherein training the target network to be trained based on the first training loss, the second training loss, the third training loss, and the fourth training loss comprises:
for each third sample image, determining fourth loss information between each first sample image and the third sample image based on third image features of the third sample image and first image features of each first sample image, respectively;
for each third sample image, determining fifth loss information between each second sample image and the third sample image based on third image features of the third sample image and second image features of each second sample image, respectively;
determining a fifth training loss based on the determined fourth loss information and the fifth loss information;
training the target network to be trained based on the first training loss, the second training loss, the third training loss, the fourth training loss, and the fifth training loss.
13. An apparatus for training a neural network, comprising:
the first determining module is used for determining a first image feature of each first sample image in a first sample group and a second image feature of a second sample image corresponding to each first sample image in a second sample group corresponding to the first sample group by using a target network to be trained; the first sample image and the corresponding second sample image are different enhanced images of the same original sample image;
the classification module is used for classifying the plurality of first sample images based on the first image characteristics to obtain a first class representation of each first sample image, and classifying the plurality of second sample images based on the second image characteristics to obtain a second class representation of each second sample image;
the second determining module is used for determining first similarity information between every two first sample images in the first sample group based on the first class representation of each first sample image;
a third determining module, configured to determine second similarity information between every two second sample images in the second sample group based on a second class characterization of each second sample image;
and the training module is used for taking the first similarity information as the supervision information of the second sample group and taking the second similarity information as the supervision information of the first sample group to train the target network to be trained.
14. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when a computer device is run, the machine-readable instructions when executed by the processor performing the steps of the method of training a neural network of any one of claims 1 to 12.
15. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the method of training a neural network according to any one of claims 1 to 12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110696473.XA CN113344189B (en) | 2021-06-23 | 2021-06-23 | Neural network training method and device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110696473.XA CN113344189B (en) | 2021-06-23 | 2021-06-23 | Neural network training method and device, computer equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113344189A true CN113344189A (en) | 2021-09-03 |
CN113344189B CN113344189B (en) | 2022-10-18 |
Family
ID=77477939
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110696473.XA Active CN113344189B (en) | 2021-06-23 | 2021-06-23 | Neural network training method and device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113344189B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113792821A (en) * | 2021-11-15 | 2021-12-14 | 北京爱笔科技有限公司 | Model training method and device for extracting human skeleton features |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111950633A (en) * | 2020-08-12 | 2020-11-17 | 深圳市商汤科技有限公司 | Neural network training method, neural network target detection method, neural network training device, neural network target detection device and storage medium |
US20210089824A1 (en) * | 2019-09-24 | 2021-03-25 | Beijing Sensetime Technology Development Co., Ltd. | Image processing method, image processing device, and storage medium |
-
2021
- 2021-06-23 CN CN202110696473.XA patent/CN113344189B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210089824A1 (en) * | 2019-09-24 | 2021-03-25 | Beijing Sensetime Technology Development Co., Ltd. | Image processing method, image processing device, and storage medium |
CN111950633A (en) * | 2020-08-12 | 2020-11-17 | 深圳市商汤科技有限公司 | Neural network training method, neural network target detection method, neural network training device, neural network target detection device and storage medium |
Non-Patent Citations (2)
Title |
---|
SHAN YOU ET AL.: "GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet", 《 IEEE XPLORE》 * |
徐传运等: "基于深度度量学习的小样本商品图像分类研究", 《重庆理工大学学报(自然科学)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113792821A (en) * | 2021-11-15 | 2021-12-14 | 北京爱笔科技有限公司 | Model training method and device for extracting human skeleton features |
CN113792821B (en) * | 2021-11-15 | 2022-02-15 | 北京爱笔科技有限公司 | Model training method and device for extracting human skeleton features |
Also Published As
Publication number | Publication date |
---|---|
CN113344189B (en) | 2022-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hamilton et al. | Unsupervised semantic segmentation by distilling feature correspondences | |
Krizhevsky et al. | ImageNet classification with deep convolutional neural networks | |
Mao et al. | Deep residual pooling network for texture recognition | |
Byeon et al. | Scene labeling with lstm recurrent neural networks | |
Tao et al. | Principal component 2-D long short-term memory for font recognition on single Chinese characters | |
Van der Maaten et al. | Visualizing data using t-SNE. | |
Donahue et al. | Decaf: A deep convolutional activation feature for generic visual recognition | |
Huang et al. | Multiple attention Siamese network for high-resolution image change detection | |
Roy et al. | Revisiting deep hyperspectral feature extraction networks via gradient centralized convolution | |
CN108960260B (en) | Classification model generation method, medical image classification method and medical image classification device | |
JP2012038244A (en) | Learning model creation program, image identification information giving program, learning model creation device, image identification information giving device | |
Saha et al. | A Lightning fast approach to classify Bangla Handwritten Characters and Numerals using newly structured Deep Neural Network | |
Hidru et al. | EquiNMF: Graph regularized multiview nonnegative matrix factorization | |
JP6107531B2 (en) | Feature extraction program and information processing apparatus | |
CN113569895A (en) | Image processing model training method, processing method, device, equipment and medium | |
Mathur et al. | FishResNet: Automatic fish classification approach in underwater scenario | |
Liang et al. | CEModule: A computation efficient module for lightweight convolutional neural networks | |
CN114037056A (en) | Method and device for generating neural network, computer equipment and storage medium | |
Lin et al. | Two stream active query suggestion for active learning in connectomics | |
Zhou et al. | Attention transfer network for nature image matting | |
CN113344189B (en) | Neural network training method and device, computer equipment and storage medium | |
Everett et al. | Protocaps: A fast and non-iterative capsule network routing method | |
Dewan et al. | A system for offline character recognition using auto-encoder networks | |
CN109614581B (en) | Non-negative matrix factorization clustering method based on dual local learning | |
Dong et al. | A supervised dictionary learning and discriminative weighting model for action recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |