WO2022027987A1 - 一种图像识别模型的训练方法及图像识别方法 - Google Patents
一种图像识别模型的训练方法及图像识别方法 Download PDFInfo
- Publication number
- WO2022027987A1 WO2022027987A1 PCT/CN2021/084760 CN2021084760W WO2022027987A1 WO 2022027987 A1 WO2022027987 A1 WO 2022027987A1 CN 2021084760 W CN2021084760 W CN 2021084760W WO 2022027987 A1 WO2022027987 A1 WO 2022027987A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- neural network
- convolutional neural
- picture
- loss function
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 83
- 238000012549 training Methods 0.000 title claims abstract description 51
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 162
- 239000011159 matrix material Substances 0.000 claims abstract description 159
- 238000004821 distillation Methods 0.000 claims abstract description 60
- 238000005457 optimization Methods 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 98
- 239000013598 vector Substances 0.000 claims description 73
- 238000004590 computer program Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 14
- 239000000284 extract Substances 0.000 claims description 12
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000014509 gene expression Effects 0.000 claims description 7
- 238000004422 calculation algorithm Methods 0.000 abstract description 31
- 230000011218 segmentation Effects 0.000 abstract description 5
- 238000002372 labelling Methods 0.000 abstract 1
- 230000002123 temporal effect Effects 0.000 abstract 1
- 238000013140 knowledge distillation Methods 0.000 description 17
- 238000000605 extraction Methods 0.000 description 14
- 238000013145 classification model Methods 0.000 description 10
- 230000006378 damage Effects 0.000 description 8
- 230000008521 reorganization Effects 0.000 description 7
- 230000008859 change Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 241000251468 Actinopterygii Species 0.000 description 2
- 241000220317 Rosa Species 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- the invention relates to the technical field of artificial intelligence, in particular to a training method of an image recognition model and an image recognition method.
- Fine-grained recognition is also called fine-grained recognition. Different from the existing general image analysis tasks, fine-grained image recognition requires more detailed types of recognition and finer granularity of recognition. It is necessary to distinguish more sub-categories in a large category, and there are subtle differences in objects to distinguish and identify.
- fine-grained image recognition has a wide range of application scenarios in life and industry.
- As an image recognition technology it is an indispensable and important technology in the field of artificial intelligence.
- the fine-grained image recognition technology can greatly improve the existing recognition technology and help improve the accuracy of related upper-layer technologies.
- the existing fine-grained classification models can be divided into two categories according to the strength of the supervision information they use: "classification models based on strong supervision information” and "classification models based on weak supervision information”.
- the classification model based on strong supervision information introduces two kinds of additional manual annotation information in the training process, namely the target annotation frame and the key part annotation points.
- the strongly supervised classification model can detect the foreground objects with the help of the target annotation frame, and eliminate the noise interference caused by the background; while the key part annotation points can be used to locate the key points with significant differences in the target, At these key points, the local features of the image can be efficiently extracted. Therefore, through the positioning provided by these two kinds of additional information, the strongly supervised classification model can better extract the object information in a precise place, eliminate the interference caused by the background of the picture and irrelevant information on other objects, and obtain a higher accuracy to achieve better results.
- the classification model based on weakly supervised information does not use any additional manual annotation information, and only relies on the classification labels of pictures and pictures to complete the training and learning of the entire algorithm.
- This type of algorithm does not require a lot of manual input, and is more convenient and concise in practical application scenarios.
- the accuracy of the classification model algorithm based on weak supervision information is not as good as that of the classification model algorithm based on strong supervision information.
- the classification model algorithm based on weakly supervised information has been introduced into convolutional neural network for training, and its accuracy has been greatly improved, and it has gradually become a trend in fine-grained image recognition research.
- the key point of the fine-grained recognition algorithm is how to dig out the subtle differences in the image, that is, the extraction of local features.
- the task of fine-grained recognition is challenging due to the difficulty in finding discriminative features.
- For the fine-grained recognition algorithm of weak supervision type it is impossible to accurately locate the target position and key points with the help of manual annotation information, and can only extract local features on the basis of pictures.
- For a picture there are a lot of local features extracted. How to eliminate erroneous interference features among the many local features and learn useful features is a difficult problem.
- the existing local feature extraction usually uses an enumeration method, which uses different steps or scales to extract the component area in the whole image, and then proposes features for the component area.
- the embodiments of the present invention provide an image recognition model training method and an image recognition method.
- an embodiment of the present invention provides a training method for an image recognition model, including: after recording a first image matrix of a sample image, dividing and scrambled the sample image, so as to obtain the scrambled sample The second image matrix of the picture; inputting the first image matrix into the first convolutional neural network, extracting the first picture feature and obtaining the first picture classification result through the first convolutional neural network; The second image matrix is input into the second convolutional neural network, and the second image feature is extracted and the second image classification result is obtained through the second convolutional neural network; the solution is solved according to the first image feature and the second image feature
- the preset distillation loss function the smaller the distillation loss function is, the closer the first convolutional neural network and the second convolutional neural network are in the feature calculation process; and, according to the first image classification result and The second image classification result is obtained by solving a preset classification loss function, and the smaller the classification loss function is, the closer the first convolutional neural network and the second convolutional neural network are in the feature calculation process
- the dividing and shuffling the sample picture specifically includes: first, dividing the image into a plurality of image blocks; then, first performing the shuffling operation of the image blocks in the row direction, and then performing the column direction The shuffling operation of the image blocks described above; or, the shuffling operation of the image blocks in the column direction is performed first, and then the shuffling operation of the image blocks in the row direction is performed.
- performing the shuffling operation of the image blocks in the row direction includes: for each image block in each row, within a preset first-step length range, according to the value of the first random variable, and The image blocks at the corresponding positions are exchanged in the row direction; the performing the shuffling operation of the image blocks in the column direction includes: for each image block in each column, in a preset second step In the long range, according to the value of the second random variable, the position in the column direction is exchanged with the image block at the corresponding position.
- the solving of the preset distillation loss function according to the first picture feature and the second picture feature includes: according to the first convolutional neural network in the first convolutional neural network layer extracted from two adjacent layers.
- the first picture feature obtains a global flow matrix
- a local flow matrix is obtained according to the second picture feature extracted from two adjacent layers of the convolutional layer in the second convolutional neural network; by calculating the global flow matrix and all The L2 norm distance of the local flow matrix is used to solve the preset distillation loss function.
- F 1 ⁇ R h ⁇ w ⁇ m represents the image feature of the upper c1 layer in the two adjacent layers
- F 2 ⁇ R h ⁇ w ⁇ m represents the image feature of the lower c2 layer in the two adjacent layers
- h, w, m represents the height, width and number of channels of the image feature respectively
- s represents the serial number of the image height feature
- t represents the serial number of the image width feature
- x represents the input image
- W represents the weight parameter of the neural network.
- W global represents the global flow matrix
- W local represents the local flow matrix
- L flow (W global , W local ) represents the distillation loss function obtained from the global flow matrix and the local flow matrix
- ⁇ 1 represents the weight coefficient
- l represents the flow matrix
- the flow matrix includes the global flow matrix and the local flow matrix
- n represents the number of the flow matrix for a picture, wherein, the number of the global flow matrix and the local flow matrix The same
- x represents the input picture
- N represents the number of pictures
- an embodiment of the present invention provides an image recognition method based on the above-mentioned image recognition model, including: after recording a first image matrix of an input picture, dividing and scrambled the input picture, so as to obtain the scrambled image. the second image matrix of the input picture; inputting the first image matrix into the first convolutional neural network, and obtaining the first output vector of the fully connected layer through the first convolutional neural network; The second image matrix is input to the second convolutional neural network, and the second output vector of the fully connected layer is obtained through the second convolutional neural network; according to the first output vector and the second output vector Get the image recognition result.
- obtaining the picture recognition result from the first output vector and the second output vector includes: adding the first output vector and the second output vector to obtain a third output vector, and according to the third output vector vector to obtain the picture recognition result.
- an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the first aspect when the processor executes the computer program or the steps of the method provided in the second aspect.
- an embodiment of the present invention provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the method provided in the first aspect or the second aspect .
- the training method and the image recognition method of the image recognition model provided by the embodiments of the present invention by inputting the image matrix of the original picture and the image matrix of the scrambled picture into two convolutional neural network branches respectively during model training, the two convolutional neural network branches are integrated. Learning and training the features and classification results extracted by a convolutional neural network is conducive to the realization of local feature capture and extraction of more effective features. It does not need any manual annotation information, and can achieve the same performance as strong supervision and fine-grained recognition. Accuracy, and the model can reduce the time and space consumption of the algorithm, and improve the robustness of the system.
- FIG. 1 is a flowchart of a training method for an image recognition model provided by an embodiment of the present invention
- FIG. 2 is a flowchart of a training method for an image recognition model provided by another embodiment of the present invention.
- FIG. 3 is a flowchart of an image recognition method provided by an embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of an image recognition model training device provided by an embodiment of the present invention.
- FIG. 5 is a schematic structural diagram of an image recognition device provided by an embodiment of the present invention.
- FIG. 6 illustrates a schematic diagram of the physical structure of an electronic device.
- FIG. 1 is a flowchart of a training method for an image recognition model provided by an embodiment of the present invention. As shown in Figure 1, the method includes:
- Step 101 After recording the first image matrix of the sample picture, segment and scramble the sample picture, so as to obtain the second image matrix of the scrambled sample picture.
- the picture can be represented by an image matrix, and the elements in the image matrix can be the gray value of each pixel.
- the image recognition model obtained by the image recognition model training method provided in the embodiment of the present invention can realize weakly supervised and fine-grained image recognition.
- Fine local detail feature representation is the key to fine-grained recognition. This is because for fine-grained recognition, local details are more important than global structure, since images from different fine-grained categories usually have the same global structure or shape, but only different local details. Shuffling and reorganizing the image allows the algorithm to discard global structural information and retain local detail information, forcing the model network to focus on distinguishing local regions for recognition. The image shuffling step effectively destroys the global structure. At this time, in order to recognize these randomly shuffled images, the classification network must find identifiable local regions and learn them. Such operations force the neural network to focus on the details in the picture.
- the training method of the image recognition model provided by the embodiment of the present invention combines the original picture and the intended picture for training. Therefore, before the sample picture is scrambled, the first image matrix of the sample picture needs to be stored in advance, and the first image matrix is the image matrix before the sample picture is scrambled. Then, the sample picture is divided and scrambled to obtain a second image matrix of the scrambled sample picture, where the second image matrix is the scrambled image matrix of the sample picture.
- Step 102 inputting the first image matrix into the first convolutional neural network, extracting the first image feature and obtaining the first image classification result through the first convolutional neural network; and, converting the second image matrix
- the input is input to the second convolutional neural network, and the second image feature is extracted and the second image classification result is obtained through the second convolutional neural network.
- a convolutional neural network is used for learning and training, including two convolutional neural networks.
- the input of the first convolutional neural network is the first image matrix of the original image
- the input of the second convolutional neural network is the The second image matrix of the messed up picture.
- the feature extraction part is divided into two branches, namely global feature extraction and local feature extraction.
- the basic structure used by these two branches is the same, for example, resnet50 can be used to extract features.
- the difference is that the local features are obtained from the scrambled image ⁇ (I) through the first convolutional neural network, which can also be called the convolutional neural network f local , while the global features are obtained from the original image through the second convolutional neural network.
- the network also known as the convolutional neural network f global , is obtained, and the extracted global features (first picture features) and local features (second picture features) are respectively obtained through the fully connected layer to obtain the global feature classification results (first picture classification results). result) and local feature classification result (second image classification result).
- Step 103 Solve a preset distillation loss function according to the first picture feature and the second picture feature.
- the smaller the distillation loss function the more the first convolutional neural network and the second convolutional neural network.
- the knowledge distillation step is completed by using the intermediate features of each layer in the two convolutional neural networks.
- KD Knowledge Distillation
- the embodiment of the present invention proposes a new knowledge distillation algorithm, which does not directly learn the characteristics of the teacher network, but instead learns the process of calculating the characteristics of the teacher network, which can jump out of the depth of the neural network model. Constraints to achieve better versatility, and can also improve model recognition and performance when faced with fine-grained recognition, which is a difficult task in computer vision.
- a preset distillation loss function is solved according to the first picture feature and the second picture feature, and the smaller the distillation loss function is, the smaller the distillation loss function is, the more the first convolutional neural network and the second The two convolutional neural networks are closer in the feature calculation process; and, according to the first picture classification result and the second picture classification result, the preset classification loss function is solved, and the smaller the classification loss function is, the smaller the classification loss function is.
- the convolutional neural network and the second convolutional neural network are closer to the true value in the classification result.
- the classification loss function can be expressed as the difference between the sum of the output vectors of the first convolutional neural network and the second convolutional neural network and the true value.
- the classification loss function can be defined as:
- l represents the classification truth value of the image
- log represents the logarithmic function
- Step 104 Optimize the first convolutional neural network and the second convolutional neural network by continuously optimizing the distillation loss function and the classification loss function, when the distillation loss function is less than a preset first threshold And the training ends when the classification loss function is smaller than the preset second threshold, thereby obtaining the trained image recognition model constructed by the first convolutional neural network and the second convolutional neural network.
- the distillation loss function and the classification loss function are continuously reduced, so as to gradually optimize the model.
- the training ends when the distillation loss function is smaller than the preset first threshold and the classification loss function is smaller than the preset second threshold, thereby obtaining a trained image recognition model.
- the training method provided by the embodiment of the present invention is generally divided into two parts: a destruction and reorganization part and a knowledge distillation part, the destruction and reorganization part realizes the orderly scrambling of the picture, destroys the structural information in the picture, and ensures that the algorithm extracts more finer parts Information; the knowledge distillation part distills and concentrates the features extracted from the damaged image, extracts the most effective features to improve the model recognition rate, and further improves the accuracy of the algorithm.
- the knowledge distillation part may include the process of model optimization using the distillation loss function and the classification loss function.
- the image matrix of the original picture and the image matrix of the scrambled picture are respectively input into two convolutional neural network branches, and the features and classification results extracted by the two convolutional neural networks are combined for learning and training. It is conducive to the realization of local feature capture and extraction of more effective features. It can achieve the same accuracy as strong supervision and fine-grained recognition without any manual annotation information, and it can reduce the time and space consumption of the algorithm on the model, improve the system robustness.
- the dividing and shuffling the sample picture specifically includes: first, dividing the image into a plurality of image blocks; then, first performing a shuffling operation on the image blocks in the row direction , and then perform the shuffling operation of the image blocks in the column direction; or, first perform the shuffling operation of the image blocks in the column direction, and then perform the shuffling operation of the image blocks in the row direction.
- the sample picture When the sample picture is divided and scrambled, it is firstly divided and then scrambled. During segmentation, the image is divided into multiple image blocks, such as M ⁇ N image blocks. After the segmentation is done, the image blocks are scrambled.
- the image block scramble operation in the row direction may be performed first, and then the image block scramble operation in the column direction may be performed; or the image block scramble operation in the column direction may be performed first, and then the image block scramble operation in the row direction may be performed.
- the embodiment of the present invention improves the flexibility and order of the system by shuffling the image blocks in the row and column directions successively after dividing the picture.
- the performing the shuffling operation of the image blocks in the row direction includes: for each image block in each row, within a preset first-step length range, according to a first random The value of the variable is exchanged with the image block at the corresponding position in the row direction;
- the performing the shuffling operation of the image block in the column direction includes: for each image block in each column, in the pre- Within the set second step size range, according to the value of the second random variable, perform position exchange with the image block at the corresponding position in the column direction.
- the idea of destruction and reorganization proposed by the embodiment of the present invention is how to effectively destroy the picture, so that the structural information of the picture is disrupted and the local information of the picture is highlighted.
- Dividing the sample picture into different image blocks is essentially dividing the first image matrix into different block matrices.
- image shuffling is the core of orderly and controllable shuffling of the picture, that is, the block matrix of the picture is replaced within a controllable range to control the noise introduced by the scrambling operation and at the same time highlight the local features of the image.
- the moving step size of the image block can be limited.
- the moving step size of the image block in the row direction, can be set within the range of the first step length.
- the first moving step size may be represented by a first random variable. When each image block moves, the first random variable may have different values, but all are within the range of the first moving step size.
- the moving step size of the image block in the column direction, can be set within the range of the second step size.
- the second moving step size may be represented by a second random variable, and when each image block moves, the second random variable may have different values, but all are within the range of the second moving step size.
- the picture can be divided into N ⁇ N blocks, that is, there are the same number of blocks in the row direction and the column direction.
- the movement of image blocks in row and column directions can also be set to a uniform step size.
- the image shuffling step can be divided into two sub-operations: segmentation and shuffling. First, the input image is divided into local small blocks, and then random algorithm is used to scramble them, and the scrambled image can be obtained.
- the specific operations are as follows:
- the image is firstly divided into N ⁇ N sub-regions R i,j , where i and j are the corresponding row block numbers and column block numbers, respectively.
- the scrambled picture ⁇ (I) is obtained, and the value of its sub-region ⁇ (i,j) can be expressed as:
- the image shuffling step effectively destroys the global structure.
- the classification network in order to recognize these randomly shuffled images, the classification network must find identifiable local regions and learn them. Such an operation forces the neural network to focus on the details in the picture, and ensures that the selection of local regions jitters in the adjacent regions through the parameter k, thereby controlling the noise introduced by the scrambling operation and highlighting the local features of the picture.
- the embodiment of the present invention uses random variables with preset thresholds to scramble image blocks in the row and column directions, and on the basis of highlighting local features, it is ensured that the local area is within the adjacent area. Dither to control the noise introduced by the shuffling operation.
- the solving of the preset distillation loss function according to the first picture feature and the second picture feature includes: according to two adjacent convolutional layers in the first convolutional neural network.
- the first picture feature extracted from the layer obtains a global flow matrix
- the local flow matrix is obtained according to the second picture features extracted from two adjacent layers of the convolutional neural network in the second convolutional neural network;
- the L2 norm distance between the global flow matrix and the local flow matrix is used to solve the preset distillation loss function.
- the first picture feature and the second picture feature When solving the preset distillation loss function according to the first picture feature and the second picture feature, according to the first picture feature extracted from two adjacent convolutional layers in the first convolutional neural network Obtain a global flow matrix, which reflects the change relationship of features between two adjacent layers of the convolutional layer in the first convolutional neural network;
- the second picture feature obtains a local flow matrix, and the local flow matrix reflects the change relationship between the features between two adjacent layers of the convolutional layer in the second convolutional neural network; by calculating the difference between the global flow matrix and the local flow matrix L2 norm distance, solve the preset distillation loss function.
- the L2 norm distance indicates the closeness of the feature changes of the two adjacent layers of the two convolutional neural networks. Therefore, the smaller the L2 norm distance and the smaller the value of the distillation loss function, the smaller the value of the two adjacent layers of the convolutional neural network. The closer the change is.
- the new knowledge distillation algorithm proposed by the embodiment of the present invention calculates the flow matrix between the two networks to obtain the change relationship of the features between each layer of the two networks,
- the mutual closeness and fusion of the network enables the student network to learn the "solution" of the computational features of the teacher network, thereby improving the accuracy of fine-grained recognition.
- the algorithm flow proposed in the embodiment of the present invention there is no strict role division between the teacher network and the student network, but the global feature extraction network (the first convolutional neural network) and the local feature extraction network (the second convolutional neural network) are used. Neural networks) are close to each other and integrate with each other to achieve the effect of knowledge distillation.
- the embodiments of the present invention can continuously fuse the global features and local features extracted from the pictures, and perform mutual fusion, mutual distillation and refinement.
- Such a process can extract features that are more helpful to the recognition rate of the model, better improve the accuracy of fine-grained recognition, and can also eliminate noise caused by scrambled pictures in this way.
- the flow matrix distillation method has good model generalization by learning the change process of the features between the two networks, which can overcome the limitations of knowledge distillation, and can perform perfectly even in the face of deep neural networks.
- the embodiment of the present invention uses the flow matrix distillation method to learn the change process of the features between the two networks, so that it has better model generalization, and can overcome the limitations of knowledge distillation, even if the surface It also performs flawlessly for very deep neural networks.
- the expressions of the global flow matrix and the local flow matrix obtained by the picture features of two adjacent layers are:
- F 1 ⁇ R h ⁇ w ⁇ m represents the image feature of the upper c1 layer in the two adjacent layers
- F 2 ⁇ R h ⁇ w ⁇ m represents the image feature of the lower c2 layer in the two adjacent layers
- h, w, m represents the height, width and number of channels of the image feature respectively
- s represents the serial number of the image height feature
- t represents the serial number of the image width feature
- x represents the input image
- W represents the weight parameter of the neural network.
- the flow matrix G ⁇ R m ⁇ n is defined as:
- the embodiment of the present invention improves the practicability by giving the expression of the flow matrix.
- distillation loss function is:
- W global represents the global flow matrix
- W local represents the local flow matrix
- L flow (W global , W local ) represents the distillation loss function obtained from the global flow matrix and the local flow matrix
- ⁇ 1 represents the weight coefficient
- l represents the flow matrix
- the flow matrix includes the global flow matrix and the local flow matrix
- n represents the number of the flow matrix for a picture, wherein, the number of the global flow matrix and the local flow matrix The same
- x represents the input picture
- N represents the number of pictures
- the global flow matrix G global (x; W global ) of the global feature extraction network and the local flow matrix G local (x; W local ) of the local feature extraction network are calculated respectively, and then the knowledge distillation loss function L flow (W global , W local ). Since one stream matrix can be calculated from two layers, there are multiple stream matrices corresponding to one picture.
- the above distillation loss function is obtained by synthesizing the L2 norm distance of the flow matrix of each image. In the embodiment of the present invention, it is considered that each flow matrix is equally important, so the same weight coefficient ⁇ 1 can be used in the loss function.
- the embodiment of the present invention obtains the distillation loss function by synthesizing the L2 norm distance of the flow matrix of each picture, which improves the reliability of the distillation loss function.
- FIG. 2 is a flowchart of a training method for an image recognition model provided by another embodiment of the present invention.
- an embodiment of the present invention proposes a training method for an image recognition model based on destruction reorganization and knowledge distillation. This method does not require any manual annotation information, and can achieve the same accuracy as strongly supervised fine-grained recognition. , and can reduce the time and space consumption of the algorithm on the model.
- the method is generally divided into two parts: the destruction and reorganization part and the knowledge distillation part.
- the destruction and reorganization part realizes the orderly disruption of the image, destroys the structural information in the image, and ensures that the algorithm can extract finer local information;
- the features extracted from the damaged images are distilled and concentrated to extract the most effective features to improve the recognition rate of the model and further improve the accuracy of the algorithm.
- the algorithm performs the image destruction step to scramble the images in an orderly manner, that is, the disturbance amplitude is controlled while scrambled, so as to achieve the effect of effectively controlling the noise introduced by the scramble.
- the original structural information of the image is destroyed, and the algorithm is forced to focus on the local information points in the image to extract more effective and accurate local information.
- the algorithm After the destruction and reorganization part is over, the algorithm enters the knowledge distillation part, which is jointly completed by the two branches.
- the scrambled image and the original image obtained earlier are extracted by the convolutional neural network for local features and global features respectively, and then the local classification results and global classification results are obtained through the fully connected layer.
- the calculation results calculate the local flow matrix and global flow matrix required by the algorithm, and then use the knowledge distillation algorithm to distill and condense the extracted features, and further obtain the most effective features to improve the model recognition rate and help the convolutional neural network.
- the parameter adjustment of the network enables the algorithm to fuse global and local features to classify images in fine-grained manner, so as to effectively improve the accuracy of fine-grained recognition.
- FIG. 3 is a flowchart of an image recognition method provided by an embodiment of the present invention.
- the method may use the image recognition model trained in any of the above embodiments to perform image recognition.
- the method includes:
- Step 201 After recording the first image matrix of the input picture, segment and scramble the input picture, so as to obtain the second image matrix of the scrambled input picture.
- the input image After recording the first image matrix of the input image, the input image can be segmented and scrambled according to the rules of image segmentation and scramble during model training, so as to obtain the scrambled second image matrix of the input image.
- the first image matrix in the embodiment of the present invention corresponds to the input picture that actually needs to be recognized
- the second image matrix corresponds to the scrambled input picture.
- Step 202 Input the first image matrix into the first convolutional neural network, and obtain the first output vector of the fully connected layer through the first convolutional neural network; and, input the second image matrix To the second convolutional neural network, the second output vector of the fully connected layer is obtained through the second convolutional neural network.
- Input the first image matrix into the first convolutional neural network obtain the first output vector of the fully connected layer through the first convolutional neural network, and the size of each element in the first output vector can represent a picture is the probability of the corresponding class.
- Input the second image matrix into the second convolutional neural network obtain the second output vector of the fully connected layer through the second convolutional neural network, and the size of each element in the second output vector can represent a picture is the probability of the corresponding class.
- Step 203 Obtain a picture recognition result according to the first output vector and the second output vector.
- the image recognition result can be obtained by combining the first output vector and the second output vector.
- the first output vector and the second output vector may be weighted and summed, and the category to which the picture belongs is determined according to the size of the elements in the output vector.
- the obtaining the picture recognition result according to the first output vector and the second output vector includes: adding the first output vector and the second output vector to obtain a third output vector, and the image recognition result is obtained according to the third output vector.
- the first output vector and the second output vector can be directly added to obtain a third output vector, and the third output vector can be obtained according to the third output vector.
- the size of the elements in the output vector determines the category to which the picture belongs, so as to obtain the picture recognition result.
- the embodiment of the present invention obtains a third output vector by adding the first output vector and the second output vector, and obtains a picture recognition result according to the third output vector, which improves the simplicity.
- FIG. 4 is a schematic structural diagram of an apparatus for training an image recognition model according to an embodiment of the present invention.
- the device includes a picture scrambling module 10, a feature extraction and classification module 20, a loss function calculation module 30 and a model optimization module 40, wherein: the picture scrambling module 10 is used for: recording the first After the image matrix, the sample picture is divided and scrambled, so as to obtain the second image matrix of the sample picture after the scramble; the feature extraction and classification module 20 is used for: inputting the first image matrix into the first image matrix.
- the loss function calculation module 30 is used for: solving the preset distillation loss function according to the first picture feature and the second picture feature, the The smaller the distillation loss function is, the closer the first convolutional neural network and the second convolutional neural network are in the feature calculation process;
- the model optimization module 40 is used for: by continuously optimizing all The distillation loss function and the classification loss function are optimized for the first convolutional neural network and the second convolutional neural network.
- the image matrix of the original picture and the image matrix of the scrambled picture are respectively input into two convolutional neural network branches, and the features and classification results extracted by the two convolutional neural networks are combined for learning and training. It is conducive to the realization of local feature capture and extraction of more effective features. It can achieve the same accuracy as strong supervision and fine-grained recognition without any manual annotation information, and it can reduce the time and space consumption of the algorithm on the model, improve the system robustness.
- FIG. 5 is a schematic structural diagram of an image recognition apparatus provided by an embodiment of the present invention.
- the device includes an image processing module 100, an output vector acquisition module 200 and an image recognition module 300, wherein: the image processing module 100 is used for: after recording the first image matrix of the input picture, the input picture Perform segmentation and scramble, thereby obtaining the second image matrix of the input picture after the scramble; the output vector obtaining module 200 is used for: inputting the first image matrix into the first convolutional neural network, through all The first convolutional neural network obtains the first output vector of the fully connected layer; and, the second image matrix is input into the second convolutional neural network, and the fully connected layer is obtained through the second convolutional neural network
- the image recognition module 300 is configured to: obtain a picture recognition result according to the first output vector and the second output vector.
- the device provided in the embodiment of the present invention is used for the above method, and the specific function may refer to the above method flow, which will not be repeated here.
- FIG. 6 illustrates a schematic diagram of the physical structure of an electronic device.
- the electronic device may include: a processor (processor) 610, a communication interface (Communications Interface) 620, a memory (memory) 630 and a communication bus 640,
- the processor 610 , the communication interface 620 , and the memory 630 communicate with each other through the communication bus 640 .
- the processor 610 can call the logic instruction in the memory 630 to execute the training method of the image recognition model, the method includes: after recording the first image matrix of the sample picture, dividing and shuffling the sample picture, so as to obtain the image recognition model.
- the second image matrix of the sample picture after chaos the first image matrix is input into the first convolutional neural network, and the first picture feature is extracted and the first picture classification result is obtained through the first convolutional neural network; and, inputting the second image matrix into the second convolutional neural network, extracting the second picture feature and obtaining the second picture classification result through the second convolutional neural network; according to the first picture feature and the
- the second image feature solves a preset distillation loss function, and the smaller the distillation loss function is, the closer the first convolutional neural network and the second convolutional neural network are in the feature calculation process; and, according to the first convolutional neural network.
- a preset classification loss function is calculated for a picture classification result and the second picture classification result.
- the classification loss function the closer the classification results of the first convolutional neural network and the second convolutional neural network are. true value; the first convolutional neural network and the second convolutional neural network are optimized by continuously optimizing the distillation loss function and the classification loss function, and the distillation loss function is less than a preset first threshold And the training ends when the classification loss function is smaller than the preset second threshold, thereby obtaining the trained image recognition model constructed by the first convolutional neural network and the second convolutional neural network.
- the processor 610 can call the logic instruction in the memory 630 to execute the image recognition method, the method includes: after recording the first image matrix of the input picture, dividing and scrambled the input picture, so as to obtain the scrambled image Then the second image matrix of the input picture; inputting the first image matrix into the first convolutional neural network, and obtaining the first output vector of the fully connected layer through the first convolutional neural network; and, inputting the second image matrix into the second convolutional neural network, and obtaining the second output vector of the fully connected layer through the second convolutional neural network; according to the first output vector and the second output vector to get the image recognition result.
- the above-mentioned logic instructions in the memory 630 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product.
- the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
- the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention.
- the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .
- an embodiment of the present invention also provides a computer program product
- the computer program product includes a computer program stored on a non-transitory computer-readable storage medium
- the computer program includes program instructions, when the program instructions When executed by a computer, the computer can execute the training method of the image recognition model provided by the above method embodiments, the method includes: after recording the first image matrix of the sample picture, the sample picture is divided and scrambled, thereby Obtain the second image matrix of the scrambled sample picture; input the first image matrix into the first convolutional neural network, extract the first picture feature and obtain the first picture classification through the first convolutional neural network Result; and, inputting the second image matrix into the second convolutional neural network, extracting the second picture feature and obtaining the second picture classification result through the second convolutional neural network; according to the first picture feature and The second image feature solves a preset distillation loss function, and the smaller the distillation loss function is, the closer the first convolutional neural network and the second convolutional neural network are in the feature calculation process
- the computer can execute the image recognition method provided by the above method embodiments, the method includes: after recording the first image matrix of the input picture, dividing the input picture and scrambled, so as to obtain the second image matrix of the input picture after the scramble; input the first image matrix into the first convolutional neural network, and obtain the fully connected layer through the first convolutional neural network. a first output vector; and, inputting the second image matrix into the second convolutional neural network, and obtaining a second output vector of the fully connected layer through the second convolutional neural network; according to the first output vector and the second output vector to obtain a picture recognition result.
- an embodiment of the present invention further provides a non-transitory computer-readable storage medium on which a computer program is stored, and the computer program is implemented when executed by a processor to perform the training of the image recognition model provided by the above embodiments
- the method includes: after recording the first image matrix of the sample picture, dividing and scrambled the sample picture, so as to obtain the second image matrix of the scrambled sample picture; Input into the first convolutional neural network, extract the first picture feature and obtain the first picture classification result through the first convolutional neural network; And, input the second image matrix into the second convolutional neural network, through The second convolutional neural network extracts the second picture feature and obtains the second picture classification result; solves the preset distillation loss function according to the first picture feature and the second picture feature, the smaller the distillation loss function is.
- the training ends when the distillation loss function is less than a preset first threshold and the classification loss function is less than a preset second threshold, thereby obtaining training A good image recognition model constructed by the first convolutional neural network and the second convolutional neural network.
- the computer program when executed by the processor, it is implemented to execute the image recognition method provided by the above embodiments, and the method includes: after recording the first image matrix of the input picture, dividing and shuffling the input picture, thereby Obtain the second image matrix of the input picture after the scramble; input the first image matrix into the first convolutional neural network, and obtain the first output vector of the fully connected layer through the first convolutional neural network ; And, the second image matrix is input to the second convolutional neural network, and the second output vector of the fully connected layer is obtained by the second convolutional neural network; According to the first output vector and the described The second output vector obtains the image recognition result.
- the device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
- each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware.
- the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
本发明实施例提供一种图像识别模型的训练方法及图像识别方法,该训练方法包括:记录样本图片的第一图像矩阵后,进行切分及打乱后获取第二图像矩阵;分别通过对应的卷积神经网络提取图片特征和获取图片分类结果;根据图片特征求解蒸馏损失函数,根据图片分类结果求解分类损失函数;通过优化蒸馏损失函数及分类损失函数进行模型优化,在蒸馏损失函数小于预设第一阈值以及分类损失函数小于预设第二阈值时训练结束,从而获取训练好的图像识别模型。本发明实施例有利于实现局部特征抓取以及提取更加有效的特征,不需要借助任何人工标注信息,也能达到和强监督细粒度识别一样的准确度,在模型上能够减少算法的时间和空间消耗,提高鲁棒性。
Description
本发明涉及人工智能技术领域,具体涉及一种图像识别模型的训练方法及图像识别方法。
细粒度识别也叫精细识别。它与现有的通用图像分析任务不同,细粒度图像识别所需识别的种类更加细致,识别的粒度也更为精细,需要在一个大类中区分出更加细分的子类,对存在细微差别的物体进行区分和识别。
例如,通用图像分类只需要区分出“鸟”和“花”这两个物体大类,而细粒度图像分类则要求对“花”该类类别下细粒度的子类进行区分,即区分出是“月季花”还是“玫瑰花”。因此,细粒度图像识别要求找出同类别物种的不同子类之间的细微差异,因此,使得它的难度和挑战大大增加。
目前来说,细粒度图像识别在生活和工业上都有很广泛的应用场景,它作为一项图像识别技术,是人工智能领域不可或缺的一门重要技术。同时,由于它所区分的粒度更加细致,使得细粒度图像识别技术能大幅度提升现有的识别技术,帮助提高相关上层技术精度。
现有的细粒度分类模型,按照其采用的监督信息的强弱,可分为两大类:分别是“基于强监督信息的分类模型”和“基于弱监督信息的分类模型”。
其中,基于强监督信息的分类模型在训练的过程中引入了两种额外的人工标注信息,分别是目标标注框和关键部位标注点。对于这两 种额外信息,强监督分类模型可以借助目标标注框得到前景对象的检测,排除背景所带来的噪声干扰;而关键部位标注点则可以被用来定位目标具有显著区别的关键点,在这些关键点上就可以高效率地提取出图片的局部特征。因此,通过这两种额外信息所提供的定位,强监督分类模型可以更好地在精准的地方提取出物体信息,排除图片背景和其他物体上无关信息所带来的干扰,得到较高的准确度,达到比较好的效果。
而基于弱监督信息的分类模型则相反,它不使用任何额外的人工标注信息,仅仅靠图片和图片的分类标签完成整个算法的训练和学习。这种类型的算法不需要大量人工的投入,在实际应用场景上更为方便简洁。总体来说,基于弱监督信息的分类模型算法的准确度不及基于强监督信息的分类模型算法。但得益于近年来深度学习的发展,基于弱监督信息的分类模型算法引入卷积神经网络来进行训练,其精确度得到了较大的提高,并逐渐成为细粒度图像识别研究的趋势。
细粒度识别算法的关键点在于如何挖掘出图片中的细微差别,即局部特征的提取。由于难以找到鉴别性特征,细粒度识别这项任务十分具有挑战性。而对于弱监督类型的细粒度识别算法来说,无法借助人工标注信息准确定位目标位置和关键部位点,只能在图片的基础上进行局部特征的提取。而对于一张图片,提取出来的局部特征非常多,如何在众多的局部特征中排除错误干扰特征,学习到有用的特征,这是一个难题。现有的局部特征提取通常使用枚举的方法,在全图使用不同的步长或尺度截取出部件区域,再对部件区域提出特征。但这种方法十分耗时,而且容易受到背景信息的干扰而提取出大量对识别无用的区域特征。另外,图片不同的光照情况、不当的拍摄角度也会对弱监督类型的细粒度识别造成干扰。在这些情况下,弱监督类型的细粒度识别的准确度较低,且鲁棒性较差。因此,弱监督类型的细粒度识别要实现较好的鲁棒性和较高的识别率仍有较大的挑战性。
发明内容
为解决现有技术中的问题,本发明实施例提供一种图像识别模型的训练方法及图像识别方法。
第一方面,本发明实施例提供一种图像识别模型的训练方法,包括:记录样本图片的第一图像矩阵后,将所述样本图片进行切分及打乱,从而获取打乱后所述样本图片的第二图像矩阵;将所述第一图像矩阵输入到第一卷积神经网络,通过所述第一卷积神经网络提取第一图片特征及获取第一图片分类结果;以及,将所述第二图像矩阵输入到第二卷积神经网络,通过所述第二卷积神经网络提取第二图片特征及获取第二图片分类结果;根据所述第一图片特征和所述第二图片特征求解预设的蒸馏损失函数,所述蒸馏损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在特征计算流程上越接近;以及,根据所述第一图片分类结果和所述第二图片分类结果求解预设的分类损失函数,所述分类损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在分类结果上越接近真值;通过不断优化所述蒸馏损失函数及所述分类损失函数进行所述第一卷积神经网络和所述第二卷积神经网络的优化,在所述蒸馏损失函数小于预设第一阈值以及所述分类损失函数小于预设第二阈值时训练结束,从而获取训练好的由所述第一卷积神经网络和所述第二卷积神经网络所构建的图像识别模型。
进一步地,所述将所述样本图片进行切分及打乱,具体包括:首先,将图像分割成多个图像块;然后,先进行行方向上所述图像块的打乱操作,再进行列方向上所述图像块的打乱操作;或,先进行列方向上所述图像块的打乱操作,再进行行方向上所述图像块的打乱操作。
进一步地,所述进行行方向上所述图像块的打乱操作,包括:对于每一行的每个所述图像块,在预设的第一步长范围内,根据第一随机变量的值,与对应位置的所述图像块进行行方向上位置的互换;所 述进行列方向上所述图像块的打乱操作,包括:对于每一列的每个所述图像块,在预设的第二步长范围内,根据第二随机变量的值,与对应位置的所述图像块进行列方向上位置的互换。
进一步地,所述根据所述第一图片特征和所述第二图片特征求解预设的蒸馏损失函数,包括:根据所述第一卷积神经网络中卷积层相邻两层所提取的所述第一图片特征获取全局流矩阵,根据所述第二卷积神经网络中卷积层相邻两层所提取的所述第二图片特征获取局部流矩阵;通过计算所述全局流矩阵和所述局部流矩阵的L2范数距离,求解预设的所述蒸馏损失函数。
进一步地,通过相邻两层的图片特征得到的所述全局流矩阵和所述局部流矩阵的表达式为:
其中,F
1∈R
h×w×m表示相邻两层中上面c1层的图片特征,F
2∈R
h×w×m表示相邻两层中下面c2层的图片特征,h,w,m分别表示图片特征的高度、宽度和通道数,s表示图片高度特征的序号,t表示图片宽度特征的序号,x表示输入的图片,W表示神经网络的权重参数。
进一步地,所述蒸馏损失函数的表达式为:
其中,W
global表示全局流矩阵,W
local表示局部流矩阵,L
flow(W
global,W
local)表示根据全局流矩阵和局部流矩阵得到的蒸馏损失函数;λ
1表示权重系数;l表示流矩阵的序号,所述流矩阵包括所述全局流矩阵和所述局部流矩阵;n表示针对一张图片所述流矩阵的个数,其中,所述全局流矩阵和所述局部流矩阵的个数相同;x表示输入的图片;N表示图片个数;
表示x图片的第l个全 局流矩阵;
表示x图片的第l个局部流矩阵;
表示L2范数距离计算。
第二方面,本发明实施例提供一种基于上述图像识别模型的图像识别方法,包括:记录输入图片的第一图像矩阵后,将所述输入图片进行切分及打乱,从而获取打乱后所述输入图片的第二图像矩阵;将所述第一图像矩阵输入到所述第一卷积神经网络,通过所述第一卷积神经网络获取全连接层的第一输出向量;以及,将所述第二图像矩阵输入到所述第二卷积神经网络,通过所述第二卷积神经网络获取全连接层的第二输出向量;根据所述第一输出向量和所述第二输出向量得到图片识别结果。
进一步地,所述第一输出向量和所述第二输出向量得到图片识别结果包括:将所述第一输出向量和所述第二输出向量相加得到第三输出向量,根据所述第三输出向量得到所述图片识别结果。
第三方面,本发明实施例提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如第一方面或第二方面所提供的方法的步骤。
第四方面,本发明实施例提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如第一方面或第二方面所提供的方法的步骤。
本发明实施例提供的图像识别模型的训练方法及图像识别方法,通过在模型训练时将原始图片的图像矩阵和打乱后的图片的图像矩阵分别输入到两个卷积神经网络分支,综合两个卷积神经网络提取的特征及分类结果进行学习和训练,有利于实现局部特征抓取以及提取到更加有效的特征,不需要借助任何人工标注信息,也能达到和强监督细粒度识别一样的准确度,并且在模型上能够减少算法的时间和空间消耗,提高了系统鲁棒性。
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明一实施例提供的图像识别模型的训练方法流程图;
图2是本发明另一实施例提供的图像识别模型的训练方法流程图;
图3是本发明一实施例提供的图像识别方法流程图;
图4是本发明一实施例提供的图像识别模型训练装置的结构示意图;
图5是本发明一实施例提供的图像识别装置的结构示意图;
图6示例了一种电子设备的实体结构示意图。
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整的描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
图1是本发明一实施例提供的图像识别模型的训练方法流程图。如图1所示,所述方法包括:
步骤101、记录样本图片的第一图像矩阵后,将所述样本图片进行切分及打乱,从而获取打乱后所述样本图片的第二图像矩阵。
可以用图像矩阵对图片进行表征,图像矩阵中的元素可以是各个 像素点的灰度值。本发明实施例提供的图像识别模型的训练方法得到的图像识别模型可以实现弱监督细粒度的图像识别。
精细的局部细节特征表示是细粒度识别的关键。这是因为对于细粒度的识别而言,局部细节比全局结构更重要,因为来自不同细粒度类别的图像通常具有相同的全局结构或形状,而只是局部细节不同。将图片打乱重组可以让算法丢弃全局结构信息,保留局部细节信息,迫使模型网络的注意力集中于具有区别性的局部区域来进行识别。图片打乱步骤有效地破坏了全局结构,此时分类网络要想识别这些随机打乱的图像,就必须找到可识别的局部区域,并对其进行学习。这样的操作迫使神经网络关注于图片中的细节。
本发明实施例提供的图像识别模型的训练方法,将图片原图和打算后的图片结合起来进行训练。因此,在样本图片打乱之前,需要预先存储好样本图片的第一图像矩阵,所述第一图像矩阵是样本图片打乱前的图像矩阵。然后,将所述样本图片进行切分及打乱,从而获取打乱后所述样本图片的第二图像矩阵,所述第二图像矩阵是所述样本图片打乱后的图像矩阵。
步骤102、将所述第一图像矩阵输入到第一卷积神经网络,通过所述第一卷积神经网络提取第一图片特征及获取第一图片分类结果;以及,将所述第二图像矩阵输入到第二卷积神经网络,通过所述第二卷积神经网络提取第二图片特征及获取第二图片分类结果。
本发明实施例中采用卷积神经网络进行学习和训练,包含两个卷积神经网络,第一卷积神经网络的输入为原图片的第一图像矩阵,第二卷积神经网络的输入为打乱后图片的第二图像矩阵。
因此,特征提取部分分为两个分支,分别为全局特征提取和局部特征提取。这两个分支所使用到的基础结构是一样的,比如都可以使用resnet50来提取特征。不同的是,局部特征是由打乱后的图片φ(I)经过第一卷积神经网络,也可称为卷积神经网络f
local得到,而全局特征 是由原图经过第二卷积神经网络,也可称为卷积神经网络f
global得到,提取后的全局特征(第一图片特征)和局部特征(第二图片特征)分别经过全连接层分别得到全局特征分类结果(第一图片分类结果)和局部特征分类结果(第二图片分类结果)。
步骤103、根据所述第一图片特征和所述第二图片特征求解预设的蒸馏损失函数,所述蒸馏损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在特征计算流程上越接近;以及,根据所述第一图片分类结果和所述第二图片分类结果求解预设的分类损失函数,所述分类损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在分类结果上越接近真值。
针对上述得到的两个特征流(第一图片特征和第二图片特征),利用两个卷积神经网络中各层的中间特征来完成知识蒸馏步骤。知识蒸馏(KD)概念首次由Hinton提出,大多使用于卷积神经网络中,其思想在于如何进行知识转换技术,即从一个完善的教师神经网络提取知识来训练学生网络,使得学生在保持模型参数少的同时提高识别的准确度。但是这种方法有其局限性,难以优化深度很深的神经网络。授之以鱼不如授之以渔,本发明实施例提出一种新的知识蒸馏算法,不直接学习教师网络的特征,而是转为学习教师网络特征计算的流程,就能跳出神经网络模型深度的约束,达到比较好的通用性,在面对细粒度识别这种计算机视觉中较难任务的时候也能很好地提升模型识别度和性能。
因此,本发明实施例中,根据所述第一图片特征和所述第二图片特征求解预设的蒸馏损失函数,所述蒸馏损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在特征计算流程上越接近;以及,根据所述第一图片分类结果和所述第二图片分类结果求解预设的分类损失函数,所述分类损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在分类结果上越接近真值。其中,分类损失函 数可以表示成第一卷积神经网络和第二卷积神经网络的输出向量之和与真值之差。
对于输入的图像I和打乱后的图片φ(I),他们分别通过全局特征提取卷积神经网络f
global和局部特征提取卷积神经网络f
local得到对应的全局特征输出向量C(I)和局部特征输出向量C(φ(I))。因此,分类损失函数可定义为:
步骤104、通过不断优化所述蒸馏损失函数及所述分类损失函数进行所述第一卷积神经网络和所述第二卷积神经网络的优化,在所述蒸馏损失函数小于预设第一阈值以及所述分类损失函数小于预设第二阈值时训练结束,从而获取训练好的由所述第一卷积神经网络和所述第二卷积神经网络所构建的图像识别模型。
蒸馏损失函数和分类损失函数越小表示模型越优化。通过对神经网络进行反馈使得蒸馏损失函数和分类损失函数不断减小,从而对模型进行逐步优化。在所述蒸馏损失函数小于预设第一阈值以及所述分类损失函数小于预设第二阈值时训练结束,从而获取训练好的图像识别模型。
本发明实施例提供的训练方法总体分为两个部分:破坏重组部分和知识蒸馏部分,破坏重组部分实现对图片的有序打乱,破坏图片中的结构信息,保证算法提取出更精细的局部信息;知识蒸馏部分对破坏后图片提取出来的特征进行蒸馏和浓缩,提取出对模型识别率提高最有效的特征,进一步提高算法的准确度。其中,知识蒸馏部分可以包括利用蒸馏损失函数和分类损失函数进行模型优化的过程。
本发明实施例通过将原始图片的图像矩阵和打乱后的图片的图像矩阵分别输入到两个卷积神经网络分支,综合两个卷积神经网络提 取的特征及分类结果进行学习和训练,有利于实现局部特征抓取以及提取到更加有效的特征,不需要借助任何人工标注信息,也能达到和强监督细粒度识别一样的准确度,并且在模型上能够减少算法的时间和空间消耗,提高了系统鲁棒性。
进一步地,基于上述实施例,所述将所述样本图片进行切分及打乱,具体包括:首先,将图像分割成多个图像块;然后,先进行行方向上所述图像块的打乱操作,再进行列方向上所述图像块的打乱操作;或,先进行列方向上所述图像块的打乱操作,再进行行方向上所述图像块的打乱操作。
在将所述样本图片进行切分及打乱时,首先进行切分,然后再打乱。切分时,将图像分割成多个图像块,如M×N个图像块。切分好后进行图像块的打乱。可以先进行行方向上图像块的打乱操作,再进行列方向上图像块的打乱操作;也可以先进行列方向上图像块的打乱操作,再进行行方向上图像块的打乱操作。
在上述实施例的基础上,本发明实施例通过将图片切分后,先后进行行列方向的图像块打乱,提高了系统灵活性和有序性。
进一步地,基于上述实施例,所述进行行方向上所述图像块的打乱操作,包括:对于每一行的每个所述图像块,在预设的第一步长范围内,根据第一随机变量的值,与对应位置的所述图像块进行行方向上位置的互换;所述进行列方向上所述图像块的打乱操作,包括:对于每一列的每个所述图像块,在预设的第二步长范围内,根据第二随机变量的值,与对应位置的所述图像块进行列方向上位置的互换。
本发明实施例提出的破坏重组思想在于如何去有效破坏图片,使得图片的结构信息被打乱的同时突出图片的局部信息。对样本图片切分为不同的图像块,其实质是对第一图像矩阵切分为不同的块矩阵。图片打乱作为算法的首要步骤,其核心在于有序可控打乱图片,即在可控的范围内对图片的块矩阵进行置换,来达到控制打乱操作所引入 的噪声的同时又能突出图片的局部特征。
具体地,可以限制图像块的移动步长。比如,行方向上可以设置图像块的移动步长在第一步长范围内。可以用第一随机变量来表示第一移动步长,每个图像块移动时,第一随机变量可以为不同的值,但是均在第一移动步长范围内。列方向上可以设置图像块的移动步长在第二步长范围内。可以用第二随机变量来表示第二移动步长,每个图像块移动时,第二随机变量可以为不同的值,但是均在第二移动步长范围内。每个图像块发生移动时,与相应位置的图像块位置互换。
当然,如果是方形图片,可以将图片切分为N×N块,即行方向和列方向上具有相同的块数。在移动时,行方向和列方向上图像块的移动也可以设置为统一步长。以此为例,进一步对图片打乱的方法进行说明:
图片打乱步骤可分为两个子操作:切分和打乱。首先把输入图像分割成局部的小块,然后使用随机算法来打乱它们,就可以得到打乱后的图片。具体操作如下:
对于输入图像I,首先把图像统一分割为N×N个子区域R
i,j,其中i和j分别为对应的行块号和列块号。算法通过以下机制对切割好的子区域进行打乱:对于第j行的区域,算法首先生成一个大小为N的向量q
j,其第i个元素q
j,i=i+r,公式中的r是一个均匀分布在(-k,k)的随机变量,这里的k是算法的一个可调参数(1≤k<N),它刻画了打乱机制所扰动的范围。通过这样的打乱机制,可以得到新序列
各元素的变化范围:
通过上面的操作,就能完成对图片的行打乱操作。行打乱之后以类似的规则进行列打乱,则同样可得到以下关系式:
输入的图片经过行打乱和列打乱后,就得到了打乱后的图片φ(I), 其子区域σ(i,j)的值可表达为:
图片打乱步骤有效地破坏了全局结构,此时分类网络要想识别这些随机打乱的图像,就必须找到可识别的局部区域,并对其进行学习。这样的操作迫使神经网络关注于图片中的细节,并通过参数k确保了局部区域的选择在邻近区域内抖动,从而控制打乱操作所引入的噪声,突出图片的局部特征。
在上述实施例的基础上,本发明实施例通过利用预设阈值的随机变量以此进行行和列方向上图像块的打乱,在突出局部特征的基础上,确保了局部区域在邻近区域内抖动,从而控制打乱操作所引入的噪声。
进一步地,基于上述实施例,所述根据所述第一图片特征和所述第二图片特征求解预设的蒸馏损失函数,包括:根据所述第一卷积神经网络中卷积层相邻两层所提取的所述第一图片特征获取全局流矩阵,根据所述第二卷积神经网络中卷积层相邻两层所提取的所述第二图片特征获取局部流矩阵;通过计算所述全局流矩阵和所述局部流矩阵的L2范数距离,求解预设的所述蒸馏损失函数。
在根据所述第一图片特征和所述第二图片特征求解预设的蒸馏损失函数时,根据所述第一卷积神经网络中卷积层相邻两层所提取的所述第一图片特征获取全局流矩阵,全局流矩阵反映第一卷积神经网络中卷积层相邻两层之间特征的变化关系;根据所述第二卷积神经网络中卷积层相邻两层所提取的所述第二图片特征获取局部流矩阵,局部流矩阵反映第二卷积神经网络中卷积层相邻两层之间特征的变化关系;通过计算所述全局流矩阵和所述局部流矩阵的L2范数距离,求解预设的所述蒸馏损失函数。L2范数距离表示出两个卷积神经网络相邻两层特征变化的接近程度,因此,L2范数距离越小、蒸馏损失函数值越小,表示两个卷积神经网络相邻两层特征变化情况越接近。
本发明实施例提出的新的知识蒸馏算法,又名流矩阵蒸馏法,通 过计算两个网络之间的流矩阵求出两个网络每一层之间特征的变化关系,通过两个流矩阵之间的相互靠近和融合使得学生网络能够学习到教师网络计算特征的“解法”,从而提升细粒度识别的准确度。在本发明实施例所提出的算法流程中,并没有严格的教师网络和学生网络的角色划分,而是通过全局特征提取网络(第一卷积神经网络)和局部特征提取网络(第二卷积神经网络)之间的相互靠近、相互融合达到知识蒸馏的效果。
通过对损失函数(包括蒸馏损失函数的分类损失函数)的不断优化,本发明实施例能够不断融合图片中提取出来的全局特征和局部特征,并进行相互融合、相互蒸馏和提炼。这样的过程能够提取出对模型识别率帮助更大的特征,较好地提升细粒度识别的准确度,并且也能通过这种方式消除因图片打乱而引入的噪声。同时,流矩阵蒸馏法通过学习两个网络之间特征的变化过程使得自身拥有较好的模型泛化性,能够克服知识蒸馏的局限性,即使面对深度很深的神经网络也能完美执行。
在上述实施例的基础上,本发明实施例通过采用流矩阵蒸馏法,学习两个网络之间特征的变化过程使得自身拥有较好的模型泛化性,能够克服知识蒸馏的局限性,即使面对深度很深的神经网络也能完美执行。
进一步地,基于上述实施例,通过相邻两层的图片特征得到的所述全局流矩阵和所述局部流矩阵的表达式为:
其中,F
1∈R
h×w×m表示相邻两层中上面c1层的图片特征,F
2∈R
h×w×m表示相邻两层中下面c2层的图片特征,h,w,m分别表示图片特征的高度、宽度和通道数,s表示图片高度特征的序号,t表示图片宽度特征的序号,x表示输入的图片,W表示神经网络的权重参数。
对于一个教师网络来说,目的是学习特征在其网络中变化的过程, 即网络中相邻两层所得特征之间的关系。因此定义流矩阵G∈R
m×n为:
通过分别计算第一卷积神经网络和第二卷积神经网络的流矩阵,并不断优化两者之间的L2范数距离,就能达到知识蒸馏的效果。
在上述实施例的基础上,本发明实施例通过给出流矩阵的表达式,提高了实用性。
进一步地,基于上述实施例,所述蒸馏损失函数的表达式为:
其中,W
global表示全局流矩阵,W
local表示局部流矩阵,L
flow(W
global,W
local)表示根据全局流矩阵和局部流矩阵得到的蒸馏损失函数;λ
1表示权重系数;l表示流矩阵的序号,所述流矩阵包括所述全局流矩阵和所述局部流矩阵;n表示针对一张图片所述流矩阵的个数,其中,所述全局流矩阵和所述局部流矩阵的个数相同;x表示输入的图片;N表示图片个数;
表示x图片的第l个全局流矩阵;
表示x图片的第l个局部流矩阵;
表示L2范数距离计算。
首先分别计算出全局特征提取网络的全局流矩阵G
global(x;W
global)和局部特征提取网络的局部流矩阵G
local(x;W
local),然后计算知识蒸馏损失函数L
flow(W
global,W
local)。由于根据两个层可以计算一个流矩阵,因此对应于一张图片流矩阵具有多个。综合各个图片的流矩阵的L2范数距离得到如上蒸馏损失函数。本发明实施例中,认为每一个流矩阵都同样重要,因此在损失函数中可以使用相同的权重系数λ
1。
在上述实施例的基础上,本发明实施例通过综合各个图片的流矩 阵的L2范数距离得到蒸馏损失函数,提高了蒸馏损失函数的可靠性。
图2是本发明另一实施例提供的图像识别模型的训练方法流程图。如图2所示,本发明实施例提出一种基于破坏重组和知识蒸馏的图像识别模型的训练方法,该方法不需要借助任何人工标注信息,也能达到和强监督细粒度识别一样的准确度,并且在模型上能够减少算法的时间和空间消耗。该方法总体分为两个部分:破坏重组部分和知识蒸馏部分,破坏重组部分实现对图片的有序打乱,破坏图片中的结构信息,保证算法提取出更精细的局部信息;知识蒸馏部分对破坏后图片提取出来的特征进行蒸馏和浓缩,提取出对模型识别率提高最有效的特征,进一步提高算法的准确度。
首先,该算法进行图片破坏步骤,对图片进行有序打乱,即在打乱的同时控制扰动幅度,以达到有效控制打乱引入噪声的效果。通过这样的步骤,使得图片原有的结构信息被破坏,而强迫算法去关注于图片中的局部信息点,提取出更加有效、更加精准的局部信息。
破坏重组部分结束后,算法进入知识蒸馏部分,此部分由两个分支共同完成。前面得到的打乱后的图片和原图分别经过卷积神经网络进行局部特征和全局特征提取,然后经过全连接层得到局部分类结果和全局分类结果,同时根据两侧卷积神经网络各层的计算结果计算出算法所需的局部流矩阵和全局流矩阵,然后利用知识蒸馏算法对提取出来的特征进行蒸馏和浓缩,进一步得到对模型识别率提高最有效的特征,并有助于卷积神经网络的参数调整,使得算法能够融合全局和局部特征来对图像进行细粒度分类,达到有效提升细粒度识别准确度。
图3是本发明一实施例提供的图像识别方法流程图。所述方法可以应用上述任一实施例训练得到的图像识别模型进行图像识别。所述方法包括:
步骤201、记录输入图片的第一图像矩阵后,将所述输入图片进行切分及打乱,从而获取打乱后所述输入图片的第二图像矩阵。
记录输入图片的第一图像矩阵后,可以按照模型训练时图片切分及打乱的规则将所述输入图片进行切分及打乱,从而获取打乱后所述输入图片的第二图像矩阵。和训练时针对于样本图片不同,本发明实施例中的第一图像矩阵对应于实际需要识别的输入图片,第二图像矩阵对应于打乱后的输入图片。
步骤202、将所述第一图像矩阵输入到所述第一卷积神经网络,通过所述第一卷积神经网络获取全连接层的第一输出向量;以及,将所述第二图像矩阵输入到所述第二卷积神经网络,通过所述第二卷积神经网络获取全连接层的第二输出向量。
将所述第一图像矩阵输入到所述第一卷积神经网络,通过所述第一卷积神经网络获取全连接层的第一输出向量,第一输出向量中的各个元素的大小可以表示图片为相应类别的概率。将所述第二图像矩阵输入到所述第二卷积神经网络,通过所述第二卷积神经网络获取全连接层的第二输出向量,第二输出向量中的各个元素的大小可以表示图片为相应类别的概率。
步骤203、根据所述第一输出向量和所述第二输出向量得到图片识别结果。
可以综合第一输出向量和第二输出向量得到图片识别结果。比如,可以将第一输出向量和第二输出向量进行加权求和,根据输出向量中元素的大小确定图片所属类别。
本发明实施例通过利用上述训练方法得到的图像识别模型进行图像识别,可以实现弱监督细粒度的图像识别,不需要借助任何人工标注信息,也能达到和强监督细粒度识别一样的准确度。
进一步地,基于上述实施例,所述根据所述第一输出向量和所述第二输出向量得到图片识别结果包括:将所述第一输出向量和所述第二输出向量相加得到第三输出向量,根据所述第三输出向量得到所述图片识别结果。
在根据所述第一输出向量和所述第二输出向量得到图片识别结果时,可以直接将所述第一输出向量和所述第二输出向量相加得到第三输出向量,根据所述第三输出向量中元素的大小确定图片所属类别,从而得到所述图片识别结果。
在上述实施例的基础上,本发明实施例通过将第一输出向量和第二输出向量相加得到第三输出向量,根据第三输出向量得到图片识别结果,提高了简便性。
图4是本发明一实施例提供的图像识别模型训练装置的结构示意图。如图4所示,所述装置包括图片打乱模块10、特征提取及分类模块20、损失函数计算模块30及模型优化模块40,其中:图片打乱模块10用于:记录样本图片的第一图像矩阵后,将所述样本图片进行切分及打乱,从而获取打乱后所述样本图片的第二图像矩阵;特征提取及分类模块20用于:将所述第一图像矩阵输入到第一卷积神经网络,通过所述第一卷积神经网络提取第一图片特征及获取第一图片分类结果;以及,将所述第二图像矩阵输入到第二卷积神经网络,通过所述第二卷积神经网络提取第二图片特征及获取第二图片分类结果;损失函数计算模块30用于:根据所述第一图片特征和所述第二图片特征求解预设的蒸馏损失函数,所述蒸馏损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在特征计算流程上越接近;以及,根据所述第一图片分类结果和所述第二图片分类结果求解预设的分类损失函数,所述分类损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在分类结果上越接近真值;模型优化模块40用于:通过不断优化所述蒸馏损失函数及所述分类损失函数进行所述第一卷积神经网络和所述第二卷积神经网络的优化,在所述蒸馏损失函数小于预设第一阈值以及所述分类损失函数小于预设第二阈值时训练结束,从而获取训练好的由所述第一卷积神经网络和所述第二卷积神经网络所构建的图像识别模型。
本发明实施例通过将原始图片的图像矩阵和打乱后的图片的图像矩阵分别输入到两个卷积神经网络分支,综合两个卷积神经网络提取的特征及分类结果进行学习和训练,有利于实现局部特征抓取以及提取到更加有效的特征,不需要借助任何人工标注信息,也能达到和强监督细粒度识别一样的准确度,并且在模型上能够减少算法的时间和空间消耗,提高了系统鲁棒性。
图5是本发明一实施例提供的图像识别装置的结构示意图。如图5所示,所述装置包括图像处理模块100、输出向量获取模块200及图像识别模块300,其中:图像处理模块100用于:记录输入图片的第一图像矩阵后,将所述输入图片进行切分及打乱,从而获取打乱后所述输入图片的第二图像矩阵;输出向量获取模块200用于:将所述第一图像矩阵输入到所述第一卷积神经网络,通过所述第一卷积神经网络获取全连接层的第一输出向量;以及,将所述第二图像矩阵输入到所述第二卷积神经网络,通过所述第二卷积神经网络获取全连接层的第二输出向量;图像识别模块300用于:根据所述第一输出向量和所述第二输出向量得到图片识别结果。
本发明实施例通过利用上述训练方法得到的图像识别模型进行图像识别,可以实现弱监督细粒度的图像识别,不需要借助任何人工标注信息,也能达到和强监督细粒度识别一样的准确度。
本发明实施例提供的设备是用于上述方法的,具体功能可参照上述方法流程,此处不再赘述。
图6示例了一种电子设备的实体结构示意图,如图6所示,该电子设备可以包括:处理器(processor)610、通信接口(Communications Interface)620、存储器(memory)630和通信总线640,其中,处理器610,通信接口620,存储器630通过通信总线640完成相互间的通信。处理器610可以调用存储器630中的逻辑指令,以执行图像识别模型的训练方法,该方法包括:记录样本图片的第一图像矩阵后,将所述样 本图片进行切分及打乱,从而获取打乱后所述样本图片的第二图像矩阵;将所述第一图像矩阵输入到第一卷积神经网络,通过所述第一卷积神经网络提取第一图片特征及获取第一图片分类结果;以及,将所述第二图像矩阵输入到第二卷积神经网络,通过所述第二卷积神经网络提取第二图片特征及获取第二图片分类结果;根据所述第一图片特征和所述第二图片特征求解预设的蒸馏损失函数,所述蒸馏损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在特征计算流程上越接近;以及,根据所述第一图片分类结果和所述第二图片分类结果求解预设的分类损失函数,所述分类损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在分类结果上越接近真值;通过不断优化所述蒸馏损失函数及所述分类损失函数进行所述第一卷积神经网络和所述第二卷积神经网络的优化,在所述蒸馏损失函数小于预设第一阈值以及所述分类损失函数小于预设第二阈值时训练结束,从而获取训练好的由所述第一卷积神经网络和所述第二卷积神经网络所构建的图像识别模型。或,处理器610可以调用存储器630中的逻辑指令,以执行图像识别方法,该方法包括:记录输入图片的第一图像矩阵后,将所述输入图片进行切分及打乱,从而获取打乱后所述输入图片的第二图像矩阵;将所述第一图像矩阵输入到所述第一卷积神经网络,通过所述第一卷积神经网络获取全连接层的第一输出向量;以及,将所述第二图像矩阵输入到所述第二卷积神经网络,通过所述第二卷积神经网络获取全连接层的第二输出向量;根据所述第一输出向量和所述第二输出向量得到图片识别结果。
此外,上述的存储器630中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若 干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
另一方面,本发明实施例还提供一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,计算机能够执行上述各方法实施例所提供的图像识别模型的训练方法,该方法包括:记录样本图片的第一图像矩阵后,将所述样本图片进行切分及打乱,从而获取打乱后所述样本图片的第二图像矩阵;将所述第一图像矩阵输入到第一卷积神经网络,通过所述第一卷积神经网络提取第一图片特征及获取第一图片分类结果;以及,将所述第二图像矩阵输入到第二卷积神经网络,通过所述第二卷积神经网络提取第二图片特征及获取第二图片分类结果;根据所述第一图片特征和所述第二图片特征求解预设的蒸馏损失函数,所述蒸馏损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在特征计算流程上越接近;以及,根据所述第一图片分类结果和所述第二图片分类结果求解预设的分类损失函数,所述分类损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在分类结果上越接近真值;通过不断优化所述蒸馏损失函数及所述分类损失函数进行所述第一卷积神经网络和所述第二卷积神经网络的优化,在所述蒸馏损失函数小于预设第一阈值以及所述分类损失函数小于预设第二阈值时训练结束,从而获取训练好的由所述第一卷积神经网络和所述第二卷积神经网络所构建的图像识别模型。或,当所述程序指令被计算机执行时,计算机能够执行上述各方法实施例所提供的图像识别方法,该方法包括:记录输入图片的第一图像矩阵后,将所述输入图片进行切分及打乱, 从而获取打乱后所述输入图片的第二图像矩阵;将所述第一图像矩阵输入到所述第一卷积神经网络,通过所述第一卷积神经网络获取全连接层的第一输出向量;以及,将所述第二图像矩阵输入到所述第二卷积神经网络,通过所述第二卷积神经网络获取全连接层的第二输出向量;根据所述第一输出向量和所述第二输出向量得到图片识别结果。
又一方面,本发明实施例还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现以执行上述各实施例提供的图像识别模型的训练方法,该方法包括:记录样本图片的第一图像矩阵后,将所述样本图片进行切分及打乱,从而获取打乱后所述样本图片的第二图像矩阵;将所述第一图像矩阵输入到第一卷积神经网络,通过所述第一卷积神经网络提取第一图片特征及获取第一图片分类结果;以及,将所述第二图像矩阵输入到第二卷积神经网络,通过所述第二卷积神经网络提取第二图片特征及获取第二图片分类结果;根据所述第一图片特征和所述第二图片特征求解预设的蒸馏损失函数,所述蒸馏损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在特征计算流程上越接近;以及,根据所述第一图片分类结果和所述第二图片分类结果求解预设的分类损失函数,所述分类损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在分类结果上越接近真值;通过不断优化所述蒸馏损失函数及所述分类损失函数进行所述第一卷积神经网络和所述第二卷积神经网络的优化,在所述蒸馏损失函数小于预设第一阈值以及所述分类损失函数小于预设第二阈值时训练结束,从而获取训练好的由所述第一卷积神经网络和所述第二卷积神经网络所构建的图像识别模型。或,该计算机程序被处理器执行时实现以执行上述各实施例提供的图像识别方法,该方法包括:记录输入图片的第一图像矩阵后,将所述输入图片进行切分及打乱,从而获取打乱后所述输入图片的第二图像矩阵;将所述第一图像矩阵输入到所述第一卷积神经网络,通过所述第 一卷积神经网络获取全连接层的第一输出向量;以及,将所述第二图像矩阵输入到所述第二卷积神经网络,通过所述第二卷积神经网络获取全连接层的第二输出向量;根据所述第一输出向量和所述第二输出向量得到图片识别结果。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。
Claims (10)
- 一种图像识别模型的训练方法,其特征在于,包括:记录样本图片的第一图像矩阵后,将所述样本图片进行切分及打乱,从而获取打乱后所述样本图片的第二图像矩阵;将所述第一图像矩阵输入到第一卷积神经网络,通过所述第一卷积神经网络提取第一图片特征及获取第一图片分类结果;以及,将所述第二图像矩阵输入到第二卷积神经网络,通过所述第二卷积神经网络提取第二图片特征及获取第二图片分类结果;根据所述第一图片特征和所述第二图片特征求解预设的蒸馏损失函数,所述蒸馏损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在特征计算流程上越接近;以及,根据所述第一图片分类结果和所述第二图片分类结果求解预设的分类损失函数,所述分类损失函数越小表示所述第一卷积神经网络和所述第二卷积神经网络在分类结果上越接近真值;通过不断优化所述蒸馏损失函数及所述分类损失函数进行所述第一卷积神经网络和所述第二卷积神经网络的优化,在所述蒸馏损失函数小于预设第一阈值以及所述分类损失函数小于预设第二阈值时训练结束,从而获取训练好的由所述第一卷积神经网络和所述第二卷积神经网络所构建的图像识别模型。
- 根据权利要求1所述的图像识别模型的训练方法,其特征在于,所述将所述样本图片进行切分及打乱,具体包括:首先,将图像分割成多个图像块;然后,先进行行方向上所述图像块的打乱操作,再进行列方向上所述图像块的打乱操作;或,先进行列方向上所述图像块的打乱操作,再进行行方向上所述图像块的打乱操作。
- 根据权利要求2所述的图像识别模型的训练方法,其特征在于,所述进行行方向上所述图像块的打乱操作,包括:对于每一行的 每个所述图像块,在预设的第一步长范围内,根据第一随机变量的值,与对应位置的所述图像块进行行方向上位置的互换;所述进行列方向上所述图像块的打乱操作,包括:对于每一列的每个所述图像块,在预设的第二步长范围内,根据第二随机变量的值,与对应位置的所述图像块进行列方向上位置的互换。
- 根据权利要求1所述的图像识别模型的训练方法,其特征在于,所述根据所述第一图片特征和所述第二图片特征求解预设的蒸馏损失函数,包括:根据所述第一卷积神经网络中卷积层相邻两层所提取的所述第一图片特征获取全局流矩阵,根据所述第二卷积神经网络中卷积层相邻两层所提取的所述第二图片特征获取局部流矩阵;通过计算所述全局流矩阵和所述局部流矩阵的L2范数距离,求解预设的所述蒸馏损失函数。
- 一种基于权利要求1至6任一所述图像识别模型的图像识别方法,其特征在于,包括:记录输入图片的第一图像矩阵后,将所述输入图片进行切分及打乱,从而获取打乱后所述输入图片的第二图像矩阵;将所述第一图像矩阵输入到所述第一卷积神经网络,通过所述第一卷积神经网络获取全连接层的第一输出向量;以及,将所述第二图像矩阵输入到所述第二卷积神经网络,通过所述第二卷积神经网络获取全连接层的第二输出向量;根据所述第一输出向量和所述第二输出向量得到图片识别结果。
- 根据权利要求7所述的图像识别方法,其特征在于,所述根据所述第一输出向量和所述第二输出向量得到图片识别结果包括:将所述第一输出向量和所述第二输出向量相加得到第三输出向量,根据所述第三输出向量得到所述图片识别结果。
- 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至6任一项所述图像识别模型的训练方法的步骤或如权利要求7至8任一项所述图像识别方法的步骤。
- 一种非暂态计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1至6任 一项所述图像识别模型的训练方法的步骤或如权利要求7至8任一项所述图像识别方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010772704.6 | 2020-08-04 | ||
CN202010772704.6A CN112016591A (zh) | 2020-08-04 | 2020-08-04 | 一种图像识别模型的训练方法及图像识别方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022027987A1 true WO2022027987A1 (zh) | 2022-02-10 |
Family
ID=73498469
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/084760 WO2022027987A1 (zh) | 2020-08-04 | 2021-03-31 | 一种图像识别模型的训练方法及图像识别方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112016591A (zh) |
WO (1) | WO2022027987A1 (zh) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114817742A (zh) * | 2022-05-18 | 2022-07-29 | 平安科技(深圳)有限公司 | 基于知识蒸馏的推荐模型配置方法、装置、设备、介质 |
CN114821203A (zh) * | 2022-06-29 | 2022-07-29 | 中国科学院自动化研究所 | 基于一致性损失的细粒度图像模型训练及识别方法和装置 |
CN114821802A (zh) * | 2022-05-16 | 2022-07-29 | 河北工业大学 | 基于多线索相互蒸馏和自蒸馏的连续手语识别方法 |
CN114861771A (zh) * | 2022-04-15 | 2022-08-05 | 西安交通大学 | 基于特征提取和深度学习的工业ct图像缺陷分类方法 |
CN114863186A (zh) * | 2022-06-02 | 2022-08-05 | 哈尔滨理工大学 | 基于双Transformer分支的三维模型分类方法 |
CN114900779A (zh) * | 2022-04-12 | 2022-08-12 | 东莞市晨新电子科技有限公司 | 音频补偿方法、系统和电子设备 |
CN114937034A (zh) * | 2022-03-22 | 2022-08-23 | 国家电投集团徐闻风力发电有限公司 | 基于颜色识别的通信光纤管理方法、系统和电子设备 |
CN114979470A (zh) * | 2022-05-12 | 2022-08-30 | 咪咕文化科技有限公司 | 摄像机旋转角度分析方法、装置、设备与存储介质 |
CN115035302A (zh) * | 2022-07-05 | 2022-09-09 | 南通大学 | 一种基于深度半监督模型的图像细粒度分类方法 |
CN115061427A (zh) * | 2022-06-28 | 2022-09-16 | 浙江同发塑机有限公司 | 吹塑机的料层均匀性控制系统及其控制方法 |
CN116245832A (zh) * | 2023-01-30 | 2023-06-09 | 北京医准智能科技有限公司 | 一种图像处理方法、装置、设备及存储介质 |
CN116469132A (zh) * | 2023-06-20 | 2023-07-21 | 济南瑞泉电子有限公司 | 基于双流特征提取的跌倒检测方法、系统、设备及介质 |
CN116544146A (zh) * | 2023-05-22 | 2023-08-04 | 浙江固驰电子有限公司 | 功率半导体器件真空烧结设备及方法 |
CN116563795A (zh) * | 2023-05-30 | 2023-08-08 | 北京天翊文化传媒有限公司 | 一种玩偶的生产管理方法及其系统 |
CN116994019A (zh) * | 2022-09-30 | 2023-11-03 | 腾讯科技(深圳)有限公司 | 模型训练方法、相关设备、存储介质及计算机产品 |
CN117274903A (zh) * | 2023-09-25 | 2023-12-22 | 安徽南瑞继远电网技术有限公司 | 基于智能ai芯片的电力巡检智能预警设备及其方法 |
CN117409207A (zh) * | 2023-12-14 | 2024-01-16 | 江西求是高等研究院 | 一种用于嵌入式设备的图像分割方法、系统及计算机 |
WO2024011732A1 (zh) * | 2022-07-14 | 2024-01-18 | 福建省杭氟电子材料有限公司 | 六氟丁二烯储放场所的气体监测系统及其监测方法 |
CN117690007A (zh) * | 2024-02-01 | 2024-03-12 | 成都大学 | 高频工件图像识别方法 |
CN117853875A (zh) * | 2024-03-04 | 2024-04-09 | 华东交通大学 | 一种细粒度图像识别方法及系统 |
CN118154884A (zh) * | 2024-05-13 | 2024-06-07 | 山东锋士信息技术有限公司 | 一种基于样本混合和对比学习的弱监督图像语义分割方法 |
WO2024175014A1 (zh) * | 2023-02-21 | 2024-08-29 | 华为技术有限公司 | 一种图像处理方法及其相关设备 |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016591A (zh) * | 2020-08-04 | 2020-12-01 | 杰创智能科技股份有限公司 | 一种图像识别模型的训练方法及图像识别方法 |
CN112966709B (zh) * | 2021-01-27 | 2022-09-23 | 中国电子进出口有限公司 | 一种基于深度学习的精细车型识别方法及系统 |
CN112862095B (zh) * | 2021-02-02 | 2023-09-29 | 浙江大华技术股份有限公司 | 基于特征分析的自蒸馏学习方法、设备以及可读存储介质 |
CN113052772B (zh) * | 2021-03-23 | 2024-08-20 | Oppo广东移动通信有限公司 | 图像处理方法、装置、电子设备及存储介质 |
CN113011387B (zh) * | 2021-04-20 | 2024-05-24 | 上海商汤科技开发有限公司 | 网络训练及人脸活体检测方法、装置、设备及存储介质 |
CN113191426A (zh) * | 2021-04-28 | 2021-07-30 | 深圳市捷顺科技实业股份有限公司 | 一种车辆识别模型创建方法、车辆识别方法及相关组件 |
CN113269117B (zh) * | 2021-06-04 | 2022-12-13 | 重庆大学 | 一种基于知识蒸馏的行人重识别方法 |
CN113627421B (zh) * | 2021-06-30 | 2024-09-06 | 华为技术有限公司 | 一种图像处理方法、模型的训练方法以及相关设备 |
CN113706642B (zh) * | 2021-08-31 | 2023-04-07 | 北京三快在线科技有限公司 | 一种图像处理方法及装置 |
CN114118379B (zh) * | 2021-12-02 | 2023-03-24 | 北京百度网讯科技有限公司 | 神经网络的训练方法、图像处理方法、装置、设备和介质 |
CN114299349B (zh) * | 2022-03-04 | 2022-05-13 | 南京航空航天大学 | 一种基于多专家系统和知识蒸馏的众包图像学习方法 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108596026A (zh) * | 2018-03-16 | 2018-09-28 | 中国科学院自动化研究所 | 基于双流生成对抗网络的跨视角步态识别装置及训练方法 |
CN109492765A (zh) * | 2018-11-01 | 2019-03-19 | 浙江工业大学 | 一种基于迁移模型的图像增量学习方法 |
US20190138850A1 (en) * | 2017-11-09 | 2019-05-09 | Disney Enterprises, Inc. | Weakly-supervised spatial context networks |
CN110751214A (zh) * | 2019-10-21 | 2020-02-04 | 山东大学 | 一种基于轻量级可变形卷积的目标检测方法及系统 |
CN111415318A (zh) * | 2020-03-20 | 2020-07-14 | 山东大学 | 基于拼图任务的无监督相关滤波目标跟踪方法及系统 |
CN112016591A (zh) * | 2020-08-04 | 2020-12-01 | 杰创智能科技股份有限公司 | 一种图像识别模型的训练方法及图像识别方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109977980A (zh) * | 2017-12-28 | 2019-07-05 | 航天信息股份有限公司 | 一种验证码识别方法及装置 |
CN108537277A (zh) * | 2018-04-10 | 2018-09-14 | 湖北工业大学 | 一种图像分类识别的方法 |
CN108776807A (zh) * | 2018-05-18 | 2018-11-09 | 复旦大学 | 一种基于可跳层双支神经网络的图像粗细粒度分类方法 |
CN109948425B (zh) * | 2019-01-22 | 2023-06-09 | 中国矿业大学 | 一种结构感知自注意和在线实例聚合匹配的行人搜索方法及装置 |
CN110084281B (zh) * | 2019-03-31 | 2023-09-12 | 华为技术有限公司 | 图像生成方法、神经网络的压缩方法及相关装置、设备 |
CN110674938B (zh) * | 2019-08-21 | 2021-12-21 | 浙江工业大学 | 基于协同多任务训练的对抗攻击防御方法 |
CN110717525B (zh) * | 2019-09-20 | 2022-03-08 | 浙江工业大学 | 一种通道自适应优化的对抗攻击防御方法和装置 |
CN110930356B (zh) * | 2019-10-12 | 2023-02-28 | 上海交通大学 | 一种工业二维码无参考质量评估系统及方法 |
CN111160275B (zh) * | 2019-12-30 | 2023-06-23 | 深圳元戎启行科技有限公司 | 行人重识别模型训练方法、装置、计算机设备和存储介质 |
CN111260055B (zh) * | 2020-01-13 | 2023-09-01 | 腾讯科技(深圳)有限公司 | 基于三维图像识别的模型训练方法、存储介质和设备 |
CN111353539A (zh) * | 2020-02-29 | 2020-06-30 | 武汉大学 | 一种基于双路注意力卷积神经网络的宫颈oct图像分类方法及系统 |
-
2020
- 2020-08-04 CN CN202010772704.6A patent/CN112016591A/zh active Pending
-
2021
- 2021-03-31 WO PCT/CN2021/084760 patent/WO2022027987A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190138850A1 (en) * | 2017-11-09 | 2019-05-09 | Disney Enterprises, Inc. | Weakly-supervised spatial context networks |
CN108596026A (zh) * | 2018-03-16 | 2018-09-28 | 中国科学院自动化研究所 | 基于双流生成对抗网络的跨视角步态识别装置及训练方法 |
CN109492765A (zh) * | 2018-11-01 | 2019-03-19 | 浙江工业大学 | 一种基于迁移模型的图像增量学习方法 |
CN110751214A (zh) * | 2019-10-21 | 2020-02-04 | 山东大学 | 一种基于轻量级可变形卷积的目标检测方法及系统 |
CN111415318A (zh) * | 2020-03-20 | 2020-07-14 | 山东大学 | 基于拼图任务的无监督相关滤波目标跟踪方法及系统 |
CN112016591A (zh) * | 2020-08-04 | 2020-12-01 | 杰创智能科技股份有限公司 | 一种图像识别模型的训练方法及图像识别方法 |
Non-Patent Citations (1)
Title |
---|
ZHANG YING; XIANG TAO; HOSPEDALES TIMOTHY M.; LU HUCHUAN: "Deep Mutual Learning", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, IEEE, 18 June 2018 (2018-06-18), pages 4320 - 4328, XP033476405, DOI: 10.1109/CVPR.2018.00454 * |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114937034A (zh) * | 2022-03-22 | 2022-08-23 | 国家电投集团徐闻风力发电有限公司 | 基于颜色识别的通信光纤管理方法、系统和电子设备 |
CN114900779A (zh) * | 2022-04-12 | 2022-08-12 | 东莞市晨新电子科技有限公司 | 音频补偿方法、系统和电子设备 |
CN114861771A (zh) * | 2022-04-15 | 2022-08-05 | 西安交通大学 | 基于特征提取和深度学习的工业ct图像缺陷分类方法 |
CN114979470A (zh) * | 2022-05-12 | 2022-08-30 | 咪咕文化科技有限公司 | 摄像机旋转角度分析方法、装置、设备与存储介质 |
CN114821802A (zh) * | 2022-05-16 | 2022-07-29 | 河北工业大学 | 基于多线索相互蒸馏和自蒸馏的连续手语识别方法 |
CN114817742A (zh) * | 2022-05-18 | 2022-07-29 | 平安科技(深圳)有限公司 | 基于知识蒸馏的推荐模型配置方法、装置、设备、介质 |
CN114817742B (zh) * | 2022-05-18 | 2022-09-13 | 平安科技(深圳)有限公司 | 基于知识蒸馏的推荐模型配置方法、装置、设备、介质 |
CN114863186A (zh) * | 2022-06-02 | 2022-08-05 | 哈尔滨理工大学 | 基于双Transformer分支的三维模型分类方法 |
CN115061427A (zh) * | 2022-06-28 | 2022-09-16 | 浙江同发塑机有限公司 | 吹塑机的料层均匀性控制系统及其控制方法 |
CN115061427B (zh) * | 2022-06-28 | 2023-04-14 | 浙江同发塑机有限公司 | 吹塑机的料层均匀性控制系统及其控制方法 |
CN114821203A (zh) * | 2022-06-29 | 2022-07-29 | 中国科学院自动化研究所 | 基于一致性损失的细粒度图像模型训练及识别方法和装置 |
CN115035302A (zh) * | 2022-07-05 | 2022-09-09 | 南通大学 | 一种基于深度半监督模型的图像细粒度分类方法 |
CN115035302B (zh) * | 2022-07-05 | 2024-09-20 | 南通大学 | 一种基于深度半监督模型的图像细粒度分类方法 |
WO2024011732A1 (zh) * | 2022-07-14 | 2024-01-18 | 福建省杭氟电子材料有限公司 | 六氟丁二烯储放场所的气体监测系统及其监测方法 |
CN116994019A (zh) * | 2022-09-30 | 2023-11-03 | 腾讯科技(深圳)有限公司 | 模型训练方法、相关设备、存储介质及计算机产品 |
CN116245832B (zh) * | 2023-01-30 | 2023-11-14 | 浙江医准智能科技有限公司 | 一种图像处理方法、装置、设备及存储介质 |
CN116245832A (zh) * | 2023-01-30 | 2023-06-09 | 北京医准智能科技有限公司 | 一种图像处理方法、装置、设备及存储介质 |
WO2024175014A1 (zh) * | 2023-02-21 | 2024-08-29 | 华为技术有限公司 | 一种图像处理方法及其相关设备 |
CN116544146B (zh) * | 2023-05-22 | 2024-04-09 | 浙江固驰电子有限公司 | 功率半导体器件真空烧结设备及方法 |
CN116544146A (zh) * | 2023-05-22 | 2023-08-04 | 浙江固驰电子有限公司 | 功率半导体器件真空烧结设备及方法 |
CN116563795A (zh) * | 2023-05-30 | 2023-08-08 | 北京天翊文化传媒有限公司 | 一种玩偶的生产管理方法及其系统 |
CN116469132B (zh) * | 2023-06-20 | 2023-09-05 | 济南瑞泉电子有限公司 | 基于双流特征提取的跌倒检测方法、系统、设备及介质 |
CN116469132A (zh) * | 2023-06-20 | 2023-07-21 | 济南瑞泉电子有限公司 | 基于双流特征提取的跌倒检测方法、系统、设备及介质 |
CN117274903B (zh) * | 2023-09-25 | 2024-04-19 | 安徽南瑞继远电网技术有限公司 | 基于智能ai芯片的电力巡检智能预警设备及其方法 |
CN117274903A (zh) * | 2023-09-25 | 2023-12-22 | 安徽南瑞继远电网技术有限公司 | 基于智能ai芯片的电力巡检智能预警设备及其方法 |
CN117409207A (zh) * | 2023-12-14 | 2024-01-16 | 江西求是高等研究院 | 一种用于嵌入式设备的图像分割方法、系统及计算机 |
CN117690007B (zh) * | 2024-02-01 | 2024-04-19 | 成都大学 | 高频工件图像识别方法 |
CN117690007A (zh) * | 2024-02-01 | 2024-03-12 | 成都大学 | 高频工件图像识别方法 |
CN117853875A (zh) * | 2024-03-04 | 2024-04-09 | 华东交通大学 | 一种细粒度图像识别方法及系统 |
CN117853875B (zh) * | 2024-03-04 | 2024-05-14 | 华东交通大学 | 一种细粒度图像识别方法及系统 |
CN118154884A (zh) * | 2024-05-13 | 2024-06-07 | 山东锋士信息技术有限公司 | 一种基于样本混合和对比学习的弱监督图像语义分割方法 |
Also Published As
Publication number | Publication date |
---|---|
CN112016591A (zh) | 2020-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022027987A1 (zh) | 一种图像识别模型的训练方法及图像识别方法 | |
Liu et al. | Exploiting unlabeled data in cnns by self-supervised learning to rank | |
Xie et al. | Comparator networks | |
Wang et al. | Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition | |
Nguyen et al. | Change detection by training a triplet network for motion feature extraction | |
Yang et al. | Embedding perspective analysis into multi-column convolutional neural network for crowd counting | |
Shi et al. | Multiscale multitask deep NetVLAD for crowd counting | |
Geng et al. | Human action recognition based on convolutional neural networks with a convolutional auto-encoder | |
CN109325440A (zh) | 人体动作识别方法及系统 | |
Trivedi et al. | Automatic monitoring of the growth of plants using deep learning-based leaf segmentation | |
Gonzalo-Martin et al. | Deep learning for superpixel-based classification of remote sensing images | |
Khavalko et al. | Image classification and recognition on the base of autoassociative neural network usage | |
Li et al. | Robust detection of farmed fish by fusing YOLOv5 with DCM and ATM | |
Anas et al. | Detecting abnormal fish behavior using motion trajectories in ubiquitous environments | |
CN117437691A (zh) | 一种基于轻量化网络的实时多人异常行为识别方法及系统 | |
Devi et al. | Cascaded pooling for convolutional neural networks | |
Dai et al. | Foliar disease classification | |
Vardhan et al. | Detection of healthy and diseased crops in drone captured images using Deep Learning | |
Lan et al. | Robust visual object tracking with spatiotemporal regularisation and discriminative occlusion deformation | |
Choudhary et al. | Automatic detection of cowpea leaves using image processing and Inception-V3 model of deep learning | |
Eghbali et al. | Deep Convolutional Neural Network (CNN) for Large-Scale Images Classification | |
Liu et al. | Automatic Fish Counting in Aquaculture with Dense Multi-scale Feature Aggregation Network | |
Jin et al. | Intelligent tea sorting system based on computer vision | |
Wang et al. | Eigen-evolution dense trajectory descriptors | |
Biswas et al. | A novel embedding architecture and score level fusion scheme for occluded image acquisition in ear biometrics system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21853440 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.06.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21853440 Country of ref document: EP Kind code of ref document: A1 |