CN112016591A - Training method of image recognition model and image recognition method - Google Patents

Training method of image recognition model and image recognition method Download PDF

Info

Publication number
CN112016591A
CN112016591A CN202010772704.6A CN202010772704A CN112016591A CN 112016591 A CN112016591 A CN 112016591A CN 202010772704 A CN202010772704 A CN 202010772704A CN 112016591 A CN112016591 A CN 112016591A
Authority
CN
China
Prior art keywords
image
picture
neural network
convolutional neural
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010772704.6A
Other languages
Chinese (zh)
Inventor
陈嘉敏
王金桥
唐明
胡建国
招继恩
朱贵波
赵朝阳
林格
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nexwise Intelligence China Ltd
Original Assignee
Nexwise Intelligence China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nexwise Intelligence China Ltd filed Critical Nexwise Intelligence China Ltd
Priority to CN202010772704.6A priority Critical patent/CN112016591A/en
Publication of CN112016591A publication Critical patent/CN112016591A/en
Priority to PCT/CN2021/084760 priority patent/WO2022027987A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a training method of an image recognition model and an image recognition method, wherein the training method comprises the following steps: after a first image matrix of a sample picture is recorded, a second image matrix is obtained after segmentation and scrambling; respectively extracting picture features and obtaining picture classification results through corresponding convolutional neural networks; solving a distillation loss function according to the picture characteristics, and solving a classification loss function according to the picture classification result; and optimizing the model by optimizing the distillation loss function and the classification loss function, and finishing the training when the distillation loss function is smaller than a preset first threshold and the classification loss function is smaller than a preset second threshold, so as to obtain the trained image recognition model. The embodiment of the invention is beneficial to realizing local feature capture and extracting more effective features, can achieve the same accuracy as the strong supervision of fine-grained identification without any manual marking information, can reduce the time and space consumption of the algorithm on a model, and improves the robustness.

Description

Training method of image recognition model and image recognition method
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a training method of an image recognition model and an image recognition method.
Background
Fine grain recognition is also called fine recognition. The method is different from the existing general image analysis task, the type of fine-grained image identification is more detailed, the identified granularity is more detailed, and more subdivided subclasses need to be distinguished in a large class to distinguish and identify objects with nuances.
For example, the general image classification only needs to distinguish two object large classes of "bird" and "flower", while the fine-grained image classification needs to distinguish fine-grained subclasses under the category of "flower", i.e. to distinguish whether "rose" or "rose". Thus, fine-grained image recognition requires finding subtle differences between different sub-classes of the same class species, thus greatly increasing its difficulty and challenge.
At present, fine-grained image recognition has a wide application scene in life and industry, and is an important technology which is indispensable in the field of artificial intelligence as an image recognition technology. Meanwhile, the granularity distinguished by the method is finer, so that the fine-grained image recognition technology can greatly improve the existing recognition technology and help to improve the precision of the related upper-layer technology.
The existing fine-grained classification model can be divided into two categories according to the intensity of adopted supervision information: respectively, "classification model based on strong supervision information" and "classification model based on weak supervision information".
Two kinds of additional manual labeling information are introduced into the classification model based on the strong supervision information in the training process, namely a target labeling frame and a key part labeling point. For the two kinds of additional information, the strong supervision classification model can obtain the detection of the foreground object by means of the target marking frame, and the noise interference caused by the background is eliminated; the key part marking points can be used for positioning key points with obvious differences of targets, and local features of the picture can be efficiently extracted from the key points. Therefore, through the positioning provided by the two kinds of additional information, the strong supervision classification model can better extract object information in an accurate place, eliminate the interference caused by irrelevant information on the picture background and other objects, obtain higher accuracy and achieve better effect.
On the contrary, the classification model based on the weak supervision information does not use any additional manual labeling information, and only depends on pictures and the classification labels of the pictures to complete the training and learning of the whole algorithm. The algorithm of the type does not need a large amount of manual investment, and is more convenient and simpler in actual application scenes. In general, the accuracy of classification model algorithms based on weak supervision information is inferior to that of classification model algorithms based on strong supervision information. However, thanks to the development of deep learning in recent years, the classification model algorithm based on weak supervision information introduces a convolutional neural network for training, so that the accuracy of the classification model algorithm is greatly improved, and the classification model algorithm gradually becomes a trend of fine-grained image recognition research.
The key point of the fine-grained identification algorithm is how to dig out nuances in the picture, i.e. the extraction of local features. Fine-grained identification is a challenging task due to the difficulty in finding discriminative features. For a fine-grained identification algorithm of a weak supervision type, the target position and key position points cannot be accurately positioned by means of manual marking information, and only local features can be extracted on the basis of pictures. For a picture, a lot of local features are extracted, and how to eliminate wrong interference features from a plurality of local features and learn useful features is a difficult problem. The existing local feature extraction generally uses an enumeration method, and uses different step sizes or scales to intercept a component region in a full graph, and then proposes features for the component region. However, this method is time-consuming and is susceptible to interference from background information, and thus extracts a large number of region features that are not useful for identification. In addition, different illumination conditions and improper shooting angles of the pictures can also interfere with fine-grained identification of the weak supervision type. In these cases, the fine-grained identification of the weakly supervised type is less accurate and less robust. Therefore, it is still more challenging to achieve better robustness and higher recognition rate for the fine-grained recognition of the weak surveillance type.
Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present invention provide a training method for an image recognition model and an image recognition method.
In a first aspect, an embodiment of the present invention provides a training method for an image recognition model, including: after a first image matrix of a sample picture is recorded, the sample picture is segmented and disordered, and a second image matrix of the disordered sample picture is obtained; inputting the first image matrix into a first convolution neural network, extracting first picture features through the first convolution neural network and obtaining a first picture classification result; inputting the second image matrix into a second convolutional neural network, extracting second image characteristics through the second convolutional neural network and obtaining a second image classification result; solving a preset distillation loss function according to the first picture characteristic and the second picture characteristic, wherein the smaller the distillation loss function is, the closer the first convolutional neural network and the second convolutional neural network are in the characteristic calculation process is; solving a preset classification loss function according to the first image classification result and the second image classification result, wherein the smaller the classification loss function is, the closer the first convolutional neural network and the second convolutional neural network are to a true value on the classification result is; and optimizing the first convolutional neural network and the second convolutional neural network by continuously optimizing the distillation loss function and the classification loss function, and finishing training when the distillation loss function is smaller than a preset first threshold and the classification loss function is smaller than a preset second threshold, so as to obtain a trained image recognition model constructed by the first convolutional neural network and the second convolutional neural network.
Further, the segmenting and disordering the sample picture specifically includes: firstly, dividing an image into a plurality of image blocks; then, performing the operation of disordering the image blocks in the row direction, and then performing the operation of disordering the image blocks in the column direction; or, firstly, the image blocks in the row and column directions are scrambled, and then the image blocks in the row directions are scrambled.
Further, the performing the operation of scrambling the image blocks in the row direction includes: for each image block in each row, exchanging positions with the image blocks at corresponding positions in the row direction within a preset first step length range according to the value of a first random variable; the performing of the operation of scrambling the image blocks in the column direction includes: and for each image block in each column, exchanging the position of the image block in the column direction with the image block in the corresponding position according to the value of a second random variable within a preset second step length range.
Further, solving a preset distillation loss function according to the first picture characteristic and the second picture characteristic comprises: acquiring a global flow matrix according to the first picture features extracted from two adjacent layers of the convolutional layers in the first convolutional neural network, and acquiring a local flow matrix according to the second picture features extracted from two adjacent layers of the convolutional layers in the second convolutional neural network; solving the preset distillation loss function by calculating the L2 norm distance of the global flow matrix and the local flow matrix.
Further, the expressions of the global stream matrix and the local stream matrix obtained through the picture features of the two adjacent layers are as follows:
Figure BDA0002617242560000041
wherein, F1∈Rh×w×mRepresenting a picture characteristic of the upper c1 layer of two adjacent layers, F2∈Rh×w×mThe method comprises the steps of representing picture characteristics of the lower layer c2 in the two adjacent layers, h, W and m respectively represent the height, width and channel number of the picture characteristics, s represents the serial number of the picture height characteristics, t represents the serial number of the picture width characteristics, x represents an input picture, and W represents weight parameters of a neural network.
Further, the distillation loss function is expressed by:
Figure BDA0002617242560000042
wherein, WglobalRepresenting a global flow matrix, WlocalRepresenting a local flow matrix, Lflow(Wglobal,Wlocal) Representing a distillation loss function derived from the global flow matrix and the local flow matrix; lambda [ alpha ]1Representing a weight coefficient; l represents the sequence number of a flow matrix, wherein the flow matrix comprises the global flow matrix and the local flow matrix; n represents the number of the stream matrixes for one picture, wherein the number of the global stream matrixes is the same as that of the local stream matrixes; x represents an input picture; n represents the number of pictures;
Figure BDA0002617242560000043
an l-th global stream matrix representing x pictures;
Figure BDA0002617242560000044
the l-th local stream matrix representing x pictures;
Figure BDA0002617242560000045
representing the L2 norm distance calculation.
In a second aspect, an embodiment of the present invention provides an image recognition method based on the image recognition model, including: after a first image matrix of an input picture is recorded, the input picture is segmented and disordered, and a second image matrix of the disordered input picture is obtained; inputting the first image matrix into the first convolution neural network, and acquiring a first output vector of a full connection layer through the first convolution neural network; inputting the second image matrix into the second convolutional neural network, and acquiring a second output vector of the full-connection layer through the second convolutional neural network; and obtaining a picture identification result according to the first output vector and the second output vector.
Further, obtaining a picture identification result by the first output vector and the second output vector comprises: and adding the first output vector and the second output vector to obtain a third output vector, and obtaining the picture identification result according to the third output vector.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method as provided in the first aspect or the second aspect when executing the computer program.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first or second aspect.
According to the training method and the image recognition method of the image recognition model provided by the embodiment of the invention, the image matrix of the original image and the image matrix of the disordered image are respectively input into the two convolutional neural network branches during model training, and the features extracted by the two convolutional neural networks and the classification result are synthesized to carry out learning and training, so that the capturing and extraction of local features are facilitated, more effective features can be obtained, the accuracy same as that of strong supervision fine-grained recognition can be achieved without any manual marking information, the time and space consumption of an algorithm can be reduced on the model, and the system robustness is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a training method for an image recognition model according to an embodiment of the present invention;
FIG. 2 is a flowchart of a training method for an image recognition model according to another embodiment of the present invention;
FIG. 3 is a flowchart of an image recognition method according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an image recognition model training apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present invention;
fig. 6 illustrates a physical structure diagram of an electronic device.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a training method of an image recognition model according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step 101, after recording a first image matrix of a sample picture, segmenting and scrambling the sample picture, thereby obtaining a second image matrix of the sample picture after scrambling.
The picture can be characterized by an image matrix, and elements in the image matrix can be gray values of all pixel points. The image recognition model obtained by the training method of the image recognition model provided by the embodiment of the invention can realize the image recognition of the weak supervision fine granularity.
Fine local detail feature representation is the key to fine-grained identification. This is because local details are more important than global structures for fine-grained recognition, since images from different fine-grained classes usually have the same global structure or shape, but only the local details are different. The pictures are disorganized and recombined, so that the algorithm discards the global structure information and retains the local detail information, and the attention of the model network is forced to be focused on the distinctive local regions for identification. The picture scrambling step effectively destroys the global structure, and at this time, in order to identify these randomly scrambled images, the classification network must find and learn the identifiable local regions. Such operations force neural networks to focus on the details in the picture.
The training method of the image recognition model provided by the embodiment of the invention combines the picture original image and the intended picture for training. Therefore, before the sample picture is disturbed, a first image matrix of the sample picture needs to be stored in advance, and the first image matrix is an image matrix before the sample picture is disturbed. And then, the sample picture is segmented and scrambled, so that a second image matrix of the scrambled sample picture is obtained, and the second image matrix is the image matrix of the scrambled sample picture.
Step 102, inputting the first image matrix into a first convolutional neural network, extracting a first picture characteristic through the first convolutional neural network and obtaining a first picture classification result; and inputting the second image matrix into a second convolutional neural network, extracting second image characteristics through the second convolutional neural network and obtaining a second image classification result.
The embodiment of the invention adopts a convolutional neural network for learning and training, and comprises two convolutional neural networks, wherein the input of the first convolutional neural network is a first image matrix of an original picture, and the input of the second convolutional neural network is a second image matrix of a disordered picture.
Thus, the feature extraction part is divided into two branches, global feature extraction and local feature extraction, respectivelyAnd (5) feature extraction. The infrastructure used by the two branches is the same, e.g., both can use resnet50 to extract features. The difference is that the local feature is that the disordered picture phi (I) passes through a first convolutional neural network, which can also be called a convolutional neural network flocalThe global feature is obtained by passing the original image through a second convolutional neural network, also called convolutional neural network fglobalAnd respectively obtaining a global feature classification result (a first picture classification result) and a local feature classification result (a second picture classification result) by the extracted global feature (a first picture feature) and the extracted local feature (a second picture feature) through a full connection layer.
Step 103, solving a preset distillation loss function according to the first picture characteristic and the second picture characteristic, wherein the smaller the distillation loss function is, the closer the first convolutional neural network and the second convolutional neural network are in the characteristic calculation process is; and solving a preset classification loss function according to the first image classification result and the second image classification result, wherein the smaller the classification loss function is, the closer the first convolutional neural network and the second convolutional neural network are to a true value on the classification result is.
For the two feature streams obtained above (first and second picture features), the knowledge distillation step is completed using the intermediate features of the layers in the two convolutional neural networks. The idea of Knowledge Distillation (KD) concept, which was first proposed by Hinton and mostly used in convolutional neural networks, is how to perform knowledge transformation technology, i.e., extract knowledge from a perfect teacher neural network to train a student network, so that students can improve recognition accuracy while keeping model parameters small. However, this method has its limitations, and it is difficult to optimize a neural network having a deep depth. The fish does not teach the fish, the embodiment of the invention provides a new knowledge distillation algorithm, the characteristics of a teacher network are not directly learned, but the knowledge distillation algorithm is converted into a process of learning teacher network characteristic calculation, so that the depth constraint of a neural network model can be skipped, the relatively good universality is achieved, and the recognition degree and the performance of the model can be well improved when the computer vision is difficult to perform in the process of recognizing fine granularity.
Therefore, in the embodiment of the present invention, a preset distillation loss function is solved according to the first picture feature and the second picture feature, and a smaller distillation loss function indicates that the first convolutional neural network and the second convolutional neural network are closer in feature calculation flow; and solving a preset classification loss function according to the first image classification result and the second image classification result, wherein the smaller the classification loss function is, the closer the first convolutional neural network and the second convolutional neural network are to a true value on the classification result is. Wherein the classification loss function may be expressed as a difference between a sum of output vectors of the first convolutional neural network and the second convolutional neural network and a true value.
For the input image I and the scrambled image phi (I), the input image I and the scrambled image phi (I) are respectively subjected to global feature extraction to obtain a convolutional neural network fglobalAnd local feature extraction convolutional neural network flocalThe corresponding global feature output vector C (I) and local feature output vector C (phi (I)) are obtained. Thus, the classification loss function can be defined as:
Figure BDA0002617242560000091
where l represents the true value of the classification of the image, log represents the logarithmic function,
Figure BDA0002617242560000092
representing a collection of pictures.
And 104, optimizing the first convolutional neural network and the second convolutional neural network by continuously optimizing the distillation loss function and the classification loss function, and finishing training when the distillation loss function is smaller than a preset first threshold and the classification loss function is smaller than a preset second threshold, so as to obtain a trained image recognition model constructed by the first convolutional neural network and the second convolutional neural network.
The smaller the distillation loss function and the classification loss function, the more optimal the model is. And the distillation loss function and the classification loss function are continuously reduced by feeding back the neural network, so that the model is gradually optimized. And finishing the training when the distillation loss function is smaller than a preset first threshold and the classification loss function is smaller than a preset second threshold, so as to obtain the trained image recognition model.
The training method provided by the embodiment of the invention is generally divided into two parts: the damage recombination part and the knowledge distillation part realize the ordered disordering of the pictures, damage the structural information in the pictures and ensure that the algorithm extracts more precise local information; the knowledge distillation part distills and concentrates the extracted features of the damaged picture, extracts the most effective features for improving the model recognition rate, and further improves the accuracy of the algorithm. Wherein the knowledge distillation part may include a process of model optimization using a distillation loss function and a classification loss function.
According to the embodiment of the invention, the image matrix of the original image and the image matrix of the disordered image are respectively input into the two convolutional neural network branches, and the features extracted by the two convolutional neural networks and the classification result are synthesized to carry out learning and training, so that the capturing and extraction of local features are facilitated, more effective features can be obtained, the accuracy same as that of strong supervision fine-grained identification can be achieved without any manual marking information, the time and space consumption of an algorithm can be reduced on a model, and the system robustness is improved.
Further, based on the above embodiment, the segmenting and scrambling the sample picture specifically includes: firstly, dividing an image into a plurality of image blocks; then, performing the operation of disordering the image blocks in the row direction, and then performing the operation of disordering the image blocks in the column direction; or, firstly, the image blocks in the row and column directions are scrambled, and then the image blocks in the row directions are scrambled.
When the sample picture is cut and scrambled, the sample picture is firstly cut and then scrambled. When dividing, the image is divided into a plurality of image blocks, such as M × N image blocks. And disordering the image blocks after the segmentation is finished. The image blocks in the row direction can be scrambled firstly, and then the image blocks in the column direction can be scrambled; or the operation of scrambling the image blocks in the row and column directions can be performed first, and then the operation of scrambling the image blocks in the row directions can be performed.
On the basis of the embodiment, the embodiment of the invention performs the image block disordering in the row and column directions in sequence after the picture is segmented, thereby improving the flexibility and the orderliness of the system.
Further, based on the above embodiment, the performing the operation of scrambling the image blocks in the row direction includes: for each image block in each row, exchanging positions with the image blocks at corresponding positions in the row direction within a preset first step length range according to the value of a first random variable; the performing of the operation of scrambling the image blocks in the column direction includes: and for each image block in each column, exchanging the position of the image block in the column direction with the image block in the corresponding position according to the value of a second random variable within a preset second step length range.
The idea of destruction and recombination provided by the embodiment of the invention is how to effectively destroy the picture, so that the structural information of the picture is disordered and the local information of the picture is highlighted. The sample picture is divided into different image blocks, and the essence is that the first image matrix is divided into different block matrixes. The method is characterized in that the picture is disordered in an orderly and controllable manner, namely, a block matrix of the picture is replaced in a controllable range, so that the noise introduced by the disordered operation is controlled, and the local characteristics of the picture can be highlighted.
Specifically, the moving step size of the image block may be limited. For example, the moving step size of the image block in the row direction may be set within the first step size range. The first moving step may be represented by a first random variable, which may be different values as each image block moves, but all within the first moving step. The moving step size of the image block in the column direction may be set within the second step size range. The second moving step may be represented by a second random variable, which may be different values but all within the second moving step as each image block moves. And when each image block moves, the position of each image block is exchanged with the position of the corresponding image block.
Of course, in case of a square picture, the picture may be sliced into N × N blocks, i.e., having the same number of blocks in the row direction and the column direction. In the moving, the movement of the image block in the row direction and the column direction may also be set to a uniform step size. Taking this as an example, a method of disturbing pictures will be further described:
the picture scrambling step can be divided into two sub-operations: cutting and disturbing. Firstly, an input image is divided into local small blocks, and then a random algorithm is used for scrambling the small blocks, so that a scrambled picture can be obtained. The specific operation is as follows:
for an input image I, the image is first divided into N × N sub-regions Ri,jWhere i and j are the corresponding row block number and column block number, respectively. The algorithm shuffles the cut sub-regions by the following mechanism: for the region of line j, the algorithm first generates a vector q of size NjOf the ith element qj,iI + r, where k is an adjustable parameter of the algorithm (1 ≦ k < N), which characterizes the range perturbed by the perturbation mechanism. By such a scrambling mechanism, new sequences can be obtained
Figure BDA0002617242560000111
The variation range of each element is as follows:
Figure BDA0002617242560000112
through the above operation, the operation of disordering the rows of the picture can be completed. Similar rules for column scrambling after row scrambling result in the following relationships:
Figure BDA0002617242560000113
after the input picture is subjected to row scrambling and column scrambling, a scrambled picture phi (I) is obtained, and the value of the sub-region sigma (I, j) can be expressed as:
Figure BDA0002617242560000114
the picture scrambling step effectively destroys the global structure, and at this time, in order to identify these randomly scrambled images, the classification network must find and learn the identifiable local regions. Such an operation forces the neural network to focus on details in the picture and ensures, by means of the parameter k, that the selection of a local region is dithered in a neighboring region, thereby controlling the noise introduced by the scrambling operation and highlighting the local features of the picture.
On the basis of the above embodiments, the embodiments of the present invention utilize the random variable of the preset threshold to perform the scrambling of the image blocks in the row and column directions, and ensure that the local area shakes in the neighboring area on the basis of highlighting the local feature, thereby controlling the noise introduced by the scrambling operation.
Further, according to the above embodiment, the solving a preset distillation loss function according to the first picture characteristic and the second picture characteristic includes: acquiring a global flow matrix according to the first picture features extracted from two adjacent layers of the convolutional layers in the first convolutional neural network, and acquiring a local flow matrix according to the second picture features extracted from two adjacent layers of the convolutional layers in the second convolutional neural network; solving the preset distillation loss function by calculating the L2 norm distance of the global flow matrix and the local flow matrix.
When a preset distillation loss function is solved according to the first picture characteristic and the second picture characteristic, a global flow matrix is obtained according to the first picture characteristic extracted from two adjacent layers of the convolutional layers in the first convolutional neural network, and the global flow matrix reflects the change relationship of the characteristics between the two adjacent layers of the convolutional layers in the first convolutional neural network; acquiring a local flow matrix according to the second picture characteristics extracted from two adjacent layers of the convolutional layer in the second convolutional neural network, wherein the local flow matrix reflects the variation relationship of the characteristics between the two adjacent layers of the convolutional layer in the second convolutional neural network; solving the preset distillation loss function by calculating the L2 norm distance of the global flow matrix and the local flow matrix. The L2 norm distance represents the closeness of the feature changes of two adjacent layers of the two convolutional neural networks, so that the smaller the L2 norm distance is, the smaller the distillation loss function value is, and the closer the feature changes of the two adjacent layers of the two convolutional neural networks are represented.
The new knowledge distillation algorithm provided by the embodiment of the invention is also called a flow matrix distillation method, the change relation of the characteristics between each layer of the two networks is obtained by calculating the flow matrix between the two networks, and the student networks can learn the 'solution' of the teacher network calculation characteristics by mutual approaching and fusion between the two flow matrices, so that the accuracy of fine-grained identification is improved. In the algorithm flow provided by the embodiment of the invention, the roles of the teacher network and the student network are not strictly divided, but the knowledge distillation effect is achieved by mutual approaching and mutual fusion between the global feature extraction network (the first convolutional neural network) and the local feature extraction network (the second convolutional neural network).
By continuously optimizing the loss function (including the classification loss function of the distillation loss function), the embodiment of the invention can continuously fuse the global features and the local features extracted from the picture, and perform mutual fusion, mutual distillation and refinement. The process can extract the features which help the model identification rate to be larger, the accuracy of fine-grained identification is improved better, and the noise caused by disordered pictures can be eliminated in the mode. Meanwhile, the flow matrix distillation method enables the flow matrix distillation method to have good model generalization by learning the change process of the characteristics between the two networks, can overcome the limitation of knowledge distillation, and can be perfectly executed even in the face of a deep neural network.
On the basis of the embodiment, the flow matrix distillation method is adopted, the change process of the characteristics between the two networks is learned, so that the method has better model generalization, the limitation of knowledge distillation can be overcome, and the method can be perfectly executed even in the face of a deep neural network.
Further, based on the above embodiment, the expressions of the global stream matrix and the local stream matrix obtained through the picture features of the two adjacent layers are as follows:
Figure BDA0002617242560000131
wherein, F1∈Rh×w×mRepresenting a picture characteristic of the upper c1 layer of two adjacent layers, F2∈Rh×w×mThe method comprises the steps of representing picture characteristics of the lower layer c2 in the two adjacent layers, h, W and m respectively represent the height, width and channel number of the picture characteristics, s represents the serial number of the picture height characteristics, t represents the serial number of the picture width characteristics, x represents an input picture, and W represents weight parameters of a neural network.
For a teacher's network, the goal is to learn the process of feature changes in its network, i.e., the relationship between the features obtained from two adjacent layers in the network. Thus defining the flow matrix G ∈ Rm×nComprises the following steps:
Figure BDA0002617242560000132
knowledge distillation can be achieved by calculating the flow matrix of the first convolutional neural network and the flow matrix of the second convolutional neural network respectively and continuously optimizing the L2 norm distance between the first convolutional neural network and the second convolutional neural network.
On the basis of the above embodiments, the embodiments of the present invention improve the practicability by giving the expression of the flow matrix.
Further, based on the above embodiment, the distillation loss function is expressed as:
Figure BDA0002617242560000141
wherein, WglobalRepresenting a global flow matrix, WlocalRepresenting a local flow matrix, Lflow(Wglobal,Wlocal) The representation is based on a global stream matrix and a local stream matrixThe distillation loss function to; lambda [ alpha ]1Representing a weight coefficient; l represents the sequence number of a flow matrix, wherein the flow matrix comprises the global flow matrix and the local flow matrix; n represents the number of the stream matrixes for one picture, wherein the number of the global stream matrixes is the same as that of the local stream matrixes; x represents an input picture; n represents the number of pictures;
Figure BDA0002617242560000142
an l-th global stream matrix representing x pictures;
Figure BDA0002617242560000143
the l-th local stream matrix representing x pictures;
Figure BDA0002617242560000144
representing the L2 norm distance calculation.
Firstly, respectively calculating a global flow matrix G of a global feature extraction networkglobal(x;Wglobal) And local flow matrix G of local feature extraction networklocal(x;Wlocal) Then calculating the knowledge distillation loss function Lflow(Wglobal,Wlocal). Since one stream matrix can be calculated from two layers, there are a plurality of stream matrices corresponding to one picture. The L2 norm distances of the flow matrix for each picture are integrated to yield the distillation loss function as above. In the embodiment of the present invention, it is considered that each flow matrix is equally important, and therefore the same weight coefficient λ may be used in the loss function1
On the basis of the above embodiments, the embodiments of the present invention obtain the distillation loss function by synthesizing the L2 norm distances of the flow matrix of each picture, thereby improving the reliability of the distillation loss function.
Fig. 2 is a flowchart of a training method of an image recognition model according to another embodiment of the present invention. As shown in FIG. 2, the embodiment of the present invention provides a training method for an image recognition model based on destructive recombination and knowledge distillation, which can achieve the same accuracy as the highly supervised fine-grained recognition without any manually labeled information, and can reduce the time and space consumption of the algorithm on the model. The method is generally divided into two parts: the damage recombination part and the knowledge distillation part realize the ordered disordering of the pictures, damage the structural information in the pictures and ensure that the algorithm extracts more precise local information; the knowledge distillation part distills and concentrates the extracted features of the damaged picture, extracts the most effective features for improving the model recognition rate, and further improves the accuracy of the algorithm.
Firstly, the algorithm carries out a picture destruction step, and orderly scrambles the pictures, namely, the disturbance amplitude is controlled while the pictures are scrambled, so that the effect of effectively controlling noise introduced by the scrambling is achieved. Through the steps, the original structural information of the picture is damaged, an algorithm is forced to pay attention to the local information points in the picture, and more effective and accurate local information is extracted.
After the disruption recombination section is completed, the algorithm enters a knowledge distillation section, which is completed by two branches together. The method comprises the steps that local features and global features of a disordered picture and an original picture are extracted through a convolutional neural network respectively, then local classification results and global classification results are obtained through a full connection layer, meanwhile, a local flow matrix and a global flow matrix required by an algorithm are calculated according to calculation results of layers of the convolutional neural networks on two sides, then the extracted features are distilled and concentrated through a knowledge distillation algorithm, the features which are most effective in improving the model recognition rate are further obtained, parameter adjustment of the convolutional neural network is facilitated, the algorithm can fuse the global features and the local features to classify fine grains of the picture, and fine grain recognition accuracy is effectively improved.
Fig. 3 is a flowchart of an image recognition method according to an embodiment of the present invention. The method can be used for image recognition by applying the image recognition model obtained by training in any embodiment. The method comprises the following steps:
step 201, after recording a first image matrix of an input picture, segmenting and scrambling the input picture, thereby obtaining a second image matrix of the scrambled input picture.
After the first image matrix of the input image is recorded, the input image can be segmented and scrambled according to the rules of image segmentation and scrambling during model training, so that the second image matrix of the scrambled input image is obtained. Different from the sample pictures during training, the first image matrix in the embodiment of the invention corresponds to the input pictures which need to be identified actually, and the second image matrix corresponds to the input pictures after disorder.
Step 202, inputting the first image matrix into the first convolutional neural network, and acquiring a first output vector of a full connection layer through the first convolutional neural network; and inputting the second image matrix into the second convolutional neural network, and acquiring a second output vector of the full-link layer through the second convolutional neural network.
And inputting the first image matrix into the first convolutional neural network, acquiring a first output vector of the full-connection layer through the first convolutional neural network, wherein the size of each element in the first output vector can represent the probability that the picture is in a corresponding category. And inputting the second image matrix into the second convolutional neural network, acquiring a second output vector of the fully-connected layer through the second convolutional neural network, wherein the size of each element in the second output vector can represent the probability that the picture is in a corresponding category.
And 203, obtaining a picture identification result according to the first output vector and the second output vector.
The first output vector and the second output vector can be synthesized to obtain a picture identification result. For example, the first output vector and the second output vector may be summed in a weighted manner, and the category to which the picture belongs may be determined according to the sizes of the elements in the output vectors.
The embodiment of the invention can realize the image recognition of the weak supervision fine granularity by using the image recognition model obtained by the training method, does not need any manual marking information, and can achieve the same accuracy as the strong supervision fine granularity recognition.
Further, based on the above embodiment, the obtaining a picture identification result according to the first output vector and the second output vector includes: and adding the first output vector and the second output vector to obtain a third output vector, and obtaining the picture identification result according to the third output vector.
When a picture identification result is obtained according to the first output vector and the second output vector, the first output vector and the second output vector can be directly added to obtain a third output vector, and the category of a picture is determined according to the size of an element in the third output vector, so that the picture identification result is obtained.
On the basis of the above embodiment, the embodiment of the present invention obtains the third output vector by adding the first output vector and the second output vector, and obtains the picture recognition result according to the third output vector, thereby improving the simplicity.
Fig. 4 is a schematic structural diagram of an image recognition model training apparatus according to an embodiment of the present invention. As shown in fig. 4, the apparatus includes a picture scrambling module 10, a feature extracting and classifying module 20, a loss function calculating module 30, and a model optimizing module 40, wherein: the picture scrambling module 10 is configured to: after a first image matrix of a sample picture is recorded, the sample picture is segmented and disordered, and a second image matrix of the disordered sample picture is obtained; the feature extraction and classification module 20 is configured to: inputting the first image matrix into a first convolution neural network, extracting first picture features through the first convolution neural network and obtaining a first picture classification result; inputting the second image matrix into a second convolutional neural network, extracting second image characteristics through the second convolutional neural network and obtaining a second image classification result; the loss function calculation module 30 is configured to: solving a preset distillation loss function according to the first picture characteristic and the second picture characteristic, wherein the smaller the distillation loss function is, the closer the first convolutional neural network and the second convolutional neural network are in the characteristic calculation process is; solving a preset classification loss function according to the first image classification result and the second image classification result, wherein the smaller the classification loss function is, the closer the first convolutional neural network and the second convolutional neural network are to a true value on the classification result is; the model optimization module 40 is configured to: and optimizing the first convolutional neural network and the second convolutional neural network by continuously optimizing the distillation loss function and the classification loss function, and finishing training when the distillation loss function is smaller than a preset first threshold and the classification loss function is smaller than a preset second threshold, so as to obtain a trained image recognition model constructed by the first convolutional neural network and the second convolutional neural network.
According to the embodiment of the invention, the image matrix of the original image and the image matrix of the disordered image are respectively input into the two convolutional neural network branches, and the features extracted by the two convolutional neural networks and the classification result are synthesized to carry out learning and training, so that the capturing and extraction of local features are facilitated, more effective features can be obtained, the accuracy same as that of strong supervision fine-grained identification can be achieved without any manual marking information, the time and space consumption of an algorithm can be reduced on a model, and the system robustness is improved.
Fig. 5 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present invention. As shown in fig. 5, the apparatus includes an image processing module 100, an output vector obtaining module 200, and an image recognition module 300, wherein: the image processing module 100 is configured to: after a first image matrix of an input picture is recorded, the input picture is segmented and disordered, and a second image matrix of the disordered input picture is obtained; the output vector obtaining module 200 is configured to: inputting the first image matrix into the first convolution neural network, and acquiring a first output vector of a full connection layer through the first convolution neural network; inputting the second image matrix into the second convolutional neural network, and acquiring a second output vector of the full-connection layer through the second convolutional neural network; the image recognition module 300 is configured to: and obtaining a picture identification result according to the first output vector and the second output vector.
The embodiment of the invention can realize the image recognition of the weak supervision fine granularity by using the image recognition model obtained by the training method, does not need any manual marking information, and can achieve the same accuracy as the strong supervision fine granularity recognition.
The device provided by the embodiment of the present invention is used for the method, and specific functions may refer to the above method flow, which is not described herein again.
Fig. 6 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 6: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a method of training an image recognition model, the method comprising: after a first image matrix of a sample picture is recorded, the sample picture is segmented and disordered, and a second image matrix of the disordered sample picture is obtained; inputting the first image matrix into a first convolution neural network, extracting first picture features through the first convolution neural network and obtaining a first picture classification result; inputting the second image matrix into a second convolutional neural network, extracting second image characteristics through the second convolutional neural network and obtaining a second image classification result; solving a preset distillation loss function according to the first picture characteristic and the second picture characteristic, wherein the smaller the distillation loss function is, the closer the first convolutional neural network and the second convolutional neural network are in the characteristic calculation process is; solving a preset classification loss function according to the first image classification result and the second image classification result, wherein the smaller the classification loss function is, the closer the first convolutional neural network and the second convolutional neural network are to a true value on the classification result is; and optimizing the first convolutional neural network and the second convolutional neural network by continuously optimizing the distillation loss function and the classification loss function, and finishing training when the distillation loss function is smaller than a preset first threshold and the classification loss function is smaller than a preset second threshold, so as to obtain a trained image recognition model constructed by the first convolutional neural network and the second convolutional neural network. Alternatively, the processor 610 may invoke logic instructions in the memory 630 to perform an image recognition method comprising: after a first image matrix of an input picture is recorded, the input picture is segmented and disordered, and a second image matrix of the disordered input picture is obtained; inputting the first image matrix into the first convolution neural network, and acquiring a first output vector of a full connection layer through the first convolution neural network; inputting the second image matrix into the second convolutional neural network, and acquiring a second output vector of the full-connection layer through the second convolutional neural network; and obtaining a picture identification result according to the first output vector and the second output vector.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can perform the method for training an image recognition model provided by the above-mentioned embodiments of the method, where the method includes: after a first image matrix of a sample picture is recorded, the sample picture is segmented and disordered, and a second image matrix of the disordered sample picture is obtained; inputting the first image matrix into a first convolution neural network, extracting first picture features through the first convolution neural network and obtaining a first picture classification result; inputting the second image matrix into a second convolutional neural network, extracting second image characteristics through the second convolutional neural network and obtaining a second image classification result; solving a preset distillation loss function according to the first picture characteristic and the second picture characteristic, wherein the smaller the distillation loss function is, the closer the first convolutional neural network and the second convolutional neural network are in the characteristic calculation process is; solving a preset classification loss function according to the first image classification result and the second image classification result, wherein the smaller the classification loss function is, the closer the first convolutional neural network and the second convolutional neural network are to a true value on the classification result is; and optimizing the first convolutional neural network and the second convolutional neural network by continuously optimizing the distillation loss function and the classification loss function, and finishing training when the distillation loss function is smaller than a preset first threshold and the classification loss function is smaller than a preset second threshold, so as to obtain a trained image recognition model constructed by the first convolutional neural network and the second convolutional neural network. Or, when the program instructions are executed by a computer, the computer can execute the image recognition method provided by the above method embodiments, and the method comprises: after a first image matrix of an input picture is recorded, the input picture is segmented and disordered, and a second image matrix of the disordered input picture is obtained; inputting the first image matrix into the first convolution neural network, and acquiring a first output vector of a full connection layer through the first convolution neural network; inputting the second image matrix into the second convolutional neural network, and acquiring a second output vector of the full-connection layer through the second convolutional neural network; and obtaining a picture identification result according to the first output vector and the second output vector.
In yet another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method for training an image recognition model provided in the foregoing embodiments, and the method includes: after a first image matrix of a sample picture is recorded, the sample picture is segmented and disordered, and a second image matrix of the disordered sample picture is obtained; inputting the first image matrix into a first convolution neural network, extracting first picture features through the first convolution neural network and obtaining a first picture classification result; inputting the second image matrix into a second convolutional neural network, extracting second image characteristics through the second convolutional neural network and obtaining a second image classification result; solving a preset distillation loss function according to the first picture characteristic and the second picture characteristic, wherein the smaller the distillation loss function is, the closer the first convolutional neural network and the second convolutional neural network are in the characteristic calculation process is; solving a preset classification loss function according to the first image classification result and the second image classification result, wherein the smaller the classification loss function is, the closer the first convolutional neural network and the second convolutional neural network are to a true value on the classification result is; and optimizing the first convolutional neural network and the second convolutional neural network by continuously optimizing the distillation loss function and the classification loss function, and finishing training when the distillation loss function is smaller than a preset first threshold and the classification loss function is smaller than a preset second threshold, so as to obtain a trained image recognition model constructed by the first convolutional neural network and the second convolutional neural network. Or, the computer program is implemented to perform the image recognition method provided by the above embodiments when executed by a processor, and the method includes: after a first image matrix of an input picture is recorded, the input picture is segmented and disordered, and a second image matrix of the disordered input picture is obtained; inputting the first image matrix into the first convolution neural network, and acquiring a first output vector of a full connection layer through the first convolution neural network; inputting the second image matrix into the second convolutional neural network, and acquiring a second output vector of the full-connection layer through the second convolutional neural network; and obtaining a picture identification result according to the first output vector and the second output vector.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A training method of an image recognition model is characterized by comprising the following steps:
after a first image matrix of a sample picture is recorded, the sample picture is segmented and disordered, and a second image matrix of the disordered sample picture is obtained;
inputting the first image matrix into a first convolution neural network, extracting first picture features through the first convolution neural network and obtaining a first picture classification result; inputting the second image matrix into a second convolutional neural network, extracting second image characteristics through the second convolutional neural network and obtaining a second image classification result;
solving a preset distillation loss function according to the first picture characteristic and the second picture characteristic, wherein the smaller the distillation loss function is, the closer the first convolutional neural network and the second convolutional neural network are in the characteristic calculation process is; solving a preset classification loss function according to the first image classification result and the second image classification result, wherein the smaller the classification loss function is, the closer the first convolutional neural network and the second convolutional neural network are to a true value on the classification result is;
and optimizing the first convolutional neural network and the second convolutional neural network by continuously optimizing the distillation loss function and the classification loss function, and finishing training when the distillation loss function is smaller than a preset first threshold and the classification loss function is smaller than a preset second threshold, so as to obtain a trained image recognition model constructed by the first convolutional neural network and the second convolutional neural network.
2. The method for training an image recognition model according to claim 1, wherein the segmenting and disordering the sample picture specifically comprises:
firstly, dividing an image into a plurality of image blocks; then, performing the operation of disordering the image blocks in the row direction, and then performing the operation of disordering the image blocks in the column direction; or, firstly, the image blocks in the row and column directions are scrambled, and then the image blocks in the row directions are scrambled.
3. The method for training an image recognition model according to claim 2, wherein the performing the operation of scrambling the image blocks in the row direction comprises: for each image block in each row, exchanging positions with the image blocks at corresponding positions in the row direction within a preset first step length range according to the value of a first random variable;
the performing of the operation of scrambling the image blocks in the column direction includes: and for each image block in each column, exchanging the position of the image block in the column direction with the image block in the corresponding position according to the value of a second random variable within a preset second step length range.
4. The method for training an image recognition model according to claim 1, wherein solving a preset distillation loss function according to the first picture feature and the second picture feature comprises:
acquiring a global flow matrix according to the first picture features extracted from two adjacent layers of the convolutional layers in the first convolutional neural network, and acquiring a local flow matrix according to the second picture features extracted from two adjacent layers of the convolutional layers in the second convolutional neural network;
solving the preset distillation loss function by calculating the L2 norm distance of the global flow matrix and the local flow matrix.
5. The method for training the image recognition model according to claim 4, wherein the expressions of the global flow matrix and the local flow matrix obtained through the picture features of two adjacent layers are as follows:
Figure FDA0002617242550000021
wherein, F1∈Rh×w×mRepresenting a picture characteristic of the upper c1 layer of two adjacent layers, F2∈Rh×w×mDiagram showing the lower c2 layer of two adjacent layersThe slice characteristics h, W and m respectively represent the height, width and channel number of the picture characteristics, s represents the serial number of the picture height characteristics, t represents the serial number of the picture width characteristics, x represents an input picture, and W represents the weight parameter of the neural network.
6. The method for training an image recognition model according to claim 5, wherein the distillation loss function is expressed by:
Figure FDA0002617242550000022
wherein, WglobalRepresenting a global flow matrix, WlocalRepresenting a local flow matrix, Lflow(Wglobal,Wlocal) Representing a distillation loss function derived from the global flow matrix and the local flow matrix; lambda [ alpha ]1Representing a weight coefficient; l represents the sequence number of a flow matrix, wherein the flow matrix comprises the global flow matrix and the local flow matrix; n represents the number of the stream matrixes for one picture, wherein the number of the global stream matrixes is the same as that of the local stream matrixes; x represents an input picture; n represents the number of pictures;
Figure FDA0002617242550000031
an l-th global stream matrix representing x pictures;
Figure FDA0002617242550000032
the l-th local stream matrix representing x pictures;
Figure FDA0002617242550000033
representing the L2 norm distance calculation.
7. An image recognition method based on the image recognition model of any one of claims 1 to 6, comprising:
after a first image matrix of an input picture is recorded, the input picture is segmented and disordered, and a second image matrix of the disordered input picture is obtained;
inputting the first image matrix into the first convolution neural network, and acquiring a first output vector of a full connection layer through the first convolution neural network; inputting the second image matrix into the second convolutional neural network, and acquiring a second output vector of the full-connection layer through the second convolutional neural network;
and obtaining a picture identification result according to the first output vector and the second output vector.
8. The image recognition method of claim 7, wherein the deriving a picture recognition result according to the first output vector and the second output vector comprises:
and adding the first output vector and the second output vector to obtain a third output vector, and obtaining the picture identification result according to the third output vector.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and being executable on the processor, wherein the processor implements the steps of the method for training an image recognition model according to any of claims 1 to 6 or the steps of the method for image recognition according to any of claims 7 to 8 when executing the computer program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the training method of the image recognition model according to any one of claims 1 to 6 or the steps of the image recognition method according to any one of claims 7 to 8.
CN202010772704.6A 2020-08-04 2020-08-04 Training method of image recognition model and image recognition method Pending CN112016591A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010772704.6A CN112016591A (en) 2020-08-04 2020-08-04 Training method of image recognition model and image recognition method
PCT/CN2021/084760 WO2022027987A1 (en) 2020-08-04 2021-03-31 Image recognition model training method, and image recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010772704.6A CN112016591A (en) 2020-08-04 2020-08-04 Training method of image recognition model and image recognition method

Publications (1)

Publication Number Publication Date
CN112016591A true CN112016591A (en) 2020-12-01

Family

ID=73498469

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010772704.6A Pending CN112016591A (en) 2020-08-04 2020-08-04 Training method of image recognition model and image recognition method

Country Status (2)

Country Link
CN (1) CN112016591A (en)
WO (1) WO2022027987A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112862095A (en) * 2021-02-02 2021-05-28 浙江大华技术股份有限公司 Self-distillation learning method and device based on characteristic analysis and readable storage medium
CN112966709A (en) * 2021-01-27 2021-06-15 中国电子进出口有限公司 Deep learning-based fine vehicle type identification method and system
CN113011387A (en) * 2021-04-20 2021-06-22 上海商汤科技开发有限公司 Network training and human face living body detection method, device, equipment and storage medium
CN113052772A (en) * 2021-03-23 2021-06-29 Oppo广东移动通信有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN113191426A (en) * 2021-04-28 2021-07-30 深圳市捷顺科技实业股份有限公司 Vehicle identification model creation method, vehicle identification method and related components
CN113269117A (en) * 2021-06-04 2021-08-17 重庆大学 Knowledge distillation-based pedestrian re-identification method
CN113627421A (en) * 2021-06-30 2021-11-09 华为技术有限公司 Image processing method, model training method and related equipment
CN113706642A (en) * 2021-08-31 2021-11-26 北京三快在线科技有限公司 Image processing method and device
WO2022027987A1 (en) * 2020-08-04 2022-02-10 杰创智能科技股份有限公司 Image recognition model training method, and image recognition method
CN114118379A (en) * 2021-12-02 2022-03-01 北京百度网讯科技有限公司 Neural network training method, image processing method, device, equipment and medium
CN114299349A (en) * 2022-03-04 2022-04-08 南京航空航天大学 Crowd-sourced image learning method based on multi-expert system and knowledge distillation
CN114817742A (en) * 2022-05-18 2022-07-29 平安科技(深圳)有限公司 Knowledge distillation-based recommendation model configuration method, device, equipment and medium
CN114821203A (en) * 2022-06-29 2022-07-29 中国科学院自动化研究所 Fine-grained image model training and identifying method and device based on consistency loss

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114900779B (en) * 2022-04-12 2023-06-06 东莞市晨新电子科技有限公司 Audio compensation method, system and electronic equipment
CN114979470A (en) * 2022-05-12 2022-08-30 咪咕文化科技有限公司 Camera rotation angle analysis method, device, equipment and storage medium
CN115061427B (en) * 2022-06-28 2023-04-14 浙江同发塑机有限公司 Material layer uniformity control system of blow molding machine and control method thereof
CN115356434B (en) * 2022-07-14 2023-06-02 福建省杭氟电子材料有限公司 Gas monitoring system and method for hexafluorobutadiene storage place
CN116245832B (en) * 2023-01-30 2023-11-14 浙江医准智能科技有限公司 Image processing method, device, equipment and storage medium
CN116544146B (en) * 2023-05-22 2024-04-09 浙江固驰电子有限公司 Vacuum sintering equipment and method for power semiconductor device
CN116563795A (en) * 2023-05-30 2023-08-08 北京天翊文化传媒有限公司 Doll production management method and doll production management system
CN116469132B (en) * 2023-06-20 2023-09-05 济南瑞泉电子有限公司 Fall detection method, system, equipment and medium based on double-flow feature extraction
CN117274903B (en) * 2023-09-25 2024-04-19 安徽南瑞继远电网技术有限公司 Intelligent early warning device and method for electric power inspection based on intelligent AI chip
CN117690007B (en) * 2024-02-01 2024-04-19 成都大学 High-frequency workpiece image recognition method
CN117853875B (en) * 2024-03-04 2024-05-14 华东交通大学 Fine-granularity image recognition method and system

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537277A (en) * 2018-04-10 2018-09-14 湖北工业大学 A kind of image classification knowledge method for distinguishing
CN108776807A (en) * 2018-05-18 2018-11-09 复旦大学 It is a kind of based on can the double branch neural networks of skip floor image thickness grain-size classification method
CN109948425A (en) * 2019-01-22 2019-06-28 中国矿业大学 A kind of perception of structure is from paying attention to and online example polymerize matched pedestrian's searching method and device
CN109977980A (en) * 2017-12-28 2019-07-05 航天信息股份有限公司 A kind of method for recognizing verification code and device
CN110084281A (en) * 2019-03-31 2019-08-02 华为技术有限公司 Image generating method, the compression method of neural network and relevant apparatus, equipment
CN110674938A (en) * 2019-08-21 2020-01-10 浙江工业大学 Anti-attack defense method based on cooperative multi-task training
CN110717525A (en) * 2019-09-20 2020-01-21 浙江工业大学 Channel adaptive optimization anti-attack defense method and device
CN110930356A (en) * 2019-10-12 2020-03-27 上海交通大学 Industrial two-dimensional code reference-free quality evaluation system and method
CN111160275A (en) * 2019-12-30 2020-05-15 深圳元戎启行科技有限公司 Pedestrian re-recognition model training method and device, computer equipment and storage medium
CN111260055A (en) * 2020-01-13 2020-06-09 腾讯科技(深圳)有限公司 Model training method based on three-dimensional image recognition, storage medium and equipment
CN111353539A (en) * 2020-02-29 2020-06-30 武汉大学 Cervical OCT image classification method and system based on double-path attention convolutional neural network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10776662B2 (en) * 2017-11-09 2020-09-15 Disney Enterprises, Inc. Weakly-supervised spatial context networks to recognize features within an image
CN108596026B (en) * 2018-03-16 2020-06-30 中国科学院自动化研究所 Cross-view gait recognition device and training method based on double-flow generation countermeasure network
CN109492765A (en) * 2018-11-01 2019-03-19 浙江工业大学 A kind of image Increment Learning Algorithm based on migration models
CN110751214A (en) * 2019-10-21 2020-02-04 山东大学 Target detection method and system based on lightweight deformable convolution
CN111415318B (en) * 2020-03-20 2023-06-13 山东大学 Unsupervised related filtering target tracking method and system based on jigsaw task
CN112016591A (en) * 2020-08-04 2020-12-01 杰创智能科技股份有限公司 Training method of image recognition model and image recognition method

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977980A (en) * 2017-12-28 2019-07-05 航天信息股份有限公司 A kind of method for recognizing verification code and device
CN108537277A (en) * 2018-04-10 2018-09-14 湖北工业大学 A kind of image classification knowledge method for distinguishing
CN108776807A (en) * 2018-05-18 2018-11-09 复旦大学 It is a kind of based on can the double branch neural networks of skip floor image thickness grain-size classification method
CN109948425A (en) * 2019-01-22 2019-06-28 中国矿业大学 A kind of perception of structure is from paying attention to and online example polymerize matched pedestrian's searching method and device
CN110084281A (en) * 2019-03-31 2019-08-02 华为技术有限公司 Image generating method, the compression method of neural network and relevant apparatus, equipment
CN110674938A (en) * 2019-08-21 2020-01-10 浙江工业大学 Anti-attack defense method based on cooperative multi-task training
CN110717525A (en) * 2019-09-20 2020-01-21 浙江工业大学 Channel adaptive optimization anti-attack defense method and device
CN110930356A (en) * 2019-10-12 2020-03-27 上海交通大学 Industrial two-dimensional code reference-free quality evaluation system and method
CN111160275A (en) * 2019-12-30 2020-05-15 深圳元戎启行科技有限公司 Pedestrian re-recognition model training method and device, computer equipment and storage medium
CN111260055A (en) * 2020-01-13 2020-06-09 腾讯科技(深圳)有限公司 Model training method based on three-dimensional image recognition, storage medium and equipment
CN111353539A (en) * 2020-02-29 2020-06-30 武汉大学 Cervical OCT image classification method and system based on double-path attention convolutional neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHAO CHAOYANG ETAL.: ""Real-time Multi-Scale Face Detector on Embedded Devices"", 《SENSORS》, 1 May 2019 (2019-05-01), pages 1 - 22 *
管文杰: ""基于注意力机制与知识蒸馏的目标细分类与检测"", 《信息科技辑》, 15 July 2019 (2019-07-15), pages 1 - 5 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022027987A1 (en) * 2020-08-04 2022-02-10 杰创智能科技股份有限公司 Image recognition model training method, and image recognition method
CN112966709A (en) * 2021-01-27 2021-06-15 中国电子进出口有限公司 Deep learning-based fine vehicle type identification method and system
CN112966709B (en) * 2021-01-27 2022-09-23 中国电子进出口有限公司 Deep learning-based fine vehicle type identification method and system
CN112862095A (en) * 2021-02-02 2021-05-28 浙江大华技术股份有限公司 Self-distillation learning method and device based on characteristic analysis and readable storage medium
CN112862095B (en) * 2021-02-02 2023-09-29 浙江大华技术股份有限公司 Self-distillation learning method and device based on feature analysis and readable storage medium
CN113052772A (en) * 2021-03-23 2021-06-29 Oppo广东移动通信有限公司 Image processing method, image processing device, electronic equipment and storage medium
CN113011387A (en) * 2021-04-20 2021-06-22 上海商汤科技开发有限公司 Network training and human face living body detection method, device, equipment and storage medium
CN113011387B (en) * 2021-04-20 2024-05-24 上海商汤科技开发有限公司 Network training and human face living body detection method, device, equipment and storage medium
CN113191426A (en) * 2021-04-28 2021-07-30 深圳市捷顺科技实业股份有限公司 Vehicle identification model creation method, vehicle identification method and related components
CN113269117A (en) * 2021-06-04 2021-08-17 重庆大学 Knowledge distillation-based pedestrian re-identification method
CN113627421A (en) * 2021-06-30 2021-11-09 华为技术有限公司 Image processing method, model training method and related equipment
CN113706642A (en) * 2021-08-31 2021-11-26 北京三快在线科技有限公司 Image processing method and device
CN114118379A (en) * 2021-12-02 2022-03-01 北京百度网讯科技有限公司 Neural network training method, image processing method, device, equipment and medium
CN114299349A (en) * 2022-03-04 2022-04-08 南京航空航天大学 Crowd-sourced image learning method based on multi-expert system and knowledge distillation
CN114299349B (en) * 2022-03-04 2022-05-13 南京航空航天大学 Crowdsourcing image learning method based on multi-expert system and knowledge distillation
CN114817742A (en) * 2022-05-18 2022-07-29 平安科技(深圳)有限公司 Knowledge distillation-based recommendation model configuration method, device, equipment and medium
CN114821203A (en) * 2022-06-29 2022-07-29 中国科学院自动化研究所 Fine-grained image model training and identifying method and device based on consistency loss

Also Published As

Publication number Publication date
WO2022027987A1 (en) 2022-02-10

Similar Documents

Publication Publication Date Title
CN112016591A (en) Training method of image recognition model and image recognition method
Zhang et al. Animal detection from highly cluttered natural scenes using spatiotemporal object region proposals and patch verification
Salman et al. Fish species classification in unconstrained underwater environments based on deep learning
Spampinato et al. Automatic fish classification for underwater species behavior understanding
Bautista et al. Convolutional neural network for vehicle detection in low resolution traffic videos
Ahmad et al. Visual features based boosted classification of weeds for real-time selective herbicide sprayer systems
CN110674874B (en) Fine-grained image identification method based on target fine component detection
Altenberger et al. A non-technical survey on deep convolutional neural network architectures
Bianco et al. Predicting image aesthetics with deep learning
CN110909618B (en) Method and device for identifying identity of pet
Ahmed et al. Automated weed classification with local pattern-based texture descriptors.
CN107633226A (en) A kind of human action Tracking Recognition method and system
CN110046574A (en) Safety cap based on deep learning wears recognition methods and equipment
Kounalakis et al. A robotic system employing deep learning for visual recognition and detection of weeds in grasslands
CN113673607A (en) Method and device for training image annotation model and image annotation
Škrabánek et al. Detection of grapes in natural environment using support vector machine classifier
Anas et al. Detecting abnormal fish behavior using motion trajectories in ubiquitous environments
CN117437691A (en) Real-time multi-person abnormal behavior identification method and system based on lightweight network
Mezenner et al. Local Directional Patterns for Plant Leaf Disease Detection
CN115409938A (en) Three-dimensional model construction method, device, equipment and storage medium
Wang et al. Eigen-evolution dense trajectory descriptors
Goyal et al. Moving Object Detection in Video Streaming Using Improved DNN Algorithm
Dwiwijaya et al. Identification of Herbal Plants Using Morphology Method and K-Nearest Neighbour Algorithm (KNN)
Eghbali et al. Deep Convolutional Neural Network (CNN) for Large-Scale Images Classification
Lillywhite et al. Automated fish taxonomy using evolution-constructed features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination