CN110598723A - Artificial neural network adjusting method and device - Google Patents
Artificial neural network adjusting method and device Download PDFInfo
- Publication number
- CN110598723A CN110598723A CN201810608259.2A CN201810608259A CN110598723A CN 110598723 A CN110598723 A CN 110598723A CN 201810608259 A CN201810608259 A CN 201810608259A CN 110598723 A CN110598723 A CN 110598723A
- Authority
- CN
- China
- Prior art keywords
- ann
- feature
- loss function
- neural network
- centers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 124
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000012549 training Methods 0.000 claims abstract description 42
- 238000009826 distribution Methods 0.000 claims abstract description 20
- 230000006870 function Effects 0.000 claims description 89
- 238000003062 neural network model Methods 0.000 claims description 7
- 238000012935 Averaging Methods 0.000 claims description 6
- 230000002411 adverse Effects 0.000 abstract description 3
- 239000010410 layer Substances 0.000 description 52
- 238000013527 convolutional neural network Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 11
- 230000008569 process Effects 0.000 description 10
- 230000004913 activation Effects 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000002085 persistent effect Effects 0.000 description 4
- 230000006978 adaptation Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 230000004069 differentiation Effects 0.000 description 2
- 238000012804 iterative process Methods 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000000137 annealing Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 239000002355 dual-layer Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
An Artificial Neural Network (ANN) adjustment method and apparatus are provided. The last fully connected layer of the ANN is a classifier for classification and the normalized weights of that layer represent the feature centers of the classes, the method comprising: adjusting the ANN with a first loss function that constrains an included angle of the feature center; and finishing the training of the ANN under the condition that the included angle distribution of each feature center tends to be uniform. According to the invention, the angle of the characteristic center is constrained by using the loss function, so that the included angle between various characteristic centers is more uniform, and the problem of adverse influence on classification results caused by insufficient discrimination of intra-class deviation and inter-class deviation can be well solved.
Description
Technical Field
The invention relates to deep learning, in particular to an adjusting method and device for an artificial neural network.
Background
In recent years, Artificial Neural Networks (ANN) have made significant progress in the fields of object detection, image classification, and the like. However, in engineering, the situation of the imbalance of the labeled data categories often occurs. If there are many classes to be classified, for example, a face recognition task, the imbalance of the classes of the labeled data will have a large impact on the classification result due to the insufficient differentiation between the intra-class deviation and the inter-class deviation.
In order to solve the above problems, various improvements have been proposed, such as data enhancement and local feature labeling. However, the above solutions also fail to solve the problem of inaccurate classification result caused by unbalanced classification of the labeled data.
In view of the above, there is still a need for an improved neural network tuning method.
Disclosure of Invention
According to the invention, the angle of the characteristic center is constrained by using the loss function, so that the included angle between various characteristic centers is more uniform, and the problem of adverse influence on classification results caused by insufficient discrimination of intra-class deviation and inter-class deviation is well solved.
According to an aspect of the present invention, there is provided an Artificial Neural Network (ANN) tuning method, wherein a last fully connected layer of the ANN is a classifier for classification and normalized weights of the layer represent feature centers of each class, the method comprising: adjusting the ANN with a first loss function that constrains an included angle of the feature center; and finishing the training of the ANN under the condition that the included angle distribution of each feature center tends to be uniform. Thus, the problem of inaccurate classification caused by uneven distribution of feature center angles due to imbalance of the training data set is eliminated.
Here, the first loss function that constrains the angle of the feature center may be a common softmax loss function. The training data input to the ANN may be an unbalanced data set with unequal distribution of data of various types. And the ANN network may be a face recognition neural network.
Preferably, the adjusting method of the present invention may further include: before the ANN is adjusted by using the first loss function, the ANN is trained by using the second loss function to determine the initial feature centers of each type. Therefore, the included angle constraint loss function of the invention can be used for training or fine tuning of the network as required.
During the training and/or tuning process, the ANN may be jointly adjusted using the unconstrained third loss function and the first loss function. Wherein the first loss function and the third loss function may be given different weights in the tuning of the ANN. Preferably, the weight of the first penalty function increases gradually as the iteration progresses. Alternatively, the weight of the loss constraint of the first loss function increases gradually as the iteration progresses. Therefore, the time and the strength for applying the angle constraint can be flexibly selected according to the practical application.
Constraining the included angle of the feature centers may include: averaging included angles between each feature center and two adjacent feature centers to serve as feature center angles of the feature centers; and so that all feature center angles tend to be equal and the variance is as zero as possible. Preferably, the characteristic center angle needs to be greater than a predetermined threshold.
According to another aspect of the present invention, there is provided an Artificial Neural Network (ANN) adjusting apparatus, wherein a last fully-connected layer of the artificial neural network is a classifier for classification and normalized weights of the layer represent feature centers of classes, the apparatus comprising: an included angle adjusting means for adjusting the ANN with a first loss function that constrains an included angle of the feature center; and the iteration device is used for finishing the training of the ANN under the condition that the included angle distribution of each feature center tends to be uniform.
Preferably, the apparatus may further comprise: and the initial training device is used for training the ANN by using a second loss function to determine the initial feature centers of various types before the ANN is adjusted by using the first loss function.
Preferably, the apparatus may further comprise: and the joint adjusting device is used for jointly adjusting the ANN by using the unconstrained third loss function and the first loss function. The first and third loss functions may be given different weights in the adjustment of the ANN by the joint adjustment means.
The iteration means may be such that the weight of the first loss function increases progressively as the iteration progresses. Alternatively or additionally, the iteration means may be such that the weight of the loss constraint of the first loss function increases gradually as the iteration progresses.
Constraining the included angle of the feature centers may include: averaging included angles between each feature center and two adjacent feature centers to serve as feature center angles of the feature centers; and so that all feature center angles tend to be equal and the variance is as zero as possible. Preferably, the characteristic center angle needs to be greater than a predetermined threshold.
According to an aspect of the present invention, an Artificial Neural Network (ANN) deployment method is provided, including: the neural network model adjusted as above is deployed on a fixed-point computing platform comprising at least in part an FPGA, a GPU, and/or an ASIC to perform reasoning. The ANN may be a face recognition neural network, and the ANN as deployed does not include the last fully-connected layer.
According to yet another aspect of the invention, a computing device is presented, comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the face recognition neural network adjustment method as described above.
According to another aspect of the present invention, a non-transitory machine-readable storage medium is proposed, on which executable code is stored, which, when executed by a processor of an electronic device, causes the processor to perform the face recognition neural network adjusting method as described above.
According to a further aspect of the invention, a fixed-point computing platform is proposed, which is at least partly composed of an FPGA, a GPU and/or an ASIC, for performing inferential computations based on a fixed-point neural network model obtained according to the above method.
The ANN adjusting method and the ANN adjusting device can enable the included angles among various feature centers to be more uniform by restricting the angles of the feature centers, so that the adverse effect on classification results due to insufficient discrimination of intra-class deviation and inter-class deviation is well solved, and the ANN adjusting method and the ANN adjusting device are particularly suitable for the condition that a training data set is unbalanced. The above-mentioned angle constraints can be applied by a loss function, and different adding timings and weights can be selected according to specific applications, thereby realizing more uniform characteristic center angle distribution under the condition of ensuring normal convergence of the network.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.
Fig. 1 shows a schematic diagram of a typical CNN.
Fig. 2 shows a flow diagram of an ANN adjustment method according to an embodiment of the present invention.
Fig. 3A and 3B show an example of feature mapping to a two-dimensional plane under unbalanced dataset training before and after optimization using the adaptation scheme of the present invention.
Fig. 4 shows a schematic diagram of an ANN adjustment apparatus according to an embodiment of the present invention.
Fig. 5 shows a schematic structural diagram of a computing device which can be used for implementing the above adjustment method according to an embodiment of the present invention.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The scheme of the present application is applicable to various Artificial Neural Networks (ANN), including Deep Neural Networks (DNN), Recurrent Neural Networks (RNN), and Convolutional Neural Networks (CNN). The following is a description with a certain degree of background using CNN as an example.
CNN basic concept
CNN achieves the most advanced performance in a wide range of vision-related tasks. To aid in understanding the CNN-based classification algorithms (e.g., face recognition algorithms) analyzed in this application, the basic concepts of CNN's underlying knowledge and fixed-point quantization are first introduced.
As shown in fig. 1, a typical CNN consists of a series of layers that run in order.
The CNN neural network is composed of an input layer, an output layer and a plurality of hidden layers which are connected in series. The first layer of the CNN reads an input value, such as an input image, and outputs a series of activation values (which may also be referred to as a feature map). The lower layer reads the activation value generated by the previous layer and outputs a new activation value. The last classifier (classifier) outputs the probability of each class to which the input image may belong.
These layers can be broadly divided into weighted layers (e.g., convolutional layers, fully-connected layers, batch normalization layers, etc.) and unweighted layers (e.g., pooling layers, ReLU layers, Softmax layers, etc.). Here, the CONV layer (convolution layer) takes a series of feature maps as input, and convolves with a convolution kernel to obtain an output activation value. The pooling layer is typically connected to the CONV layer for outputting a maximum or average value for each partition (sub area) in each feature map, thereby reducing the computational effort by sub-sampling while maintaining some degree of displacement, scale and deformation invariance. Multiple alternations between convolutional and pooling layers may be included in a CNN, thereby gradually reducing the spatial resolution and increasing the number of feature maps. A one-dimensional vector output comprising a plurality of eigenvalues may then be derived by applying a linear transformation on the input eigenvector, possibly connected to at least one fully connected layer.
In general, the operation of weighted layers can be represented as:
Y=WX+b,
where W is the weight value, b is the bias, X is the input activation value, and Y is the output activation value.
The operation of the unweighted layer can be represented as:
Y=f(X),
wherein f (X) is a non-linear function.
Here, "weights" (weights) refer to parameters in the hidden layer, which in a broad sense can include biases, are values learned through the training process, and remain unchanged at inference; the activation value refers to a value, also referred to as a feature value, transferred between layers, starting from an input layer, and an output of each layer is obtained by an operation of the input value and a weight value. Unlike the weight values, the distribution of activation values varies dynamically according to the input data sample.
Before using CNN for reasoning (e.g., image classification), CNN needs to be trained first. Parameters, such as weights and biases, of the various layers of the neural network model are determined through a large import of training data.
Training of CNN
The training model represents the ideal values for learning (determining) all weights and biases by labeled samples. These determined weights and biases enable a high-accuracy inference of the input feature values, e.g. a correct classification of the input pictures, during the neural network deployment phase.
In supervised learning, a machine learning algorithm learns parameters by examining multiple samples and attempting to find a model that minimizes losses, a process known as empirical risk minimization.
The loss is a penalty for poor prediction. That is, the penalty is a numerical value representing how accurate the model predicts for a single sample. If the prediction of the model is completely accurate, the loss is zero, otherwise the loss will be large. The goal of training the model is to find a set of weights and biases that are "less" lost on average from all samples.
In the training process of the neural network, a loss function may be defined in order to quantify whether the current weights and biases are such that the network inputs fit all of the network inputs. Thus, the goal of training the network can be translated into a process that minimizes the loss function of weights and biases. Typically, a gradient descent algorithm (in multi-layer neural network training, a back propagation algorithm is used) is used to achieve the above-described minimization process.
In the back-propagation algorithm, a repetitive iterative process of forward propagation and back propagation is involved. The forward propagation process is a process in which neurons in the layers are connected through a weight matrix so that stimuli (eigenvalues) are continuously transmitted from the previous layer to the next layer through the excitation function of each layer. In the backward propagation, the error of the current layer needs to be reversely derived from the error of the next layer. Therefore, the weights and the bias are continuously adjusted through the iterative process of forward propagation and backward propagation, so that the loss function is gradually close to the minimum value, and the training of the neural network is completed.
The inventionANN adjustment scheme of
In recent years, Artificial Neural Networks (ANN) have made significant progress in the fields of object detection, image classification, and the like. But the situation of labeling data category imbalance often occurs in the network training process. If there are many classes to be classified, for example, a face recognition task, the imbalance of the classes of the labeled data will have a large impact on the classification result due to the insufficient differentiation between the intra-class deviation and the inter-class deviation.
For example, in the practical application scenario of a face recognition network, face data that can be obtained often shows an unbalanced long-tail distribution, and the number of pictures per ID (corresponding to one face) varies from one to hundreds, which is catastrophic for a data-driven neural network — an increase in data does not necessarily bring about an improvement in algorithm performance, and even brings about an opposite effect. Therefore, the face recognition problem of the unbalanced data set is solved.
Various algorithms have been proposed to address this problem, mainly involving data enhancement, and local feature labeling. However, the above solutions also fail to solve the problem of inaccurate classification result caused by unbalanced classification of the labeled data.
In view of the above, the present invention provides an improved scheme for the loss function. Unlike other methods for the euclidean distance domain, the present invention improves in the angular domain. Compared with the Euclidean distance domain, the angle domain is more primitive and natural and is close to the essence of feature distribution, so that better performance can be obtained.
Fig. 2 shows a flow diagram of an ANN adjustment method according to an embodiment of the present invention. Here, the last fully-connected layer of the ANN may be a classifier for classification and the normalized weight value of that layer represents the feature center of each class.
In step S210, the ANN is adjusted by a first loss function that constrains the included angle of the feature center. In step S220, the training of the ANN is completed under the condition that the distribution of the included angles of the feature centers tends to be uniform.
The ANN adjusting method provided by the invention particularly relates to an application scenario with more classifications and unbalanced long tail distribution of a data set used for training. In one embodiment, the training data input to the ANN may be an unbalanced data set with unequal distribution of data types. In another embodiment, the ANN network is a face recognition neural network.
And inputting the activation value of the last full-connection layer as a characteristic in the training process of the face recognition neural network. By normalizing the weight of the last fully-connected layer, the center of the class of features represented by the normalized weight can be obtained. For classes with fewer pictures, the feature centers of the class have smaller included angles with the feature centers of the adjacent classes, so the feature space of the class is smaller, and for the class, the probability of being mistakenly classified by the classifier is increased, which is obviously disadvantageous.
FIG. 3A illustrates an example of prior art feature mapping to a two-dimensional plane under unbalanced dataset training. For convenience of explanation, ten ID classifications by the classifier are taken as an example in the drawing. It should be understood that in real applications many more categories, for example on the order of tens of thousands, need to be classified. As shown in the figure, the included angle between the class with the larger number of pictures and the adjacent class feature center is larger, the included angle between the class feature center with the smaller number of pictures and the adjacent class feature center is smaller, the feature space of the class is smaller, and the probability of being classified by the classifier is larger.
According to the adjustment scheme of the invention, a constraint can be added to the included angle between each type of feature center and the adjacent feature center, so that all the type of feature centers are uniformly distributed in the whole feature space. Therefore, the feature spaces of each class are the same in size, so that the neural network can be concentrated on optimizing the distance between the feature and the feature center of the class, and the performance of the face recognition algorithm is improved.
FIG. 3B shows an example of the mapping of features trained using the angle constraint of the present invention onto a two-dimensional plane. As shown in the figure, the feature center distribution optimized based on the invention is more uniform and compact, so that the classification probability of the corresponding feature can be correspondingly improved.
According to different application scenarios, the ANN can be trained by directly using the first loss function for constraining the included angle of the feature center; the ANN may also be trained using a second loss function to determine initial feature centers for each class prior to adjusting the ANN using the first loss function. In other words, the first loss function that can constrain the included angle of the feature center according to the present invention may be a training loss function or a micro-call loss function. In the case of fine tuning, a face model may be trained using an additional loss function to provide a certain degree of separation between features. At this time, the first function of the invention is used for fine adjustment, so that the distribution of the feature centers is more uniform, and the recognition effect is better.
The fine tuning or training of the ANN using the first function may also be used in conjunction with other loss functions. In one embodiment, the ANN is jointly adjusted using an unconstrained third loss function and the first loss function. Preferably, the first loss function and the third loss function are given different weights during the adjustment of the ANN.
In one embodiment, the third penalty function may be a penalty function that gives different weights to different classes. The above function may weight the sparse class by, for example, 1 and the dominant class for which the number of data samples is greater than a certain value by a weight that is inversely related to its probability of being correctly predicted. Therefore, the excessive influence of the dominant category on network training is reduced, and the prediction accuracy of the rare categories is improved. The above function can be used in combination with the loss function of the present invention for adjusting the central feature angle, thereby further improving the classification accuracy of the multi-classification network.
In one embodiment, the weight of the first penalty function increases gradually as the iteration progresses. In another embodiment, the weight of the loss constraint, which may be a first loss function, gradually increases as the iteration progresses. In other words, the tuning scheme of the present invention can be network trained by annealing. Specifically, when the features are close to a random distribution at the time of network initialization, the variance of the center angles of the features is the largest, and the direct constraint training may return to cause the network not to converge. A weight term may then be added that is related to the number of iterations, and the value of the weight term may increase from 0 to 1 as the number of iterations increases, and eventually stabilize at 1. Therefore, after the characteristic has a primary distribution, the characteristic center can meet the uniform distribution.
In one embodiment, constraining the included angle of the feature centers comprises: averaging included angles between each feature center and two adjacent feature centers to serve as feature center angles of the feature centers; and so that all feature center angles tend to be equal and the variance is as zero as possible. Preferably, the characteristic center angle needs to be greater than a predetermined threshold to meet the minimum requirement for discrimination.
In one embodiment, the first loss function that constrains the included angle of the feature centers is a softmax loss function.
It is to be understood that "first", "second" and "third" are used herein to illustrate that the loss functions for applying the angle constraint, the loss functions for performing the initial training of the network, and the loss functions for performing the joint training mentioned in the present invention are not identical, but there is no provision or suggestion on the order or relationship of the three. For example, the first and second functions herein may both be softmax loss functions, but since the first function is a softmax loss function that imposes an angular constraint, it is still different from the original softmax loss function used to train the initial network.
ANN adjusting device
The above adjustment method of the present invention can be implemented by a specific ANN adjustment apparatus. Fig. 4 shows a schematic diagram of an ANN adjustment apparatus according to an embodiment of the present invention. Here, the last fully-connected layer of the ANN is the classifier for classification and the normalized weight value of that layer represents the feature center of each class.
As shown, the ANN adjustment means 400 may include an angle adjustment means 410 and an iteration means 420. The angle adjustment means 410 may be configured to adjust the ANN with a first loss function that constrains the angle of the feature center. The iterative means 420 may be used to complete the training of the ANN while making the distribution of angles between the feature centers uniform.
Preferably, the ANN adjustment means may further comprise initial training means 430 operable to train the ANN using a second loss function to determine initial feature centers of each class prior to adjusting the ANN using the first loss function.
Preferably, the ANN adjustment means may further comprise joint adjustment means 440 operable to jointly adjust the ANN using the unconstrained third loss function and the first loss function. The first loss function and the third loss function are given different weights in the adjustment of the ANN by the joint adjusting means 440. The iteration means 420 may make the weight of the first loss function gradually increase as the iteration progresses. In a parallel scheme, the iterating unit 420 may also make the weight of the loss constraint of the first loss function gradually increase as the iteration progresses.
Correspondingly to the adjustment method, the constraining the included angle of the feature center may also include: averaging included angles between each feature center and two adjacent feature centers to serve as feature center angles of the feature centers; and so that all feature center angles tend to be equal and the variance is as zero as possible.
In one embodiment, the present invention also includes an Artificial Neural Network (ANN) deployment method for deploying a neural network model adapted as described above on a fixed-point computing platform comprising at least in part an FPGA, a GPU and/or an ASIC to perform inference. Such as a face recognition task. The bit width for fixed point quantization may be determined by the bit width of the FPGA, GPU and/or ASIC.
The ANN may be a face recognition neural network, and the ANN as deployed does not include the last fully-connected layer. The face recognition neural network for training further includes a second last full-connected layer 1 for outputting the extracted face feature vectors. In the deployed neural network, the last layer does not output, so the penultimate fully-connected layer in the training can be used as an output layer of the neural network model actually deployed on the hardware computing platform. The face feature vectors output by the neural network model performing inference are used for comparison with existing face features (e.g., existing face features stored in a database) for face recognition.
In addition, a large amount of application data may be collected in an actual application scene by face recognition, and the deployed fixed-point model can be directly fine-tuned on a hardware platform by using the data, so that the effect of thermal update is realized. Thus, in one embodiment, the deployment method of the present invention may further comprise using the verification of the inference for fine tuning of the neural network deployed.
It will be appreciated by those skilled in the art that although described in the above description in particular in connection with face recognition tasks, the inventive adaptation is equally applicable to other multi-classification networks, such as classification of vehicle models and animal species.
Fig. 5 shows a schematic structural diagram of a computing device which can be used for implementing the above adjustment method according to an embodiment of the present invention.
Referring to fig. 5, computing device 500 includes memory 510 and processor 520.
The processor 550 may be a multi-core processor or may include a plurality of processors. In some embodiments, processor 520 may include a general-purpose host processor and one or more special coprocessors such as a Graphics Processor (GPU), a Digital Signal Processor (DSP), or the like. In some embodiments, processor 520 may be implemented using custom circuitry, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).
The memory 510 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions for the processor 520 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 510 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 510 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a digital versatile disc read only (e.g., DVD-ROM, dual layer DVD-ROM), a Blu-ray disc read only, an ultra-dense disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disk, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory 510 has stored thereon processable code, which, when processed by the processor 520, causes the processor 520 to perform the neural network tuning methods described above.
In actual use, the computing device 500 may be a general purpose computing device including a mass storage device 510 and a CPU 520 for performing training of a neural network. The neural network for classification obtained according to the adaptation scheme of the present invention may be executed on a fixed-point computing platform implemented at least in part by an FPGA, a GPU and/or an ASIC.
Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.
Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Claims (23)
1. A method of Artificial Neural Network (ANN) tuning, wherein a last fully-connected layer of the ANN is a classifier for classification and normalized weights for that layer represent feature centers of the classes, the method comprising:
adjusting the ANN with a first loss function that constrains an included angle of the feature center;
and finishing the training of the ANN under the condition that the included angle distribution of each feature center tends to be uniform.
2. The method of claim 1, the method comprising:
before the ANN is adjusted by using the first loss function, the ANN is trained by using the second loss function to determine the initial feature centers of each type.
3. The method of claim 1, wherein constraining the included angle of the feature centers comprises:
averaging included angles between each feature center and two adjacent feature centers to serve as feature center angles of the feature centers; and
so that all feature center angles tend to be equal and the variance is as zero as possible.
4. The method of claim 3, wherein the feature center angle needs to be greater than a predetermined threshold.
5. The method of claim 1, wherein the first loss function that constrains the included angle of the feature centers is a softmax loss function.
6. The method of claim 1, further comprising;
jointly adjusting the ANN using an unconstrained third loss function and the first loss function.
7. The method of claim 6, wherein the first and third loss functions are given different weights during the adjustment of the ANN.
8. The method of claim 6, wherein the weight of the first loss function gradually increases as the iteration progresses.
9. The method of claim 1, wherein the weight of the loss constraint of the first loss function gradually increases as the iteration progresses.
10. The method of claim 1 wherein the training data input to the ANN is an unbalanced data set with unequal distribution of data types.
11. The method of claim 1, wherein the ANN network is a face recognition neural network.
12. An Artificial Neural Network (ANN) tuning apparatus in which a last fully-connected layer of the artificial neural network is a classifier for classification and normalized weights of the layer represent feature centers of each class, the apparatus comprising:
an included angle adjusting means for adjusting the ANN with a first loss function that constrains an included angle of the feature center;
and the iteration device is used for finishing the training of the ANN under the condition that the included angle distribution of each feature center tends to be uniform.
13. The apparatus of claim 12, the apparatus comprising:
and the initial training device is used for training the ANN by using a second loss function to determine the initial feature centers of various types before the ANN is adjusted by using the first loss function.
14. The apparatus of claim 12, wherein constraining the included angle of the feature centers comprises:
averaging included angles between each feature center and two adjacent feature centers to serve as feature center angles of the feature centers; and
so that all feature center angles tend to be equal and the variance is as zero as possible.
15. The apparatus of claim 12, further comprising;
and the joint adjusting device is used for jointly adjusting the ANN by using the unconstrained third loss function and the first loss function.
16. The apparatus of claim 15, wherein the first and third loss functions are weighted differently in the adjustment of the ANN by the joint adjustment apparatus.
17. The apparatus of claim 15, wherein the iterating means causes the weight of the first loss function to gradually increase as the iteration progresses.
18. The apparatus of claim 12, wherein the iterating means causes the weight of the loss constraint of the first loss function to gradually increase as the iteration progresses.
19. An Artificial Neural Network (ANN) deployment method, comprising:
deploying the neural network model of any one of claims 1-11 on a fixed-point computing platform comprising at least in part an FPGA, a GPU, and/or an ASIC to perform inference.
20. The method of claim 19, wherein the ANN is a face recognition neural network, and the ANN as deployed does not include the last fully-connected layer.
21. A computing device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1-11.
22. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any one of claims 1-11.
23. A fixed-point computing platform, at least partly composed of FPGAs, GPUs and/or ASICs, for deploying artificial neural networks for inferential computation based on acquisition according to the method of any of claims 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810608259.2A CN110598723B (en) | 2018-06-13 | 2018-06-13 | Artificial neural network adjusting method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810608259.2A CN110598723B (en) | 2018-06-13 | 2018-06-13 | Artificial neural network adjusting method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110598723A true CN110598723A (en) | 2019-12-20 |
CN110598723B CN110598723B (en) | 2023-12-12 |
Family
ID=68849618
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810608259.2A Active CN110598723B (en) | 2018-06-13 | 2018-06-13 | Artificial neural network adjusting method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110598723B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408715A (en) * | 2020-03-17 | 2021-09-17 | 杭州海康威视数字技术股份有限公司 | Fixed-point method and device for neural network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170068887A1 (en) * | 2014-09-26 | 2017-03-09 | Samsung Electronics Co., Ltd. | Apparatus for classifying data using boost pooling neural network, and neural network training method therefor |
CN106682734A (en) * | 2016-12-30 | 2017-05-17 | 中国科学院深圳先进技术研究院 | Method and apparatus for increasing generalization capability of convolutional neural network |
CN107103281A (en) * | 2017-03-10 | 2017-08-29 | 中山大学 | Face identification method based on aggregation Damage degree metric learning |
CN107886062A (en) * | 2017-11-03 | 2018-04-06 | 北京达佳互联信息技术有限公司 | Image processing method, system and server |
CN108009625A (en) * | 2016-11-01 | 2018-05-08 | 北京深鉴科技有限公司 | Method for trimming and device after artificial neural network fixed point |
-
2018
- 2018-06-13 CN CN201810608259.2A patent/CN110598723B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170068887A1 (en) * | 2014-09-26 | 2017-03-09 | Samsung Electronics Co., Ltd. | Apparatus for classifying data using boost pooling neural network, and neural network training method therefor |
CN108009625A (en) * | 2016-11-01 | 2018-05-08 | 北京深鉴科技有限公司 | Method for trimming and device after artificial neural network fixed point |
CN106682734A (en) * | 2016-12-30 | 2017-05-17 | 中国科学院深圳先进技术研究院 | Method and apparatus for increasing generalization capability of convolutional neural network |
CN107103281A (en) * | 2017-03-10 | 2017-08-29 | 中山大学 | Face identification method based on aggregation Damage degree metric learning |
CN107886062A (en) * | 2017-11-03 | 2018-04-06 | 北京达佳互联信息技术有限公司 | Image processing method, system and server |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113408715A (en) * | 2020-03-17 | 2021-09-17 | 杭州海康威视数字技术股份有限公司 | Fixed-point method and device for neural network |
WO2021185125A1 (en) * | 2020-03-17 | 2021-09-23 | 杭州海康威视数字技术股份有限公司 | Fixed-point method and apparatus for neural network |
Also Published As
Publication number | Publication date |
---|---|
CN110598723B (en) | 2023-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210089922A1 (en) | Joint pruning and quantization scheme for deep neural networks | |
CN110413255B (en) | Artificial neural network adjusting method and device | |
US10275719B2 (en) | Hyper-parameter selection for deep convolutional networks | |
US10373050B2 (en) | Fixed point neural network based on floating point neural network quantization | |
US20190164057A1 (en) | Mapping and quantification of influence of neural network features for explainable artificial intelligence | |
US20190332941A1 (en) | Learning a truncation rank of singular value decomposed matrices representing weight tensors in neural networks | |
US11574198B2 (en) | Apparatus and method with neural network implementation of domain adaptation | |
US11586924B2 (en) | Determining layer ranks for compression of deep networks | |
US11100374B2 (en) | Apparatus and method with classification | |
US20190354865A1 (en) | Variance propagation for quantization | |
US11704556B2 (en) | Optimization methods for quantization of neural network models | |
US20200265307A1 (en) | Apparatus and method with multi-task neural network | |
US20220108180A1 (en) | Method and apparatus for compressing artificial neural network | |
CN110598837A (en) | Artificial neural network adjusting method and device | |
CN110555340A (en) | neural network computing method and system and corresponding dual neural network implementation | |
US11449758B2 (en) | Quantization and inferencing for low-bitwidth neural networks | |
CN112132255A (en) | Batch normalization layer fusion and quantification method for model inference in artificial intelligence neural network engine | |
EP3882823A1 (en) | Method and apparatus with softmax approximation | |
US11694301B2 (en) | Learning model architecture for image data semantic segmentation | |
CN110598723B (en) | Artificial neural network adjusting method and device | |
CN110633722B (en) | Artificial neural network adjusting method and device | |
US11347968B2 (en) | Image enhancement for realism | |
CN110610227B (en) | Artificial neural network adjusting method and neural network computing platform | |
WO2021158830A1 (en) | Rounding mechanisms for post-training quantization | |
WO2023220892A1 (en) | Expanded neural network training layers for convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200908 Address after: Unit 01-19, 10 / F, 101, 6 / F, building 5, yard 5, Anding Road, Chaoyang District, Beijing 100029 Applicant after: Xilinx Electronic Technology (Beijing) Co.,Ltd. Address before: 100083, 17 floor, four building four, 1 Wang Zhuang Road, Haidian District, Beijing. Applicant before: BEIJING DEEPHI INTELLIGENT TECHNOLOGY Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |