CN110598723B - Artificial neural network adjusting method and device - Google Patents

Artificial neural network adjusting method and device Download PDF

Info

Publication number
CN110598723B
CN110598723B CN201810608259.2A CN201810608259A CN110598723B CN 110598723 B CN110598723 B CN 110598723B CN 201810608259 A CN201810608259 A CN 201810608259A CN 110598723 B CN110598723 B CN 110598723B
Authority
CN
China
Prior art keywords
neural network
network model
loss function
feature
centers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810608259.2A
Other languages
Chinese (zh)
Other versions
CN110598723A (en
Inventor
高梓桁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xilinx Technology Beijing Ltd
Original Assignee
Xilinx Technology Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xilinx Technology Beijing Ltd filed Critical Xilinx Technology Beijing Ltd
Priority to CN201810608259.2A priority Critical patent/CN110598723B/en
Publication of CN110598723A publication Critical patent/CN110598723A/en
Application granted granted Critical
Publication of CN110598723B publication Critical patent/CN110598723B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Abstract

An Artificial Neural Network (ANN) tuning method and apparatus are presented. The last fully connected layer of the ANN is a classifier for classification and the normalized weights of that layer represent feature centers of each class, the method comprising: adjusting the ANN with a first loss function which constrains the included angle of the feature center; and training the ANN is completed under the condition that the included angle distribution of the characteristic centers tends to be uniform. The application can make the included angles among various feature centers more uniform by using the loss function to restrict the angles of the feature centers, thereby solving the problem of adverse effect on classification results caused by insufficient differentiation between intra-class deviation and inter-class deviation.

Description

Artificial neural network adjusting method and device
Technical Field
The application relates to deep learning, in particular to a method and a device for adjusting an artificial neural network.
Background
In recent years, artificial Neural Networks (ANNs) have made significant progress in the fields of object detection, image classification, and the like. However, in engineering, imbalance in the labeling data types often occurs. If there are more categories to be classified, for example, a face recognition task, the imbalance of the categories of the labeling data may have a great influence on the classification result due to the insufficient distinction between the intra-category deviation and the inter-category deviation.
In order to solve the above problems, various improvements have been proposed, such as data enhancement and local feature labeling. However, the above solutions also fail to solve the problem of inaccurate classification results caused by unbalanced classification of the labeling data.
In view of this, there remains a need for an improved neural network tuning method.
Disclosure of Invention
The application can make the included angles among various feature centers more uniform by using the loss function to restrict the angles of the feature centers, thereby well solving the problem of adverse effect on classification results caused by insufficient differentiation between intra-class deviation and inter-class deviation.
According to one aspect of the present application, an Artificial Neural Network (ANN) tuning method is provided, wherein a last fully connected layer of the ANN is a classifier for classification and a normalized weight of the layer represents a feature center of each class, the method comprising: adjusting the ANN with a first loss function which constrains the included angle of the feature center; and training the ANN is completed under the condition that the included angle distribution of the characteristic centers tends to be uniform. Thus, the problem of inaccurate classification caused by uneven distribution of the characteristic center angles, for example, due to unbalance of the training data set, is eliminated.
The first loss function that constrains the angle of the feature centers may be a common softmax loss function. The training data entered into the ANN may be unbalanced data sets with various types of data being unevenly distributed. And the ANN network may be a face recognition neural network.
Preferably, the adjusting method of the present application may further include: before the ANN is adjusted by using the first loss function, the ANN is trained by using the second loss function to determine initial feature centers of various types. Therefore, the included angle constraint loss function can be used for training or fine tuning of the network according to the requirement.
During the training and/or fine tuning described above, the ANN may be adjusted jointly using an unconstrained third loss function and the first loss function. Wherein the first and third loss functions may be weighted differently during adjustment of the ANN. Preferably, the weight of the first loss function gradually increases with the deepening of the iteration. Alternatively, the weight of the loss constraint of the first loss function gradually increases as the iteration deepens. Therefore, the time and the intensity for applying the angle constraint can be flexibly selected according to practical application.
Constraints on the included angle of the feature centers may include: taking the average of the included angles between each feature center and two adjacent feature centers as a feature center angle of the feature center; and such that all feature center angles tend to be equal and the variance is as zero as possible. Preferably, the feature center angle needs to be greater than a predetermined threshold.
According to another aspect of the present application, an Artificial Neural Network (ANN) tuning apparatus is provided, wherein a last fully connected layer of the artificial neural network is a classifier for classification and a normalized weight of the layer represents a feature center of each class, the apparatus comprising: the included angle adjusting device is used for adjusting the ANN according to a first loss function for restraining the included angle of the characteristic center; and the iteration device is used for completing training of the ANN under the condition that the included angle distribution of the characteristic centers tends to be uniform.
Preferably, the apparatus may further include: and the initial training device is used for training the ANN by using a second loss function before the first loss function is used for adjusting the ANN so as to determine initial feature centers of various types.
Preferably, the apparatus may further include: and the joint adjustment device is used for jointly adjusting the ANN by using the unconstrained third loss function and the first loss function. The first and third loss functions may be weighted differently during adjustment of the ANN by the joint adjustment device.
The iteration means may cause the weight of the first loss function to gradually increase as the iteration deepens. Alternatively or additionally, the iteration means may cause the weight of the loss constraint of the first loss function to gradually increase as the iteration deepens.
Constraints on the included angle of the feature centers may include: taking the average of the included angles between each feature center and two adjacent feature centers as a feature center angle of the feature center; and such that all feature center angles tend to be equal and the variance is as zero as possible. Preferably, the feature center angle needs to be greater than a predetermined threshold.
According to one aspect of the present application, an Artificial Neural Network (ANN) deployment method is presented, comprising: the neural network model adjusted as described above is deployed on a fixed-point computing platform that includes, at least in part, an FPGA, GPU, and/or ASIC to perform reasoning. The ANN may be a face recognition neural network, and the ANN deployed does not include the last fully connected layer.
According to yet another aspect of the present application, there is provided a computing device comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the face recognition neural network adjustment method as described above.
According to another aspect of the present application, a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the face recognition neural network tuning method as described above is presented.
According to a further aspect of the application, a fixed-point computing platform is proposed, which is at least partly constituted by an FPGA, a GPU and/or an ASIC, for performing an inference calculation based on a fixed-point neural network model obtained according to the above method.
According to the ANN adjusting method and device, the angles of the feature centers are restrained, so that the included angles among various feature centers are more uniform, adverse effects on classification results caused by insufficient differentiation of intra-class deviation and inter-class deviation are well solved, and the ANN adjusting method and device are particularly suitable for the situation that a training data set is unbalanced. The above-mentioned angular constraint may be imposed by a loss function and different addition opportunities and weights may be chosen depending on the specific application, thereby achieving a more uniform characteristic central angle distribution while ensuring a normal convergence of the network.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following more particular descriptions of exemplary embodiments of the disclosure as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout exemplary embodiments of the disclosure.
Fig. 1 shows a schematic diagram of a typical CNN.
Fig. 2 shows a flow chart of an ANN adjustment method according to an embodiment of the present application.
Fig. 3A and 3B illustrate examples of feature mapping to a two-dimensional plane under training of unbalanced data sets before and after optimization using the adjustment scheme of the present application.
Fig. 4 shows a schematic diagram of an ANN adjustment device according to an embodiment of the present application.
FIG. 5 illustrates a schematic diagram of a computing device that may be used to implement the adjustment methods described above, according to one embodiment of the application.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The scheme of the application is applicable to various Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs). A background explanation will be given below with CNN as an example.
CNN basic concept
CNNs achieve the most advanced performance in a wide range of vision-related tasks. To aid in understanding the CNN-based classification algorithm (e.g., face recognition algorithm) analyzed in the present application, the basic knowledge of CNN and the basic concept of fixed-point quantization are first introduced.
As shown in fig. 1, a typical CNN consists of a series of layers that run in order.
The CNN neural network is composed of an input layer, an output layer and a plurality of hidden layers which are connected in series. The first layer of CNN reads input values, such as input images, and outputs a series of activation values (also referred to as feature maps). The lower layer reads the activation value generated by the upper layer and outputs a new activation value. The last classifier (classifer) outputs the probability of each class to which the input image may belong.
These layers can be broadly divided into weighted layers (e.g., convolutional layers, fully-connected layers, batch normalization layers, etc.) and unweighted layers (e.g., pooling layers, reLU layers, softmax layers, etc.). Therein, the CONV layer (Convolutional layers, convolution layer) takes as input a series of feature maps and convolves with a convolution kernel to obtain an output activation value. The pooling layer is typically connected to the CONV layer for outputting a maximum or average value for each partition (sub area) in each feature map, thereby reducing the computational effort by sub-sampling while maintaining some degree of displacement, scale and deformation invariance. Multiple alternations between convolutional and pooling layers may be included in one CNN, thereby progressively reducing the spatial resolution and increasing the number of feature maps. And then may be connected to at least one fully connected layer to obtain a one-dimensional vector output comprising a plurality of eigenvalues by linear transformation applied to the input eigenvectors.
In general, the operation of the weighted layers can be expressed as:
Y=WX+b,
wherein W is a weight value, b is a bias, X is an input activation value, and Y is an output activation value.
The operation of the unweighted layers can be expressed as:
Y=f(X),
wherein f (X) is a nonlinear function.
Here, "weights" refer to parameters in the hidden layer, which may be understood broadly to include bias, are values learned through the training process, and remain unchanged at the time of reasoning; the activation value refers to a value transmitted between layers, also called a feature value, calculated from an input value and a weight value, from an input layer to an output of each layer. Unlike the weight values, the distribution of activation values may dynamically change according to the input data samples.
Before using CNNs for reasoning (e.g., image classification), the CNNs first need to be trained. Parameters, such as weights and biases, of the various layers of the neural network model are determined by extensive importation of training data.
Training of CNN
The training model represents the ideal value for all weights and biases learned (determined) by labeled samples. These determined weights and biases enable high-accuracy reasoning of the input eigenvalues at the neural network deployment stage, for example, correct classification of the input pictures.
In supervised learning, machine learning algorithms learn parameters by examining multiple samples and attempting to find a model that minimizes losses, a process known as empirical risk minimization.
Loss is a penalty for bad predictions. That is, the loss is a value that represents the accuracy of the model predictions for a single sample. If the prediction of the model is completely accurate, the penalty is zero, otherwise the penalty is larger. The goal of the training model is to find a set of weights and biases that average loss "less" from all samples.
In the training process of the neural network, a loss function may be defined in order to quantify whether the current weights and biases enable the network inputs to fit all of the network inputs. Thus, the goal of training the network may be translated into a process that minimizes the loss function of weights and biases. Typically, the above-described minimization process is implemented using a gradient descent algorithm (in multi-layer neural network training, a back-propagation algorithm is used).
In the back propagation algorithm, a iterative process of forward propagation and back propagation is involved. The forward propagation process is a process in which the inter-layer neurons are connected by a weight matrix such that the stimulus (eigenvalue) is continuously transferred from the previous layer to the next layer via the stimulus function of each layer. In back propagation, the error of the layer needs to be derived back from the error of the next layer. The weight and bias are continuously adjusted through the forward and backward propagation iterative process, so that the loss function gradually approaches to the minimum value, and the training of the neural network is completed.
ANN adjustment scheme of the application
In recent years, artificial Neural Networks (ANNs) have made significant progress in the fields of object detection, image classification, and the like. But imbalance in the annotation data category often occurs during the network training process. If there are more categories to be classified, for example, a face recognition task, the imbalance of the categories of the labeling data may have a great influence on the classification result due to the insufficient distinction between the intra-category deviation and the inter-category deviation.
For example, in the practical application scenario of the face recognition network, the face data that can be obtained often presents unbalanced long tail distribution, and the number of pictures of each ID (corresponding to one face) varies from one to hundreds, which is catastrophic for the data-driven neural network—the increase of the data does not necessarily bring about improvement of the algorithm performance, and even brings about the opposite effect. Therefore, the face recognition problem of the unbalanced data set is particularly urgent.
Various algorithms have been proposed to address this problem, mainly involving data enhancement, and local feature labeling. However, the above solutions also fail to solve the problem of inaccurate classification results caused by unbalanced classification of the labeling data.
In view of this, the present application proposes an improvement to the loss function. Unlike other methods for the euclidean distance domain, the present application improves in the angular domain. Compared with the Euclidean distance domain, the angle domain is more naive and natural and approaches the essence of the characteristic distribution, so that better performance can be obtained.
Fig. 2 shows a flow chart of an ANN adjustment method according to an embodiment of the present application. Here, the last fully connected layer of ANN may be a classifier for classification and the normalized weights of that layer represent the feature centers of the classes.
In step S210, the ANN is adjusted with a first loss function that constrains the included angle of the feature center. In step S220, training of the ANN is completed under the condition that the distribution of the included angles of the feature centers tends to be uniform.
The ANN adjustment method of the application particularly relates to application scenes with more classifications and with unbalanced long tail distribution of the data set for training. In one embodiment, the training data entered into the ANN may be unbalanced data sets with unequal data distributions of various types. In another embodiment, the ANN network is a face recognition neural network.
In the training process of the face recognition neural network, the activation value of the last full-connection layer is input as a characteristic. By normalizing the weight of the last full-connection layer, the center of the feature represented by the normalized weight can be obtained. For classes with smaller numbers of pictures, the included angle between the feature center of the class and the feature center of the adjacent class is smaller, so that the feature space of the class is smaller, and for the class, the probability of being misclassified by the classifier is also increased, which is obviously disadvantageous.
FIG. 3A illustrates an example of feature mapping to a two-dimensional plane under unbalanced data set training in the prior art. For convenience of explanation, ten ID classifications are made by the classifier in the figure. It should be appreciated that there are many more categories that need to be classified in real applications, for example on the order of tens of thousands. As shown in the figure, the included angle between the class with more pictures and the adjacent class feature center is larger, the included angle between the class feature center with fewer pictures and the adjacent class feature center is smaller, the feature space of the class is smaller, and the probability of being misclassified by the classifier is larger.
According to the adjustment scheme of the application, a constraint can be added to the included angle between each type of feature center and the adjacent feature center, so that all types of feature centers are uniformly distributed in the whole feature space. Therefore, the feature space of each class is the same in size, so that the neural network can concentrate on optimizing the distance between the feature and the center of the feature of each class, and further the performance of the face recognition algorithm is improved.
FIG. 3B shows an example of a feature mapped to a two-dimensional plane that is trained using the angle constraints of the present application. As shown in the figure, the optimized feature center distribution based on the method is more uniform and compact, so that the classification probability of the corresponding features can be correspondingly improved.
According to different application scenes, the ANN can be trained by directly using the first loss function for restraining the included angle of the characteristic center; the ANN may also be trained using a second loss function to determine initial feature centers for each type prior to adjusting the ANN using the first loss function. In other words, the first loss function that can constrain the included angle of the feature center according to the present application may be a training loss function or a fine tuning loss function. In the case of fine tuning, a face model may first be trained using an additional loss function to provide a degree of separation between features. At this time, the first function of the application is used for fine adjustment, so that the distribution of the characteristic centers is more uniform, and the recognition effect is better.
Fine tuning or training of the ANN using the first function may also be used with other loss functions. In one embodiment, the ANN is adjusted jointly using an unconstrained third loss function and the first loss function. Preferably, the first and third loss functions are weighted differently during adjustment of the ANN.
In one embodiment, the third loss function may be a loss function that gives different weights to different categories. The above function may weight rare categories by, for example, 1 and dominant categories with a number of data samples greater than a particular value by a weight inversely related to their probability of being correctly predicted. Thereby reducing the excessive influence of the dominant category on the network training and improving the prediction accuracy of the sparse category. The function can be used in combination with the loss function for adjusting the central characteristic angle, so that the classification precision of the multi-classification network is further improved.
In one embodiment, the weight of the first loss function increases gradually with increasing iteration. In another embodiment, the weight of the loss constraint, which may be the first loss function, gradually increases as the iteration deepens. In other words, the adjustment scheme of the application can perform network training through an annealing mode. Specifically, the features are approximately randomly distributed at the time of network initialization, and the variance of the central angles of the features is maximum, and direct constraint training may revert to causing network misconvergence. A weight term can then be added that is related to the number of iterations, and the value of the weight term can be increased from 0 to 1 as the number of iterations increases, and finally stabilizes at 1. This allows the feature centers to be uniformly distributed after a preliminary distribution of features.
In one embodiment, the constraining the included angle of the feature centers includes: taking the average of the included angles between each feature center and two adjacent feature centers as a feature center angle of the feature center; and such that all feature center angles tend to be equal and the variance is as zero as possible. Preferably, the feature center angle needs to be greater than a predetermined threshold to meet the minimum requirement for differentiation.
In one embodiment, the first loss function that constrains the included angle of the feature centers is a softmax loss function.
It should be understood that the terms "first," "second," and "third" are used herein to describe the fact that the penalty functions for applying an angular constraint, the penalty functions for performing an initial training of the network, and the penalty functions for performing a joint training are not exactly the same, but rather do not specify or imply any order or relationship between the three. For example, the first and second functions herein may both be softmax penalty functions, but since the first function is a softmax penalty function that imposes an angular constraint, it still differs from the original softmax penalty function used to train the initial network.
ANN adjusting device
The above adjustment method of the present application can be implemented by a specific ANN adjustment device. Fig. 4 shows a schematic diagram of an ANN adjustment device according to an embodiment of the present application. Here, the last fully connected layer of the ANN is a classifier for classification and the normalized weights of that layer represent the feature centers of the classes.
As shown, the ANN adjustment means 400 may comprise an angle adjustment means 410 and an iteration means 420. The angle adjustment means 410 may be adapted to adjust the ANN with a first loss function constraining the angle of the feature centers. The iteration device 420 may be used to complete training of the ANN while making the distribution of the included angles of the feature centers tend to be uniform.
Preferably, the ANN adjustment means may further comprise initial training means 430 which may be used to train the ANN to determine initial feature centers of the classes using a second loss function before adjusting the ANN using the first loss function.
Preferably, the ANN adjustment means may further comprise joint adjustment means 440, which may be used to jointly adjust the ANN using the unconstrained third loss function and the first loss function. The first and third loss functions are weighted differently during the adjustment of the ANN by the joint adjustment device 440. The iteration means 420 may cause the weight of the first loss function to gradually increase as the iteration deepens. In a parallel scenario, the iteration means 420 may also cause the weight of the loss constraint of the first loss function to gradually increase as the iteration deepens.
Correspondingly to the adjustment method, the constraint on the included angle of the feature center may also include: taking the average of the included angles between each feature center and two adjacent feature centers as a feature center angle of the feature center; and such that all feature center angles tend to be equal and the variance is as zero as possible.
In one embodiment, the application further comprises an Artificial Neural Network (ANN) deployment method for deploying the neural network model obtained by adjustment as described above on a fixed-point computing platform comprising, at least in part, an FPGA, a GPU and/or an ASIC to perform reasoning. Such as face recognition tasks. The bit width of the fixed-point quantization may be determined by the bit width of the FPGA, GPU, and/or ASIC.
The ANN may be a face recognition neural network, and the ANN deployed does not include the last fully connected layer. The face recognition neural network for training further comprises a penultimate full connection layer 1 for outputting the extracted face feature vectors. In the deployed neural network, the last-last fully-connected layer in training can be used as the output layer of the neural network model actually deployed on the hardware computing platform, since the last layer does not output. The face feature vector that the neural network model performs the inference output is used for comparison with existing face features (e.g., existing face features stored in a database) for face recognition.
In addition, the face recognition can collect a large amount of application data in the actual application scene, and the deployed fixed-point model can be finely tuned on the hardware platform directly by utilizing the data, so that the effect of thermal updating is realized. Thus, in one embodiment, the deployment method of the present application may further comprise using the verification of the reasoning for fine-tuning of the deployed neural network.
It will be apparent to those skilled in the art that while described above in particular in connection with face recognition tasks, the adaptation of the present application is equally applicable to other multi-classification networks, such as classification of vehicle models and animal species.
FIG. 5 illustrates a schematic diagram of a computing device that may be used to implement the adjustment methods described above, according to one embodiment of the application.
Referring to fig. 5, a computing device 500 includes a memory 510 and a processor 520.
Processor 550 may be a multi-core processor or may include multiple processors. In some embodiments, processor 520 may comprise a general-purpose host processor and one or more special coprocessors such as, for example, a Graphics Processor (GPU), a Digital Signal Processor (DSP), etc. In some embodiments, processor 520 may be implemented using custom circuitry, for example, an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).
Memory 510 may include various types of storage units, such as system memory, read Only Memory (ROM), and persistent storage. Where the ROM may store static data or instructions that are required by the processor 520 or other modules of the computer. The persistent storage may be a readable and writable storage. The persistent storage may be a non-volatile memory device that does not lose stored instructions and data even after the computer is powered down. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the persistent storage may be a removable storage device (e.g., diskette, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as dynamic random access memory. The system memory may store instructions and data that are required by some or all of the processors at runtime. Furthermore, memory 510 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic disks, and/or optical disks may also be employed. In some embodiments, memory 510 may include a readable and/or writable removable storage device such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only blu-ray disc, an ultra-dense disc, a flash memory card (e.g., SD card, min SD card, micro-SD card, etc.), a magnetic floppy disk, and the like. The computer readable storage medium does not contain a carrier wave or an instantaneous electronic signal transmitted by wireless or wired transmission.
The memory 510 has stored thereon a processable code that, when processed by the processor 520, causes the processor 520 to perform the neural network tuning method described above.
In actual use, the computing device 500 may be a general purpose computing device comprising mass storage 510 and CPU 520, the device being configured to perform training of neural networks. The neural network for classification obtained according to the tuning scheme of the present application may be executed on a fixed-point computing platform implemented at least in part by an FPGA, GPU, and/or ASIC.
Furthermore, the method according to the application may also be implemented as a computer program or computer program product comprising computer program code instructions for performing the steps defined in the above-mentioned method of the application.
Alternatively, the application may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the application.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description of embodiments of the application has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (19)

1. A neural network model tuning method, wherein the neural network model is used for classifying pictures, a last fully connected layer is a classifier for classifying, and a normalized weight of the layer represents feature centers of each class, the method comprising:
adjusting the neural network model with a first loss function which constrains the included angle of the feature center in an angle domain;
the adjustment of the neural network model is completed under the condition that the included angle distribution of the characteristic centers in the angle domain tends to be uniform,
wherein the method comprises the following steps:
training the neural network model using a second loss function to determine initial feature centers for each class prior to tuning the neural network model using the first loss function,
wherein the constraint on the included angle of the feature center includes:
taking the average of the included angles between each feature center and two adjacent feature centers as a feature center angle of the feature center; and
so that all feature center angles tend to be equal and the variance is as zero as possible.
2. The method of claim 1, wherein the feature center angle is required to be greater than a predetermined threshold.
3. The method of claim 1, wherein the first loss function that constrains the included angle of the feature centers is a softmax loss function.
4. The method of claim 1, further comprising;
the neural network model is jointly tuned using the unconstrained third loss function and the first loss function.
5. The method of claim 4, wherein the first and third loss functions are weighted differently during the adjustment of the neural network model.
6. The method of claim 4, wherein the weight of the first loss function increases gradually with increasing iteration.
7. The method of claim 1, wherein the weight of the loss constraint of the first loss function gradually increases as the iteration deepens.
8. The method of claim 1, wherein the training data input to the neural network model is an unbalanced data set with unequal data distribution of the classes.
9. The method of claim 1, wherein the neural network model is a face recognition neural network model.
10. A neural network model tuning apparatus in which the neural network model classifies pictures, a last fully connected layer is a classifier for classification and normalized weights of the layer represent feature centers of each class, the apparatus comprising:
the included angle adjusting device is used for adjusting the neural network model by a first loss function which is used for restraining the included angle of the characteristic center in an angle domain;
the iteration device is used for completing adjustment of the neural network model under the condition that the included angle distribution of each characteristic center in the angle domain tends to be uniform; and
initial training means for training the neural network model using a second loss function to determine initial feature centers for each type prior to adjusting the neural network model using the first loss function,
wherein the constraint on the included angle of the feature center includes:
taking the average of the included angles between each feature center and two adjacent feature centers as a feature center angle of the feature center; and
so that all feature center angles tend to be equal and the variance is as zero as possible.
11. The apparatus of claim 10, further comprising;
and the joint adjustment device is used for performing joint adjustment on the neural network model by using the unconstrained third loss function and the first loss function.
12. The apparatus of claim 11, wherein the first and third loss functions are weighted differently during an adjustment of the neural network model by the joint adjustment apparatus.
13. The apparatus of claim 11, wherein the iterating means causes the weight of the first loss function to gradually increase as the iteration deepens.
14. The apparatus of claim 10, wherein the iterating means causes a weight of a loss constraint of the first loss function to gradually increase as an iteration deepens.
15. A neural network model deployment method, comprising:
deploying the neural network model adapted by the method of any of claims 1-9 on a fixed-point computing platform comprising at least in part an FPGA, GPU and/or ASIC to perform reasoning.
16. The method of claim 15, wherein the neural network model is a face recognition neural network model and a penultimate full connection layer in training is used as an output layer of the neural network model that is actually deployed on a hardware computing platform.
17. A computing device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any of claims 1-9.
18. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1-9.
19. A fixed-point computing platform, at least partially made up of FPGAs, GPUs and/or ASICs, for deploying inferential computation based on neural network models obtained by a method according to any of claims 1-9.
CN201810608259.2A 2018-06-13 2018-06-13 Artificial neural network adjusting method and device Active CN110598723B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810608259.2A CN110598723B (en) 2018-06-13 2018-06-13 Artificial neural network adjusting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810608259.2A CN110598723B (en) 2018-06-13 2018-06-13 Artificial neural network adjusting method and device

Publications (2)

Publication Number Publication Date
CN110598723A CN110598723A (en) 2019-12-20
CN110598723B true CN110598723B (en) 2023-12-12

Family

ID=68849618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810608259.2A Active CN110598723B (en) 2018-06-13 2018-06-13 Artificial neural network adjusting method and device

Country Status (1)

Country Link
CN (1) CN110598723B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408715A (en) * 2020-03-17 2021-09-17 杭州海康威视数字技术股份有限公司 Fixed-point method and device for neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682734A (en) * 2016-12-30 2017-05-17 中国科学院深圳先进技术研究院 Method and apparatus for increasing generalization capability of convolutional neural network
CN107103281A (en) * 2017-03-10 2017-08-29 中山大学 Face identification method based on aggregation Damage degree metric learning
CN107886062A (en) * 2017-11-03 2018-04-06 北京达佳互联信息技术有限公司 Image processing method, system and server
CN108009625A (en) * 2016-11-01 2018-05-08 北京深鉴科技有限公司 Method for trimming and device after artificial neural network fixed point

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102445468B1 (en) * 2014-09-26 2022-09-19 삼성전자주식회사 Apparatus for data classification based on boost pooling neural network, and method for training the appatratus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009625A (en) * 2016-11-01 2018-05-08 北京深鉴科技有限公司 Method for trimming and device after artificial neural network fixed point
CN106682734A (en) * 2016-12-30 2017-05-17 中国科学院深圳先进技术研究院 Method and apparatus for increasing generalization capability of convolutional neural network
CN107103281A (en) * 2017-03-10 2017-08-29 中山大学 Face identification method based on aggregation Damage degree metric learning
CN107886062A (en) * 2017-11-03 2018-04-06 北京达佳互联信息技术有限公司 Image processing method, system and server

Also Published As

Publication number Publication date
CN110598723A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
US20210089922A1 (en) Joint pruning and quantization scheme for deep neural networks
US10373050B2 (en) Fixed point neural network based on floating point neural network quantization
US10223635B2 (en) Model compression and fine-tuning
US11238346B2 (en) Learning a truncation rank of singular value decomposed matrices representing weight tensors in neural networks
US20160224903A1 (en) Hyper-parameter selection for deep convolutional networks
US20170091619A1 (en) Selective backpropagation
US11586924B2 (en) Determining layer ranks for compression of deep networks
WO2016182659A1 (en) Bit width selection for fixed point neural networks
US11443514B2 (en) Recognizing minutes-long activities in videos
US20190354865A1 (en) Variance propagation for quantization
CN112508186A (en) Method for training neural network for image recognition and neural network device
US20200265307A1 (en) Apparatus and method with multi-task neural network
US20220121949A1 (en) Personalized neural network pruning
US11449758B2 (en) Quantization and inferencing for low-bitwidth neural networks
CN113065632A (en) Method and apparatus for validating training of neural networks for image recognition
CN110598837A (en) Artificial neural network adjusting method and device
CN110598723B (en) Artificial neural network adjusting method and device
US20210110268A1 (en) Learned threshold pruning for deep neural networks
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
CN110633722B (en) Artificial neural network adjusting method and device
US20230076290A1 (en) Rounding mechanisms for post-training quantization
US11636698B2 (en) Image processing method and apparatus with neural network adjustment
US20230306233A1 (en) Simulated low bit-width quantization using bit shifted neural network parameters
CN116109841B (en) Zero sample target detection method and device based on dynamic semantic vector
US20240160926A1 (en) Test-time adaptation via self-distilled regularization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200908

Address after: Unit 01-19, 10 / F, 101, 6 / F, building 5, yard 5, Anding Road, Chaoyang District, Beijing 100029

Applicant after: Xilinx Electronic Technology (Beijing) Co.,Ltd.

Address before: 100083, 17 floor, four building four, 1 Wang Zhuang Road, Haidian District, Beijing.

Applicant before: BEIJING DEEPHI INTELLIGENT TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant