CN110598723A

CN110598723A - Artificial neural network adjusting method and device

Info

Publication number: CN110598723A
Application number: CN201810608259.2A
Authority: CN
Inventors: 高梓桁
Original assignee: Beijing Shenjian Intelligent Technology Co Ltd
Current assignee: Xilinx Technology Beijing Ltd
Priority date: 2018-06-13
Filing date: 2018-06-13
Publication date: 2019-12-20
Anticipated expiration: 2038-06-13
Also published as: CN110598723B

Abstract

An Artificial Neural Network (ANN) adjustment method and apparatus are provided. The last fully connected layer of the ANN is a classifier for classification and the normalized weights of that layer represent the feature centers of the classes, the method comprising: adjusting the ANN with a first loss function that constrains an included angle of the feature center; and finishing the training of the ANN under the condition that the included angle distribution of each feature center tends to be uniform. According to the invention, the angle of the characteristic center is constrained by using the loss function, so that the included angle between various characteristic centers is more uniform, and the problem of adverse influence on classification results caused by insufficient discrimination of intra-class deviation and inter-class deviation can be well solved.

Description

Artificial neural network adjusting method and device

Technical Field

The invention relates to deep learning, in particular to an adjusting method and device for an artificial neural network.

Background

In recent years, Artificial Neural Networks (ANN) have made significant progress in the fields of object detection, image classification, and the like. However, in engineering, the situation of the imbalance of the labeled data categories often occurs. If there are many classes to be classified, for example, a face recognition task, the imbalance of the classes of the labeled data will have a large impact on the classification result due to the insufficient differentiation between the intra-class deviation and the inter-class deviation.

In order to solve the above problems, various improvements have been proposed, such as data enhancement and local feature labeling. However, the above solutions also fail to solve the problem of inaccurate classification result caused by unbalanced classification of the labeled data.

In view of the above, there is still a need for an improved neural network tuning method.

Disclosure of Invention

According to the invention, the angle of the characteristic center is constrained by using the loss function, so that the included angle between various characteristic centers is more uniform, and the problem of adverse influence on classification results caused by insufficient discrimination of intra-class deviation and inter-class deviation is well solved.

According to an aspect of the present invention, there is provided an Artificial Neural Network (ANN) tuning method, wherein a last fully connected layer of the ANN is a classifier for classification and normalized weights of the layer represent feature centers of each class, the method comprising: adjusting the ANN with a first loss function that constrains an included angle of the feature center; and finishing the training of the ANN under the condition that the included angle distribution of each feature center tends to be uniform. Thus, the problem of inaccurate classification caused by uneven distribution of feature center angles due to imbalance of the training data set is eliminated.

Here, the first loss function that constrains the angle of the feature center may be a common softmax loss function. The training data input to the ANN may be an unbalanced data set with unequal distribution of data of various types. And the ANN network may be a face recognition neural network.

Preferably, the adjusting method of the present invention may further include: before the ANN is adjusted by using the first loss function, the ANN is trained by using the second loss function to determine the initial feature centers of each type. Therefore, the included angle constraint loss function of the invention can be used for training or fine tuning of the network as required.

During the training and/or tuning process, the ANN may be jointly adjusted using the unconstrained third loss function and the first loss function. Wherein the first loss function and the third loss function may be given different weights in the tuning of the ANN. Preferably, the weight of the first penalty function increases gradually as the iteration progresses. Alternatively, the weight of the loss constraint of the first loss function increases gradually as the iteration progresses. Therefore, the time and the strength for applying the angle constraint can be flexibly selected according to the practical application.

Constraining the included angle of the feature centers may include: averaging included angles between each feature center and two adjacent feature centers to serve as feature center angles of the feature centers; and so that all feature center angles tend to be equal and the variance is as zero as possible. Preferably, the characteristic center angle needs to be greater than a predetermined threshold.

According to another aspect of the present invention, there is provided an Artificial Neural Network (ANN) adjusting apparatus, wherein a last fully-connected layer of the artificial neural network is a classifier for classification and normalized weights of the layer represent feature centers of classes, the apparatus comprising: an included angle adjusting means for adjusting the ANN with a first loss function that constrains an included angle of the feature center; and the iteration device is used for finishing the training of the ANN under the condition that the included angle distribution of each feature center tends to be uniform.

Preferably, the apparatus may further comprise: and the initial training device is used for training the ANN by using a second loss function to determine the initial feature centers of various types before the ANN is adjusted by using the first loss function.

Preferably, the apparatus may further comprise: and the joint adjusting device is used for jointly adjusting the ANN by using the unconstrained third loss function and the first loss function. The first and third loss functions may be given different weights in the adjustment of the ANN by the joint adjustment means.

The iteration means may be such that the weight of the first loss function increases progressively as the iteration progresses. Alternatively or additionally, the iteration means may be such that the weight of the loss constraint of the first loss function increases gradually as the iteration progresses.

According to an aspect of the present invention, an Artificial Neural Network (ANN) deployment method is provided, including: the neural network model adjusted as above is deployed on a fixed-point computing platform comprising at least in part an FPGA, a GPU, and/or an ASIC to perform reasoning. The ANN may be a face recognition neural network, and the ANN as deployed does not include the last fully-connected layer.

According to yet another aspect of the invention, a computing device is presented, comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the face recognition neural network adjustment method as described above.

According to another aspect of the present invention, a non-transitory machine-readable storage medium is proposed, on which executable code is stored, which, when executed by a processor of an electronic device, causes the processor to perform the face recognition neural network adjusting method as described above.

According to a further aspect of the invention, a fixed-point computing platform is proposed, which is at least partly composed of an FPGA, a GPU and/or an ASIC, for performing inferential computations based on a fixed-point neural network model obtained according to the above method.

The ANN adjusting method and the ANN adjusting device can enable the included angles among various feature centers to be more uniform by restricting the angles of the feature centers, so that the adverse effect on classification results due to insufficient discrimination of intra-class deviation and inter-class deviation is well solved, and the ANN adjusting method and the ANN adjusting device are particularly suitable for the condition that a training data set is unbalanced. The above-mentioned angle constraints can be applied by a loss function, and different adding timings and weights can be selected according to specific applications, thereby realizing more uniform characteristic center angle distribution under the condition of ensuring normal convergence of the network.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.

Fig. 1 shows a schematic diagram of a typical CNN.

Fig. 2 shows a flow diagram of an ANN adjustment method according to an embodiment of the present invention.

Fig. 3A and 3B show an example of feature mapping to a two-dimensional plane under unbalanced dataset training before and after optimization using the adaptation scheme of the present invention.

Fig. 4 shows a schematic diagram of an ANN adjustment apparatus according to an embodiment of the present invention.

Fig. 5 shows a schematic structural diagram of a computing device which can be used for implementing the above adjustment method according to an embodiment of the present invention.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The scheme of the present application is applicable to various Artificial Neural Networks (ANN), including Deep Neural Networks (DNN), Recurrent Neural Networks (RNN), and Convolutional Neural Networks (CNN). The following is a description with a certain degree of background using CNN as an example.

CNN basic concept

CNN achieves the most advanced performance in a wide range of vision-related tasks. To aid in understanding the CNN-based classification algorithms (e.g., face recognition algorithms) analyzed in this application, the basic concepts of CNN's underlying knowledge and fixed-point quantization are first introduced.

As shown in fig. 1, a typical CNN consists of a series of layers that run in order.

The CNN neural network is composed of an input layer, an output layer and a plurality of hidden layers which are connected in series. The first layer of the CNN reads an input value, such as an input image, and outputs a series of activation values (which may also be referred to as a feature map). The lower layer reads the activation value generated by the previous layer and outputs a new activation value. The last classifier (classifier) outputs the probability of each class to which the input image may belong.

These layers can be broadly divided into weighted layers (e.g., convolutional layers, fully-connected layers, batch normalization layers, etc.) and unweighted layers (e.g., pooling layers, ReLU layers, Softmax layers, etc.). Here, the CONV layer (convolution layer) takes a series of feature maps as input, and convolves with a convolution kernel to obtain an output activation value. The pooling layer is typically connected to the CONV layer for outputting a maximum or average value for each partition (sub area) in each feature map, thereby reducing the computational effort by sub-sampling while maintaining some degree of displacement, scale and deformation invariance. Multiple alternations between convolutional and pooling layers may be included in a CNN, thereby gradually reducing the spatial resolution and increasing the number of feature maps. A one-dimensional vector output comprising a plurality of eigenvalues may then be derived by applying a linear transformation on the input eigenvector, possibly connected to at least one fully connected layer.

In general, the operation of weighted layers can be represented as:

Y＝WX+b，

where W is the weight value, b is the bias, X is the input activation value, and Y is the output activation value.

The operation of the unweighted layer can be represented as:

Y＝f(X)，

wherein f (X) is a non-linear function.

Here, "weights" (weights) refer to parameters in the hidden layer, which in a broad sense can include biases, are values learned through the training process, and remain unchanged at inference; the activation value refers to a value, also referred to as a feature value, transferred between layers, starting from an input layer, and an output of each layer is obtained by an operation of the input value and a weight value. Unlike the weight values, the distribution of activation values varies dynamically according to the input data sample.

Before using CNN for reasoning (e.g., image classification), CNN needs to be trained first. Parameters, such as weights and biases, of the various layers of the neural network model are determined through a large import of training data.

Training of CNN

The training model represents the ideal values for learning (determining) all weights and biases by labeled samples. These determined weights and biases enable a high-accuracy inference of the input feature values, e.g. a correct classification of the input pictures, during the neural network deployment phase.

In supervised learning, a machine learning algorithm learns parameters by examining multiple samples and attempting to find a model that minimizes losses, a process known as empirical risk minimization.

The loss is a penalty for poor prediction. That is, the penalty is a numerical value representing how accurate the model predicts for a single sample. If the prediction of the model is completely accurate, the loss is zero, otherwise the loss will be large. The goal of training the model is to find a set of weights and biases that are "less" lost on average from all samples.

In the training process of the neural network, a loss function may be defined in order to quantify whether the current weights and biases are such that the network inputs fit all of the network inputs. Thus, the goal of training the network can be translated into a process that minimizes the loss function of weights and biases. Typically, a gradient descent algorithm (in multi-layer neural network training, a back propagation algorithm is used) is used to achieve the above-described minimization process.

In the back-propagation algorithm, a repetitive iterative process of forward propagation and back propagation is involved. The forward propagation process is a process in which neurons in the layers are connected through a weight matrix so that stimuli (eigenvalues) are continuously transmitted from the previous layer to the next layer through the excitation function of each layer. In the backward propagation, the error of the current layer needs to be reversely derived from the error of the next layer. Therefore, the weights and the bias are continuously adjusted through the iterative process of forward propagation and backward propagation, so that the loss function is gradually close to the minimum value, and the training of the neural network is completed.

The inventionANN adjustment scheme of

In recent years, Artificial Neural Networks (ANN) have made significant progress in the fields of object detection, image classification, and the like. But the situation of labeling data category imbalance often occurs in the network training process. If there are many classes to be classified, for example, a face recognition task, the imbalance of the classes of the labeled data will have a large impact on the classification result due to the insufficient differentiation between the intra-class deviation and the inter-class deviation.

For example, in the practical application scenario of a face recognition network, face data that can be obtained often shows an unbalanced long-tail distribution, and the number of pictures per ID (corresponding to one face) varies from one to hundreds, which is catastrophic for a data-driven neural network — an increase in data does not necessarily bring about an improvement in algorithm performance, and even brings about an opposite effect. Therefore, the face recognition problem of the unbalanced data set is solved.

Various algorithms have been proposed to address this problem, mainly involving data enhancement, and local feature labeling. However, the above solutions also fail to solve the problem of inaccurate classification result caused by unbalanced classification of the labeled data.

In view of the above, the present invention provides an improved scheme for the loss function. Unlike other methods for the euclidean distance domain, the present invention improves in the angular domain. Compared with the Euclidean distance domain, the angle domain is more primitive and natural and is close to the essence of feature distribution, so that better performance can be obtained.

Fig. 2 shows a flow diagram of an ANN adjustment method according to an embodiment of the present invention. Here, the last fully-connected layer of the ANN may be a classifier for classification and the normalized weight value of that layer represents the feature center of each class.

In step S210, the ANN is adjusted by a first loss function that constrains the included angle of the feature center. In step S220, the training of the ANN is completed under the condition that the distribution of the included angles of the feature centers tends to be uniform.

The ANN adjusting method provided by the invention particularly relates to an application scenario with more classifications and unbalanced long tail distribution of a data set used for training. In one embodiment, the training data input to the ANN may be an unbalanced data set with unequal distribution of data types. In another embodiment, the ANN network is a face recognition neural network.

And inputting the activation value of the last full-connection layer as a characteristic in the training process of the face recognition neural network. By normalizing the weight of the last fully-connected layer, the center of the class of features represented by the normalized weight can be obtained. For classes with fewer pictures, the feature centers of the class have smaller included angles with the feature centers of the adjacent classes, so the feature space of the class is smaller, and for the class, the probability of being mistakenly classified by the classifier is increased, which is obviously disadvantageous.

FIG. 3A illustrates an example of prior art feature mapping to a two-dimensional plane under unbalanced dataset training. For convenience of explanation, ten ID classifications by the classifier are taken as an example in the drawing. It should be understood that in real applications many more categories, for example on the order of tens of thousands, need to be classified. As shown in the figure, the included angle between the class with the larger number of pictures and the adjacent class feature center is larger, the included angle between the class feature center with the smaller number of pictures and the adjacent class feature center is smaller, the feature space of the class is smaller, and the probability of being classified by the classifier is larger.

According to the adjustment scheme of the invention, a constraint can be added to the included angle between each type of feature center and the adjacent feature center, so that all the type of feature centers are uniformly distributed in the whole feature space. Therefore, the feature spaces of each class are the same in size, so that the neural network can be concentrated on optimizing the distance between the feature and the feature center of the class, and the performance of the face recognition algorithm is improved.

FIG. 3B shows an example of the mapping of features trained using the angle constraint of the present invention onto a two-dimensional plane. As shown in the figure, the feature center distribution optimized based on the invention is more uniform and compact, so that the classification probability of the corresponding feature can be correspondingly improved.

According to different application scenarios, the ANN can be trained by directly using the first loss function for constraining the included angle of the feature center; the ANN may also be trained using a second loss function to determine initial feature centers for each class prior to adjusting the ANN using the first loss function. In other words, the first loss function that can constrain the included angle of the feature center according to the present invention may be a training loss function or a micro-call loss function. In the case of fine tuning, a face model may be trained using an additional loss function to provide a certain degree of separation between features. At this time, the first function of the invention is used for fine adjustment, so that the distribution of the feature centers is more uniform, and the recognition effect is better.

The fine tuning or training of the ANN using the first function may also be used in conjunction with other loss functions. In one embodiment, the ANN is jointly adjusted using an unconstrained third loss function and the first loss function. Preferably, the first loss function and the third loss function are given different weights during the adjustment of the ANN.

In one embodiment, the third penalty function may be a penalty function that gives different weights to different classes. The above function may weight the sparse class by, for example, 1 and the dominant class for which the number of data samples is greater than a certain value by a weight that is inversely related to its probability of being correctly predicted. Therefore, the excessive influence of the dominant category on network training is reduced, and the prediction accuracy of the rare categories is improved. The above function can be used in combination with the loss function of the present invention for adjusting the central feature angle, thereby further improving the classification accuracy of the multi-classification network.

In one embodiment, the weight of the first penalty function increases gradually as the iteration progresses. In another embodiment, the weight of the loss constraint, which may be a first loss function, gradually increases as the iteration progresses. In other words, the tuning scheme of the present invention can be network trained by annealing. Specifically, when the features are close to a random distribution at the time of network initialization, the variance of the center angles of the features is the largest, and the direct constraint training may return to cause the network not to converge. A weight term may then be added that is related to the number of iterations, and the value of the weight term may increase from 0 to 1 as the number of iterations increases, and eventually stabilize at 1. Therefore, after the characteristic has a primary distribution, the characteristic center can meet the uniform distribution.

In one embodiment, constraining the included angle of the feature centers comprises: averaging included angles between each feature center and two adjacent feature centers to serve as feature center angles of the feature centers; and so that all feature center angles tend to be equal and the variance is as zero as possible. Preferably, the characteristic center angle needs to be greater than a predetermined threshold to meet the minimum requirement for discrimination.

In one embodiment, the first loss function that constrains the included angle of the feature centers is a softmax loss function.

It is to be understood that "first", "second" and "third" are used herein to illustrate that the loss functions for applying the angle constraint, the loss functions for performing the initial training of the network, and the loss functions for performing the joint training mentioned in the present invention are not identical, but there is no provision or suggestion on the order or relationship of the three. For example, the first and second functions herein may both be softmax loss functions, but since the first function is a softmax loss function that imposes an angular constraint, it is still different from the original softmax loss function used to train the initial network.

ANN adjusting device

The above adjustment method of the present invention can be implemented by a specific ANN adjustment apparatus. Fig. 4 shows a schematic diagram of an ANN adjustment apparatus according to an embodiment of the present invention. Here, the last fully-connected layer of the ANN is the classifier for classification and the normalized weight value of that layer represents the feature center of each class.

As shown, the ANN adjustment means 400 may include an angle adjustment means 410 and an iteration means 420. The angle adjustment means 410 may be configured to adjust the ANN with a first loss function that constrains the angle of the feature center. The iterative means 420 may be used to complete the training of the ANN while making the distribution of angles between the feature centers uniform.

Preferably, the ANN adjustment means may further comprise initial training means 430 operable to train the ANN using a second loss function to determine initial feature centers of each class prior to adjusting the ANN using the first loss function.

Preferably, the ANN adjustment means may further comprise joint adjustment means 440 operable to jointly adjust the ANN using the unconstrained third loss function and the first loss function. The first loss function and the third loss function are given different weights in the adjustment of the ANN by the joint adjusting means 440. The iteration means 420 may make the weight of the first loss function gradually increase as the iteration progresses. In a parallel scheme, the iterating unit 420 may also make the weight of the loss constraint of the first loss function gradually increase as the iteration progresses.

Correspondingly to the adjustment method, the constraining the included angle of the feature center may also include: averaging included angles between each feature center and two adjacent feature centers to serve as feature center angles of the feature centers; and so that all feature center angles tend to be equal and the variance is as zero as possible.

In one embodiment, the present invention also includes an Artificial Neural Network (ANN) deployment method for deploying a neural network model adapted as described above on a fixed-point computing platform comprising at least in part an FPGA, a GPU and/or an ASIC to perform inference. Such as a face recognition task. The bit width for fixed point quantization may be determined by the bit width of the FPGA, GPU and/or ASIC.

The ANN may be a face recognition neural network, and the ANN as deployed does not include the last fully-connected layer. The face recognition neural network for training further includes a second last full-connected layer 1 for outputting the extracted face feature vectors. In the deployed neural network, the last layer does not output, so the penultimate fully-connected layer in the training can be used as an output layer of the neural network model actually deployed on the hardware computing platform. The face feature vectors output by the neural network model performing inference are used for comparison with existing face features (e.g., existing face features stored in a database) for face recognition.

In addition, a large amount of application data may be collected in an actual application scene by face recognition, and the deployed fixed-point model can be directly fine-tuned on a hardware platform by using the data, so that the effect of thermal update is realized. Thus, in one embodiment, the deployment method of the present invention may further comprise using the verification of the inference for fine tuning of the neural network deployed.

It will be appreciated by those skilled in the art that although described in the above description in particular in connection with face recognition tasks, the inventive adaptation is equally applicable to other multi-classification networks, such as classification of vehicle models and animal species.

Referring to fig. 5, computing device 500 includes memory 510 and processor 520.

The processor 550 may be a multi-core processor or may include a plurality of processors. In some embodiments, processor 520 may include a general-purpose host processor and one or more special coprocessors such as a Graphics Processor (GPU), a Digital Signal Processor (DSP), or the like. In some embodiments, processor 520 may be implemented using custom circuitry, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).

The memory 510 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions for the processor 520 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. Further, the memory 510 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 510 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a digital versatile disc read only (e.g., DVD-ROM, dual layer DVD-ROM), a Blu-ray disc read only, an ultra-dense disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disk, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.

The memory 510 has stored thereon processable code, which, when processed by the processor 520, causes the processor 520 to perform the neural network tuning methods described above.

In actual use, the computing device 500 may be a general purpose computing device including a mass storage device 510 and a CPU 520 for performing training of a neural network. The neural network for classification obtained according to the adaptation scheme of the present invention may be executed on a fixed-point computing platform implemented at least in part by an FPGA, a GPU and/or an ASIC.

Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.

Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of Artificial Neural Network (ANN) tuning, wherein a last fully-connected layer of the ANN is a classifier for classification and normalized weights for that layer represent feature centers of the classes, the method comprising:

adjusting the ANN with a first loss function that constrains an included angle of the feature center;

and finishing the training of the ANN under the condition that the included angle distribution of each feature center tends to be uniform.

2. The method of claim 1, the method comprising:

before the ANN is adjusted by using the first loss function, the ANN is trained by using the second loss function to determine the initial feature centers of each type.

3. The method of claim 1, wherein constraining the included angle of the feature centers comprises:

averaging included angles between each feature center and two adjacent feature centers to serve as feature center angles of the feature centers; and

so that all feature center angles tend to be equal and the variance is as zero as possible.

4. The method of claim 3, wherein the feature center angle needs to be greater than a predetermined threshold.

5. The method of claim 1, wherein the first loss function that constrains the included angle of the feature centers is a softmax loss function.

6. The method of claim 1, further comprising;

jointly adjusting the ANN using an unconstrained third loss function and the first loss function.

7. The method of claim 6, wherein the first and third loss functions are given different weights during the adjustment of the ANN.

8. The method of claim 6, wherein the weight of the first loss function gradually increases as the iteration progresses.

9. The method of claim 1, wherein the weight of the loss constraint of the first loss function gradually increases as the iteration progresses.

10. The method of claim 1 wherein the training data input to the ANN is an unbalanced data set with unequal distribution of data types.

11. The method of claim 1, wherein the ANN network is a face recognition neural network.

12. An Artificial Neural Network (ANN) tuning apparatus in which a last fully-connected layer of the artificial neural network is a classifier for classification and normalized weights of the layer represent feature centers of each class, the apparatus comprising:

an included angle adjusting means for adjusting the ANN with a first loss function that constrains an included angle of the feature center;

and the iteration device is used for finishing the training of the ANN under the condition that the included angle distribution of each feature center tends to be uniform.

13. The apparatus of claim 12, the apparatus comprising:

and the initial training device is used for training the ANN by using a second loss function to determine the initial feature centers of various types before the ANN is adjusted by using the first loss function.

14. The apparatus of claim 12, wherein constraining the included angle of the feature centers comprises:

15. The apparatus of claim 12, further comprising;

and the joint adjusting device is used for jointly adjusting the ANN by using the unconstrained third loss function and the first loss function.

16. The apparatus of claim 15, wherein the first and third loss functions are weighted differently in the adjustment of the ANN by the joint adjustment apparatus.

17. The apparatus of claim 15, wherein the iterating means causes the weight of the first loss function to gradually increase as the iteration progresses.

18. The apparatus of claim 12, wherein the iterating means causes the weight of the loss constraint of the first loss function to gradually increase as the iteration progresses.

19. An Artificial Neural Network (ANN) deployment method, comprising:

deploying the neural network model of any one of claims 1-11 on a fixed-point computing platform comprising at least in part an FPGA, a GPU, and/or an ASIC to perform inference.

20. The method of claim 19, wherein the ANN is a face recognition neural network, and the ANN as deployed does not include the last fully-connected layer.

21. A computing device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any one of claims 1-11.

22. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any one of claims 1-11.

23. A fixed-point computing platform, at least partly composed of FPGAs, GPUs and/or ASICs, for deploying artificial neural networks for inferential computation based on acquisition according to the method of any of claims 1-11.