CN111832342A

CN111832342A - Neural network, training and using method, device, electronic equipment and medium

Info

Publication number: CN111832342A
Application number: CN201910305394.4A
Authority: CN
Inventors: 陈长国
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2020-10-27

Abstract

The embodiment of the disclosure discloses a neural network, which comprises a plurality of neurons, and is characterized in that the activation function of at least one neuron in the neurons has a fractional rational function form. The embodiment of the disclosure also discloses a method for training the neural network, a method for processing data by using the neural network, a device, an electronic device and a readable storage medium. The neural network using the activation function with the form of the fractional rational function can be quickly converged during training, and the requirement of on-line neural network training is met.

Description

Neural network, training and using method, device, electronic equipment and medium

Technical Field

The present disclosure relates to the field of computer application technologies, and in particular, to a neural network, a training method, a training apparatus, a using method, an electronic device, and a readable storage medium.

Background

An Artificial Neural Network (ANN, simply referred to as "Neural Network") abstracts and models a human brain neuron Network from the viewpoint of information processing, and forms different networks in different connection modes. A neural network includes a large number of nodes (or neurons) connected to each other. Each neuron represents a particular output function, called the excitation function. Each connection between two neurons represents a weighted value, called weight, for the signal passing through the connection. The neuron of the neural network processes data input into the neural network based on corresponding weight, activation function, connection relation between the neuron and other neurons and the like, and obtains an output result of the neural network.

In recent years, the research and application of neural networks are deepened, and great progress has been made, which has successfully solved many practical problems that are difficult to solve by modern computers in the fields of pattern recognition, intelligent robots, automatic control, prediction estimation, biology, medicine, economy and the like, and shows good intelligent characteristics.

Disclosure of Invention

In order to solve the problems in the related art, embodiments of the present disclosure provide a neural network, a training and using method, an apparatus, an electronic device, and a readable storage medium.

In a first aspect, an embodiment of the present disclosure provides a method of training a neural network, the neural network including a plurality of neurons, an activation function of at least one of the neurons having a fractal rational function form, the method including:

inputting training samples into the neural network;

processing the training samples by neurons in the neural network, generating output results, wherein the at least one neuron uses an activation function having the form of the fractional rational function for the processing;

adjusting parameters of the neural network to optimize the output result.

With reference to the first aspect, the present disclosure provides in a first implementation manner of the first aspect:

the neural network is used for image classification, the training sample comprises an image, and the output result comprises a category to which the image belongs; or

The neural network is used for target detection, the training sample comprises an image, and the output result comprises a class to which a target contained in the image belongs and/or target frame coordinates of the target; or

The neural network is used for positioning the face feature points, the training samples comprise face images, and the output result comprises position coordinates of the face feature points.

With reference to the first aspect, in a second implementation manner of the first aspect, the fractional rational function is:

wherein α ≧ 1, β >0, γ >0 and the values of α, β, γ are such that the value of f (x) is in the range of [ -1,1], x being the result of linear processing of the input signal transmitted to the neuron using the activation function in the form of a fractional rational function.

With reference to the second implementation manner of the first aspect, the present disclosure provides in a third implementation manner of the first aspect:

α ═ 1, β ═ 1, and γ ═ 1; or

∝＝2，β＝2，γ＝1。

With reference to the first aspect, the present disclosure provides in a fourth implementation manner of the first aspect:

the activation function includes at least one parameter;

the adjusting parameters of the neural network includes adjusting parameters of the activation function and/or adjusting other parameters of the neural network.

With reference to the fourth implementation manner of the first aspect, the present disclosure provides in a fifth implementation manner of the first aspect:

at least some of the plurality of neurons using the same activation function, the adjusting parameters of the activation function comprising adjusting parameters of respective activation functions of the at least some neurons; or

The adjusting the parameters of the activation function includes adjusting the parameters of the activation function of each neuron, respectively.

With reference to the first aspect, the present disclosure provides in a sixth implementation manner of the first aspect:

the neural network is any one or combination of several of the following: a convolutional neural network, a fully-connected neural network, a recurrent neural network; and/or

The adjusting the parameters of the neural network comprises adjusting the parameters of the neural network through any one or a combination of the following: genetic algorithm, genetic programming, evolution strategy, evolution programming and gradient descent optimization algorithm.

With reference to the first aspect, the present disclosure provides in a seventh implementation manner of the first aspect, the processing the training samples by neurons in the neural network, including:

performing linear processing on at least one first input signal transmitted to a first one of the neurons in response to the training sample to obtain a first linear processing result;

applying an activation function of the first neuron to the first linear processing result to obtain a first activation processing result;

outputting the first activation processing result from the first neuron.

In a second aspect, an embodiment of the present disclosure provides a method for processing data using a neural network, the neural network including a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form, the method including:

inputting data to be processed into the neural network;

processing the data to be processed through neurons in the neural network to generate a processing result, wherein the at least one neuron uses an activation function in the form of the fractional rational function to perform the processing;

and outputting the processing result.

With reference to the second aspect, the present disclosure provides in a first implementation manner of the second aspect:

the neural network is used for image classification, the data to be processed comprise images, and the processing result comprises the category to which the images belong; or

The neural network is used for target detection, the data to be processed comprise images, and the processing result comprises the category of the target contained in the images and/or the target frame coordinates of the target; or

The neural network is used for positioning the face feature points, the data to be processed comprise face images, and the processing result comprises the position coordinates of the face feature points.

With reference to the second aspect, the present disclosure provides in a second implementation manner of the second aspect: the fractional rational function is:

With reference to the second implementation manner of the second aspect, the present disclosure provides in a third implementation manner of the second aspect:

α ═ 1, β ═ 1, and γ ═ 1; or

∝＝2，β＝2，γ＝1。

With reference to the third implementation manner of the second aspect, the present disclosure provides in a fourth implementation manner of the second aspect:

at least some of the plurality of neurons use the same activation function.

With reference to the second aspect, the present disclosure provides in a fifth implementation manner of the second aspect:

the neural network is any one or combination of several of the following: convolutional neural networks, fully-connected neural networks, recursive neural networks.

With reference to the second aspect, the present disclosure provides in a sixth implementation manner of the second aspect:

processing the data to be processed by neurons in the neural network, including:

performing linear processing on at least one second input signal transmitted to a first neuron of the neurons in response to the data to be processed to obtain a second linear processing result;

applying the activation function of the first neuron to the second linear processing result to obtain a second activation processing result;

outputting the second activation processing result from the first neuron.

In a third aspect, an embodiment of the present disclosure provides an apparatus for training a neural network, the neural network including a plurality of neurons, an activation function of at least one of the neurons having a fractal rational function form, the apparatus including:

a first input module configured to input training samples into the neural network;

a first processing module configured to process the training samples by neurons in the neural network, generating output results, wherein the at least one neuron uses an activation function having the form of the fractional rational function for the processing;

an adjustment module configured to adjust parameters of the neural network to optimize the output result.

With reference to the third aspect, the present disclosure provides in a first implementation manner of the third aspect:

With reference to the third aspect, in a second implementation manner of the third aspect, the fractional rational function is:

With reference to the second implementation manner of the third aspect, the present disclosure provides in a third implementation manner of the third aspect:

α ═ 1, β ═ 1, and γ ═ 1; or

∝＝2，β＝2，γ＝1。

With reference to the third aspect, the present disclosure provides in a fourth implementation manner of the third aspect:

the activation function includes at least one parameter;

With reference to the fourth implementation manner of the third aspect, the present disclosure provides in a fifth implementation manner of the third aspect:

With reference to the third aspect, the present disclosure provides in a sixth implementation manner of the third aspect:

With reference to the third aspect, in a seventh implementation manner of the third aspect, the processing the training samples by the neurons in the neural network includes:

outputting the first activation processing result from the first neuron.

In a fourth aspect, an embodiment of the present disclosure provides an apparatus for processing data using a neural network, the neural network including a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form, the apparatus comprising:

a second input module configured to input data to be processed into the neural network;

a second processing module configured to process the data to be processed through neurons in the neural network to generate a processing result, wherein the at least one neuron performs the processing using an activation function having the form of the fractional rational function;

an output module configured to output the processing result.

With reference to the fourth aspect, the present disclosure provides in a first implementation manner of the fourth aspect:

With reference to the fourth aspect, the present disclosure provides in a second implementation manner of the fourth aspect: the fractional rational function is:

With reference to the second implementation manner of the fourth aspect, in a third implementation manner of the fourth aspect:

α ═ 1, β ═ 1, and γ ═ 1; or

∝＝2，β＝2，γ＝1。

With reference to the third implementation manner of the fourth aspect, in a fourth implementation manner of the fourth aspect:

at least some of the plurality of neurons use the same activation function.

With reference to the fourth aspect, the present disclosure provides in a fifth implementation manner of the fourth aspect:

With reference to the fourth aspect, the present disclosure provides in a sixth implementation manner of the fourth aspect:

outputting the second activation processing result from the first neuron.

In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including a processor and a memory, where:

the memory is to store one or more computer instructions;

the one or more computer instructions are executable by the processor to implement the method according to any one of the first to sixth implementation forms of the first aspect.

In a sixth aspect, the present disclosure provides a readable storage medium having stored thereon computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method according to any one of the first to sixth implementation manners of the first aspect.

In a seventh aspect, an embodiment of the present disclosure provides a neural network, including a plurality of neurons, wherein an activation function of at least one of the plurality of neurons has a fractional rational function form.

With reference to the seventh aspect, the present disclosure provides in a first implementation manner of the seventh aspect:

the fractional rational function is:

With reference to the first implementation manner of the seventh aspect, the present disclosure is in a second implementation manner of the seventh aspect:

α ═ 1, β ═ 1, and γ ═ 1; or

∝＝2，β＝2，γ＝1。

With reference to the seventh aspect, the present disclosure provides in a third implementation manner of the seventh aspect:

at least some of the plurality of neurons use the same activation function.

With reference to the seventh aspect, the present disclosure in a fourth implementation manner of the seventh aspect:

In an eighth aspect, an embodiment of the present disclosure provides an electronic device including the neural network according to any one of the fourth implementation manners of the seventh aspect to the seventh aspect.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary neural network;

FIG. 2 shows a schematic structural diagram of a neural network according to an embodiment of the present disclosure;

FIG. 3 shows a block diagram of an electronic device incorporating the neural network described above, in accordance with an embodiment of the present disclosure;

FIG. 4 shows a flow diagram of a method of training a neural network in accordance with an embodiment of the present disclosure;

FIG. 5 illustrates a flow diagram for processing the training samples by neurons in the neural network according to an embodiment of the disclosure;

FIG. 6 shows a flow diagram of a method of processing data using a neural network in accordance with an embodiment of the present disclosure;

FIG. 7 illustrates a flow diagram for processing the data to be processed by neurons in the neural network according to an embodiment of the disclosure;

FIG. 8 shows a block diagram of an apparatus for training a neural network, according to an embodiment of the present disclosure;

FIG. 9 shows a block diagram of an apparatus for processing data using a neural network, in accordance with an embodiment of the present disclosure;

FIG. 10 shows a block diagram of an electronic device according to an embodiment of the present disclosure;

FIG. 11 illustrates a schematic block diagram of a computer system suitable for use in implementing a method of training a neural network and/or a method of processing data using a neural network according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.

It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows a schematic structural diagram of an exemplary neural network.

As shown in fig. 1, the exemplary neural network 100 includes an input layer 110, a first hidden layer 120, a second hidden layer 130, and an output layer 140. The input layer 110 includes neurons u₁Neuron u₂Neuron u₃The first hidden layer 120 includes neurons h₁Neuron h₂Neuron h₃Neuron h₄The second hidden layer 130 includes neurons v₁Neuron v₂Neuron v₃Neuron v₄The output layer 140 includes neurons z.

Neuron u₁Neuron u₂Neuron u₃Respectively being a signal U₁Signal U₂Signal U₃Neuron h₁Neuron h₂Neuron h₃Neuron h₄Respectively as signal H₁Signal H₂Signal H₃Signal H₄Neuron v₁Nerve, nerveYuan v₂Neuron v₃Neuron v₄Respectively as a signal V₁Signal V₂Signal V₃Signal V₄The output signal of neuron z is signal OUT.

As shown IN FIG. 1, input data IN enters the neural network 100 through the input layer 110, neurons u₁Neuron u₂Neuron u₃Respectively output signal U₁Signal U₂Signal U₃. Signal U₁Signal U₂Signal U₃The neuron passes through the first hidden layer 120, the second hidden layer 130 and the output layer 140, and the neuron performs corresponding processing to obtain an output result OUT.

The following is the neuron h₁For example, the processing of signals by neurons is described.

As shown in FIG. 1, neuron u₁And neuron h₁The weight of the connection between is w_h11Neuron u₂And neuron h₁The weight of the connection between is w_h12Neuron u₃And neuron h₁The weight of the connection between is w_h13. Neuron h₁Is biased by b_h1The activation function is f_h1。

From neuron u₁Neuron u₂Neuron u₃To neuron h₁The transmitted signals are respectively signals U₁Signal U₂Signal U₃. Neuron h₁For signal U₁Signal U₂Signal U₃Taking a weighted sum, applying an offset b to the weighted sum_h1Then applying the activation function f to the result obtained_h1To obtain an output signal

Similarly, any neuron h in the first hidden layer 120_j(1. ltoreq. j. ltoreq.4) of an output signal

Wherein the neuron u_iAnd neuron h_jThe weight of the connection between is w_hjiNeuron h_jIs biased by b_hjThe activation function is f_hj。

Any neuron v in the second hidden layer 130_j(1. ltoreq. j. ltoreq.4) of an output signal

Wherein, the neuron h_iAnd neuron v_jThe weight of the connection between is w_vjiNeuron v_jIs biased by b_vjThe activation function is f_vj。

Output signal of neuron z of output layer 140

Wherein, the neuron v_iThe weight of the connection to neuron z is w_ziBias of neuron z is b_zThe activation function is f_z。

It will be appreciated that what has been described above in connection with fig. 1 is merely an example of a neural network. The neural network of various connection relations, weights, biases and/or activation functions can be designed according to actual needs, such as a fully-connected neural network, a convolutional neural network, a recurrent neural network, etc., which is not limited by the present disclosure.

In practical use, a neural network is generally trained by using training data to determine values or specific forms of at least one or more of the parameters of the weight, the bias and the activation function of the neural network. Commonly used activation functions include sigmoid functions, tanh functions, relu functions, and the like.

In making the present disclosure, the inventors have discovered that training a neural network is generally time consuming, and in order to meet the requirements of on-line neural network training, it is desirable to provide a neural network that converges more quickly.

In this regard, embodiments of the present disclosure provide an activation function having a fractional rational function form. The neural network using the activation function with the form of the fractional rational function can be quickly converged during training, and the requirement of on-line neural network training is met.

Fig. 2 shows a schematic structural diagram of a neural network according to an embodiment of the present disclosure.

As shown in fig. 2, the neural network 200 is different from the neural network 100 shown in fig. 1 in that the activation function F of at least one neuron has a fractal rational function form.

According to an embodiment of the present disclosure, all activation functions of the neural network may have a fractional rational function form. For example, the activation functions of the neurons of the first hidden layer 120, the second hidden layer 130, and the output layer 140 in fig. 2 may all be in the form of fractional rational functions.

According to an embodiment of the present disclosure, the neural network may have a fractional rational function form of an activation function of only a part of neurons. For example, the activation functions of the neurons of any one or two of the first hidden layer 120, the second hidden layer 130, and the output layer 140 in fig. 2 may have a fractal rational function form, and other neurons may have other forms. Alternatively, the activation functions of any one or more of the neurons in the first hidden layer 120, the second hidden layer 130, and the output layer 140 in fig. 2 may have a fractal rational function form, while the activation functions of other neurons may have other forms, and the plurality of neurons may be distributed in the same or different layers.

According to embodiments of the present disclosure, the activation functions of at least some of the neurons in the neural network may be the same. For example, in a neuron whose activation function is in the form of a fractional rational function, there may be some neurons whose activation functions are the same, and some of the neurons may be distributed in the same or different layers.

According to an embodiment of the present disclosure, the fractional rational function is:

For example, in FIG. 2, any neuron h in the neuron first hidden layer 120_j(1. ltoreq. j. ltoreq.4) results of linear processing

Wherein the neuron u_iAnd neuron h_jThe weight of the connection between is w_hjiNeuron h_jIs biased by b_hj。

Any neuron v in the second hidden layer 130_j(1. ltoreq. j. ltoreq.4) results of linear processing

Wherein, the neuron h_iAnd neuron v_jThe weight of the connection between is w_vjiNeuron v_jIs biased by b_vj。

Linear processing results of neurons z of output layer 140

Wherein, the neuron v_iThe weight of the connection to neuron z is w_ziBias of neuron z is b_z。

According to an embodiment of the present disclosure, oc ═ 1, β ═ 1, γ ═ 1; alternatively, β ═ 2, and γ ═ 1.

According to the embodiment of the disclosure, the neural network is any one or a combination of several of the following: convolutional neural networks, fully-connected neural networks, recursive neural networks.

Fig. 3 shows a block diagram of an electronic device incorporating the neural network described above, according to an embodiment of the present disclosure.

As shown in fig. 3, the electronic device 300 includes the neural network 200 described above. According to an embodiment of the present disclosure, the electronic device 300 may be any one of: computing equipment, terminal equipment and a server.

FIG. 4 shows a flow diagram of a method of training a neural network in accordance with an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the neural network comprises a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form. According to the embodiment of the disclosure, the neural network is any one or a combination of several of the following: convolutional neural networks, fully-connected neural networks, recursive neural networks.

As shown in fig. 4, the method includes steps S401 to S403.

In step S401, training samples are input into the neural network.

In step S402, the training samples are processed by neurons in the neural network to generate an output result, wherein the at least one neuron performs the processing using an activation function having the form of the fractional rational function.

In step S403, parameters of the neural network are adjusted to optimize the output result.

According to the embodiment of the disclosure, the activation function in the form of the fractional rational function is adopted, so that the neural network can be rapidly converged during training, and the requirement of on-line neural network training is met.

According to an embodiment of the present disclosure, the neural network may be used for image classification, the training samples include images, and the output result includes a class to which the images belong. For example, the neural network may be trained using a plurality of known classes of images, and parameters of the neural network may be adjusted to optimize the classification results of the images. For example, the training sample images include images of four categories of cat, dog, cup and hat, and the classification result of the training sample images is made as accurate as possible by training a neural network.

According to an embodiment of the present disclosure, the neural network is used for target detection, the training sample includes an image, and the output result includes a category to which a target included in the image belongs and/or a target frame coordinate of the target. For example, the neural network may be trained using a plurality of images of objects containing known classes and/or object box coordinates, and parameters of the neural network adjusted to optimize the class and/or object box coordinates to which the detected objects belong. For example, the training sample image is an image including an object (e.g., a cat or a dog), and the output results from training the neural network are the class of the object (e.g., whether it is a cat or a dog) and/or the object box coordinates (e.g., the coordinates of the four vertices of a box that substantially encloses the object) in the training sample image. By training the neural network, the detected class to which the target belongs and/or the target frame coordinate are/is made as accurate as possible.

According to the embodiment of the disclosure, the neural network is used for positioning the face feature points, the training samples comprise face images, and the output result comprises the coordinates of the positions of the face feature points. For example, the neural network may be trained using a plurality of face images with known landmark position coordinates, and parameters of the neural network may be adjusted to optimize the located face landmark position coordinates. For example, the training sample image is a face image whose feature points are known. The facial feature points may be, for example, a plurality of predetermined points such as, but not limited to, the corners of the eyes, corners of the mouth, the tip of the nose, the brow tail, and the like. And outputting the position coordinates of the positioned human face feature points. The coordinates of the human face characteristic points obtained by positioning are accurate as much as possible by training the neural network.

According to an embodiment of the present disclosure, the neural network used for image classification, target detection, and face feature point localization may be a convolutional neural network or a fully-connected neural network.

According to an embodiment of the present disclosure, the activation function comprises at least one parameter, and the adjusting a parameter of the neural network comprises adjusting a parameter of the activation function and/or adjusting another parameter of the neural network. For example, the other parameters may include weights and/or biases.

In particular, parameters of the activation function may be fixed at the time of training, e.g. set empirically, while other parameters, such as weights and/or biases, are adjusted based on training data.

Alternatively, other parameters, such as weights and/or biases, may be fixed at the time of training, for example set empirically, while parameters of the activation function are adjusted based on training data.

Alternatively, the parameters of the activation function and other parameters may be adjusted based on the training data.

According to an embodiment of the disclosure, at least some of the neurons use the same activation function, the adjusting the parameters of the activation function comprises adjusting the parameters of the respective activation function of the at least some neurons, or the adjusting the parameters of the activation function comprises adjusting the parameters of the activation function of each neuron separately.

Those skilled in the art can select parameters to be adjusted according to the neural network used and the application scenario thereof, so as to meet the requirements of different training speeds, accuracies, computing resources, storage resources, communication resources, and the like, which is not specifically limited by the present disclosure.

FIG. 5 shows a flow diagram for processing the training samples by neurons in the neural network, according to an embodiment of the disclosure.

As shown in fig. 5, processing the training sample by the neurons in the neural network includes steps S4021 to S4023.

In step S4021, at least one first input signal transmitted to a first one of the neurons in response to the training sample is linearly processed to obtain a first linear processing result.

According to an embodiment of the present disclosure, the first neuron may be any neuron in the neural network other than an input layer neuron. The at least one first input signal transmitted to the first neuron in response to the training sample comprises an input signal generated by a "last hop" neuron of the first neuron and transmitted to the first neuron in response to the training sample.

For example, if the neuron h in FIG. 2 is used_jAs a first neuron, at least one first input signal includes its "last hop" neuron u₁、u₂、u₃An output signal U 'generated in response to the training samples'₁、U’₂、U’₃Then the linear processing may be to signal U'₁、U’₂、U’₃Weighted sum and bias, e.g. neuron h_j(1. ltoreq. j. ltoreq.4) results of linear processing

If it is a neuron v in FIG. 2_jAs a first neuron, at least one first input signal includes its "last hop" neuron h₁、h₂、h₃、h₄An output signal H 'generated in response to the training samples'₁、H’₂、H’₃、H’₄Then the linear processing may be to signal H'₁、H’₂、H’₃、H’₄Weighted sum and bias, e.g. neuron v_j(1. ltoreq. j. ltoreq.4) results of linear processing

If the spirit in figure 2 is usedThe channel element z is used as the first neuron, and the at least one first input signal comprises its "last-hop" neuron v₁、v₂、v₃、v₄An output signal V 'generated in response to the training samples'₁、V’₂、V’₃、V’₄. The linear processing may be to signal V'₁、V’₂、V’₃、V’₄Weighted sum and biased, e.g. linear processing of neuron z

In step S4022, the activation function of the first neuron is applied to the first linear processing result to obtain a first activation processing result.

For example, if the neuron h in FIG. 2 is used_jThe first neuron is H 'as the result of the first activation treatment'_j＝F_hj(x’_hj)，H’_jIs the neuron h_jThe signal output in response to the training sample, which is also neuron h_jA signal transmitted to its "next hop" neuron in response to a training sample.

If it is a neuron v in FIG. 2_jAs the first neuron, the result of the first activation treatment is V'_j＝F_vj(x’_vj)，V’_jI.e. neuron v_jThe signal output in response to the training sample, also neuron v_jA signal transmitted to its "next hop" neuron in response to a training sample.

If neuron z in FIG. 2 is taken as the first neuron, the result of the first activation process is OUT'_j＝F_z(x’_z) And OUT' is the output result of the neural network.

In step S4023, the first activation processing result is output from the first neuron.

As described above, for example, if the spirit in fig. 2 is usedJingyuan h_jAs the first neuron element, the result of the first activation treatment H'_jIs the neuron h_jA signal transmitted to its "next hop" neuron in response to a training sample.

If it is a neuron v in FIG. 2_jAs the first neuron, the result of the first activation treatment V'_jI.e. neuron v_jA signal transmitted to its "next hop" neuron in response to a training sample.

If the neuron z in fig. 2 is used as the first neuron, the first activation processing result OUT' is an output result of the neural network.

According to an embodiment of the present disclosure, the adjusting the parameter of the neural network includes adjusting the parameter of the neural network by any one or a combination of the following: genetic Algorithms (Genetic Algorithms), Genetic Programming (Genetic Programming), Evolution Strategies (Evolution Strategies), Evolution Programming (Evolution Programming), gradient descent optimization Algorithms.

The neural network according to the embodiment of the disclosure adopts the activation function with the form of the fractional rational function, and can be quickly converged during training, so that the neural network is suitable for more complex parameter optimization methods such as genetic algorithm, genetic programming, evolution strategy and evolution programming.

FIG. 6 shows a flow diagram of a method of processing data using a neural network, in accordance with an embodiment of the present disclosure.

As shown in fig. 6, the method includes steps S601 to S603.

In step S601, data to be processed is input to the neural network.

In step S602, processing the data to be processed by a neuron in the neural network to generate a processing result, wherein the at least one neuron performs the processing by using an activation function having the form of the fractional rational function;

in step S603, the processing result is output.

According to the embodiment of the disclosure, the neural network can be used for image classification, the data to be processed comprises an image, and the processing result comprises a category to which the image belongs.

According to the embodiment of the disclosure, the neural network can be used for target detection, the data to be processed comprises an image, and the processing result comprises a class to which a target contained in the image belongs and/or target frame coordinates of the target.

According to the embodiment of the disclosure, the neural network can be used for positioning the face feature points, the data to be processed comprises a face image, and the processing result comprises the coordinates of the positions of the face feature points.

According to an embodiment of the present disclosure, at least some of the plurality of neurons use the same activation function.

FIG. 7 shows a flow diagram for processing the data to be processed by neurons in the neural network according to an embodiment of the disclosure.

As shown in fig. 7, processing the data to be processed by the neurons in the neural network includes steps S6021 to S6023.

In step S6021, at least one second input signal transmitted to a first neuron of the neurons in response to the data to be processed is linearly processed to obtain a second linear processing result.

According to an embodiment of the present disclosure, the first neuron may be any neuron in the neural network other than an input layer neuron. The at least one second input signal transmitted to the first neuron in response to the to-be-processed data comprises an input signal generated by a "last-hop" neuron of the first neuron and transmitted to the first neuron in response to the to-be-processed data.

For example, if the neuron h in FIG. 2 is used_jAs a first neuron, the at least one second input signal includes its "last hop" neuron u₁、u₂、u₃An output signal U generated in response to the data to be processed "₁、U”₂、U”₃Then the linear processing may be to the signal U "₁、U”₂、U”₃Weighted sum and bias, e.g. neuron h_j(1. ltoreq. j. ltoreq.4) results of linear processing

If it is a neuron v in FIG. 2_jAs a first neuron, the at least one second input signal includes its "last hop" neuron h₁、h₂、h₃、h₄An output signal H generated in response to the data to be processed "₁、H”₂、H”₃、H”₄Then the linear processing can be to the signal H "₁、H”₂、H”₃、H”₄Weighted sum and bias, e.g. neuron v_j(1. ltoreq. j. ltoreq.4) results of linear processing

Wherein, the neuron h_iAnd neuron v_jIn betweenThe weight of the connection is w_vjiNeuron v_jIs biased by b_vj。

If neuron z in FIG. 2 is taken as the first neuron, then at least one of the second input signals includes its "last hop" neuron v₁、v₂、v₃、v₄An output signal V generated in response to the data to be processed "₁、V”₂、V”₃、V”₄. The linear processing may be on the signal V "₁、V”₂、V”₃、V”₄Weighted sum and biased, e.g. linear processing of neuron z

In step S6022, the activation function of the first neuron is applied to the second linear processing result to obtain a second activation processing result.

For example, if the neuron h in FIG. 2 is used_jAs the first neuron, the result of the second activation processing is H ″_j＝F_hj(x″_hj)，H”_jIs the neuron h_jThe signals output in response to the data to be processed, also neurons h_jA signal transmitted to its "next hop" neuron in response to data to be processed.

If it is a neuron v in FIG. 2_jAs the first neuron, the result of the second activation processing is V ″_j＝F_vj(x″_vj)，V”_jI.e. neuron v_jThe signals output in response to the data to be processed, also neurons v_jA signal transmitted to its "next hop" neuron in response to data to be processed.

If the neuron z in FIG. 2 is used as the first neuron, the result of the second activation process is OUT ″_j＝F_z(x″_z) And OUT is the processing result of the neural network.

In step S6023, the second activation processing result is output from the first neuron.

As described above, for example, if neuron h in fig. 2 is used_jAs the first neuron, the second activation processing result H "_jIs the neuron h_jA signal transmitted to its "next hop" neuron in response to data to be processed.

If it is a neuron v in FIG. 2_jAs the first neuron, the second activation processing result V "_jI.e. neuron v_jA signal transmitted to its "next hop" neuron in response to data to be processed.

If the neuron z in fig. 2 is used as the first neuron, the second activation processing result OUT ″ is a processing result of the neural network.

Fig. 8 shows a block diagram of an apparatus for training a neural network according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the neural network comprises a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form. The means may be implemented by software, hardware or a combination of both.

As shown in fig. 8, the apparatus 800 for training a neural network includes a first input module 810, a first processing module 820, and an adjusting module 830.

The first input module 810 is configured to input training samples into the neural network.

The first processing module 820 is configured to process the training samples by neurons in the neural network, generating output results, wherein the at least one neuron uses an activation function having the form of the fractional rational function for the processing.

The adjustment module 830 is configured to adjust parameters of the neural network to optimize the output result.

According to an embodiment of the present disclosure, the neural network is used for image classification, the training samples include images, and the output result includes a category to which the images belong; or

According to an embodiment of the present disclosure, the activation function comprises at least one parameter, and the adjusting a parameter of the neural network comprises adjusting a parameter of the activation function and/or adjusting another parameter of the neural network.

According to an embodiment of the disclosure, at least some of the neurons use the same activation function, and the adjusting parameters of the activation function comprises adjusting parameters of respective activation functions of the at least some neurons.

Alternatively, according to an embodiment of the present disclosure, the adjusting the parameters of the activation function includes adjusting the parameters of the activation function of each neuron, respectively.

According to an embodiment of the present disclosure, the adjusting the parameter of the neural network includes adjusting the parameter of the neural network by any one or a combination of the following: genetic algorithm, genetic programming, evolution strategy, evolution programming and gradient descent optimization algorithm.

According to an embodiment of the present disclosure, processing the training samples by neurons in the neural network comprises:

outputting the first activation processing result from the first neuron.

Fig. 9 illustrates a block diagram of an apparatus for processing data using a neural network according to an embodiment of the present disclosure.

As shown in fig. 9, the apparatus 900 for processing data using a neural network includes a second input module 910, a second processing module 920, and an output module 930.

The second input module 910 is configured to input data to be processed into the neural network;

the second processing module 920 is configured to process the data to be processed through neurons in the neural network, and generate a processing result, wherein the at least one neuron performs the processing by using an activation function having the form of the fractional rational function;

the output module 930 is configured to output the processing result.

According to an embodiment of the present disclosure, the neural network is used for image classification, the data to be processed includes an image, and the processing result includes a category to which the image belongs; or

According to an embodiment of the present disclosure, processing the data to be processed by the neurons in the neural network includes:

outputting the second activation processing result from the first neuron.

Fig. 10 shows a block diagram of an electronic device according to an embodiment of the present disclosure.

As shown in fig. 10, the electronic device 1000 includes a memory 1001 and a processor 1002. The memory 1001 is used to store one or more computer instructions.

According to an embodiment of the present disclosure, the one or more computer instructions are executed by the processor 1002 to implement the steps of:

inputting training samples into a neural network, the neural network comprising a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form;

adjusting parameters of the neural network to optimize the output result.

outputting the first activation processing result from the first neuron.

inputting data to be processed into a neural network, the neural network comprising a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form;

and outputting the processing result.

outputting the second activation processing result from the first neuron.

As shown in fig. 11, the computer system 1100 includes a Central Processing Unit (CPU)1101, which can execute various processes in the above-described embodiments according to a program stored in a Read Only Memory (ROM)1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the system 1100 are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

The following components are connected to the I/O interface 1105: an input portion 1106 including a keyboard, mouse, and the like; an output portion 1107 including a signal output unit such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1108 including a hard disk and the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. A driver 1110 is also connected to the I/O interface 1105 as necessary. A removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1110 as necessary, so that a computer program read out therefrom is mounted into the storage section 1108 as necessary.

In particular, the above described methods may be implemented as computer software programs according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the above-described object class determination method. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 1109 and/or installed from the removable medium 1111.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software or by programmable hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

As another aspect, the present disclosure also provides a computer-readable storage medium, which may be a computer-readable storage medium contained in the electronic device or the computer system in the above-described embodiments; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method of training a neural network, the neural network comprising a plurality of neurons, an activation function of at least one of the neurons having the form of a fractional rational function, the method comprising:

inputting training samples into the neural network;

adjusting parameters of the neural network to optimize an output result.

2. The method of claim 1, wherein:

3. The method of claim 1, wherein the fractional rational function is:

4. The method of claim 3, wherein:

α ═ 1, β ═ 1, and γ ═ 1; or

∝＝2，β＝2，γ＝1。

5. The method of claim 1, wherein:

the activation function includes at least one parameter;

6. The method of claim 5, wherein:

7. The method of claim 1, wherein:

8. The method of claim 1, wherein processing the training samples by neurons in the neural network comprises:

outputting the first activation processing result from the first neuron.

9. A method of processing data using a neural network, the neural network comprising a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form, the method comprising:

inputting data to be processed into the neural network;

and outputting the processing result.

10. The method of claim 9, wherein:

11. The method of claim 9, wherein the fractional rational function is:

12. The method of claim 11, wherein:

α ═ 1, β ═ 1, and γ ═ 1; or

∝＝2，β＝2，γ＝1。

13. The method of claim 12, wherein at least some of the plurality of neurons use the same activation function.

14. The method of claim 9, wherein the neural network is any one or a combination of: convolutional neural networks, fully-connected neural networks, recursive neural networks.

15. The method of claim 9, wherein processing the data to be processed by the neurons in the neural network comprises:

outputting the second activation processing result from the first neuron.

16. An apparatus for training a neural network, the neural network comprising a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form, the apparatus comprising:

17. The apparatus of claim 16, wherein:

18. The apparatus of claim 16, wherein the fractional rational function is:

19. The apparatus of claim 18, wherein:

α ═ 1, β ═ 1, and γ ═ 1; or

∝＝2，β＝2，γ＝1。

20. The apparatus of claim 16, wherein:

the activation function includes at least one parameter;

21. The apparatus of claim 20, wherein:

22. The apparatus of claim 16, wherein:

23. The apparatus of claim 16, wherein processing the training samples by neurons in the neural network comprises:

outputting the first activation processing result from the first neuron.

24. An apparatus for processing data using a neural network, the neural network comprising a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form, the apparatus comprising:

an output module configured to output the processing result.

25. The apparatus of claim 24, wherein:

26. The apparatus of claim 24, wherein the fractional rational function is:

27. The apparatus of claim 26, wherein:

α ═ 1, β ═ 1, and γ ═ 1; or

∝＝2，β＝2，γ＝1。

28. The apparatus of claim 27, wherein at least some of the plurality of neurons use the same activation function.

29. The apparatus of claim 24, wherein the neural network is any one or a combination of the following: convolutional neural networks, fully-connected neural networks, recursive neural networks.

30. The apparatus of claim 24, wherein processing the data to be processed by the neurons in the neural network comprises:

outputting the second activation processing result from the first neuron.

31. An electronic device comprising a processor and a memory, wherein:

the memory is to store one or more computer instructions;

the one or more computer instructions being executable by the processor to implement the method of any one of claims 1-14.

32. A readable storage medium having stored thereon computer instructions, wherein the one or more computer instructions are executable by the processor to implement the method of any one of claims 1-14.

33. A neural network comprising a plurality of neurons, wherein the activation function of at least one of the plurality of neurons has a fractional rational function form.

34. The neural network of claim 33, wherein the fractional rational function is:

35. The neural network of claim 34, wherein:

α ═ 1, β ═ 1, and γ ═ 1; or

∝＝2，β＝2，γ＝1。

36. The neural network of claim 33, wherein at least some of the plurality of neurons use the same activation function.

37. The neural network of claim 33, wherein the neural network is any one or a combination of: convolutional neural networks, fully-connected neural networks, recursive neural networks.

38. An electronic device comprising a neural network as claimed in any one of claims 33 to 37.