CN111832342B

CN111832342B - Neural network, training and using method and device, electronic equipment and medium

Info

Publication number: CN111832342B
Application number: CN201910305394.4A
Authority: CN
Inventors: 陈长国
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2024-06-21
Anticipated expiration: 2039-04-16
Also published as: CN111832342A

Abstract

Embodiments of the present disclosure disclose a neural network comprising a plurality of neurons, wherein an activation function of at least one of the plurality of neurons has a fractional rational function form. The embodiment of the disclosure also discloses a method for training the neural network, a method for processing data by using the neural network, a device, electronic equipment and a readable storage medium. The neural network with the activation function in the form of the divided rational function can be used for quickly converging during training, and the requirement of on-line neural network training is met.

Description

Neural network, training and using method and device, electronic equipment and medium

Technical Field

The present disclosure relates to the field of computer application technologies, and in particular, to a neural network, training and using methods, apparatuses, electronic devices, and readable storage media.

Background

The neural network (ARTIFICIAL NEURAL NETWORK, ANN, simply referred to as "neural network") abstracts and models the human brain neural network from the information processing perspective, and forms different networks according to different connection modes. A neural network includes a large number of nodes (or neurons) that are interconnected. Each neuron represents a specific output function, called an excitation function. The connection between each two neurons represents a weight, called a weight, for the signal passing through the connection. The neurons of the neural network process the data input into the neural network based on the corresponding weight, activation function, connection relation with other neurons and the like, and an output result of the neural network is obtained.

In recent years, the research and application of neural networks have been in progress, and the neural networks have successfully solved many practical problems which are difficult to solve by modern computers in the fields of pattern recognition, intelligent robots, automatic control, predictive estimation, biology, medicine, economy and the like, and have shown good intelligent characteristics.

Disclosure of Invention

To solve the problems in the related art, embodiments of the present disclosure provide a neural network, and training and using methods, apparatuses, electronic devices, and readable storage media.

In a first aspect, embodiments of the present disclosure provide a method of training a neural network, the neural network including a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form, the method comprising:

Inputting a training sample into the neural network;

Processing the training samples by neurons in the neural network, generating output results, wherein the at least one neuron performs the processing using an activation function in the form of the split rational function;

Parameters of the neural network are adjusted to optimize the output result.

With reference to the first aspect, in a first implementation manner of the first aspect, the present disclosure:

the neural network is used for classifying images, the training sample comprises images, and the output result comprises the category to which the images belong; or alternatively

The neural network is used for target detection, the training sample comprises an image, and the output result comprises the category of a target contained in the image and/or the target frame coordinate of the target; or alternatively

The neural network is used for positioning the face feature points, the training sample comprises a face image, and the output result comprises the position coordinates of the face feature points.

With reference to the first aspect, in a second implementation manner of the first aspect, the split rational function is:

wherein α is equal to or greater than 1, β is equal to 0, γ is equal to 0, and the values of α, β, γ are such that the value of F (x) is in the [ -1,1] interval, x being the result of linear processing of an input signal transmitted to the neuron using the activation function in the form of a fractional rational function.

With reference to the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the present disclosure:

c=1, β=1, γ=1; or alternatively

∝＝2，β＝2，γ＝1。

With reference to the first aspect, in a fourth implementation manner of the first aspect, the present disclosure:

the activation function includes at least one parameter;

the adjusting of the parameters of the neural network includes adjusting parameters of the activation function and/or adjusting other parameters of the neural network.

With reference to the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the present disclosure:

at least some of the plurality of neurons use the same activation function, the adjusting parameters of the activation function comprising adjusting parameters of respective activation functions of the at least some neurons; or alternatively

The adjusting the parameters of the activation function includes adjusting the parameters of the activation function of each neuron separately.

With reference to the first aspect, in a sixth implementation manner of the first aspect, the present disclosure:

The neural network is any one or a combination of the following: convolutional neural networks, fully-connected neural networks, recurrent neural networks; and/or

The adjusting the parameters of the neural network includes adjusting the parameters of the neural network by any one or a combination of the following: genetic algorithm, genetic programming, evolution strategy, evolution programming, gradient descent optimization algorithm.

With reference to the first aspect, in a seventh implementation manner of the first aspect, the processing, by a neuron in the neural network, the training sample includes:

Performing linear processing on at least one first input signal transmitted to a first one of the neurons in response to the training sample to obtain a first linear processing result;

Applying an activation function of the first neuron to the first linear processing result to obtain a first activation processing result;

outputting the first activation processing result from the first neuron.

In a second aspect, in an embodiment of the present disclosure, there is provided a method of processing data using a neural network, the neural network including a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form, the method comprising:

Inputting data to be processed into the neural network;

Processing the data to be processed by neurons in the neural network to generate a processing result, wherein the at least one neuron performs the processing by using an activation function in the form of the split rational function;

And outputting the processing result.

With reference to the second aspect, in a first implementation manner of the second aspect, the present disclosure:

The neural network is used for classifying images, the data to be processed comprises images, and the processing result comprises the category to which the images belong; or alternatively

The neural network is used for target detection, the data to be processed comprises an image, and the processing result comprises the category of a target contained in the image and/or the target frame coordinate of the target; or alternatively

The neural network is used for positioning the face feature points, the data to be processed comprises face images, and the processing result comprises face feature point position coordinates.

With reference to the second aspect, in a second implementation manner of the second aspect, the present disclosure: the split rational function is:

With reference to the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the present disclosure:

c=1, β=1, γ=1; or alternatively

∝＝2，β＝2，γ＝1。

With reference to the third implementation manner of the second aspect, in a fourth implementation manner of the second aspect:

at least some of the plurality of neurons use the same activation function.

With reference to the second aspect, in a fifth implementation manner of the second aspect, the present disclosure:

The neural network is any one or a combination of the following: convolutional neural networks, fully-connected neural networks, recurrent neural networks.

With reference to the second aspect, in a sixth implementation manner of the second aspect, the present disclosure:

Processing the data to be processed by neurons in the neural network, including:

Performing linear processing on at least one second input signal transmitted to a first neuron in the neurons in response to the data to be processed to obtain a second linear processing result;

Applying the activation function of the first neuron to the second linear processing result to obtain a second activation processing result;

outputting the second activation processing result from the first neuron.

In a third aspect, in an embodiment of the present disclosure, there is provided an apparatus for training a neural network, the neural network including a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form, the apparatus comprising:

A first input module configured to input training samples into the neural network;

A first processing module configured to process the training samples by neurons in the neural network, generating output results, wherein the at least one neuron performs the processing using an activation function in the form of the split rational function;

And the adjustment module is configured to adjust parameters of the neural network to optimize the output result.

With reference to the third aspect, in a first implementation manner of the third aspect, the present disclosure:

With reference to the third aspect, in a second implementation manner of the third aspect, the split rational function is:

With reference to the second implementation manner of the third aspect, in a third implementation manner of the third aspect, the present disclosure:

c=1, β=1, γ=1; or alternatively

∝＝2，β＝2，γ＝1。

With reference to the third aspect, in a fourth implementation manner of the third aspect, the present disclosure:

the activation function includes at least one parameter;

With reference to the fourth implementation manner of the third aspect, the present disclosure is in a fifth implementation manner of the third aspect:

With reference to the third aspect, in a sixth implementation manner of the third aspect, the present disclosure:

With reference to the third aspect, in a seventh implementation manner of the third aspect, the processing, by a neuron in the neural network, the training sample includes:

outputting the first activation processing result from the first neuron.

In a fourth aspect, in an embodiment of the present disclosure, there is provided an apparatus for processing data using a neural network, the neural network including a plurality of neurons, an activation function of at least one of the neurons having a form of a divided rational function, the apparatus comprising:

a second input module configured to input data to be processed into the neural network;

A second processing module configured to process the data to be processed by neurons in the neural network, generating a processing result, wherein the at least one neuron performs the processing using an activation function in the form of the split rational function;

and the output module is configured to output the processing result.

With reference to the fourth aspect, in a first implementation manner of the fourth aspect, the present disclosure:

With reference to the fourth aspect, in a second implementation manner of the fourth aspect, the present disclosure: the split rational function is:

With reference to the second implementation manner of the fourth aspect, in a third implementation manner of the fourth aspect, the present disclosure:

c=1, β=1, γ=1; or alternatively

∝＝2，β＝2，γ＝1。

With reference to the third implementation manner of the fourth aspect, in a fourth implementation manner of the fourth aspect, the present disclosure:

at least some of the plurality of neurons use the same activation function.

With reference to the fourth aspect, in a fifth implementation manner of the fourth aspect, the present disclosure:

With reference to the fourth aspect, in a sixth implementation manner of the fourth aspect, the present disclosure:

outputting the second activation processing result from the first neuron.

In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including a processor and a memory, wherein:

the memory is used for storing one or more computer instructions;

The one or more computer instructions are executable by the processor to implement a method according to any one of the first to second aspects.

In a sixth aspect, embodiments of the present disclosure provide a readable storage medium having stored thereon computer instructions, wherein the one or more computer instructions are executed by the processor to implement a method according to any one of the sixth implementation forms of the first aspect to the second aspect.

In a seventh aspect, embodiments of the present disclosure provide a neural network including a plurality of neurons, wherein an activation function of at least one of the plurality of neurons has a fractional rational function form.

With reference to the seventh aspect, in a first implementation manner of the seventh aspect, the present disclosure:

The split rational function is:

With reference to the first implementation manner of the seventh aspect, in a second implementation manner of the seventh aspect, the present disclosure:

c=1, β=1, γ=1; or alternatively

∝＝2，β＝2，γ＝1。

With reference to the seventh aspect, in a third implementation manner of the seventh aspect, the present disclosure:

at least some of the plurality of neurons use the same activation function.

With reference to the seventh aspect, in a fourth implementation manner of the seventh aspect, the present disclosure:

In an eighth aspect, an embodiment of the disclosure provides an electronic device, including a neural network according to any one of the seventh to seventh aspects.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments, taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 illustrates a schematic diagram of an exemplary neural network;

FIG. 2 shows a schematic structural diagram of a neural network according to an embodiment of the present disclosure;

FIG. 3 illustrates a block diagram of an electronic device incorporating the neural network described above, according to an embodiment of the present disclosure;

FIG. 4 illustrates a flow chart of a method of training a neural network according to an embodiment of the present disclosure;

FIG. 5 illustrates a flow chart of processing the training samples through neurons in the neural network, according to an embodiment of the disclosure;

FIG. 6 illustrates a flow chart of a method of processing data using a neural network, according to an embodiment of the present disclosure;

FIG. 7 illustrates a flow chart of processing the data to be processed by neurons in the neural network, according to an embodiment of the disclosure;

FIG. 8 shows a block diagram of an apparatus for training a neural network, according to an embodiment of the present disclosure;

FIG. 9 shows a block diagram of an apparatus for processing data using a neural network, according to an embodiment of the disclosure;

fig. 10 shows a block diagram of an electronic device according to an embodiment of the disclosure;

Fig. 11 illustrates a schematic of a computer system suitable for use in implementing a method of training a neural network and/or a method of processing data using a neural network in accordance with an embodiment of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. In addition, for the sake of clarity, portions irrelevant to description of the exemplary embodiments are omitted in the drawings.

In this disclosure, it should be understood that terms such as "comprises" or "comprising," etc., are intended to indicate the presence of features, numbers, steps, acts, components, portions, or combinations thereof disclosed in this specification, and are not intended to exclude the possibility that one or more other features, numbers, steps, acts, components, portions, or combinations thereof are present or added.

In addition, it should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows a schematic structural diagram of an exemplary neural network.

As shown in fig. 1, the exemplary neural network 100 includes an input layer 110, a first hidden layer 120, a second hidden layer 130, and an output layer 140. The input layer 110 includes neurons u ₁, u ₂, and u ₃, the first hidden layer 120 includes neurons h ₁, h ₂, h ₃, and h ₄, the second hidden layer 130 includes neurons v ₁, v ₂, v ₃, and v ₄, and the output layer 140 includes neuron z.

The output signals of the neuron U ₁, the neuron U ₂ and the neuron U ₃ are respectively the signal U ₁, signal U ₂, signal U ₃, neuron h ₁, neuron h ₂, neuron h ₃, The output signals of the neuron H ₄ are a signal H ₁, a signal H ₂, a signal H ₃ and a signal H ₄ respectively, The output signals of the neuron V ₁, the neuron V ₂, the neuron V ₃ and the neuron V ₄ are respectively the signal V ₁, Signal V ₂, signal V ₃, signal V ₄, the output signal of neuron z is signal OUT.

As shown IN fig. 1, input data IN enters the neural network 100 through the input layer 110, and neurons U ₁, U ₂, and U ₃ output signals U ₁, U ₂, and U ₃, respectively. The signal U ₁, the signal U ₂, and the signal U ₃ are transmitted through each neuron in the first hidden layer 120, the second hidden layer 130, and the output layer 140, and the neurons are respectively processed correspondingly, so as to obtain an output result OUT.

The processing of the signal by the neuron will be described below by taking the neuron h ₁ as an example.

As shown in fig. 1, the connection between neuron u ₁ and neuron h ₁ has a weight of w _h11, the connection between neuron u ₂ and neuron h ₁ has a weight of w _h12, and the connection between neuron u ₃ and neuron h ₁ has a weight of w _h13. Neuron h ₁ has a bias of b _h1 and an activation function of f _h1.

The signals transmitted from the neurons U ₁, U ₂, and U ₃ to the neuron h ₁ are the signals U ₁, U ₂, and U ₃, respectively. Neuron h ₁ sums the weights of signal U ₁, signal U ₂, and signal U ₃, applies bias b _h1 to the weighted sums, and then applies activation function f _h1 to the resulting result to obtain an output signal

Similarly, the output signal of any neuron h _j (1. Ltoreq.j.ltoreq.4) in the first hidden layer 120The connection between neuron u _i and neuron h _j has a weight of w _hji, the bias of neuron h _j is b _hj, and the activation function is f _hj.

Output signals of any neuron v _j (1.ltoreq.j.ltoreq.4) in the second hidden layer 130The connection between neuron h _i and neuron v _j has a weight of w _vji, the bias of neuron v _j is b _vj, and the activation function is f _vj.

Output signal of neuron z of output layer 140The connection between neuron v _i and neuron z has a weight of w _zi, the bias of neuron z is b _z, and the activation function is f _z.

It will be appreciated that the above description in connection with fig. 1 is merely an example of a neural network. Various connection relationships, weights, offsets, and/or neural networks that activate functions, such as fully connected neural networks, convolutional neural networks, recurrent neural networks, etc., may be designed according to actual needs, and this disclosure is not limited in this regard.

In practical use, the neural network is generally trained first using training data to determine the value or specific form of at least one or more of the parameters of the weight, bias, activation function of the neural network. Common activation functions include sigmoid functions, tanh functions, relu functions, and the like.

In making the present disclosure, the inventors have found that training a neural network is generally time consuming, and in order to meet the needs of on-line neural network training, it is desirable to propose a neural network that converges more rapidly.

In view of this, embodiments of the present disclosure propose an activation function having a divided rational function form. The neural network with the activation function in the form of the divided rational function can be used for quickly converging during training, and the requirement of on-line neural network training is met.

Fig. 2 shows a schematic structural diagram of a neural network according to an embodiment of the present disclosure.

As shown in fig. 2, the neural network 200 differs from the neural network 100 shown in fig. 1 in that the activation function F of at least one neuron has a fractional rational function form.

According to embodiments of the present disclosure, all activation functions of the neural network may have a fractional rational function form. For example, the activation functions of the neurons of the first hidden layer 120, the second hidden layer 130, and the output layer 140 in fig. 2 may all be in the form of a split-type rational function.

According to embodiments of the present disclosure, the neural network may have a fractional rational function form with only a portion of the activation functions of neurons. For example, the activation function of the neurons of any one or both of the first hidden layer 120, the second hidden layer 130, and the output layer 140 in fig. 2 may have a divided rational function form, while other neurons may have other forms. Or the activation function of any one or more of the first hidden layer 120, the second hidden layer 130, and the output layer 140 in fig. 2 may have a form of a divided rational function, while the activation function of other neurons may have other forms, and the plurality of neurons may be distributed in the same or different layers.

According to embodiments of the present disclosure, the activation functions of at least some neurons in a neural network may be the same. For example, in neurons whose activation functions are in the form of a split-type rational function, there may be some neurons whose activation functions are the same, and some neurons may be distributed in the same or different layers.

According to an embodiment of the present disclosure, the split rational function is:

For example, in FIG. 2, the result of the linear processing of any neuron h _j (1.ltoreq.j.ltoreq.4) in the neuron first hidden layer 120Wherein the connection between neuron u _i and neuron h _j has a weight w _hji and the bias of neuron h _j is b _hj.

Linear processing results of any neuron v _j (1.ltoreq.j.ltoreq.4) in the second hidden layer 130Wherein the connection between neuron h _i and neuron v _j has a weight of w _vji and the bias of neuron v _j is b _vj.

Linear processing results of neuron z of output layer 140Wherein the connection between neuron v _i and neuron z has a weight w _zi and the bias of neuron z is b _z.

According to an embodiment of the present disclosure, oc=1, β=1, γ=1; or ≡c=2, β=2, γ=1.

According to an embodiment of the present disclosure, the neural network is any one or a combination of several of the following: convolutional neural networks, fully-connected neural networks, recurrent neural networks.

Fig. 3 shows a block diagram of an electronic device incorporating the neural network described above, according to an embodiment of the disclosure.

As shown in fig. 3, the electronic device 300 includes the neural network 200 described above. According to an embodiment of the present disclosure, the electronic device 300 may be any of the following: computing equipment, terminal equipment and a server.

Fig. 4 shows a flowchart of a method of training a neural network, according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the neural network includes a plurality of neurons, and an activation function of at least one of the neurons has a fractional rational function form. According to an embodiment of the present disclosure, the neural network is any one or a combination of several of the following: convolutional neural networks, fully-connected neural networks, recurrent neural networks.

As shown in fig. 4, the method includes steps S401 to S403.

In step S401, a training sample is input to the neural network.

In step S402, the training samples are processed by neurons in the neural network, generating output results, wherein the at least one neuron performs the processing using an activation function in the form of the fractional rational function.

In step S403, parameters of the neural network are adjusted to optimize the output result.

According to the embodiment of the disclosure, the activation function in the form of the split-type rational function is adopted, so that the neural network can be quickly converged during training, and the requirement of on-line neural network training is met.

According to an embodiment of the disclosure, the neural network may be used for image classification, the training sample comprises an image, and the output result comprises a category to which the image belongs. For example, the neural network may be trained using a plurality of known classes of images, and parameters of the neural network are adjusted to optimize the classification results of the images. For example, training sample images include images of four categories of cats, dogs, cups and hats, and classification of training sample images is made as accurate as possible by training a neural network.

According to an embodiment of the disclosure, the neural network is used for target detection, the training sample comprises an image, and the output result comprises a category to which a target contained in the image belongs and/or a target frame coordinate of the target. For example, the neural network may be trained using a plurality of images of targets containing known categories and/or target frame coordinates, and parameters of the neural network adjusted to optimize the category and/or target frame coordinates to which the detected targets belong. For example, the training sample image is an image that includes a target (e.g., a cat or dog), and the output results obtained by training the neural network are the class of the target (e.g., whether a cat or a dog) and/or the target frame coordinates (e.g., coordinates of four vertices of a box that substantially encloses the target) in the training sample image. By training the neural network, the detected object belongs to the category and/or the coordinates of the object frame are as accurate as possible.

According to an embodiment of the disclosure, the neural network is used for positioning facial feature points, the training sample comprises a facial image, and the output result comprises facial feature point position coordinates. For example, the neural network may be trained using face images with known feature point location coordinates, and parameters of the neural network are adjusted to optimize the located face feature point location coordinates. For example, the training sample image is a face image whose feature points are known. The face feature points may be, for example, a plurality of points set in advance, such as, but not limited to, the corners of eyes, corners of mouth, tips of nose, eyebrows, tails of eyebrows, and the like. And outputting the result to be the positioned face feature point position coordinates. The coordinates of the face feature points obtained by positioning are as accurate as possible through training the neural network.

According to embodiments of the present disclosure, the neural network for image classification, object detection, and face feature point localization may be a convolutional neural network or a fully-connected neural network.

According to an embodiment of the present disclosure, the activation function includes at least one parameter, and the adjusting the parameter of the neural network includes adjusting the parameter of the activation function and/or adjusting other parameters of the neural network. For example, the other parameters may include weights and/or biases.

In particular, the parameters of the activation function may be fixed at training, e.g. empirically set, while other parameters, e.g. weights and/or biases, are adjusted based on the training data.

Or other parameters, such as weights and/or biases, may be fixed during training, e.g. empirically set, while parameters of the activation function are adjusted based on the training data.

Or parameters of the activation function and other parameters may be adjusted based on the training data.

According to an embodiment of the present disclosure, at least some of the plurality of neurons use the same activation function, the adjusting the parameters of the activation function comprises adjusting the parameters of the respective activation functions of the at least some neurons, or the adjusting the parameters of the activation function comprises adjusting the parameters of the activation functions of the respective neurons, respectively.

The parameters to be adjusted can be selected by a person skilled in the art according to the neural network used and the application scenario thereof, so as to meet the requirements of different training speeds, precision, computing resources, storage resources, communication resources and the like, and the present disclosure is not limited in detail.

Fig. 5 illustrates a flow chart of processing the training samples through neurons in the neural network, according to an embodiment of the disclosure.

As shown in fig. 5, the training samples are processed by neurons in the neural network, including steps S4021-S4023.

In step S4021, at least one first input signal transmitted to a first neuron of the neurons in response to the training sample is linearly processed to obtain a first linear processing result.

According to an embodiment of the present disclosure, the first neuron may be any neuron in the neural network other than an input layer neuron. At least one first input signal transmitted to a first neuron in response to the training sample includes an input signal generated by and transmitted to a "last-hop" neuron of the first neuron in response to the training sample.

For example, if neuron h _j in FIG. 2 is used as the first neuron, at least one first input signal comprises an output signal U '₁、U'₂、U'₃ generated by its "last-hop" neuron U ₁、u₂、u₃ in response to the training sample, the linear processing may be a weighted sum and bias of signal U' ₁、U'₂、U'₃, e.g., the result of a linear processing of neuron h _j (1.ltoreq.j.ltoreq.4) Wherein the connection between neuron u _i and neuron h _j has a weight w _hji and the bias of neuron h _j is b _hj.

If neuron v _j of FIG. 2 is used as the first neuron, at least one first input signal comprises an output signal H '₁、H'₂、H'₃、H'₄ generated by its "last hop" neuron H ₁、h₂、h₃、h₄ in response to the training sample, the linear processing may be a weighted sum and bias of signal H' ₁、H'₂、H'₃、H'₄, e.g., the result of the linear processing of neuron v _j (1. Ltoreq.j.ltoreq.4)Wherein the connection between neuron h _i and neuron v _j has a weight of w _vji and the bias of neuron v _j is b _vj.

If neuron z in FIG. 2 is the first neuron, then at least one first input signal comprises an output signal V' ₁、V'₂、V'₃、V'₄ generated by its "last hop" neuron V ₁、v₂、v₃、v₄ in response to the training sample. The linear processing may be the result of a linear processing of the signals V' ₁、V'₂、V'₃、V'₄ by weighting and biasing, e.g., neuron zWherein the connection between neuron v _i and neuron z has a weight w _zi and the bias of neuron z is b _z.

In step S4022, an activation function of the first neuron is applied to the first linear processing result, and a first activation processing result is obtained.

For example, if neuron H _j in fig. 2 is used as the first neuron, the first activation process results in H' _j＝F_hj(x'_hj),H'_j being the signal that neuron H _j outputs in response to the training sample, and also the signal that neuron H _j transmits to its "next-hop" neuron in response to the training sample.

If neuron V _j in FIG. 2 is used as the first neuron, the first activation process results in V' _j＝F_vj(x'_vj),V'_j being the signal output by neuron V _j in response to the training sample, and also the signal transmitted by neuron V _j to its "next hop" neuron in response to the training sample.

If the neuron z in fig. 2 is taken as the first neuron, the first activation processing result is OUT '_j＝F_z(x'_z), and OUT' is the output result of the neural network.

In step S4023, the first activation processing result is output from the first neuron.

As described above, if, for example, neuron H _j in fig. 2 is the first neuron, then the first activation process result H' _j is the signal that neuron H _j transmits to its "next-hop" neuron in response to the training sample.

If neuron V _j in FIG. 2 is used as the first neuron, then the first activation process result V' _j is the signal that neuron V _j transmits to its "next hop" neuron in response to the training sample.

If the neuron z in fig. 2 is taken as the first neuron, the first activation processing result OUT' is the output result of the neural network.

According to an embodiment of the present disclosure, the adjusting the parameter of the neural network includes adjusting the parameter of the neural network by any one or a combination of several of: genetic algorithm (Genetic Algorithms), genetic programming (Genetic Programming), evolutionary strategy (Evolution Strategies), evolutionary programming (Evolution Programming), gradient descent optimization algorithm.

The neural network according to the embodiment of the disclosure adopts the activation function with the form of the split rational function, and can quickly converge during training, so that the neural network is suitable for relatively complex parameter optimization methods such as genetic algorithm, genetic programming, evolution strategy, evolution programming and the like.

Fig. 6 shows a flowchart of a method of processing data using a neural network, according to an embodiment of the present disclosure.

As shown in fig. 6, the method includes steps S601 to S603.

In step S601, data to be processed is input to the neural network.

Processing the data to be processed by neurons in the neural network, generating a processing result, wherein the at least one neuron performs the processing by using an activation function in the form of the fractional rational function;

In step S603, the processing result is output.

According to an embodiment of the disclosure, the neural network may be used for image classification, the data to be processed includes an image, and the processing result includes a category to which the image belongs.

According to an embodiment of the disclosure, the neural network may be used for target detection, the data to be processed includes an image, and the processing result includes a category to which a target included in the image belongs and/or a target frame coordinate of the target.

According to the embodiment of the disclosure, the neural network may be used for positioning the face feature points, the data to be processed includes a face image, and the processing result includes position coordinates of the face feature points.

According to an embodiment of the present disclosure, at least some of the plurality of neurons use the same activation function.

Fig. 7 shows a flowchart of processing the data to be processed by neurons in the neural network, according to an embodiment of the disclosure.

As shown in fig. 7, the data to be processed is processed by neurons in the neural network, including steps S6021 to S6023.

At step S6021, at least one second input signal transmitted to a first neuron among the neurons in response to the data to be processed is subjected to linear processing to obtain a second linear processing result.

According to an embodiment of the present disclosure, the first neuron may be any neuron in the neural network other than an input layer neuron. The at least one second input signal transmitted to the first neuron in response to the data to be processed comprises an input signal generated by and transmitted to a "previous hop" neuron of the first neuron in response to the data to be processed.

For example, if neuron h _j in FIG. 2 is used as the first neuron, the at least one second input signal includes an output signal U "₁、U"₂、U"₃ whose" last-hop "neuron U ₁、u₂、u₃ produces in response to the data to be processed, the linear processing may be a weighted sum and bias of signal U" ₁、U"₂、U"₃, e.g., the result of a linear processing of neuron h _j (1.ltoreq.j.ltoreq.4)Wherein the connection between neuron u _i and neuron h _j has a weight w _hji and the bias of neuron h _j is b _hj.

If neuron v _j in FIG. 2 is used as the first neuron, the at least one second input signal includes an output signal H "₁、H"₂、H"₃、H"₄ generated by its" last-hop "neuron H ₁、h₂、h₃、h₄ in response to the data to be processed, the linear processing may be a weighted sum and bias of signal H" ₁、H"₂、H"₃、H"₄, e.g., the result of a linear processing of neuron v _j (1.ltoreq.j.ltoreq.4)Wherein the connection between neuron h _i and neuron v _j has a weight of w _vji and the bias of neuron v _j is b _vj.

If neuron z in FIG. 2 is taken as the first neuron, then at least one second input signal comprises an output signal V "₁、V"₂、V"₃、V"₄ generated by its" last-hop "neuron V ₁、v₂、v₃、v₄ in response to the data to be processed. The linear processing may be the result of a linear processing of the signal V "₁、V"₂、V"₃、V"₄ by weighting and biasing, e.g., neuron zWherein the connection between neuron v _i and neuron z has a weight w _zi and the bias of neuron z is b _z.

In step S6022, the activation function of the first neuron is applied to the second linear processing result, and a second activation processing result is obtained.

For example, if neuron H _j in fig. 2 is used as the first neuron, then a second activation process result of H "_j＝F_hj(x″_hj),H"_j is a signal that neuron H _j outputs in response to the data to be processed, and is also a signal that neuron H _j transmits to its" next-hop "neuron in response to the data to be processed.

If neuron V _j in FIG. 2 is used as the first neuron, then the second activation process results in V "_j＝F_vj(x″_vj),V"_j being the signal that neuron V _j outputs in response to the data to be processed, and also the signal that neuron V _j transmits to its" next hop "neuron in response to the data to be processed.

If the neuron z in fig. 2 is taken as the first neuron, the second activation processing result is OUT "_j＝F_z(x″_z", and OUT "is the processing result of the neural network.

In step S6023, the second activation processing result is output from the first neuron.

As described above, for example, if the neuron H _j in fig. 2 is taken as the first neuron, the second activation processing result H "_j is a signal that the neuron H _j transmits to its" next-hop "neuron in response to the data to be processed.

If neuron V _j in FIG. 2 is used as the first neuron, then the second activation processing result V "_j is the signal that neuron V _j transmits to its" next hop "neuron in response to the data to be processed.

If the neuron z in fig. 2 is taken as the first neuron, the second activation processing result OUT "is the processing result of the neural network.

Fig. 8 shows a block diagram of a structure of an apparatus for training a neural network according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the neural network comprises a plurality of neurons, and an activation function of at least one of the neurons has a fractional rational function form. The apparatus may be implemented in software, hardware or a combination of both.

As shown in fig. 8, the apparatus 800 for training a neural network includes a first input module 810, a first processing module 820, and an adjustment module 830.

The first input module 810 is configured to input training samples into the neural network.

The first processing module 820 is configured to process the training samples by neurons in the neural network, generating output results, wherein the at least one neuron performs the processing using an activation function in the form of the split rational function.

The adjustment module 830 is configured to adjust parameters of the neural network to optimize the output result.

According to an embodiment of the disclosure, the neural network is used for image classification, the training sample comprises an image, and the output result comprises a category to which the image belongs; or alternatively

According to an embodiment of the present disclosure, the activation function includes at least one parameter, and the adjusting the parameter of the neural network includes adjusting the parameter of the activation function and/or adjusting other parameters of the neural network.

According to an embodiment of the present disclosure, at least some of the plurality of neurons use the same activation function, and the adjusting the parameters of the activation function comprises adjusting the parameters of the respective activation functions of the at least some neurons.

Or according to an embodiment of the present disclosure, said adjusting the parameters of the activation function comprises adjusting the parameters of the activation function of each neuron separately.

According to an embodiment of the present disclosure, the adjusting the parameter of the neural network includes adjusting the parameter of the neural network by any one or a combination of several of: genetic algorithm, genetic programming, evolution strategy, evolution programming, gradient descent optimization algorithm.

According to an embodiment of the present disclosure, processing the training samples by neurons in the neural network includes:

outputting the first activation processing result from the first neuron.

Fig. 9 shows a block diagram of an apparatus for processing data using a neural network, according to an embodiment of the disclosure.

As shown in fig. 9, the apparatus 900 for processing data using a neural network includes a second input module 910, a second processing module 920, and an output module 930.

The second input module 910 is configured to input data to be processed into the neural network;

the second processing module 920 is configured to process the data to be processed by neurons in the neural network, generating a processing result, wherein the at least one neuron performs the processing using an activation function in the form of the split rational function;

the output module 930 is configured to output the processing result.

According to an embodiment of the disclosure, the neural network is used for classifying images, the data to be processed includes images, and the processing result includes a category to which the images belong; or alternatively

According to an embodiment of the present disclosure, processing the data to be processed by neurons in the neural network includes:

outputting the second activation processing result from the first neuron.

Fig. 10 shows a block diagram of an electronic device according to an embodiment of the disclosure.

As shown in fig. 10, the electronic device 1000 includes a memory 1001 and a processor 1002. The memory 1001 is used to store one or more computer instructions.

According to an embodiment of the present disclosure, the one or more computer instructions are executed by the processor 1002 to perform the steps of:

Inputting a training sample into a neural network, the neural network comprising a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form;

Parameters of the neural network are adjusted to optimize the output result.

outputting the first activation processing result from the first neuron.

Inputting data to be processed into a neural network, wherein the neural network comprises a plurality of neurons, and an activation function of at least one neuron has a fractional rational function form;

And outputting the processing result.

outputting the second activation processing result from the first neuron.

As shown in fig. 11, the computer system 1100 includes a Central Processing Unit (CPU) 1101, which can execute various processes in the above-described embodiments in accordance with a program stored in a Read Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the system 1100 are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

The following components are connected to the I/O interface 1105: an input section 1106 including a keyboard, a mouse, and the like; an output portion 1107 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1108 including a hard disk or the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, and the like. The communication section 1109 performs communication processing via a network such as the internet. The drive 1110 is also connected to the I/O interface 1105 as needed. Removable media 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in drive 1110, so that a computer program read therefrom is installed as needed in storage section 1108.

In particular, according to embodiments of the present disclosure, the methods described above may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the above-described object class determination method. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1109, and/or installed from the removable media 1111.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules referred to in the embodiments of the present disclosure may be implemented in software or in programmable hardware. The units or modules described may also be provided in a processor, the names of which in some cases do not constitute a limitation of the unit or module itself.

As another aspect, the present disclosure also provides a computer-readable storage medium, which may be the computer-readable storage medium contained in the electronic device or the computer system in the above-described embodiment; or may be a computer-readable storage medium, alone, that is not assembled into a device. The computer-readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the invention referred to in this disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which any combination of features described above or their equivalents is contemplated without departing from the inventive concepts described. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims

1. A method of training a neural network comprising a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form, the method comprising:

Inputting a training sample into the neural network;

adjusting parameters of the neural network to optimize an output result;

2. The method of claim 1, wherein the split rational function is:

3. The method according to claim 2, characterized in that:

c=1, β=1, γ=1; or alternatively

∝＝2，β＝2，γ＝1。

4. The method according to claim 1, characterized in that:

the activation function includes at least one parameter;

5. The method according to claim 4, wherein:

6. The method according to claim 1, characterized in that:

7. The method of claim 1, wherein processing the training samples through neurons in the neural network comprises:

outputting the first activation processing result from the first neuron.

8. A method of processing data using a neural network, the neural network comprising a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form, the method comprising:

Inputting data to be processed into the neural network;

Outputting the processing result;

9. The method of claim 8, wherein the split rational function is:

10. The method according to claim 9, wherein:

c=1, β=1, γ=1; or alternatively

∝＝2，β＝2，γ＝1。

11. The method of claim 10, wherein at least some of the plurality of neurons use the same activation function.

12. The method of claim 8, wherein the neural network is any one or a combination of the following: convolutional neural networks, fully-connected neural networks, recurrent neural networks.

13. The method of claim 8, wherein processing the data to be processed by neurons in the neural network comprises:

outputting the second activation processing result from the first neuron.

14. An apparatus for training a neural network, the neural network comprising a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form, the apparatus comprising:

an adjustment module configured to adjust parameters of the neural network to optimize the output result;

15. The apparatus of claim 14, wherein the split rational function is:

16. The apparatus according to claim 15, wherein:

c=1, β=1, γ=1; or alternatively

∝＝2，β＝2，γ＝1。

17. The apparatus according to claim 14, wherein:

the activation function includes at least one parameter;

18. The apparatus according to claim 17, wherein:

19. The apparatus according to claim 14, wherein:

20. The apparatus of claim 14, wherein processing the training samples through neurons in the neural network comprises:

outputting the first activation processing result from the first neuron.

21. An apparatus for processing data using a neural network, the neural network comprising a plurality of neurons, an activation function of at least one of the neurons having a fractional rational function form, the apparatus comprising:

an output module configured to output the processing result;

22. The apparatus of claim 21, wherein the split rational function is:

23. The apparatus according to claim 22, wherein:

c=1, β=1, γ=1; or alternatively

∝＝2，β＝2，γ＝1。

24. The apparatus of claim 23, wherein at least some of the plurality of neurons use the same activation function.

25. The apparatus of claim 21, wherein the neural network is any one or a combination of: convolutional neural networks, fully-connected neural networks, recurrent neural networks.

26. The apparatus of claim 21, wherein processing the data to be processed by neurons in the neural network comprises:

outputting the second activation processing result from the first neuron.

27. An electronic device comprising a processor and a memory, wherein:

the memory is used for storing one or more computer instructions;

the one or more computer instructions being executable by the processor to implement the method of any one of claims 1-13.

28. A readable storage medium having stored thereon computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any of claims 1-13.

29. A neural network comprising a plurality of neurons, wherein an activation function of at least one neuron of the plurality of neurons has a fractional rational function form;

The neural network is used for classifying images, a training sample of the neural network comprises images, and an output result of the neural network comprises the category to which the images belong; or alternatively

The neural network is used for target detection, a training sample of the neural network comprises an image, and an output result of the neural network comprises a category to which a target contained in the image belongs and/or a target frame coordinate of the target; or alternatively

The neural network is used for positioning the face feature points, the training sample of the neural network comprises a face image, and the output result of the neural network comprises the position coordinates of the face feature points.

30. The neural network of claim 29, wherein the split rational function is:

31. The neural network of claim 30, wherein:

c=1, β=1, γ=1; or alternatively

∝＝2，β＝2，γ＝1。

32. The neural network of claim 31, wherein at least some of the plurality of neurons use the same activation function.

33. The neural network of claim 29, wherein the neural network is any one or a combination of: convolutional neural networks, fully-connected neural networks, recurrent neural networks.

34. An electronic device comprising a neural network according to any one of claims 29 to 33.