CN112036549B

CN112036549B - Neural network optimization method and device, electronic equipment and storage medium

Info

Publication number: CN112036549B
Application number: CN202010889955.2A
Authority: CN
Inventors: 袁坤; 李全全; 闫俊杰
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2020-08-28
Filing date: 2020-08-28
Publication date: 2021-09-10
Anticipated expiration: 2040-08-28
Also published as: CN112036549A

Abstract

Disclosed are a neural network optimization method, apparatus, electronic device and storage medium, the method comprising: inputting sample data into a neural network, wherein the sample data corresponds to the labeling data; mapping transformation operations in the neural network to nodes, the neural network including at least one of the nodes; connecting nodes corresponding to the neural network through edges to obtain a connection relation, wherein the edges represent a data transmission relation between the nodes, the output edges of the nodes have path weights, and the path weights are determined according to an aggregation result of input data of the nodes; and adjusting the path weight corresponding to the node according to the difference between the output data of the neural network and the labeled data.

Description

Neural network optimization method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to computer vision technologies, and in particular, to a neural network optimization method and apparatus, an electronic device, and a storage medium.

Background

In the design of neural networks, the same network structure is typically used for different samples in the data set, however, the difficulty of feature representation is increased by the sample differences in the data set.

At present, the capacity and representation capability of the network are usually increased by increasing the number of convolutional layers or expanding the convolutional layers, but this way will increase additional parameters and calculation amount and increase the cost of the model in actual deployment, so how to effectively increase the network representation capability for sample differences is still an urgent problem to be solved.

Disclosure of Invention

The present disclosure provides an optimization scheme for a neural network.

According to an aspect of the present disclosure, a method for optimizing a neural network is provided, the method including: inputting sample data into a neural network, wherein the sample data corresponds to the labeling data; mapping transformation operations in the neural network to nodes, the neural network including at least one of the nodes; connecting nodes corresponding to the neural network through edges to obtain a connection relation, wherein the edges represent a data transmission relation between the nodes, the output edges of the nodes have path weights, and the path weights are determined according to an aggregation result of input data of the nodes; and adjusting the path weight corresponding to the node according to the difference between the output data of the neural network and the labeled data.

In combination with any one of the embodiments provided by the present disclosure, the determining the path weight according to the aggregation result of the input data of the node includes: obtaining transformation characteristics obtained by carrying out transformation operation corresponding to the nodes on the aggregation result of the input data; and determining the path weight corresponding to the output edge of the node according to the transformation characteristic and the number of the output edges of the node.

In combination with any embodiment provided by the present disclosure, the configuring, by the node, a routing unit, where the routing unit includes a pooling layer, a full-link layer, and an active layer, and determining, according to the transformation feature and the number of output edges of the node, a path weight corresponding to the output edge of the node includes: down-sampling the transformation features through the pooling layer to obtain feature vectors, wherein the dimensionality of the feature vectors is determined according to the number of output edges of the nodes; performing full-connection processing on the feature vectors through the full-connection layer to obtain full-connection features; and inputting the full-connection characteristics into an activation layer to obtain the path weight of the output edge of the node corresponding to the full-connection characteristics.

In connection with any embodiment provided by the present disclosure, the path weight corresponding to each level node in the neural network is stored in an adjacency matrix, each row of the adjacency matrix represents the path weight corresponding to the input edge of the node, and each column of the adjacency matrix represents the weight corresponding to the output edge of the node, where the input edge of the node is the output edge of the preceding node connected to the node.

In combination with any one of the embodiments provided by the present disclosure, the sample data is batch training sample data, and the method further includes: and storing a plurality of adjacent matrixes obtained according to the batch of training sample data in the same cache region, wherein the plurality of adjacent matrixes are used for storing the path weight of each order node in the neural network corresponding to the batch of training sample data.

In combination with any one of the embodiments provided by the present disclosure, the connecting the nodes corresponding to the neural network by edges to obtain a connection relationship includes: and connecting each pair of nodes in each node corresponding to the neural network to obtain a full-connection relation.

In combination with any embodiment provided by the present disclosure, the method further comprises: in response to the adjusted path weight being smaller than a set threshold, removing an edge corresponding to the path weight; and/or, in response to the adjusted path weight being greater than or equal to the set threshold, retaining the edge corresponding to the path weight.

In combination with any embodiment provided by the present disclosure, the method further comprises: and adjusting the network parameters of the nodes while adjusting the path weights corresponding to the nodes.

In combination with any of the embodiments provided by the present disclosure, the transformation operation includes one or more of summing, convolution, normalization, activation.

In combination with any of the embodiments provided by the present disclosure, for each node, the features of the preamble node are aggregated by at least one input edge and input to the node, where the input edge is an edge connected to an input end of the node; the features generated by the node are output to a subsequent node via at least one output edge, which is an edge connected to an output of the node.

In combination with any one of the embodiments provided by the present disclosure, the mapping transformation operations in the neural network to nodes includes: mapping an input of the neural network to a first one of the nodes; mapping an output of the neural network to a last one of the nodes.

According to an aspect of the present disclosure, a neural network optimization apparatus is provided, the apparatus including: the input unit is used for inputting sample data into the neural network, wherein the sample data corresponds to the marking data; a mapping unit for mapping transformation operations in the neural network to nodes, the neural network comprising at least one of the nodes; the connection unit is used for connecting nodes corresponding to the neural network through edges to obtain a connection relation, wherein the edges represent the data transmission relation among the nodes, the output edges of the nodes have path weights, and the path weights are determined according to the aggregation result of the input data of the nodes; and the optimization unit is used for adjusting the path weight corresponding to the node according to the difference between the output data of the neural network and the labeled data.

In combination with any embodiment provided by the present disclosure, the sample data is batch training sample data, and the apparatus further includes a batch processing unit, configured to store, in the same cache region, multiple adjacent matrices obtained according to the batch training sample data, where the multiple adjacent matrices are used to store a path weight of each order node in a neural network corresponding to the batch training sample data.

In combination with any one of the embodiments provided by the present disclosure, the connection unit is specifically configured to: and connecting each pair of nodes in each node corresponding to the neural network to obtain a full-connection relation.

In combination with any embodiment provided by the present disclosure, the apparatus further includes a determining unit, configured to remove an edge corresponding to the path weight in response to that the adjusted path weight is smaller than a set threshold; and/or, in response to the adjusted path weight being greater than or equal to the set threshold, retaining the edge corresponding to the path weight.

In combination with any embodiment provided by the present disclosure, the apparatus further includes a parameter adjusting unit, configured to adjust a network parameter of the node while adjusting the path weight corresponding to the node.

In combination with any embodiment provided by the present disclosure, the mapping unit is specifically configured to: mapping an input of the neural network to a first one of the nodes; mapping an output of the neural network to a last one of the nodes.

According to an aspect of the present disclosure, an electronic device is provided, which includes a memory and a processor, the memory is used for storing computer instructions executable on the processor, and the processor is used for executing the computer instructions to implement the method of any embodiment provided by the present disclosure.

According to an aspect of the present disclosure, a computer-readable storage medium is proposed, on which a computer program is stored, which when executed by a first processing device implements the method of any of the embodiments provided by the present disclosure.

According to the optimization method, the optimization device, the electronic device and the storage medium for the neural network provided by any embodiment of the disclosure, sample data is input into the neural network, transformation operation in the neural network is mapped to nodes, and the nodes corresponding to the neural network are connected through edges to obtain a complete graph, wherein the output edges of the nodes have path weights determined according to the aggregation result of the input data of the nodes; and adjusting the path weight corresponding to the node according to the difference between the output data of the neural network and the labeled data corresponding to the sample data. The path weight corresponding to each node is related to the input data of the node, so that the path weight corresponding to each node is related to sample data, different feature fusion modes can be adopted for feature expression aiming at the sample, and the representation capability of the neural network can be improved under the condition of not increasing extra calculation amount and parameter amount.

Drawings

In order to more clearly illustrate one or more embodiments or technical solutions in the prior art in the present specification, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in one or more embodiments of the present specification, and other drawings can be obtained by those skilled in the art without inventive exercise.

Fig. 1 is a flowchart of an optimization method of a neural network according to at least one embodiment of the present disclosure;

fig. 2 is a connection diagram in an optimization method of a neural network according to at least one embodiment of the present disclosure;

fig. 3 is a schematic diagram of a node in an optimization method of a neural network according to at least one embodiment of the present disclosure;

fig. 4A is a schematic diagram of an adjacency matrix in an optimization method of a neural network according to at least one embodiment of the present disclosure;

FIG. 4B is a schematic illustration of an adjacency matrix in a batch process;

fig. 5 is a schematic structural diagram of an optimization apparatus of a neural network according to at least one embodiment of the present disclosure;

fig. 6 is a block diagram of an electronic device according to at least one embodiment of the present disclosure.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from one or more embodiments of the disclosure without making any creative effort shall fall within the scope of protection of the disclosure.

Fig. 1 is a flowchart of an optimization method of a neural network according to at least one embodiment of the present disclosure. As shown in fig. 1, the method includes steps 101 to 104.

In step 101, sample data is input to the neural network, the sample data corresponding to the annotation data.

The neural network can be an existing neural network or a newly constructed neural network; the structure of the neural network may include a plurality of convolution layers directly stacked, or may include a plurality of parallel modules, or may include a plurality of residual modules, and the like, and the present disclosure does not limit the initial structure of the neural network to be optimized.

And labeling data corresponding to the sample data, namely a label (label) labeled to the sample data. For example, for the classification task, the annotation data corresponding to the sample data is the class label of the target object in the sample image.

In step 102, transformation operations (transformations) in the neural network are mapped to nodes, the neural network comprising at least one of the nodes.

And (3) transformation operations in the neural network, including one or more of aggregation (e.g., summation), convolution, normalization, and activation. The transformation operation is mapped to, i.e. represented by, a node. It will be appreciated by those skilled in the art that the nodes may also represent other transformation operations and are not limited to those described above.

The number of nodes corresponding to the neural network is determined according to transformation operation performed by the neural network, and the neural network comprises at least one node.

In step 103, the nodes corresponding to the neural network are connected through edges to obtain a connection relationship.

In the embodiment of the present disclosure, the connections between nodes may be mapped to edges according to the initial structure of the neural network, that is, the data transmission relationship between the nodes is characterized by the edges. In addition, a connection relation graph obtained by connecting nodes corresponding to the neural network through edges can be represented by an acyclic graph, namely, a connection graph, so that the neural network structure is shown from a topological view.

The output edges of the nodes have path weights, and the path weights are determined according to the aggregation result of the input data of the nodes. For a node in the neural network, the path weight corresponding to the output edge of the node is determined according to the sum of the input features of the input edges of the node, and the input features of the node are related to sample data input to the neural network, so the path weight corresponding to the output edge of the node in the neural network is related to the sample data.

In step 104, the path weight corresponding to the node is adjusted according to the difference between the output data of the neural network and the labeled data.

In the embodiment of the present disclosure, the path weights corresponding to the output edges of the nodes may be adjusted, that is, the fusion mode of the nodes for the input features is adjusted, so that the difference between the output data of the neural network is smaller and smaller, thereby optimizing the neural network.

In the embodiment of the disclosure, sample data is input into a neural network, transformation operations in the neural network are mapped to nodes, and the nodes corresponding to the neural network are connected through edges to obtain a complete graph, wherein output edges of the nodes have path weights determined according to an aggregation result of the input data of the nodes; and adjusting the path weight corresponding to the node according to the difference between the output data of the neural network and the labeled data corresponding to the sample data. The path weight corresponding to each node is related to the input data of the node, so that the path weight corresponding to each node is related to sample data, different feature fusion modes can be adopted for feature expression aiming at the sample, and the representation capability of the neural network can be improved under the condition of not increasing extra calculation amount and parameter amount.

In some embodiments, for each node in the connection graph, features of the preamble nodes are aggregated by at least one input edge and input to the node, the input edge being an edge connected to an input of the node; the features generated by the node are output to a subsequent node via at least one output edge, which is an edge connected to an output of the node. The preorder node is a node with a serial number smaller than that of the node, and the posterior node is a node with a serial number larger than that of the node.

In one example, the input to the initial neural network maps to a first node in the set N of nodes; the output of the initial neural network is mapped to the last node in the set of nodes.

Fig. 2 is a connection diagram in a neural network optimization method according to at least one embodiment of the present disclosure, in which nodes 1 to 8 mapped by a transform operation in a neural network and edges directly connected between the nodes are shown, and input data is input to the neural network from the node 1 and output from the node 8.

The connection graph of the neural network may be represented as (Ν, epsilon), where Ν represents the set of nodes and epsilon represents the set of edges. For the ith node, the corresponding mapping function may be denoted as oⁱ(·), the set of edges can be represented as e ═ e^(i,j)I is more than or equal to 1 and j is less than or equal to N, wherein e^(i,j)Representing an edge connecting directly from the ith node to the jth node. Edge e^(i,j)With path weights a characterizing the degree of importance of the connection^ij. For each node, the number of input edges may be referred to as in-degree (in-degree) and the number of output edges may be referred to as out-degree (out-degree).

In the connection diagram, the output of the jth node can be represented by formula (1):

in the formula (1), xⁱRepresents input features from the ith node, and

representing the convolution weight of the jth node.

In some embodiments, the path weight of the output edge of the node may be determined from the aggregated result of the input data of the node by the following method.

Firstly, transformation characteristics obtained by carrying out transformation operation corresponding to the nodes on the aggregation result of the input data are obtained.

Fig. 3 is a schematic diagram of transformation operations of nodes in a neural network optimization method according to at least one embodiment of the present disclosure, which illustrates transformation operations of the nodes 6 in the connection diagram of fig. 3.

As can be seen from the connection diagram of FIG. 3, the node 6 inputs the edge e₃₆Input from a node3 characteristic x₃And by an input edge e₅₆Input features x from node 5₅And for the feature x₃And feature x₅Polymerization (addition) was carried out to obtain polymerization feature x'₆By polymerizing feature x 'to the'₆Performing convolution operation to obtain the transformation characteristic x of the node 6₆。

And then, according to the transformation characteristics and the number of the output edges of the nodes, determining the path weight corresponding to the output edges of the nodes.

As shown in FIG. 3, according to polymerized characteristic x'₆And the number of output edges of the node (output edges include e in this example)₆₇And e₆₈I.e. the number of output edges is 2), the output edge e can be determined₆₇Corresponding path weight a₆₇And an output edge e₆₈Corresponding path weight a₆₈。

In this example, node 6 is via output edge e₆₇Based on the path weight a₆₇Transforming feature x₆Passes to the subsequent node 7 and passes through the output edge a₆₈Based on the path weight a₆₈Characterizing feature x₆To the subsequent node 8.

In the embodiment of the disclosure, the path weight corresponding to the output edge is predicted according to the aggregation result of the input data of the node, so that the feature fusion related to the sample data input to the neural network is realized, and the representation capability of the neural network is improved.

In some embodiments, the prediction of the path weight corresponding to the output edge of the node may be implemented by configuring a routing unit for each node or a part of nodes corresponding to the neural network.

In one example, the routing unit may include a pooling layer, a full connection layer, and an active layer, and the determining, by the routing unit configured by the node, the path weight corresponding to the output edge of the node includes: obtaining a transformation characteristic obtained by transforming the aggregation result of the input data by the node; down-sampling the transformation features through the pooling layer to obtain feature vectors, wherein the dimensionality of the feature vectors is determined according to the number of output edges of the nodes; performing full-connection processing on the feature vectors through the full-connection layer to obtain full-connection features; inputting the full-connection characteristics into an activation layer, such as a sigmoid function, constraining the full-connection characteristics in a range of [0,1], and obtaining the path weight of the output edge of the node corresponding to the full-connection characteristics.

In one example, a transformation feature x obtained by a node performing a transformation operation on an input aggregation feature (sum of features input by each input edge) is obtained first_iThe transformation characteristic x_iCan be considered as global feature information. First, to transform the feature x_iPerforming global average pooling

The path weights for the various output edges of the node are then generated by making the full connection F (-) and sigmoid activation function σ (-) over. The routing mechanism may be represented as:

in the formula (2), the first and second groups,

and

is the weight parameter and bias parameter of the fully-connected layer, aⁱRepresenting the path weights.

It will be appreciated by those skilled in the art that other downsampling schemes or other activation functions may be used, and the present disclosure is not limited in this respect.

In the embodiment of the disclosure, the path weight of the output edge of the node is obtained by down-sampling, fully connecting and activating the transformation characteristics of the node, so that the path weight corresponding to the output edge can be predicted according to the aggregation result of the input data of the node, and thus the prediction related to the sample data input to the neural network is realized.

In some embodiments, the path weights corresponding to the nodes of each stage in the neural network may be stored in an adjacency matrix (adjacency matrix). As shown in fig. 4A, each row of the adjacency matrix represents an edge weight of an input edge of the node, and each column of the adjacency matrix represents an edge weight of an output edge of the node. For nodes without edge connections, the corresponding edge weight is zero. Where the size of a row (with an edge weight different from zero) represents the in degree of a node and the size of a column (with an edge weight different from zero) is called the out degree of a node.

Since a plurality of adjacency matrices may be stored in the same buffer area (buffer), batch training may be efficiently supported. For example, 4 adjacency matrices corresponding to the batch training shown in fig. 4B may be stored in the same buffer area.

For a batch training including B samples, the output of the jth node can be written as:

in the formula (3), the first and second groups,

and is

Representing the path weight of the edge.

In the disclosed embodiments, batch training may be efficiently supported by storing path weights in a cache matrix.

In some embodiments, each pair of nodes in the respective nodes of the neural network may be connected to obtain a full graph. For example, a full connection relationship graph (which may also be referred to as a full graph) may be obtained by connecting all nodes directly with inputs and outputs, and connecting any pair of nodes among the nodes. In the full connection relationship graph, each pair of nodes has a connection relationship therebetween.

For a full graph with N nodes, the search space includes 2^N(N-1)/2A possible topology. For neural networks with k-th order, the total search space can be expressed as

A broader search space may be provided compared to neuron-based and block-based approaches.

In some embodiments, a threshold is also set for each node using a learnable weight τ to control connectivity. The control mechanism can be expressed by equation (4):

in the formula (4), a^ijDenotes a path weight, and τ denotes a set threshold.

That is, in response to the adjusted path weight being less than the set threshold, the edge corresponding to the path weight may be removed, and/or in response to the adjusted path weight being greater than or equal to the set threshold, the edge corresponding to the path weight may be retained.

In the inference process, if the path weight corresponding to the connection is less than a set threshold, the corresponding edge can be marked as closed to save the calculation amount; a node may be described even if its input or output edges are all turned off. Accordingly, if the path weight corresponding to the connection is greater than or equal to the set threshold, the corresponding edge is retained. And, the greater the path weight corresponding to the connection, the more important the connection is.

In the embodiment of the present disclosure, the connection between nodes is controlled by using the learnable weight τ as the set threshold of each node, so that the feature fusion can be performed in a continuous manner, and the optimization effect is improved.

In some embodiments, the edge weights of the edges may be optimized, and the network parameters of the nodes may also be optimized.

In one example, the optimization objective may be expressed as:

in the formula (5), x represents an input sample, y represents label data corresponding to the input sample x, and w_rRepresents the path weight, w_oRepresenting a network parameter.

Fig. 5 is a neural network optimization apparatus provided in at least one embodiment of the present disclosure, the apparatus including: an input unit 501, configured to input sample data to a neural network, where the sample data corresponds to labeled data; a mapping unit 502 for mapping transformation operations in the neural network to nodes, the neural network comprising at least one of the nodes; a connection unit 503, configured to connect nodes corresponding to the neural network through edges to obtain a connection relationship, where the edges represent a data transmission relationship between the nodes, an output edge of the node has a path weight, and the path weight is determined according to an aggregation result of input data of the node; an optimizing unit 504, configured to adjust a path weight corresponding to the node according to a difference between output data of the neural network and the labeled data.

In some embodiments, the determining of the path weight from the aggregated result of the input data of the node comprises: obtaining transformation characteristics obtained by carrying out transformation operation corresponding to the nodes on the aggregation result of the input data; and determining the path weight corresponding to the output edge of the node according to the transformation characteristic and the number of the output edges of the node.

In some embodiments, the node is configured with a routing unit, the routing unit includes a pooling layer, a full-link layer, and an active layer, and the determining the path weight corresponding to the output edge of the node according to the transformation characteristic and the number of the output edges of the node includes: down-sampling the transformation features through the pooling layer to obtain feature vectors, wherein the dimensionality of the feature vectors is determined according to the number of output edges of the nodes; performing full-connection processing on the feature vectors through the full-connection layer to obtain full-connection features; and inputting the full-connection characteristics into an activation layer to obtain the path weight of the output edge of the node corresponding to the full-connection characteristics.

In some embodiments, the path weights corresponding to each level node in the neural network are stored in an adjacency matrix, each row of the adjacency matrix representing a path weight corresponding to an input edge of the node, and each column of the adjacency matrix representing a weight corresponding to an output edge of the node, wherein the input edge of the node is the output edge of a preceding node connected with the node.

In some embodiments, the sample data is batch training sample data, and the apparatus further includes a batch processing unit configured to store, in the same cache region, a plurality of adjacency matrices obtained according to the batch training sample data, where the plurality of adjacency matrices are used to store path weights of each level node in a neural network corresponding to the batch training sample data.

In some embodiments, the connection unit is specifically configured to: and connecting each pair of nodes in each node corresponding to the neural network to obtain a full-connection relation.

In some embodiments, the apparatus further includes a determining unit, configured to remove an edge corresponding to the path weight in response to the adjusted path weight being smaller than a set threshold; and/or, in response to the adjusted path weight being greater than or equal to the set threshold, retaining the edge corresponding to the path weight.

In some embodiments, the apparatus further includes a parameter adjusting unit, configured to adjust a network parameter of the node while adjusting the path weight corresponding to the node.

In some embodiments, the transformation operation comprises one or more of summing, convolving, normalizing, activating.

In some embodiments, for each node, the features of the preamble node are aggregated by at least one input edge and input to the node, the input edge being an edge connected to an input of the node; the features generated by the node are output to a subsequent node via at least one output edge, which is an edge connected to an output of the node.

In some embodiments, the mapping unit is specifically configured to: mapping an input of the neural network to a first one of the nodes; mapping an output of the neural network to a last one of the nodes.

Fig. 6 is an electronic device provided in at least one embodiment of the present disclosure, and the electronic device includes a memory and a processor, the memory is used for storing computer instructions executable on the processor, and the processor is used for executing the computer instructions to implement the optimization method of a neural network according to any one of the embodiments of the present disclosure.

At least one embodiment of the present specification also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the optimization method of a neural network according to any one of the embodiments of the present specification.

As will be appreciated by one skilled in the art, one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the data processing apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Embodiments of the subject matter and the functional operations described in this specification can be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and their structural equivalents, or a combination of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Moreover, a computer may be embedded in another device, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., an internal hard disk or a removable disk), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims

1. A method for optimizing a neural network, the method comprising:

inputting sample data into a neural network, wherein the sample data corresponds to the labeling data;

mapping transformation operations in the neural network to nodes, the neural network including at least one of the nodes;

connecting nodes corresponding to the neural network through edges to obtain a connection relation, wherein the edges represent a data transmission relation between the nodes, the output edges of the nodes have path weights, and the path weights are determined according to an aggregation result of input data of the nodes;

adjusting the path weight corresponding to the node according to the difference between the output data of the neural network and the labeled data; the path weight is determined according to an aggregation result of input data of the nodes, and comprises the following steps:

obtaining transformation characteristics obtained by carrying out transformation operation corresponding to the nodes on the aggregation result of the input data;

determining the path weight corresponding to the output edge of the node according to the transformation characteristic and the number of the output edges of the node;

the node is configured with a routing unit, and the routing unit is used for realizing the prediction of the path weight corresponding to the output edge of the node.

2. The method of claim 1, wherein the routing unit comprises a pooling layer, a full-link layer, and an active layer, and wherein determining the path weight corresponding to the output edge of the node according to the transformation characteristic and the number of output edges of the node comprises:

down-sampling the transformation features through the pooling layer to obtain feature vectors, wherein the dimensionality of the feature vectors is determined according to the number of output edges of the nodes;

performing full-connection processing on the feature vectors through the full-connection layer to obtain full-connection features;

and inputting the full-connection characteristics into an activation layer to obtain the path weight of the output edge of the node corresponding to the full-connection characteristics.

3. The method of claim 1, wherein the path weights corresponding to each level node in the neural network are stored in an adjacency matrix, each row of the adjacency matrix represents the path weights corresponding to the input edges of the node, and each column of the adjacency matrix represents the weights corresponding to the output edges of the node, wherein the input edges of the node are the output edges of the preceding nodes connected with the node.

4. The method of claim 3, wherein the sample data is batch training sample data, the method further comprising:

and storing a plurality of adjacent matrixes obtained according to the batch of training sample data in the same cache region, wherein the plurality of adjacent matrixes are used for storing the path weight of each order node in the neural network corresponding to the batch of training sample data.

5. The method according to any one of claims 1 to 4, wherein the connecting the nodes corresponding to the neural network by edges to obtain a connection relationship comprises:

and connecting each pair of nodes in each node corresponding to the neural network to obtain a full-connection relation.

6. The method according to any one of claims 1 to 4, further comprising:

in response to the adjusted path weight being smaller than a set threshold, removing an edge corresponding to the path weight; and/or the presence of a gas in the gas,

and in response to the adjusted path weight being greater than or equal to the set threshold, retaining the edge corresponding to the path weight.

7. The method according to any one of claims 1 to 4, further comprising: and adjusting the network parameters of the nodes while adjusting the path weights corresponding to the nodes.

8. The method of any one of claims 1 to 4, wherein the transformation operation comprises one or more of summing, convolution, normalization, activation.

9. The method of claim 8, wherein for each node, features of preamble nodes are aggregated by at least one input edge and input to the node, the input edge being an edge connected to an input of the node; the features generated by the node are output to a subsequent node via at least one output edge, which is an edge connected to an output of the node.

10. The method of claim 9, wherein mapping transformation operations in a neural network to nodes comprises:

mapping an input of the neural network to a first one of the nodes;

mapping an output of the neural network to a last one of the nodes.

11. An apparatus for optimizing a neural network, the apparatus comprising:

the input unit is used for inputting sample data into the neural network, wherein the sample data corresponds to the marking data;

a mapping unit for mapping transformation operations in the neural network to nodes, the neural network comprising at least one of the nodes;

the connection unit is used for connecting nodes corresponding to the neural network through edges to obtain a connection relation, wherein the edges represent the data transmission relation among the nodes, the output edges of the nodes have path weights, and the path weights are determined according to the aggregation result of the input data of the nodes;

the optimization unit is used for adjusting the path weight corresponding to the node according to the difference between the output data of the neural network and the labeled data;

the path weight is determined according to an aggregation result of input data of the nodes, and comprises the following steps:

12. An electronic device comprising a memory and a processor, the memory for storing computer instructions executable on the processor, the processor for executing the computer instructions to implement the method of any one of claims 1 to 10.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a first processing device, carries out the method of any one of claims 1 to 10.