CN112308227B

CN112308227B - Neural network architecture searching method, device, terminal equipment and storage medium

Info

Publication number: CN112308227B
Application number: CN202011203194.7A
Authority: CN
Inventors: 朱威
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-11-02
Filing date: 2020-11-02
Publication date: 2024-05-28
Anticipated expiration: 2040-11-02
Also published as: CN112308227A

Abstract

The embodiment of the application discloses a neural network architecture searching method, a device, terminal equipment and a storage medium, which are suitable for digital medical treatment. The method comprises the following steps: determining a search space and a training data set for constructing a target neural network; adjusting the weight matrix and the architecture parameters in the initial neural network based on the plurality of sample data to obtain an adjusted weight matrix and architecture parameters; and determining a target neural network according to the adjusted weight matrix and the basic operation parameters, and determining target input attribute characteristic data for the target neural network according to the adjusted characteristic weight parameters. By adopting the embodiment of the application, the performance of the neural network model can be improved, and the efficiency of feature selection of the neural network can be improved.

Description

Neural network architecture searching method, device, terminal equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a neural network architecture searching method, device, terminal equipment, and storage medium.

Background

A neural network with good performance often has a subtle network structure, and requires a great deal of effort from human experts with high skill and experience to design. For example, graph neural networks are very popular in analyzing non-euclidean geometric data, such as social networks, biomedical data, knowledge maps, and the like, and many research advances have been made with them as tools. However, at present, the graph neural network cannot have good universality, and different graph network architectures are required for different graph structure data. But design drawing neural networks require a lot of manual work and domain knowledge. In addition, too many attribute features are input, so that the graph neural network is over-fitted, calculation resources are consumed, and too few attribute features are used, so that the graph neural network is poor in learning. Therefore, how to determine the attribute features for the graph neural network while determining the graph network architecture is a current urgent problem to be solved.

Disclosure of Invention

The embodiment of the application provides a neural network architecture searching method, a device, terminal equipment and a storage medium, which can improve the performance of a neural network model and the efficiency of feature selection of the neural network.

In a first aspect, an embodiment of the present application provides a neural network architecture searching method, where the method includes:

determining a search space and a training data set for constructing a target neural network, wherein the search space comprises a plurality of basic operations, the training data set comprises a plurality of sample data, the plurality of sample data comprises n types of input attribute characteristic data, and one sample data comprises m types of input attribute characteristic data, wherein n is greater than m;

Adjusting a weight matrix and an architecture parameter in an initial neural network based on the plurality of sample data to obtain an adjusted weight matrix and an adjusted architecture parameter, wherein the initial neural network comprises a plurality of nodes, every two adjacent nodes in the plurality of nodes are connected through at least two basic operations, the architecture parameter comprises a characteristic weight parameter and a basic operation parameter, the characteristic weight parameter comprises n characteristic weight values, one type of input attribute characteristic data corresponds to one characteristic weight value, the basic operation parameter comprises confidence degrees corresponding to basic operations between every two adjacent nodes, and the confidence degree corresponding to any basic operation connecting any two adjacent nodes is a probability value of the target basic operation between any two adjacent nodes;

And determining a target neural network according to the adjusted weight matrix and the basic operation parameters, and determining target input attribute characteristic data for the target neural network according to the adjusted characteristic weight parameters.

With reference to the first aspect, in one possible implementation manner, determining the target neural network according to the adjusted weight matrix and the basic operation parameter includes:

Determining the confidence coefficient corresponding to each basic operation between any two adjacent nodes from the adjusted basic operation parameters;

Determining a basic operation corresponding to the maximum confidence coefficient in the confidence coefficient corresponding to each basic operation between any two adjacent nodes as a target basic operation between any two adjacent nodes;

and generating a target neural network according to the adjusted weight matrix and target basic operation between every two adjacent nodes.

With reference to the first aspect, in one possible implementation manner, the determining, according to the adjusted feature weight parameter, target input attribute feature data for the target neural network includes:

determining the quantity x of input attribute characteristic data for the target neural network, wherein x is smaller than n;

The n feature weight values included in the adjusted feature weight parameters are arranged in a descending order;

And determining the x-type input attribute characteristic data indicated by the first x feature weight values after descending order as target input attribute characteristic data for the target neural network.

Acquiring a characteristic weight threshold;

Determining a plurality of feature weight values which are larger than or equal to the feature weight threshold value from n feature weight values included in the adjusted feature weight parameters;

And determining the multi-class input attribute characteristic data indicated by the characteristic weight values as target input attribute characteristic data for the target neural network.

With reference to the first aspect, in one possible implementation manner, the plurality of nodes include a plurality of layers of nodes, and an upper layer node and a lower layer node are connected through at least two basic operations;

In the initial neural network, the input of the first layer node includes m types of input attribute feature data in each sample data, the output of the first layer node is determined by the m types of input attribute feature data in each sample data and feature weight values corresponding to each input attribute feature data in the m types of input attribute feature data in each sample data, the input of any layer node after the first layer node is determined by the output of the upper layer node of any layer node, each basic operation between any layer node and the upper layer node of any layer node, and the confidence corresponding to each basic operation, and the output of the last layer node in the multi-layer nodes is used for adjusting the weight matrix and the architecture parameters in the initial neural network.

With reference to the first aspect, in one possible implementation manner, the input of the node at any layer is determined by performing, respectively, values obtained after performing various basic operations between the node at any layer and the node at any layer, and weighting and summing confidence degrees corresponding to the various basic operations, from the output of the node at any layer above.

With reference to the first aspect, in a possible implementation manner, the sample data includes sample drug data, and the input attribute feature data in the sample drug data includes atomic attribute feature data corresponding to each atom that forms the drug molecule, where an atomic attribute feature includes one or more of an atomic type, a number of chemical bonds, a formal charge, an atomic chirality, a number of connecting hydrogen atoms, an atomic orbit, and an aromaticity.

In a second aspect, an embodiment of the present application provides a neural network architecture search apparatus, including:

The data preparation module is used for determining a search space and a training data set for constructing a target neural network, wherein the search space comprises a plurality of basic operations, the training data set comprises a plurality of sample data, the plurality of sample data comprises n types of input attribute characteristic data, and one sample data comprises m types of input attribute characteristic data, wherein n is larger than m;

The parameter adjustment module is used for adjusting a weight matrix and an architecture parameter in an initial neural network based on the plurality of sample data to obtain an adjusted weight matrix and an adjusted architecture parameter, the initial neural network comprises a plurality of nodes, every two adjacent nodes in the plurality of nodes are connected through at least two basic operations, the architecture parameter comprises a characteristic weight parameter and a basic operation parameter, the characteristic weight parameter comprises n characteristic weight values, one type of input attribute characteristic data corresponds to one characteristic weight value, the basic operation parameter comprises confidence degrees corresponding to basic operations between every two adjacent nodes, and the confidence degree corresponding to any basic operation connected with any two adjacent nodes is a probability value of a target basic operation between any two adjacent nodes;

the network generation module is used for determining a target neural network according to the adjusted weight matrix and the basic operation parameters, and determining target input attribute characteristic data for the target neural network according to the adjusted characteristic weight parameters.

With reference to the second aspect, in one possible implementation manner, the network generating module includes a target neural network determining unit, where the target neural network determining unit includes:

the first processing subunit is used for determining the confidence coefficient corresponding to each basic operation between any two adjacent nodes from the adjusted basic operation parameters;

A target basic operation determining subunit, configured to determine a basic operation corresponding to a maximum confidence coefficient in confidence coefficients corresponding to each basic operation between any two adjacent nodes as a target basic operation between any two adjacent nodes;

and the second processing subunit is used for generating a target neural network according to the adjusted weight matrix and target basic operation between every two adjacent nodes.

With reference to the second aspect, in one possible implementation manner, the network generating module includes a first target input attribute feature data determining unit, where the first target input attribute feature data determining unit includes:

a third processing subunit, configured to determine a number x of input attribute feature data for the target neural network, where x is less than n;

The sorting subunit is used for descending order of n characteristic weight values included in the adjusted characteristic weight parameters;

and the fourth processing subunit is used for determining the x-type input attribute characteristic data indicated by the first x feature weight values after descending order as the target input attribute characteristic data for the target neural network.

With reference to the second aspect, in one possible implementation manner, the network generating module includes a second target input attribute feature data determining unit, where the second target input attribute feature data determining unit includes:

The threshold determining subunit is used for acquiring the characteristic weight threshold;

a fifth processing subunit, configured to determine a plurality of feature weight values greater than or equal to the feature weight threshold from n feature weight values included in the adjusted feature weight parameter;

And a sixth processing subunit, configured to determine, as target input attribute feature data for the target neural network, multiple types of input attribute feature data indicated by the feature weight values.

With reference to the second aspect, in one possible implementation manner, the plurality of nodes include a plurality of layers of nodes, and an upper layer node and a lower layer node are connected through at least two basic operations;

With reference to the second aspect, in one possible implementation manner, the input of the node at any layer is determined by performing, respectively, values obtained after performing various basic operations between the node at any layer and the node at any layer, and weighting and summing confidence degrees corresponding to the various basic operations, from the output of the node at any layer.

With reference to the second aspect, in one possible embodiment, the sample data includes sample drug data, and the input attribute feature data in the sample drug data includes atomic attribute feature data corresponding to each atom that makes up the drug molecule, where the atomic attribute feature includes one or more of an atomic type, a number of chemical bonds, a formal charge, an atomic chirality, a number of hydrogen atoms connected, an atomic orbit, and an aromaticity.

In a third aspect, an embodiment of the present application provides a terminal device, where the terminal device includes a processor and a memory, and the processor and the memory are connected to each other. The memory is configured to store a computer program supporting the terminal device to perform the method provided by the first aspect and/or any of the possible implementation manners of the first aspect, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method provided by the first aspect and/or any of the possible implementation manners of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method provided by the first aspect and/or any of the possible implementations of the first aspect.

In the embodiment of the application, the adjusted weight matrix and the adjusted architecture parameters can be obtained by determining the search space and the training data set for constructing the target neural network and adjusting the weight matrix and the architecture parameters in the initial neural network based on a plurality of sample data. And determining the target neural network according to the adjusted weight matrix and the basic operation parameters, and determining the target input attribute characteristic data for the target neural network according to the adjusted characteristic weight parameters. By adopting the embodiment of the application, the performance of the neural network model can be improved, and the efficiency of feature selection of the neural network can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a neural network architecture searching method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an initial neural network according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a neural network architecture searching device according to an embodiment of the present application;

fig. 4 is another schematic structural diagram of a neural network architecture searching device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The neural network architecture searching method provided by the embodiment of the application can be widely applied to the construction of various neural networks and the feature selection of the neural networks. The neural network includes convolutional neural network, cyclic neural network, graph convolution neural network, etc., which are not limited herein. For convenience of description, a neural network will be described as an example. According to the method, the search space and the training data set for constructing the target neural network are determined, and the weight matrix and the architecture parameters in the initial neural network are adjusted based on a plurality of sample data, so that the adjusted weight matrix and the adjusted architecture parameters can be obtained. And determining the target neural network according to the adjusted weight matrix and the basic operation parameters, and determining the target input attribute characteristic data for the target neural network according to the adjusted characteristic weight parameters. By adopting the embodiment of the application, the performance of the neural network model can be improved, and the efficiency of feature selection of the neural network can be improved.

The method and the related device according to the embodiments of the present application will be described in detail below with reference to fig. 1 to 5, respectively. The method provided by the embodiment of the application can comprise a data processing stage for acquiring a search space and a training data set, adjusting a weight matrix and architecture parameters in an initial neural network based on a plurality of sample data included in the training data set, determining a target neural network according to the adjusted weight matrix and basic operation parameters, determining target input attribute characteristic data for the target neural network according to the adjusted characteristic weight parameters, and the like. The implementation of the above-mentioned data processing stages can be seen from the implementation shown in fig. 1 below.

Referring to fig. 1, fig. 1 is a flow chart of a neural network architecture searching method according to an embodiment of the present application. The method provided by the embodiment of the application can comprise the following steps S101 to S103:

s101, determining a search space and a training data set for constructing a target neural network.

In some possible implementations, the search space and training data set may be determined based on the target task. Wherein the search space comprises a plurality of basic operations, and the training data set comprises a plurality of sample data. The plurality of sample data comprises n types of input attribute characteristic data, and one sample data comprises m types of input attribute characteristic data, wherein n is larger than m. It should be understood that the target task is to build a target neural network. For example, the target task may be to construct a face recognition neural network (i.e., the target neural network is a face recognition neural network) for recognizing a face from an input image and outputting a corresponding person name. For another example, the target task may be to construct a drug class diagram neural network (i.e., the target neural network is a drug class diagram neural network) for outputting whether a drug molecule is available for treating a disease according to the inputted characteristic data of the drug molecule.

Generally, the basic operations included in the search space may be convolution, pooling, or a combination of convolution and pooling, which are specifically determined according to the actual application scenario, and are not limited herein. For example, assume that the objective task is to construct a face recognition neural network, where a convolutional neural network may be generally employed, and thus, the underlying operations included in the search space may include one or more of an average pooling with a pooling kernel size of 3×3, a maximum pooling with a pooling kernel size of 3×3, a separate convolution with a convolutional kernel size of 5×5, and so on, without limitation. For another example, assume that the target task is to construct a drug class diagram neural network, where the drug class diagram neural network may typically employ a diagram neural network, and thus, the underlying operations included in the search space may include one or more of operations on intentional force coefficients, activation functions, message transfer functions, and the like, without limitation. The activation functions may include Sigmoid functions, tanh functions, RECTIFIED LINEAR Unit functions, linear functions, and the like, and the message transfer functions may include IDENTITY functions, and the like, without limitation.

It should be appreciated that the sample data included in the training dataset is data associated with the target task. For example, if the target task is to construct a face recognition neural network, the sample data in the training data set may be sample face picture data, where the face recognition neural network is configured to output, according to each input face picture data, a person name corresponding to each picture. For another example, if the target task is to construct a drug class diagram neural network, the sample data in the training data set may be sample drug data, wherein the input attribute feature data in the sample drug data includes atomic attribute feature data corresponding to each atom constituting the drug molecule, wherein the atomic attribute feature includes one or more of an atomic type, a number of chemical bonds, a formal charge, an atomic chirality, a number of connecting hydrogen atoms, an atomic orbit, and an aromaticity. It should be understood that the drug class diagram neural network is used to output the class of each drug molecule based on the input attribute characteristic data included in each drug molecule that is input. Wherein the class of drug molecules includes whether the drug molecules can be used to treat a disease (i.e., the drug class diagram neural network may be a classification model). In general, the attribute features for describing a certain object may include a plurality of attribute features, but since too many attribute features of the input model consume computing resources and reduce the training speed of the model, the input attribute feature data for the target neural network in the present application is feature vectors corresponding to some attribute features in all attribute features included in the certain object.

For example, assume that the target neural network is a drug class diagram neural network in which any one drug molecule includes a plurality of atoms, i.e., the drug molecule is composed of a plurality of atoms. The attribute of each atom may include 7 attribute features (i.e., n=7) such as atom type, number of chemical bonds, formal charge, atom chirality, number of hydrogen atoms connected, atom orbit, and aromaticity. Therefore, in order to increase the training speed and reduce the video memory, sample data (i.e., m=3) including 3 types of attribute features can be randomly selected from the 7 types of attribute features for training of the neural network, so as to obtain the drug class diagram neural network meeting the target convergence condition. Wherein the 3 kinds of input attribute feature data included in each sample data may be different. For example, the 3 types of attribute features included in one sample data may be an atom type, a number of chemical bonds, and a formal charge, and the 3 types of attribute features included in the other sample data may be an atom type, a number of connecting hydrogen atoms, an atom track, and the like, which are determined according to an actual application scenario, and are not limited herein.

It should be appreciated that in the neural network training phase, by determining sample data including m types of input attribute feature data from n types of input attribute feature data for training of the target neural network, that is, selecting part of attribute features from all attribute features for training in the input drug class diagram neural network, data enhancement can be achieved, and a training data set can be enlarged. This is because only a part of the attribute features of a sample are included in the sample data, and thus a plurality of sample data can be generated based on the same sample. For example, assuming that a certain sample includes 7 types of attribute features (i.e., n=7), if any sample data includes 3 types of attribute features (i.e., m=3), a maximum of generation can be performed based on the sampleSample data. For example, assuming that the above 7 types of attribute features are attribute feature 1, attribute feature 2, attribute feature 3, attribute feature 4, attribute feature 5, attribute feature 6, and attribute feature 7, respectively, the 3 types of attribute features included in one part of sample data in the training data set may be attribute feature 1, attribute feature 2, and attribute feature 3, respectively, the 3 types of attribute features included in the other part of sample data in the training data set may be attribute feature 4, attribute feature 5, and attribute feature 6, respectively, the 3 types of attribute features included in the other part of sample data in the training data set may also be attribute feature 1, attribute feature 2, and attribute feature 7, respectively, the 3 types of attribute features included in the other part of sample data in the training data set may also be attribute feature 3, attribute feature 4, and attribute feature 5, respectively, and the like, which are not limited thereto.

S102, adjusting the weight matrix and the architecture parameters in the initial neural network based on the plurality of sample data to obtain the adjusted weight matrix and the adjusted architecture parameters.

In some possible implementations, the weight matrix and the architecture parameters in the initial neural network may be adjusted based on the plurality of sample data to obtain an adjusted weight matrix and architecture parameters. Wherein m types of attribute features included in each sample data are different. It will be appreciated that since the input attribute feature data in the input initial neural network is part of all attribute features that an object includes, for each type of attribute feature, a randomly initialized feature weight value may be assigned before training. One type of input attribute feature data corresponds to one feature weight value, and each feature weight value corresponding to each attribute feature constitutes a feature weight parameter. It will be appreciated that the sum of the feature weight values corresponding to the attribute features is equal to 1.

For example, suppose the target neural network is a drug class diagram neural network in which a drug molecule is composed of a plurality of atoms, and the attribute characteristics of each atom include 7 types of attribute characteristics (i.e., n=7) of atom type, number of chemical bonds, formal charge, atomic chirality, number of connecting hydrogen atoms, atomic orbitals, aromaticity. Thus, the feature weight parameter may include 7 feature weight values, where one type of input attribute feature data corresponds to one feature weight value. Specifically, it may be assumed that an atomic type-corresponding feature weight value 1, a chemical bond number-corresponding feature weight value 2, a formal charge-corresponding feature weight value 3, an atomic chiral-corresponding feature weight value 4, a connection hydrogen number-corresponding feature weight value 5, an atomic orbit-corresponding feature weight value 6, and an aromaticity-corresponding feature weight value 7. It should be understood that the initial value of each feature weight value of the random initialization may be set to 1/7, i.e., during the random initialization phase, feature weight value 1=feature weight value 2=feature weight value 3=feature weight value 4=feature weight value 5=feature weight value 6=feature weight value 7=1/7.

It should be appreciated that the initial neural network may comprise a plurality of nodes, wherein each two adjacent nodes in the plurality of nodes are connected by at least two basic operations. Generally, the architecture parameters may include feature weight parameters and basic operational parameters. The characteristic weight parameters comprise n characteristic weight values, and one type of input attribute characteristic data corresponds to one characteristic weight value. The basic operation parameters comprise the confidence coefficient corresponding to each basic operation between every two adjacent nodes, wherein the confidence coefficient corresponding to any basic operation connecting any two adjacent nodes is the probability value of any basic operation serving as the target basic operation between any two adjacent nodes.

It will be appreciated that the more nodes that are included in the initial neural network, the more corresponding parameters and the more computing resources that are required. Accordingly, the fewer nodes included in the initial neural network, the fewer corresponding parameters, and the fewer computing resources required. For example, taking an initial neural network comprising 3 nodes, node 0 node 1 and node 2 respectively, the search space comprises 6 basic operations, base operation 1, base operation 2, base operation 3, base operation 4, base operation 5 and base operation 6 respectively as examples. Referring to fig. 2, fig. 2 is a schematic diagram of an initial neural network according to an embodiment of the application. It is assumed that any two nodes are connected by three basic operations. As shown in fig. 2, the node 0 and the node 1 are connected through a base operation 1, a base operation 2 and a base operation 3, and the node 1 and the node 2 are connected through a base operation 4, a base operation 5 and a base operation 6. The confidence corresponding to each basic operation of multiple basic operations between any two adjacent nodes can be called a basic operation parameter. In general, the sum of the confidence levels of the various underlying operations connecting any two neighboring nodes in the initial neural network is 1. For example, please refer to fig. 2, wherein the node 0 and the node 1 are connected through the base operation 1, the base operation 2 and the base operation 3. The confidence coefficient corresponding to the basic operation 1 is w ₁₁ =0.2, the confidence coefficient corresponding to the basic operation 2 is w ₁₂ =0.5, and the confidence coefficient corresponding to the basic operation 3 is w ₁₃ =0.3. The node 1 and the node 2 are connected through a basic operation 4, a basic operation 5 and a basic operation 6. The confidence coefficient corresponding to the basic operation 4 is w ₂₁ =0.2, the confidence coefficient corresponding to the basic operation 5 is w ₂₂ =0.5, and the confidence coefficient corresponding to the basic operation 6 is w ₂₃ =0.3. The sum of the confidence coefficient corresponding to each basic operation between any two adjacent nodes is 1, i.e. w ₁₁+w₁₂+w₁₃＝1,w₂₁+w₂₂+w₂₃ =1.

Specifically, the plurality of nodes include a plurality of layers of nodes, wherein an upper layer node and a lower layer node are connected through at least two basic operations. In the initial neural network, the input of the first layer node comprises m types of input attribute feature data in each sample data, the output of the first layer node is determined by m types of input attribute feature data in each sample data and feature weight values corresponding to each input attribute feature data in m types of input attribute feature data in each sample data, the input of any layer node after the first layer node is determined by the output of the upper layer node of any layer node, each basic operation between any layer node and the upper layer node of any layer node and the confidence corresponding to each basic operation, and the output of the last layer node in the multi-layer node is used for adjusting weight matrix and architecture parameters in the initial neural network. Generally, the output of a first level node is the product of each input attribute feature data and its corresponding feature weight value. The input of any layer of nodes after the first layer of nodes can be determined by the output of the upper layer of nodes of any layer of nodes after the weighted summation of the values obtained after various basic operations between any layer of nodes and the upper layer of nodes of any layer of nodes and the confidence corresponding to various basic operations.

For example, referring to fig. 2 together, the first layer node of the initial neural network in fig. 2 is node 0, the second layer node is node 1, and the third layer node is node 2. The node 0 includes characteristic weight parameters, which are a characteristic weight value 1, a characteristic weight value 2, a characteristic weight value 3, a characteristic weight value 4, a characteristic weight value 5 and a characteristic weight value 6. The node 1 includes a weight matrix 1, and the node 2 includes a weight matrix 2. The method comprises the steps of providing an input attribute feature data 1 corresponding to a feature weight value 1, an input attribute feature data 2 corresponding to a feature weight value 2, an input attribute feature data 3 corresponding to a feature weight value 3, an input attribute feature data 4 corresponding to a feature weight value 4, an input attribute feature data 5 corresponding to a feature weight value 5, and an input attribute feature data 6 corresponding to a feature weight value 6. Assume that 3 types of input attribute feature data are included in sample data input into the node 0, for example, a sample data set includes sample data 1 and sample data 2, where the sample data 1 includes input attribute feature data 1, input attribute feature data 2, and input attribute feature data 3, and the sample data 2 includes input attribute feature data 4, input attribute feature data 5, and input attribute feature data 6. Therefore, by inputting the sample data 1 into the node 0, the output X1 of the node 0 is the product of the input attribute feature data 1 and the feature weight value 1, the product of the input attribute feature data 2 and the feature weight value 2, the product of the input attribute feature data 3 and the feature weight value 3, and by inputting the sample data 2 into the node 0, the output of the node 0 is the product of the input attribute feature data 4 and the feature weight value 4, the product of the input attribute feature data 5 and the feature weight value 5, and the product of the input attribute feature data 6 and the feature weight value 6.

The input X2 of the node 1 is a first output value obtained by performing basic operation 1 on the output X1 of the node 0, a second output value obtained by performing basic operation 2 on the output X1 of the node 0, and a third output value obtained by performing basic operation 3 on the output X1 of the node 0 and the corresponding confidence coefficient thereof are respectively determined after weighted summation. I.e. the input x2=first output value xw ₁₁ +second output value xw ₁₂ +third output value xw ₁₃ of node 1. After the weight matrix 1 in the node 1 is calculated, assuming that the node 1 outputs X3, the input X4 of the node 2 is equal to a fourth output value obtained by the output X3 of the node 1 through the base operation 4, a fifth output value obtained by the output X3 of the node 1 through the base operation 5, and the sixth output value obtained by the output X3 of the node 1 through the base operation 6 is determined after weighted summation with the corresponding confidence degrees. I.e. the input X4 of node 2 = fourth output value X w ₂₁ + fifth output value X w ₂₂ + sixth output value X w ₂₃, and so on, until the output of the last layer of nodes in the multi-layer nodes is used to adjust the weight matrix and architecture parameters in the initial neural network.

It is understood that the larger the confidence coefficient corresponding to a basic operation between any two adjacent nodes, the larger the probability that the two adjacent nodes are connected by adopting the basic operation, that is, the larger the probability that the basic operation is used as a target basic operation between any two adjacent nodes. For example, in each basic operation connecting the node 0 and the node 1, since the confidence level w ₁₂ =0.5 > the confidence level w ₁₃ =0.3 > the confidence level w ₁₁ =0.2 of the basic operation 3, the probability of connecting the node 0 and the node 2 through the basic operation 2 is higher, that is, the probability of the basic operation 2 as the target basic operation between the node 0 and the node 1 is higher. Between node 1 and node 2, since confidence level w ₂₁ =0.5 > confidence level w ₂₂ =0.3 > confidence level w ₂₃ =0.2 corresponding to base operation 1 corresponding to base operation 5, the greater the likelihood that node 1 and node 2 are connected by base operation 4, i.e., the greater the probability that base operation 4 is the target base operation between node 1 and node 2.

In general, we refer to the weights inherent to a neural network as a weight matrix. For example, a target neural network is taken as an example of a face recognition neural network. Among them, convolutional neural networks can be generally used for face recognition neural networks. It should be appreciated that convolutional neural networks may generally include an input layer, a convolutional layer, a pooling layer, and a neural network layer. Wherein. The convolution layer may include a number of convolution operators (i.e., weight matrices) that function in image processing as a filter that extracts specific information from the input image matrix. In general, different weight matrices may be used to extract different features in an image, e.g., one weight matrix is used to extract image edge information, another weight matrix is used to extract a particular color of the image, yet another weight matrix is used to blur noise in the image, etc. Thus, a neural network may generally use multiple weight matrices for feature extraction, such as weight matrix 1 and weight matrix 2 shown in fig. 2. Generally, the weight values in the weight matrices are obtained through a great amount of training in practical application, and each weight matrix formed by the weight values obtained through training can extract information from an input image, so that the convolutional neural network is helped to perform correct prediction. Therefore, the application can train the initial neural network based on a plurality of sample data in the training data set to obtain the adjusted weight matrix and the architecture parameters. In order to achieve the purpose that both the weight matrix and the architecture parameters can be optimized at the same time, the embodiment of the application can divide sample data in a training data set into two data sets, wherein the value of the architecture parameters can be fixed on one data set, then the value of the weight matrix is adjusted in a gradient descent mode, further, the value of the weight matrix is fixed on the other data set, then the value of the architecture parameters is adjusted in a gradient descent mode, and the adjusted weight matrix and architecture parameters are acquired after the adjustment until convergence conditions are met.

S103, determining a target neural network according to the adjusted weight matrix and the basic operation parameters, and determining target input attribute characteristic data for the target neural network according to the adjusted characteristic weight parameters.

In some possible embodiments, after the adjusted weight matrix and the architecture parameters are acquired, a target neural network may be determined according to the basic operation parameters included in the adjusted weight matrix and the architecture parameters, and target input attribute feature data for the target neural network may be determined according to the feature weight parameters included in the adjusted architecture parameters. Specifically, the confidence coefficient corresponding to each basic operation between any two adjacent nodes can be determined from the adjusted basic operation parameters, then the basic operation corresponding to the maximum confidence coefficient in the confidence coefficient corresponding to each basic operation between any two adjacent nodes is determined as the target basic operation between any two adjacent nodes, and then the target neural network is generated according to the adjusted weight matrix and the target basic operation between each two adjacent nodes. Optionally, after the target neural network is generated, the target neural network can be trained again, so that the target neural network with better model effect can be obtained.

Further, by determining the number x of input attribute feature data for the target neural network and then performing descending order or ascending order of n feature weight values included in the adjusted feature weight parameters, the x-class input attribute feature data indicated by the first x feature weight values after descending order may be determined as the target input attribute feature data for the target neural network, or the x-class input attribute feature data indicated by the last x feature weight values after ascending order may be determined as the target input attribute feature data for the target neural network, where x is smaller than n, for example, x=m. Optionally, a feature weight threshold may be obtained, and then a plurality of feature weight values greater than or equal to the feature weight threshold are determined from n feature weight values included in the adjusted feature weight parameter, and multi-class input attribute feature data indicated by the plurality of feature weight values greater than or equal to the feature weight threshold are determined as target input attribute feature data for the target neural network. That is, the determined target input attribute feature data may be used as a result of feature selection. Therefore, when the target neural network is trained again later, the target neural network can be trained by taking the sample data comprising the target input attribute characteristic data as a training sample, so that the target neural network with better model effect is obtained. And extracting target input attribute characteristic data included in the data to be processed as input data of the model when the target neural network after retraining is used later.

For example, assume that the target neural network is a drug class diagram neural network, wherein the drug molecule comprises a plurality of atoms, and the attribute features of each atom comprise 7 types of attribute features of atom type, number of chemical bonds, formal charge, atomic chirality, number of connecting hydrogen atoms, atomic orbitals, aromaticity. After training the initial neural network based on the training data set, it is assumed that the atomic type corresponds to the characteristic weight parameter 1=0.3, the chemical bond number corresponds to the characteristic weight parameter 2=0.2, the form charge corresponds to the characteristic weight parameter 3=0.01, the atomic chiral corresponds to the characteristic weight parameter 4=0.09, the connection hydrogen number corresponds to the characteristic weight parameter 5=0.04, the atomic orbit corresponds to the characteristic weight parameter 6=0.06, and the aromaticity corresponds to the characteristic weight parameter 7=0.3. Wherein, the feature weight threshold=0.1, so that the atom type, the number of chemical bonds and the aromaticity in the feature weight parameter can be determined as the target input attribute feature data for the drug class diagram neural network. That is, when the drug classification is performed by using the drug classification graph neural network in the following, 3 feature data of the atom type, the number of chemical bonds and the aromaticity of the drug molecules to be classified can be input into the trained drug classification graph neural network for drug classification.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a neural network architecture searching device according to an embodiment of the present application. The neural network architecture searching device provided by the embodiment of the application comprises the following components:

a data preparation module 31, configured to determine a search space for constructing a target neural network and a training data set, where the search space includes a plurality of basic operations, the training data set includes a plurality of sample data, where the plurality of sample data includes n types of input attribute feature data, and one sample data includes m types of input attribute feature data, where n is greater than m;

The parameter adjustment module 32 is configured to adjust a weight matrix and an architecture parameter in an initial neural network based on the plurality of sample data to obtain an adjusted weight matrix and an adjusted architecture parameter, where the initial neural network includes a plurality of nodes, each two adjacent nodes in the plurality of nodes are connected by at least two basic operations, the architecture parameter includes a feature weight parameter and a basic operation parameter, the feature weight parameter includes n feature weight values, one type of input attribute feature data corresponds to one feature weight value, the basic operation parameter includes a confidence coefficient corresponding to each basic operation between each two adjacent nodes, and the confidence coefficient corresponding to any basic operation connecting any two adjacent nodes is a probability value of the target basic operation between any two adjacent nodes;

the network generating module 33 is configured to determine a target neural network according to the adjusted weight matrix and the basic operation parameter, and determine target input attribute feature data for the target neural network according to the adjusted feature weight parameter.

Referring to fig. 4, fig. 4 is another schematic structural diagram of a neural network architecture searching device according to an embodiment of the application.

In some possible embodiments, the network generating module 33 includes a target neural network determining unit 331, and the target neural network determining unit 331 includes:

a first processing subunit 3310, configured to determine, from the adjusted basic operation parameters, a confidence level corresponding to each basic operation between any two adjacent nodes;

A target basic operation determining subunit 3311, configured to determine, as a target basic operation between any two adjacent nodes, a basic operation corresponding to a maximum confidence coefficient among confidence coefficients corresponding to each basic operation between any two adjacent nodes;

a second processing subunit 3312, configured to generate a target neural network according to the adjusted weight matrix and the target basic operation between each two adjacent nodes.

In some possible embodiments, the network generating module 33 includes a first target input attribute feature data determining unit 332, where the first target input attribute feature data determining unit 332 includes:

a third processing subunit 3320, configured to determine a number x of input attribute feature data for the target neural network, where x is less than n;

a sorting subunit 3321, configured to descending order n feature weight values included in the adjusted feature weight parameters;

The fourth processing subunit 3322 is configured to determine, as the target input attribute feature data for the target neural network, the x-class input attribute feature data indicated by the first x feature weight values after the descending order.

In some possible embodiments, the network generating module 33 includes a second target input attribute feature data determining unit 333, and the second target input attribute feature data determining unit 333 includes:

A threshold determination subunit 3330, configured to obtain a feature weight threshold;

A fifth processing subunit 3331, configured to determine a plurality of feature weight values greater than or equal to the feature weight threshold from n feature weight values included in the adjusted feature weight parameters;

The sixth processing subunit 3332 is configured to determine the multi-class input attribute feature data indicated by the plurality of feature weight values as target input attribute feature data for the target neural network.

In some possible embodiments, the plurality of nodes include a plurality of layers of nodes, and the upper layer node and the lower layer node are connected through at least two basic operations;

In some possible embodiments, the input of the node at any layer is determined by performing weighted summation of values obtained by performing various basic operations between the node at any layer and the node at any layer, and confidence levels corresponding to the various basic operations, respectively, from the output of the node at any layer.

In some possible embodiments, the sample data includes sample drug data, and the input attribute feature data in the sample drug data includes atomic attribute feature data corresponding to each atom constituting the drug molecule, wherein the atomic attribute feature includes one or more of an atomic type, a number of chemical bonds, a formal charge, an atomic chirality, a number of hydrogen atoms connected, an atomic orbit, and an aromaticity.

In the embodiment of the application, the neural network architecture searching device can obtain the adjusted weight matrix and architecture parameters by determining the searching space and the training data set for constructing the target neural network and adjusting the weight matrix and the architecture parameters in the initial neural network based on a plurality of sample data. And determining the target neural network according to the adjusted weight matrix and the basic operation parameters, and determining the target input attribute characteristic data for the target neural network according to the adjusted characteristic weight parameters. By adopting the embodiment of the application, the performance of the neural network model can be improved, and the efficiency of feature selection of the neural network can be improved.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 5, the terminal device in this embodiment may include: one or more processors 401, memory 402, and a transceiver 403. The processor 401, the memory 402, and the transceiver 403 are connected by a bus 404. The memory 402 is used for storing a computer program comprising program instructions, and the processor 401 is used for executing the program instructions stored in the memory 402 for performing the following operations:

In some possible embodiments, the processor 401 is configured to:

Acquiring a characteristic weight threshold;

It should be appreciated that in some possible embodiments, the processor 401 may be a central processing unit (central processing unit, CPU), which may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application Specific Integrated Circuits (ASICs), off-the-shelf programmable gate arrays (field programmable GATE ARRAY, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory 402 may include read only memory and random access memory and provides instructions and data to the processor 401. A portion of memory 402 may also include non-volatile random access memory. For example, the memory 402 may also store information of device type.

In a specific implementation, the terminal device may execute, through each functional module built in the terminal device, an implementation manner provided by each step in fig. 1, and specifically, the implementation manner provided by each step may be referred to, which is not described herein again.

In the embodiment of the application, the terminal equipment can obtain the adjusted weight matrix and the adjusted architecture parameters by determining the search space and the training data set for constructing the target neural network and adjusting the weight matrix and the architecture parameters in the initial neural network based on a plurality of sample data. And determining the target neural network according to the adjusted weight matrix and the basic operation parameters, and determining the target input attribute characteristic data for the target neural network according to the adjusted characteristic weight parameters. By adopting the embodiment of the application, the performance of the neural network model can be improved, and the efficiency of feature selection of the neural network can be improved.

The embodiment of the application also provides a computer readable storage medium, and the computer readable storage medium stores a computer program, where the computer program includes program instructions, and when the program instructions are executed by a processor, the program instructions implement a neural network architecture searching method provided in each step in fig. 1, and specifically, the implementation manner provided in each step may be referred to, which is not described herein again.

The computer readable storage medium may be the neural network architecture search apparatus provided in any one of the foregoing embodiments or an internal storage unit of the terminal device, for example, a hard disk or a memory of an electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like, which are provided on the electronic device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used to store the computer program and other programs and data required by the electronic device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

The terms "first," "second," "third," "fourth" and the like in the claims and in the description and drawings of the present application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments. The term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The method and related apparatus provided in the embodiments of the present application are described with reference to the flowchart and/or schematic structural diagrams of the method provided in the embodiments of the present application, and each flow and/or block of the flowchart and/or schematic structural diagrams of the method may be implemented by computer program instructions, and combinations of flows and/or blocks in the flowchart and/or block diagrams. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or structural diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or structures.

Claims

1. A neural network architecture search method, the method comprising:

Determining a search space and a training data set for constructing a target neural network, wherein the search space comprises a plurality of basic operations, the training data set comprises a plurality of sample data, the plurality of sample data comprises n types of input attribute characteristic data, and one sample data comprises m types of input attribute characteristic data, wherein n is greater than m; wherein the target neural network is a drug class diagram neural network, and the basic operation comprises attention coefficient operation, an activation function or a message transfer function; the sample data comprises sample medicine data, and input attribute characteristic data in the sample medicine data comprises atomic attribute characteristic data corresponding to each atom composing a medicine molecule, wherein the atomic attribute characteristic comprises one or more of atom type, number of chemical bonds, form charge, atom chirality, number of connecting hydrogen atoms, atom orbit and aromaticity;

Adjusting a weight matrix and an architecture parameter in an initial neural network based on the plurality of sample data to obtain an adjusted weight matrix and an adjusted architecture parameter, wherein the initial neural network comprises a plurality of nodes, every two adjacent nodes in the plurality of nodes are connected through at least two basic operations, the architecture parameter comprises a characteristic weight parameter and a basic operation parameter, the characteristic weight parameter comprises n characteristic weight values, one type of input attribute characteristic data corresponds to one characteristic weight value, the basic operation parameter comprises confidence degrees corresponding to basic operations between every two adjacent nodes, and the confidence degree corresponding to any basic operation connected with any two adjacent nodes is a probability value of the target basic operation between any two adjacent nodes;

Determining a target neural network according to the adjusted weight matrix and basic operation parameters, and determining target input attribute characteristic data for the target neural network according to the adjusted characteristic weight parameters; the number of the target input attribute feature data is smaller than the n;

The determining the target neural network according to the adjusted weight matrix and the basic operation parameters comprises the following steps:

2. The method of claim 1, wherein determining target input attribute feature data for the target neural network based on the adjusted feature weight parameters comprises:

Determining the number x of input attribute feature data for the target neural network, x being less than n;

3. The method of claim 1, wherein determining target input attribute feature data for the target neural network based on the adjusted feature weight parameters comprises:

Acquiring a characteristic weight threshold;

4. A method according to any one of claims 1-3, wherein the plurality of nodes comprises a plurality of layers of nodes, and wherein an upper layer node and a lower layer node are connected by at least two basic operations;

in the initial neural network, the input of a first layer node comprises m types of input attribute feature data in each sample data, the output of the first layer node is determined by the m types of input attribute feature data in each sample data and feature weight values corresponding to each input attribute feature data in the m types of input attribute feature data in each sample data, the input of any layer node after the first layer node is determined by the output of an upper layer node of any layer node, each basic operation between any layer node and the upper layer node of any layer node and the confidence corresponding to each basic operation, and the output of the last layer node in the multi-layer node is used for adjusting weight matrix and architecture parameters in the initial neural network.

5. The method according to claim 4, wherein the input of the node at any layer is determined by performing, from the output of the node at any layer, various basic operations between the node at any layer and the node at any layer, respectively, and weighting and summing the confidence levels corresponding to the various basic operations.

6. A neural network architecture search apparatus, the apparatus comprising:

The data preparation module is used for determining a search space and a training data set for constructing a target neural network, wherein the search space comprises a plurality of basic operations, the training data set comprises a plurality of sample data, the plurality of sample data comprises n types of input attribute characteristic data, and one sample data comprises m types of input attribute characteristic data, wherein n is larger than m; wherein the target neural network is a drug class diagram neural network, and the basic operation comprises attention coefficient operation, an activation function or a message transfer function; the sample data comprises sample medicine data, and input attribute characteristic data in the sample medicine data comprises atomic attribute characteristic data corresponding to each atom composing a medicine molecule, wherein the atomic attribute characteristic comprises one or more of atom type, number of chemical bonds, form charge, atom chirality, number of connecting hydrogen atoms, atom orbit and aromaticity;

The network generation module is used for determining a target neural network according to the adjusted weight matrix and the basic operation parameters and determining target input attribute characteristic data for the target neural network according to the adjusted characteristic weight parameters; the number of the target input attribute feature data is smaller than the n;

Wherein the network generation module includes a target neural network determination unit including:

The target basic operation determining subunit is used for determining the basic operation corresponding to the maximum confidence coefficient in the confidence coefficient corresponding to each basic operation between any two adjacent nodes as the target basic operation between any two adjacent nodes;

7. A terminal device comprising a processor and a memory, said processor and memory being interconnected;

The memory is for storing a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-5.

8. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-5.