WO2020211037A1 - 加速器的检测方法和验证平台 - Google Patents

加速器的检测方法和验证平台 Download PDF

Info

Publication number
WO2020211037A1
WO2020211037A1 PCT/CN2019/083225 CN2019083225W WO2020211037A1 WO 2020211037 A1 WO2020211037 A1 WO 2020211037A1 CN 2019083225 W CN2019083225 W CN 2019083225W WO 2020211037 A1 WO2020211037 A1 WO 2020211037A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
neural network
nodes
parent
current node
Prior art date
Application number
PCT/CN2019/083225
Other languages
English (en)
French (fr)
Inventor
王耀杰
林蔓虹
陈琳
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2019/083225 priority Critical patent/WO2020211037A1/zh
Priority to CN201980009150.XA priority patent/CN111656370A/zh
Publication of WO2020211037A1 publication Critical patent/WO2020211037A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the technical field of neural networks, and more specifically, to an accelerator detection method and verification platform.
  • the neural network After the neural network is generated, in order to use the neural network for data processing, it is generally necessary to load the neural network on the accelerator to run.
  • the performance of the accelerator may directly affect the subsequent use of neural networks for data processing. Therefore, how to better detect the performance of the accelerator is a problem that needs to be solved.
  • This application provides an accelerator detection method, a neural network generation method, a data processing method, and related devices to better perform accelerator detection.
  • an accelerator detection method includes: generating at least one target neural network; translating at least one target neural network into neural network instructions; inputting the neural network instructions to the accelerator and the software matching the accelerator respectively Execute in the model and determine the difference in the output results of the neural network instructions; determine the abnormal instructions during the operation of the accelerator according to the difference in the output results of the neural network instructions.
  • the performance of the accelerator can be effectively tested.
  • generating at least one target neural network includes: generating multiple target neural networks.
  • different neural networks can be used to detect the performance of the accelerator, and the performance of the accelerator can be better detected.
  • the aforementioned target neural network is a convolutional neural network.
  • the target neural network generated in this application can also be other types of neural networks other than the convolutional neural network, such as a feedforward neural network, a recurrent neural network, and so on.
  • generating at least one target neural network includes: determining the algebra of the target neural network, and the node types and the number of nodes of all generations of the target neural network, where the target neural network is Any one of the above-mentioned at least one target neural network; determining a target connection mode connecting all nodes in the target neural network according to preset node connection requirements; generating a target neural network according to the target connection mode.
  • the target neural network can be finally generated, which can be more flexible and convenient.
  • Types of neural networks Furthermore, when multiple types of neural networks are generated, the performance of the accelerator can be better tested.
  • determining the target connection method of connecting all nodes in the target neural network according to preset node connection requirements includes: determining the candidate parent node of the current node according to the node connection requirements, where the current node and The candidate parent node meets the node connection requirements; the actual parent node of the current node is selected from the candidate parent nodes; the connection relationship between the current node and the actual parent node of the current node is determined to finally generate the target connection mode.
  • the aforementioned candidate parent node may also be referred to as a candidate node of the parent node.
  • determining the candidate parent node of the current node according to the node connection requirement includes: determining the candidate parent node of the current node according to at least one of the following connection relationships; the node type of the current node is In Concat or Eltwise, the number of parent nodes of the current node is multiple, and the number of parent nodes of the current node is less than or equal to the number of candidate parent nodes of the current node; when the node type of the parent node of the current node is Active, the current node The node type of the node is a type other than Active; when the node type of the parent node of the current node is Global Pooling, the node type of the current node is Global Pooling; when the node type of the parent node of the current node is FC, the node type of the current node The node type is FC or Concat; when the node type of the parent node of the current node is Conv, Eltwise, Pooling, and Concat,
  • selecting the actual parent node of the current node from the candidate parent nodes includes: determining the probability that each of the candidate parent nodes is the actual parent node of the current node according to a probability density function; The actual parent node of the current node is determined from the candidate parent nodes according to the probability that each node in the candidate parent nodes is the actual parent node of the current node.
  • determining the actual parent node of the current node from the candidate parent nodes according to the probability that each of the candidate parent nodes is the actual parent node of the current node includes: A node whose probability as the actual parent node of the current node is greater than the preset probability value is determined as the actual parent node of the current node.
  • the above method further includes: adjusting the probability of each of the candidate parent nodes as the actual parent node of the current node according to the expectation and variance of the probability density function.
  • the width and depth of the target neural network can be adjusted, so that the target neural network whose depth and width meet the requirements can be generated.
  • the expectation and variance of the probability density function can be adjusted according to the requirements of the depth and width of the target neural network to be generated.
  • the greater the variance of the probability density function the greater the probability that the nodes in the adjacent generation are selected, the narrower the width of the network and the deeper the depth.
  • the aforementioned probability density function is a Gaussian function.
  • the above-mentioned generating the target neural network according to the target connection mode includes: determining the effective target connection relationship from the target connection relationship according to the preset effective node connection relationship; generating according to the effective target connection relationship Target neural network.
  • the above-mentioned effective connection relationship between nodes includes at least one of the following relationships: when the node type of the current node is Eltwise, the number of channels of multiple inputs of the current node remains the same; When the node type is FC or GlobalPooling, the current node can only connect to nodes other than FC, GlobalPooling, and act types.
  • determining the algebra of the target neural network to be generated, as well as the node type and number of nodes of all generations of the target neural network includes: determining the target neural network according to the operational requirements of the target neural network The algebra of the network, and the node type and number of nodes of all generations of the target neural network.
  • the above calculation requirements for the target neural network can be the demand for the amount of calculation (size).
  • size When the demand for calculation is small, fewer algebras can be set for the target neural network, and a smaller number of nodes can be set for each generation; and when When the computational demand is large, more algebras can be set for the target neural network, and more nodes can be set per generation.
  • the above calculation requirements for the target neural network can be the complexity of the calculation. When the calculation complexity is low, fewer algebras can be set for the target neural network, and fewer nodes can be set per generation; when the calculation complexity When it is higher, you can set more algebras for the target neural network, and you can also set more nodes per generation.
  • a method for generating a neural network includes: determining the algebra of the target neural network to be generated, and the node types and number of nodes of all generations of the target neural network; according to preset node connection requirements Determine the target connection mode connecting all nodes in the target neural network; generate the target neural network according to the target connection mode.
  • the target neural network can be finally generated, which can be more flexible and convenient.
  • Types of neural networks
  • a data processing method includes: determining the algebra of the target neural network to be generated, and the node types and the number of nodes of all generations of the target neural network; determining the connection according to the preset node connection requirements The target connection mode of all nodes in the target neural network; the target neural network is generated according to the target connection mode; the target neural network is used for data processing.
  • the target neural network can be finally generated, which can be more flexible and convenient.
  • Various types of neural networks can be more targeted to use specific neural networks for data processing.
  • an accelerator verification platform comprising: a memory for storing code; at least one processor for executing the code stored in the memory to perform the following operations: generating at least one target neural network; Translate at least one target neural network into neural network instructions; input the neural network instructions into the accelerator and the software model matching the accelerator for execution, and determine the difference in the output results of the neural network instructions; according to the difference in the output results of the neural network instructions Determine the abnormal instruction during accelerator operation.
  • a device for generating a neural network including: a memory for storing code; at least one processor for executing the code stored in the memory to perform the following operations: determining the target neural network to be generated The number of generations, and the node type and number of nodes of each generation; determine the target connection mode connecting each node in the target neural network according to preset node connection requirements; generate the target neural network according to the target connection mode.
  • a data processing device which is characterized by comprising: a memory for storing code; at least one processor for executing the code stored in the memory to perform the following operations: determining the target to be generated The algebra of the neural network, and the node types and the number of nodes of all generations of the target neural network; determine the target connection mode to connect all nodes in the target neural network according to the preset node connection requirements; connect according to the target Method to generate the target neural network; using the target neural network for data processing.
  • a computer-readable storage medium on which instructions for executing any one of the first, second, and third aspects are stored.
  • a computer program product which includes instructions for executing any one of the methods in the first, second, and third aspects.
  • Figure 1 is a schematic diagram of the neural network structure
  • FIG. 2 is a schematic flowchart of an accelerator detection method according to an embodiment of the present application
  • Fig. 3 is a schematic diagram of a neural network generation process according to an embodiment of the present application.
  • Figure 4 is a schematic diagram of the algebra of the determined target neural network, the number of nodes in each generation, and the node type;
  • Figure 5 is a schematic diagram of a possible node connection relationship of a neural network
  • Figure 6 is a schematic diagram of a possible node connection relationship of a neural network
  • Figure 7 is a schematic diagram of a possible node connection relationship of a neural network
  • FIG. 8 is a schematic diagram of a possible node connection relationship of a neural network
  • Fig. 9 is a schematic diagram of a neural network generation process according to an embodiment of the present application.
  • FIG. 10 is a schematic block diagram of a verification platform of an accelerator according to an embodiment of the present application.
  • FIG. 11 is a schematic block diagram of an apparatus for generating a neural network according to an embodiment of the present application.
  • Fig. 12 is a schematic block diagram of a data processing device according to an embodiment of the present application.
  • Figure 1 is a schematic diagram of the neural network structure.
  • the neural network in FIG. 1 may be a convolutional neural network or other types of neural networks, which is not limited in this application.
  • the structure of the neural network mainly includes three parts: node (node), generation (generation) and tree (tree).
  • the neural network includes nodes 1 to 9, which together form the 0th to 4th generation nodes, and the nodes included in each generation are as follows:
  • the first generation node 2, node 3, node 4;
  • the second generation node 5, node 6;
  • the third generation node 7, node 8;
  • the node of the previous generation can be used as the parent node of the node of the subsequent generation, and the node of the subsequent generation can be the child node of the node of the previous generation.
  • nodes from the 1st to 4th generation can be used as child nodes of the 0th generation node, and the 1st generation node can be used as the parent node of the 2nd to 4th generation nodes.
  • the above-mentioned nodes in the 0th to 4th generations together form the tree of the neural network.
  • Each node is used to describe a computing layer (for example, a convolutional layer).
  • the information contained in each node and the meaning of the corresponding information are as follows:
  • node_header the header information of the node
  • the header information of the above node includes sequence, gen_id, and node_id, where sequence is the total sequence number of the node, gen_id represents the generation index number (the index number of the generation where the node is located), and node_id represents the node index number in the generation;
  • node_t node type, for example, the node type here can include Input/Eltwise/Concat/Conv/Pool/Relu/Prelu/innerproduct/GlobalPooling, etc.;
  • node_name the node name (of the node);
  • top (of the node) the node name of the top node, where the top node is a child node of the node;
  • bottom[] The node name of the bottom node (of the node), where the bottom node is the parent node of the node, and the number of bottom nodes is parent_num;
  • if_n/c/h/w[] The batch number, channel number, width and height of each input node (of the node), where the number of input nodes of the node is equal to parent_num;
  • Generation is used to organize at least one node. If a generation contains multiple nodes, nodes in the same generation cannot be connected to each other. The nodes in the current generation can only connect to nodes in the generation whose gen_id is less than the gen_id of the current generation ( That is to support cross-generation connection).
  • the information contained in the generation and the meaning of the corresponding information are as follows:
  • gen_id code index number
  • node_num the number of nodes contained in the generation, node_num is less than or equal to the maximum width of the neural network
  • nodes instances of nodes included in the generation
  • node_tq[] The type of each node included in the generation.
  • Trees are used to organize multiple generations and describe the connection relationships of all nodes in the network.
  • the information contained in the tree and the meaning of the corresponding information are as follows:
  • gen_num the algebra contained in the tree, gen_num is less than or equal to the maximum depth of the network
  • genes[] instances of the generation contained in the tree, the number of gens[] is equal to gen_num.
  • neural network structure introduced above in conjunction with FIG. 1 is only one possible structure of the neural network in the embodiment of the present application, and the neural network in the embodiment of the present application may also have other structures.
  • the specific structure and form of the network are not limited.
  • a possible structure of the neural network in the embodiment of the present application is briefly introduced above in conjunction with FIG. 1, and the detection method of the accelerator in the embodiment of the present application is described in detail below in conjunction with FIG. 2.
  • Fig. 2 is a schematic flowchart of an accelerator detection method according to an embodiment of the present application.
  • the method shown in Figure 2 can be executed by an electronic device or a server, where the electronic device can be a mobile terminal (for example, a smart phone), a computer, a personal digital assistant, a wearable device, a vehicle-mounted device, an Internet of Things device, etc., containing a processor equipment.
  • the method shown in FIG. 2 includes steps 110 to 140, which are respectively described in detail below.
  • the aforementioned at least one target neural network is multiple target neural networks.
  • the above-mentioned target neural network may be a convolutional neural network, or may be other than a convolutional neural network, or other types of neural networks other than a convolutional neural network, such as a feedforward neural network, a recurrent neural network, and so on.
  • step 120 is to load the aforementioned at least one target neural network into an accelerator or software model for execution. Before loading into the accelerator or software model, it is generally necessary to translate the aforementioned at least one target neural network into an accelerator or software model that can be executed. instruction.
  • the above-mentioned software model matched with the accelerator may be a software model for comparing the performance of the accelerator, and the software model may simulate the operation behavior of the accelerator.
  • the above neural network command is input to the accelerator to obtain the first output result
  • the above neural network command is input to the software model to obtain the second output result
  • the above neural network can be obtained by comparing the first output result and the second output result. The difference in the output result of the instruction.
  • step 140 when there is a discrepancy in the output result, the instruction of the accelerator corresponding to the output result can be regarded as an instruction that is abnormal during the operation of the accelerator.
  • the instruction that is abnormal during the operation of the accelerator it can be used for positioning Accelerator problems, further improvements or amendments to the design of the accelerator to improve the performance of the accelerator.
  • the performance of the accelerator can be effectively tested.
  • different neural networks can be used to detect the performance of the accelerator, and the performance of the accelerator can be better detected.
  • step 110 There are many ways to realize the generation of at least one target neural network in step 110.
  • the process of generating at least one target neural network in step 110 will be described in detail below with reference to FIG. 3.
  • Fig. 3 is a schematic diagram of a neural network generation process in an embodiment of the present application.
  • the process shown in FIG. 3 includes steps 210 to 230, which are respectively described in detail below.
  • the target neural network determined in the above step 210 may be any one of the at least one target neural network in the above step 110.
  • the generation number of the target neural network may be randomly determined first, and then the node type and the number of nodes of each generation of nodes may be randomly determined.
  • the algebra of the target neural network can be randomly determined to be 5 (the algebra of the neural network in Figure 1 is 5).
  • the algebra of the target neural network may be determined within a certain numerical range (for example, the depth range of the neural network).
  • the algebra of the target neural network can be randomly determined to be 12 within the range of values [10,20].
  • the node type of each generation node can be determined from all available node types. When determining the number of nodes in each generation, it can be within a certain range of values (for example, the width of the neural network). Range) to determine the number of nodes in each generation.
  • step 210 the algebra of the target neural network, as well as the node type and number of nodes of each generation can also be set according to specific (operation) requirements.
  • the node type of each generation can be determined according to the available node types of Input/Eltwise/Concat/Conv/Pool/Relu/Prelu/Innerproduct/GlobalPooling.
  • the number of nodes from generation 0 to generation 4 can be randomly determined to be 1, 3, 2, and 2, respectively. and 1.
  • the node type and number of nodes in each generation you can first determine the node type of each generation, or the number of nodes in each generation, and you can also determine the node type and number of nodes in each generation at the same time.
  • the application does not limit the order of determining the node type of each generation node and the number of nodes of each generation node).
  • the number of nodes in each generation can be greater than or equal to the number of node types in the generation (the number of node types in each generation is less than the number of nodes in the generation ).
  • the algebra of the target neural network determined in step 210 is 4 (including the 0th to 4th generations), and the number of nodes included in the 0th to 3rd generations is specifically as follows:
  • the number of nodes of the 0th generation node is 1;
  • the number of nodes of the first generation node is 3;
  • the number of nodes of the second generation node is 2;
  • the number of nodes of the 3rd generation node is 1.
  • the node types included in the 0th to 3rd generations are as follows:
  • the node type of the 0th generation node is Input
  • the node types of the first generation nodes include FC, Eltwise and GlobalPoolling;
  • the node types of the second generation nodes are Concat and FC;
  • the node type of the 3rd generation node is Eltwise.
  • the above-mentioned node connection requirements may be rules that can meet the normal use requirements of the neural network.
  • the node connection requirements may be preset. Specifically, the node connection requirements may be set through experience and the requirements of the neural network to be generated.
  • connection relationship between the nodes in the target neural network determined according to the node connection requirements in step 220 can be multiple, and after obtaining multiple connection relationships, one (arbitrarily) can be selected from the multiple connection relationships. This kind of connection is the final connection.
  • the foregoing node connection requirements may include at least one of the following conditions:
  • the node type of the first generation node is Input type
  • Table 1 shows the node types of the parent nodes that can be connected when the current node is of different node types, where Y indicates that it can be connected, and N indicates that it cannot be connected.
  • step 230 the validity of the multiple node connection relationships can be judged, and a valid node connection relationship can be selected from it before step 230 is performed.
  • FC type and GlobalPooling type nodes (including nodes immediately after the current node and nodes located after the current node in subsequent generations) cannot be connected to other types of nodes except FC, GlobalPooling and act types.
  • the node type of the node immediately following the FC type and GlobalPooling type node, and the node after the FC type and the GlobalPooling type node in the subsequent generation can only be the FC, GlobalPooling or act type.
  • the node type of node 6 is Eltwise, the number of input channels of node 6 is 1, and the number of input channels at both ends of node 6 meets the above condition (4), but for For node 11 of the same Eltwise type, the number of input channels on the left of node 11 is 2, the number of input channels on the right is 1, and the number of input channels on the left of node 11 is inconsistent with the number of input channels on the right, which does not meet the above requirements.
  • Condition (4) the number of input channels on the left of node 6 is 1
  • the number of input channels at both ends of node 6 meets the above condition (4)
  • connection relationship shown in FIG. 5 does not meet the above condition (4).
  • the connection relationship determined in step 220 include the invalid connection relationship shown in FIG. 5, the connection relationship needs to be excluded.
  • the node type of node 1 is FC, and the node type of node 2 is Relu. Since the node type of node 1 is FC, the only connection behind node 1 is FC and GlobalPooling. With the node of act, the connection relationship between node 1 and node 2 does not meet the above condition (5); in addition, the node type of node 3 is GlobalPooling, the node type of node 4 is Prelu, and node 3 can only be connected with node types of FC and GlobalPooling And the node of act, the connection relationship between node 3 and node 4 does not satisfy the above condition (5).
  • connection relationship shown in FIG. 6 does not meet the aforementioned condition (5).
  • the connection relationship determined in step 220 include the invalid connection relationship shown in FIG. 6, the connection relationship should be excluded.
  • step 220 when determining the parent node of a node, there may be multiple candidate nodes. At this time, as long as the above conditions (1) to (5) are met, they can be used as the candidate parent node of the current node (also It can be called the candidate node of the parent node), but which nodes are selected from the candidate parent nodes as the actual parent nodes of the current node can be determined according to the probability density function.
  • the above-mentioned probability density function may be a Gaussian function.
  • the Gaussian function as a whole meets the basic requirement that the closer the generation is selected, the higher the probability, specifically, the expected value of the Gaussian function may be the same as the generation index value -1 Keep consistent, the expected value of the Gaussian function does not affect the control of the network form.
  • the variance in the Gaussian function it is possible to control the shape of the Gaussian function, thereby controlling the probability of nodes in each generation being selected.
  • the greater the variance of the Gaussian function the greater the probability that the nodes in the neighboring generation will be selected, the deeper the depth, and the narrower the width of the network.
  • the target neural network can be constructed according to the connection relationship of each node, or the prototxt file (the file contains the connection relationship of each node in the target neural network) can be output, so that Then input the prototxt file to the configuration tool and translate it into neural network instructions for the accelerator to execute.
  • the prototxt file the file contains the connection relationship of each node in the target neural network
  • of_h represents the height of the node output feature map
  • if_h represents the height of the node input feature map
  • pad_h is the element of the input feature map filled in for the convenience of calculation
  • the number of rows, usually filled with 0, dilation_h represents the number of elements interpolated in the input feature map of the node (dilation_h is greater than 0), usually the interpolation value is 0,
  • kernel_h represents the size of the convolution kernel during convolution operation
  • stride_h represents The sliding step length of the convolution kernel or the pooling window in the height direction
  • Pool_size represents the size of the window during pooling processing.
  • Condition A The size of the feature map output by the parent node is equal to the size of the feature map input by the child node.
  • the size of the feature map output in the parent node must be the same as the size of the feature map input by the child node.
  • step 210 when the algebra of the target neural network determined in step 210, and the node type and number of nodes in each generation are as shown in Figure 4, on this basis, continue to perform step 220 to obtain the connection relationship between nodes as shown in Figure 4 7 and Figure 8.
  • the node connection relationship shown in Fig. 7 and Fig. 8 is analyzed. Through analysis, it is found that the connection relationship of the nodes shown in Fig. 7 and Fig. 8 all satisfy the condition (4), but in Fig. 7, the connection of the node 3 and the node 6 does not meet the above condition (5). In addition to the above condition (4), Fig. 8 also satisfies the condition (5). Therefore, it can be determined that the node connection relationship shown in Fig. 8 is a valid node connection relationship. Next, in step 230, you can The node connection relationship shown in 8 is used to construct a neural network.
  • Fig. 9 is a schematic diagram of a neural network generation process according to an embodiment of the present application.
  • the process shown in FIG. 9 can be performed by an electronic device (for the definition and explanation of the electronic device, please refer to the related content in the method shown in FIG. 2).
  • the process shown in FIG. 9 includes steps 1001 to 1011. Give a detailed description.
  • Step 1001 represents starting to generate a neural network.
  • a value can be randomly selected as the algebra of the neural network within a certain range of values.
  • the number of nodes of each generation can be randomly generated within a certain network width.
  • the width of the neural network cannot exceed 10, so you can choose a value between 1 and 10 as the number of nodes in each generation.
  • the node type of each node can be randomly generated from all available node types.
  • Step 1002 and step 1003 here are equivalent to step 210 above.
  • the relevant definitions and explanations of step 210 above are also applicable to step 1002 and step 1003. To avoid repetition, step 1002 and step 1003 are not described in detail here.
  • each node in each generation can be instantiated according to the node type of each generation and the number of nodes in each generation, that is, according to the node type of each generation node and the number of nodes in each generation The number determines the node instances in each generation, where one node can correspond to one instance or multiple instances.
  • the node here is more inclined to a logical concept, and the node instance is an entity that the node actually depends on, on which various data processing tasks of the node can be executed.
  • Configuring the header information (node_header) of each node is to generate the total sequence number (sequence) of each node, the generation index number (gen_id) and the node index number (node_id) in the generation.
  • the generation index number (gen_id) of each generation can be generated from top to bottom, and the total sequence number of each node (gen_id) can be generated from the 0th generation to the Nth generation (N is the number of the last generation of the neural network) sequence), in each generation, the node index number (node_id) of each node in the generation is generated in a certain order.
  • sequence represents the sequence number of the node in the entire neural network.
  • step 1006 the candidate node of the parent node of the current node is to be calculated, so that the parent node can be subsequently selected from the candidate node.
  • step 1006 you can start from the bottom layer, and select candidate parent nodes from previous generations for each node in each layer layer by layer.
  • the candidate parent node of the current node may not only come from the previous generation of the current node, but also from all generations before the current node.
  • the candidate parent node for each node you can select the candidate parent node according to certain node connection requirements (the node connection requirements can be one or more of the above conditions (1) to (3)) ,
  • the node that meets the node connection requirements in the previous generation is regarded as the candidate parent node of the current node.
  • node 5 and node 6 in the second generation can be selected as candidate parent nodes of node 7 in the third generation.
  • the probability density function can be used to calculate the probability that each node in the candidate parent node is the parent node of the current node, and the node with the probability greater than a certain value is taken as Candidate parent node of the current node.
  • the number of the above-mentioned candidate parent nodes may be multiple, and the number of parent nodes selected from the candidate parent nodes may be one or multiple.
  • the parent node selected from the candidate parent nodes is the actual parent node of the current node.
  • a node has 6 candidate parent nodes, calculated by the probability density function, the probability of these 6 candidate parent nodes being the candidate parent nodes of the current node is 70%, 60%, 65%, 45%, 40% and 30%. Then, the candidate parent nodes corresponding to the probabilities of 70%, 60%, and 65% can be determined as the actual parent nodes of the current node (one or more candidate parent nodes can be selected as the actual parent nodes of the current node).
  • only the candidate parent node with the highest corresponding probability may be used as the actual parent node of the current node (that is, the candidate parent node corresponding to a probability of 70% is used as the actual parent node of the current node).
  • the aforementioned probability density function may specifically be a Gaussian function.
  • step 1006 after the actual parent node of the current node is selected from the candidate parent nodes, if the number of actual parent nodes is multiple, then the parent node can be selected arbitrarily or randomly from the actual parent nodes for connection .
  • step 1008 it is necessary to determine whether the currently existing connection is valid.
  • each connection relationship can be judged according to the above conditions (4) and (5), and the conditions (4) and (5) are satisfied.
  • the connection relationship is a valid connection relationship, and a connection relationship that does not satisfy any one of the conditions (4) and (5) is an invalid connection relationship.
  • step 1009 is executed, and when it is determined that the connection is invalid, step 1006 is continued.
  • each node can be connected according to the effective connection relationship determined in step 1008.
  • step 1009a may be performed.
  • the intra-node parameters of each node can be determined according to the above formula (1), formula (2) and the constraints of condition A.
  • the prototxt file contains the connection relationship of each node in the neural network to be generated. After the prototxt file is generated, it is convenient to construct or generate the neural network according to the prototxt file.
  • Step 1011 represents the end of the neural network generation process.
  • the neural network generation method specifically includes: determining the algebra of the target neural network to be generated, and the node type and number of nodes of all generations of the target neural network ; Determine the target connection mode of all nodes in the target neural network according to the preset node connection requirements; generate the target neural network according to the target connection mode.
  • the target neural network can be finally generated, which can be more flexible and convenient.
  • Types of neural networks
  • the target neural network generated above can be used to process data. Therefore, this application can also protect a data processing method.
  • the method includes: determining the algebra of the target neural network to be generated and the nodes of all generations of the target neural network. Node type and number of nodes; determine the target connection method of connecting all nodes in the target neural network according to the preset node connection requirements; generate the target neural network according to the target connection method; use the target neural network for data processing.
  • the target neural network can be finally generated, which can be more flexible and convenient.
  • Various types of neural networks can be more targeted to use specific neural networks for data processing.
  • using the target neural network to perform data processing includes: obtaining input data; using the target neural network to perform data processing on the input data to obtain output data.
  • the aforementioned input data may be data that needs to be processed by a neural network, and further, the input data may be data that needs to be processed by a neural network in the field of artificial intelligence.
  • the aforementioned input data may be image data to be processed, and the aforementioned output data may be a classification result or a recognition result of the image.
  • the input data may also be voice data to be recognized, and the output result may be a voice recognition result.
  • the following describes the verification platform of the accelerator of the embodiment of the present application with reference to FIG. 10. It should be understood that the verification platform of the accelerator shown in FIG. 10 can execute each step of the accelerator detection method of the embodiment of the present application. Repeated description is omitted.
  • Fig. 10 is a schematic block diagram of a verification platform of an accelerator according to an embodiment of the present application.
  • the accelerator verification platform 2000 shown in FIG. 10 includes:
  • the memory 2001 is used to store codes
  • At least one processor 2002 is configured to execute the code stored in the memory to perform the following operations:
  • the instruction that is abnormal during the operation of the accelerator is determined.
  • FIG. 10 only shows one processor 2002.
  • the verification platform 2000 shown in FIG. 10 may include one or more processors 2002.
  • Fig. 11 is a schematic block diagram of an apparatus for generating a neural network according to an embodiment of the present application. It should be understood that the device 3000 shown in FIG. 11 can execute each step of the method for generating a neural network in the embodiment of the present application, and the device 3000 shown in FIG. 11 includes:
  • the memory 3001 is used to store codes
  • At least one processor 3002 is configured to execute codes stored in the memory to perform the following operations:
  • the instruction that is abnormal during the operation of the accelerator is determined.
  • FIG. 11 only shows one processor 3002.
  • the apparatus 3000 shown in FIG. 11 may include one or more processors 3002.
  • Fig. 12 is a schematic block diagram of a data processing device according to an embodiment of the present application. It should be understood that the device 4000 shown in FIG. 12 can execute each step of the data processing method of the embodiment of the present application, and the device 4000 shown in FIG. 12 includes:
  • the memory 4001 is used to store codes
  • At least one processor 4002 is configured to execute codes stored in the memory to perform the following operations:
  • the target neural network is used for data processing.
  • FIG. 12 shows only one processor 4002.
  • the apparatus 4000 shown in FIG. 12 may include one or more processors 4002.
  • the verification platform 2000, the device 3000, and the device 4000 of the aforementioned accelerator may specifically be an electronic device or a server, where the electronic device may be a mobile terminal (for example, a smart phone), a computer, a personal digital assistant, a wearable device, a vehicle-mounted device, and the Internet of Things A device that contains a processor, such as a device.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)), etc.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

一种加速器的检测方法和验证平台,该检测方法包括:生成至少一个目标神经网络(110);将所述至少一个目标神经网络翻译成神经网络指令(120);将所述神经网络指令分别输入到加速器以及与所述加速器匹配的软件模型中执行,并确定所述神经网络指令的输出结果的差异(130);根据所述神经网络指令的输出结果的差异确定加速器运行过程中出现异常的指令(140)。通过生成的至少一个目标神经网络,能够有效地对加速器进行性能检测。

Description

加速器的检测方法和验证平台
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。
技术领域
本申请涉及神经网络技术领域,并且更为具体地,涉及一种加速器的检测方法和验证平台。
背景技术
神经网络在生成之后,为了利用该神经网络进行数据处理,一般需要将神经网络加载到加速器上运行。而加速器性能的好坏可能会直接影响到后续利用神经网络进行数据处理的效果,因此,如何更好地对加速器进行性能检测是一个需要解决的问题。
发明内容
本申请提供一种加速器的检测方法、神经网络的生成方法、数据处理方法以及相关装置,以更好地进行加速器的检测。
第一方面,提供了一种加速器的检测方法,该方法包括:生成至少一个目标神经网络;将至少一个目标神经网络翻译成神经网络指令;将神经网络指令分别输入到加速器以及与加速器匹配的软件模型中执行,并确定神经网络指令的输出结果的差异;根据神经网络指令的输出结果的差异确定加速器运行过程中出现异常的指令。
本申请中,通过生成的至少一个目标神经网络,能够有效地对加速器进行性能检测。
可选地,生成至少一个目标神经网络,包括:生成多个目标神经网络。
当生成了多个目标神经网络时,能够采用不同的神经网络对加速器的性能进行检测,进而能够更好地实现对加速器的性能检测。
可选地,上述目标神经网络为卷积神经网络。
应理解,本申请中生成的目标神经网络除了可以是卷积神经网络之外,也可以是卷积神经网络之外的其他类型的神经网络,例如,前馈神经网络,递归神经网络等等。
在第一方面的某些实现方式中,生成至少一个目标神经网络,包括:确定目标神经网络的代数,以及目标神经网络所有代的节点的节点类型和节点个数,其中,该目标神经网络为上述至少一个目标神经网络中的任意一个目标神经网络;根据预设的节点连接要求确定连接该目标神经网络中所有节点的目标连接方式;根据目标连接方式生成目标神经网络。
本申请中,通过先确定待生成的目标神经网络的代数,节点个数和节点类型,再结合预设的节点连接要求生成目标连接方式,能够最终生成目标神经网络,可以更加灵活方便地生成多种类型的神经网络。进一步的,当生成了多种类型的神经网络之后,能够更好地对加速器进行性能检测。
在第一方面的某些实现方式中,根据预设的节点连接要求确定连接目标神经网络中所有节点的目标连接方式,包括:根据节点连接要求确定当前节点的候选父节点,其中,当前节点和候选父节点满足节点连接要求;从候选父节点中选择出当前节点的实际父节点;确定当前节点与当前节点的实际父节点之间的连接关系,以最终生成目标连接方式。
上述候选父节点也可以称为父节点的候选节点。
在第一方面的某些实现方式中,根据节点连接要求确定当前节点的候选父节点,包括:根据以下连接关系中的至少一种,确定当前节点的候选父节点;在当前节点的节点类型为Concat或Eltwise时,当前节点的父节点个数为多个,且当前节点的父节点个数小于或者等于当前节点的候选父节点个数;在当前节点的父节点的节点类型为Active时,当前节点的节点类型为Active之外的类型;在当前节点的父节点的节点类型为Global Pooling时,当前节点的节点类型为Global Pooling;在当前节点的父节点的节点类型为FC时,当前节点的节点类型为FC或者Concat;在当前节点的父节点的节点类型为Conv、Eltwise、Pooling以及Concat时,当前节点的节点类型可以为Conv、Eltwise、Pooling、Active、Global Pooling、Concat以及FC中的任意一种。
在第一方面的某些实现方式中,从候选父节点中选择出当前节点的实际父节点,包括:根据概率密度函数确定候选父节点中的每个节点作为当前节点的实际父节点的概率;根据候选父节点中的每个节点作为当前节点的实际 父节点的概率从候选父节点中确定出当前节点的实际父节点。
在第一方面的某些实现方式中,根据候选父节点中的每个节点作为当前节点的实际父节点的概率从候选父节点中确定出当前节点的实际父节点,包括:将候选父节点中作为当前节点的实际父节点的概率大于预设概率值的节点确定为当前节点的实际父节点。
在第一方面的某些实现方式中,上述方法还包括:根据概率密度函数的期望和方差,调整候选父节点中的每个节点作为当前节点的实际父节点的概率。
通过调整概率密度函数的期望和方差,能够调整目标神经网络的宽度和深度,从而能够生成深度和宽度满足要求的目标神经网络。
具体地,可以根据待生成的目标神经网络的深度和宽度的要求来调整概率密度函数的期望和方差。
一般来说,概率密度函数的方差越大,邻近代中的节点被选中的概率越大,网络的宽度会变得越窄,深度会变得越深。
在第一方面的某些实现方式中,上述概率密度函数为高斯函数。
在第一方面的某些实现方式中,上述根据目标连接方式生成目标神经网络,包括:根据预设的节点有效连接关系,从目标连接关系中确定出有效目标连接关系;根据有效目标连接关系生成目标神经网络。
在第一方面的某些实现方式中,上述节点有效连接关系包括下列关系中的至少一种:在当前节点的节点类型为Eltwise时,当前节点的多个输入的通道数保持一致;当前节点的节点类型为FC或者GlobalPooling时,当前节点的之后只能连接FC、GlobalPooling和act类型之外的节点。
在第一方面的某些实现方式中,确定待生成的目标神经网络的代数,以及目标神经网络所有代的节点的节点类型和节点个数,包括:根据对目标神经网络的运算要求确定目标神经网络的代数,以及目标神经网络所有代的节点的节点类型和节点个数。
上述对目标神经网络的运算需求可以为运算量(大小)的需求,当运算量需求较小时,可以为目标神经网络设置较少的代数、每代也可以设置较少的节点个数;而当运算量需求较大时可以为目标神经网络设置较多的代数,每代也可以设置较多的节点个数。
上述对目标神经网络的运算需求可以为运算的复杂度,当运算复杂度较 低时,可以为目标神经网络设置较少的代数、每代也可以设置较少的节点个数;当运算复杂度较高时,可以为目标神经网络设置较多的代数,每代也可以设置较多的节点个数。
第二方面,提供一种神经网络的生成方法,该方法包括:确定待生成的目标神经网络的代数,以及目标神经网络所有代的节点的节点类型和节点个数;根据预设的节点连接要求确定连接目标神经网络中所有节点的目标连接方式;根据目标连接方式生成目标神经网络。
本申请中,通过先确定待生成的目标神经网络的代数,节点个数和节点类型,再结合预设的节点连接要求生成目标连接方式,能够最终生成目标神经网络,可以更加灵活方便地生成多种类型的神经网络。
第三方面,提供一种数据处理方法,该方法包括:确定待生成的目标神经网络的代数,以及目标神经网络所有代的节点的节点类型和节点个数;根据预设的节点连接要求确定连接目标神经网络中所有节点的目标连接方式;根据目标连接方式生成目标神经网络;采用目标神经网络进行数据处理。
本申请中,通过先确定待生成的目标神经网络的代数,节点个数和节点类型,再结合预设的节点连接要求生成目标连接方式,能够最终生成目标神经网络,可以更加灵活方便地生成多种类型的神经网络,进而能够更有针对性的采用特定的神经网络对相应的数据进行数据处理。
应理解,本申请第二方面和第三方面中生成的目标神经网络的具体方式以及对相关信息的限定和解释可以参见上述第一方面中的相关内容。
第四方面,提供一种加速器的验证平台,该验证平台包括:存储器,用于存储代码;至少一个处理器,用于执行存储器中存储的代码,以执行如下操作:生成至少一个目标神经网络;将至少一个目标神经网络翻译成神经网络指令;将神经网络指令分别输入到加速器以及与加速器匹配的软件模型中执行,并确定神经网络指令的输出结果的差异;根据神经网络指令的输出结果的差异确定加速器运行过程中出现异常的指令。
第五方面,提供一种神经网络的生成装置,包括:存储器,用于存储代码;至少一个处理器,用于执行所述存储器中存储的代码,以执行如下操作:确定待生成的目标神经网络的代数,以及各代节点的节点类型和节点个数;根据预设的节点连接要求确定连接所述目标神经网络中各个节点的目标连接方式;根据所述目标连接方式生成所述目标神经网络。
第六方面,提供一种数据处理装置,其特征在于,包括:存储器,用于存储代码;至少一个处理器,用于执行所述存储器中存储的代码,以执行如下操作:确定待生成的目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数;根据预设的节点连接要求确定连接所述目标神经网络中所有节点的目标连接方式;根据所述目标连接方式生成所述目标神经网络;采用所述目标神经网络进行数据处理。
第七方面,提供一种计算机可读存储介质,其上存储有用于执行第一方面、第二方面和第三方面中的任意一种方法的指令。
第八方面,提供一种计算机程序产品,包含用于执行第一方面、第二方面和第三方面中的任意一种方法的指令。
附图说明
图1是神经网络结构的示意图;
图2是本申请实施例的加速器的检测方法的示意性流程图;
图3是本申请实施例的神经网络的生成过程的示意图;
图4是确定的目标神经网络的代数、各代的节点个数和节点类型的示意图;
图5是神经网络的一种可能的节点连接关系的示意图;
图6是神经网络的一种可能的节点连接关系的示意图;
图7是神经网络的一种可能的节点连接关系的示意图;
图8是神经网络的一种可能的节点连接关系的示意图;
图9是本申请实施例的神经网络的生成过程的示意图;
图10是本申请实施例的加速器的验证平台的示意性框图;
图11是本申请实施例的生成神经网络的装置的示意性框图;
图12是本申请实施例的数据处理装置的示意性框图。
具体实施方式
下面结合附图对本申请实施例进行详细的描述。
为了更好地理解本申请实施例,下面先结合图1对本申请实施例中的神经网络的结构以及神经网络的相关信息进行描述。
图1是神经网络结构的示意图。
应理解,图1中的神经网络既可以是卷积神经网络,也可以是其它类型的神经网络,本申请对此不做限制。
在图1中,神经网络的结构主要包括三部分:节点(node)、代(generation)和树(tree)。
在图1中,神经网络包括节点1至节点9,这些节点共同组成了第0代至第4代的节点,每代包含的节点如下:
第0代:节点1;
第1代:节点2、节点3、节点4;
第2代:节点5、节点6;
第3代:节点7、节点8;
第4代:节点1。
如图1所示,前面代的节点可以作为后面代的节点的父节点,后面代的节点可以作为前面代的节点的子节点。例如,第1代至第4代的节点可以作为第0代节点的子节点,第1代节点可以作为第2代至第4代节点的父节点。
如图1所示,上述第0代至第4代中的节点共同构成了神经网络的树。
下面对节点、代和树的相关信息进行详细介绍。
每个节点用于描述一个计算层(例如,卷积层),每个节点包含的信息以及相应信息的含义具体如下:
node_header:节点的头信息;
上述节点的头信息包括sequence、gen_id和node_id,其中,sequence为节点总序列号,gen_id表示代索引号(该节点所处的代的索引号),node_id表示代中的节点索引号;
parent_num:(该节点的)父节点个数,对于Concat/Eltwise类型的节点来说,parent_num≥2,对于其它类型的节点来说,parent_num=1;
parents[]:(该节点的)父节点,(该节点的)父节点个数等于parent_num;
node_t:节点类型,例如,这里的节点类型可以包括Input/Eltwise/Concat/Conv/Pool/Relu/Prelu/innerproduct/GlobalPooling等;
node_name:(该节点的)节点名称;
top:(该节点的)top节点的节点名称,其中,top节点为该节点的子节点;
bottom[]:(该节点的)bottom节点的节点名称,其中,bottom节点为 该节点的父节点,bottom节点的个数为parent_num;
if_n/c/h/w[]:(该节点的)各输入节点的batch数、通道数、宽和高,其中,该节点的输入节点个数等于parent_num;
of_n/c/h/w:该节点的输出节点batch数、通道数、宽和高。
代(generation)用于组织至少一个节点,如果一个代中包含多个节点,同代中的各个节点不能相互连接,当前代中的节点只能连接gen_id小于当前代的gen_id的代中的节点(即支持跨代连接)。代中包含的信息以及相应信息的含义如下:
gen_id:代索引号;
node_num:代中包含的节点个数,node_num小于或者等于神经网络的最大宽度;
nodes:代中包含的节点的实例;
node_tq[]:代中包含的各节点的类型。
树(tree)用于组织多个代,并描述网络中所有节点的连接关系。树中包含的信息以及相应信息的含义如下:
gen_num:树中包含的代数,gen_num小于或者等于网络的最大深度;
gens[]:树中包含的代的实例,gens[]的个数等于gen_num。
应理解,上文结合图1介绍的神经网络结构只是本申请实施例中的神经网络的一种可能的结构,本申请实施例的神经网络还可以是其它结构,本申请对本申请涉及到的神经网络的具体结构和形式不做限定。
上文结合图1对本申请实施例中的神经网络的一种可能的结构进行了简单的介绍,下面结合图2对本申请实施例的加速器的检测方法进行详细介绍。
图2是本申请实施例的加速器的检测方法的示意性流程图。图2所示的方法可以由电子设备或者服务器执行,这里的电子设备可以是移动终端(例如,智能手机),电脑,个人数字助理,可穿戴设备,车载设备,物联网设备等包含处理器的设备。图2所示的方法包括步骤110至140,下面分别对这些步骤进行详细的描述。
110、生成至少一个目标神经网络。
可选地,上述至少一个目标神经网络为多个目标神经网络。
上述目标神经网络可以是卷积神经网络,也可以是卷积神经网络之外,也可以是卷积神经网络之外的其他类型的神经网络,例如,前馈神经网络, 递归神经网络等等。
120、将至少一个目标神经网络翻译成神经网络指令。
应理解,步骤120是为了将上述至少一个目标神经网络加载到加速器或者软件模型中执行,在加载到加速器或者软件模型之前,一般需要将上述至少一个目标神经网络翻译成加速器或者软件模型能够执行的指令。
130、将神经网络指令分别输入到加速器以及与加速器匹配的软件模型中执行,并确定神经网络指令的输出结果的差异。
应理解,上述与加速器匹配的软件模型可以是用于对比加速器性能的软件模型,该软件模型可以模拟加速器的运算行为。
假设上述神经网络指令输入到加速器得到的是第一输出结果,上述神经网络指令输入到软件模型得到的是第二输出结果,通过比较第一输出结果和第二输出结果就能获取到上述神经网络指令的输出结果的差异。
140、根据神经网络指令的输出结果的差异确定加速器运行过程中出现异常的指令。
在步骤140中,当输出结果出现差异时,与该输出结果相对应的加速器的指令就可以认定为加速器运行过程中出现异常的指令,通过确定加速器运行过程中出现异常的指令,能够用于定位加速器的问题,进一步的改进或者修正加速器的设计,从而提高加速器的性能。
本申请中,通过生成的至少一个目标神经网络,能够有效地对加速器进行性能检测。
进一步的,当生成了多个目标神经网络时,能够采用不同的神经网络对加速器的性能进行检测,进而能够更好地实现对加速器的性能检测。
上述步骤110中生成至少一个目标神经网络的实现方式有多种,下面结合图3对步骤110中生成至少一个目标神经网络的过程进行详细的介绍。
图3是本申请实施例的神经网络的生成过程的示意图。
图3所示的过程包括步骤210至230,下面分别对这些步骤进行详细的描述。
210、确定目标神经网络的代数,以及目标神经网络所有代的节点的节点类型和节点个数。
其中,上述步骤210中确定的目标神经网络可以是上述步骤110中的至少一个目标神经网络中的任意一个目标神经网络。
具体地,在步骤210中,可以先随机确定目标神经网络的代数,然后再随机确定每一代的节点的节点类型和节点个数。
例如,如图1所示,可以随机确定目标神经网络的代数为5(图1中的神经网络的代数为5)。
另外,在步骤210中,可以在一定的数值范围(例如,神经网络的深度范围)内来确定目标神经网络的代数。例如,可以在数值[10,20]范围内随机确定目标神经网络的代数为12。
在确定了目标神经网络的代数之后,可以从所有可用的节点类型中确定出各代节点的节点类型,在确定各代的节点个数时,可以在一定的数值范围(例如,神经网络的宽度范围)内来确定各代的节点个数。
应理解,在步骤210中,也可以根据具体的(运算)需求来设置目标神经网络的代数、以及各代的节点类型和节点个数。
例如,如果采用神经网络做一些简单的运算,那么,可以为目标神经网络设置较少的代数、每代也可以设置较少的节点个数,而如果要采用神经网络做一些非常复杂的运算,那么,可以为目标神经网络设置较多的代数,每代也可以设置较多的节点个数。
可选地,可根据Input/Eltwise/Concat/Conv/Pool/Relu/Prelu/Innerproduct/GlobalPooling这些可用的节点类型来确定各代的节点类型。
例如,以图1中所示的神经网络为例,在随机确定了目标神经网络的代数为5之后,可以随机确定第0代至第4代的节点个数分别为1、3、2、2和1。
在确定各代的节点类型和节点个数的时候,既可以先确定各代的节点类型,也可以先确定各代的节点个数,还可以同时确定各代的节点类型和节点个数(本申请不限定确定各代节点的节点类型和各代节点的节点个数的先后顺序)。
应理解,在确定各代的节点类型和节点个数时,每一代的节点个数可以大于或者等于该代的节点类型的个数(每一代的节点类型的个数小于该代的节点个数)。
下面结合附图对步骤210说明确定的目标神经网络的代数,以及各代的节点类型和节点个数进行说明。
例如,如图4所示,步骤210确定的目标神经网络的代数为4(包括第 0代至第4代),第0代至第3代包含的节点个数具体如下:
第0代节点的节点个数为1;
第1代节点的节点个数为3;
第2代节点的节点个数为2;
第3代节点的节点个数为1。
第0代至第3代包含的节点类型具体如下:
第0代节点的节点类型为Input;
第1代节点的节点类型包括FC、Eltwise和GlobalPoolling;
第2代节点的节点类型为Concat和FC;
第3代节点的节点类型为Eltwise。
220、根据预设的节点连接要求确定目标神经网络中各个节点的连接关系。
上述节点连接要求可以是能够满足神经网络正常使用要求的规则,该节点连接要求可以是预先设置好的,具体地,可以通过经验和要生成的神经网络的需求来设定节点连接要求。
应理解,在步骤220中根据节点连接要求确定的目标神经网络中各个节点之间的连接关系可以有多种,在获取了多种连接关系之后可以从该多种连接关系中(任意)选择一种连接关系作为最终的连接关系。
可选地,上述节点连接要求可以包括下列条件中的至少一种:
(1)第一代节点的节点类型为输入(Input)类型;
(2)当前节点的节点类型为Concat或者Eltwise时,当前节点的父节点个数小于或者等于该父节点的候选节点个数;
(3)当前节点的节点类型与当前节点的父节点之间的连接满足表1所示的关系。
表1
Figure PCTCN2019083225-appb-000001
Figure PCTCN2019083225-appb-000002
表1示出了当前节点为不同的节点类型时能够连接的父节点的节点类型,其中,Y表示可以连接,N表示不能连接。
应理解,在上述步骤220中可以得到多种节点连接关系,在执行步骤230之前,可以对该多种节点连接关系的有效性进行判断,从中选择出有效的节点连接关系之后再执行步骤230。
具体地,在检查多种节点连接关系的有效性时,可以判断这些节点连接关系是否满足下面的条件(4)和条件(5),并将这些节点连接关系中满足条件(4)和条件(5)的节点连接关系确定为有效的节点连接关系,并根据这些有效的节点连接关系执行步骤230。
(4)Eltwise类型的节点多个输入的通道数要保持一致;
(5)FC类型和GlobalPooling类型的节点之后(包括紧跟着当前节点之后的节点,以及后面代中位于当前节点之后的节点)不能连接FC、GlobalPooling和act类型之外的其它类型节点。
具体地,FC类型和GlobalPooling类型的节点之后紧跟着的后面的节点,以及后面代中位于FC类型和GlobalPooling类型的节点之后的节点的节点类型只能是FC、GlobalPooling或act类型。
例如,在图4所示的神经网络结构中,节点6的节点类型为Eltwise,节点6两个输入的通道数均为1,节点6两端的输入通道数满足上述条件(4),但是,对于同为Eltwise类型的节点11来说,节点11左侧的输入通道数为2,右侧的输入通道数为1,节点11左侧的输入通道数和右侧的输入通道数不一致,不满足上述条件(4)。
因此,图5所示的连接关系不符合上述条件(4),当步骤220中确定出的多种节点连接关系包含图5所示的无效连接关系时,需要将该连接关系排除掉。
再如,在图6所示的神经网络中,节点1的节点类型为FC,节点2的节点类型为Relu,由于节点1的节点类型为FC,节点1后面只能连接节点 类型为FC、GlobalPooling和act的节点,节点1与节点2的连接关系不满足上述条件(5);另外,节点3的节点类型为GlobalPooling,节点4的节点类型为Prelu,节点3只能连接节点类型为FC、GlobalPooling和act的节点,节点3与节点4的连接关系不满足上述条件(5)。
因此,图6所示的连接关系不符合上述条件(5),当步骤220中确定出的多种节点连接关系包含图6所示的无效连接关系时,要将该连接关系排除掉。
另外,在步骤220中,在确定一个节点的父节点时,可能会存在多个候选节点,这个时候,只要满足上述条件(1)至条件(5)均可以作为当前节点的候选父节点(也可以称为父节点的候选节点),但是具体从候选父节点中选择哪些节点作为当前节点的实际父节点可以根据概率密度函数来确定。
可选地,上述概率密度函数可以是高斯函数(gaussian function),由于高斯函数整体符合越接近的代被选中的概率越高的基本要求,具体地,高斯函数的期望值可以与代索引值-1保持一致,高斯函数的期望值不影响网络形态的控制。通过调整高斯函数中的方差,可以实现对高斯函数形态的控制,从而控制各代中的节点被选中的概率。一般来说,高斯函数的方差越大,邻近代中的节点被选中的概率越大,深度会变得越深,网络的宽度会变得越窄。
230、根据目标连接关系生成目标神经网络。
在步骤230中,在确定了各个节点的连接关系之后,就可以根据各个节点的连接关系来构造目标神经网络,或者输出prototxt文件(该文件中包含目标神经网络中各个节点的连接关系),以便后续根据该prototxt文件输入给配置工具翻译成神经网络指令供加速器执行。
应理解,在根据连接关系生成目标神经网络时,还需要确定各个节点的节点内参数,其中,各个节点的节点内参数类型、节点内参数的个数以及节点内参数与节点类型相关。
例如,对于Conv类型的节点来说,of_h需满足公式(1)。
of_h=(if_h[0]+2×pad_h–(dilation_h×(kernel_h-1)+1))/stride_h+1 (1)
而对于Pool类型的节点来说,of_h需满足公式(2)。
of_h=(if_h[0]+2×pad_h–Pool_size)/stride_h+1 (2)
其中,在上述公式(1)和公式(2)中,of_h表示节点输出特征图的高,if_h表示节点输入特征图的高,pad_h是为了便于计算而在节点的输入特征 图上填充的元素的行数,通常都是填充0,dilation_h表示在节点的输入特征图中间插值的元素的个数(dilation_h大于0),通常插值为0,kernel_h表示进行卷积操作时卷积核的大小,stride_h表示卷积核或池化窗口在高度方向滑动的步长,Pool_size表示进行池化处理时的窗口的大小。
对于Concat类型的节点来说,of_c等于各个if_c的总和,对于Eltwise类型的节点来说,of_c应与每个if_c的大小保持一致。
另外,在确定各个节点的节点内参数时,还需要满足下面的条件A。
条件A:父节点输出的特征图的大小与子节点的输入的特征图的大小相等。
由于父节点的输出的特征图的就是子节点的输入的特征图,因此,父节点内输出的特征图的大小要与子节点输入的特征图的大小一致。
下面结合图7和图8对步骤230中根据确定的连接关系生成目标神经网络进行说明。
例如,当步骤210中确定出来的目标神经网络的代数,以及各代的节点类型和节点个数如图4所示的情况时,在此基础上,继续执行步骤220得到的节点连接关系如图7和图8所示。
接下来,根据上述条件(4)和(5)对图7和图8所示的节点连接关系进行分析。通过分析得知,图7和图8所示节点连接关系均满足条件(4),但是,在图7中,节点3与节点6连接不符合上述条件(5)。而图8除了满足上述条件(4)之外,还满足条件(5),因此,可以确定图8所示的节点连接关系是有效的节点连接关系,接下来,在步骤230中就可以根据图8所示的节点连接关系来构造神经网络了。
为了更好地理解本申请实施例的神经网络的生成方法的流程,下面结合图9对申请实施例的神经网络的生成过程的具体执行流程进行详细的介绍。
图9是本申请实施例的神经网络的生成过程的示意图。图9所示的过程可以由电子设备(该电子设备的限定和解释可参见图2所示的方法中的相关内容)执行,图9所示的过程包括步骤1001至1011,下面分别对这些步骤进行详细的描述。
1001、开始。
步骤1001表示开始生成神经网络。
1002、随机生成神经网络的代数。
应理解,在步骤1002,可以在一定的数值范围内随机选择一个数值作为神经网络的代数。
1003、随机生成各代节点的个数和各代节点的节点类型。
在步骤1003中,可以在一定的网络宽度的范围内随机生成各代节点的个数。例如,神经网络的宽度不能超过10,那么,可以分别在1到10之间任意选择一个数值作为各个代的节点的个数。
而在随机生成各个节点的节点类型时,可以从所有可用的节点类型中随机生成各个节点的节点类型。
这里的步骤1002和步骤1003相当于上文中的步骤210,上文中对步骤210的相关限定和解释同样适用于步骤1002和步骤1003,为了避免重复,这里不再详细描述步骤1002和步骤1003。
1004、根据节点类型例化各个节点。
具体地,在步骤1004中,可以根据各代的节点类型和各代的节点个数,例化各代中的各个节点,也就是说,要根据各代节点的节点类型和各代的节点个数确定各代中的节点实例,其中,一个节点可以对应一个实例,也可以对多个实例。
应理解,这里的节点更偏向于逻辑上的一个概念,而节点实例则是节点实际依托的一个实体,在该实体上能够执行该节点的各种数据处理任务。
1005、配置各个节点的头信息和父节点个数。
配置各个节点的头信息(node_header)也就是要生成各个节点的节点总序列号(sequence),代索引号(gen_id)和代中的节点索引号(node_id)。
例如,可以按照从上到下的顺序生成各代的代索引号(gen_id),按照第0代到第N(N为神经网络的最后一代的编号)代的顺序生成各个节点的总序列号(sequence),在每代中再按照一定的顺序生成各个节点在代中的节点索引号(node_id)。
其中,sequence表示整个神经网络中的节点的序列号。
1006、计算各个节点的父节点的候选节点。
具体地,在步骤1006中,要计算当前节点的父亲节点的候选节点,以便于后续从该候选节点中选择出父节点。
在步骤1006中,可以最底层开始,逐层为每一层中的每个节点从前面的代中选择出候选父节点。
应理解,在为当前节点选择候选父节点时,当前节点的候选父节点不仅可以来自于当前节点的上一代,也可以来源于当前节点之前的所有代。
在为每一个节点确定候选父节点时,可以按照一定的节点连接要求(该节点连接要求可以是上文中的条件(1)至条件(3)中的一种或者多种)来选择候选父节点,将上一代中满足节点连接要求的节点作为当前节点的候选父节点。
例如,如图4所示,可以选择第2代中的节点5和节点6作为第3代中的节点7的候选父节点。
另外,在步骤1006中,当确定了节点的候选父节点之后,可以采用概率密度函数来计算候选父节点中的每个节点作为当前节的父节点的概率,并将概率大于一定数值的节点作为当前节点的候选父节点。
应理解,上述候选父节点的个数可以是多个,从候选父节点中选出的父节点的个数可以是一个也可以是多个。另外,从候选父节点中选择出来的父节点是当前节点的实际父节点。
例如,某个节点有6个候选父节点,通过概率密度函数计算,这6个候选父节点作为当前节点的候选父节点的概率分别为70%、60%、65%、45%、40%和30%。那么,可以将概率分别为70%、60%、65%对应的候选父节点确定为当前节点的实际父节点(可以选择一个或者多个候选父节点作为当前节点的实际父节点)。
在上述例子中,也可以只将对应概率最大的候选父节点作为当前节点的实际父节点(也就是将概率为70%对应的候选父节点作为当前节点的实际父节点)。
上述概率密度函数具体可以是高斯函数。
1007、随机挑选当前节点的实际父节点进行连接。
在上述步骤1006中,当从候选父节点中选择出当前节点的实际父节点之后,如果实际父节点的数量为多个,那么,就可以从实际父节点中任意或者随机选择父节点进行连接了。
1008、确定当前连接是否有效。
在步骤1008中,要确定当前已经存在的连接是否有效,在具体执行时,可以根据上述条件(4)和条件(5)对每一个连接关系进行判断,满足条件(4)和(5)的连接关系为有效连接关系,不满足条件(4)和条件(5)中 的任意一个条件的连接关系为无效连接关系。
当确定连接有效时,执行步骤1009,当确定连接无效时,继续执行步骤1006。
1009、连接各个节点。
在步骤1009中,可以根据步骤1008中确定出来的有效连接关系对各个节点进行连接。
应理解,在步骤1009之后,还可以执行步骤1009a。
1009a、确定各个节点的节点内参数。
确定各个节点的节点内参数时,可以根据上述公式(1),公式(2)以及条件A的约束来确定各个节点的节点内参数。
1010、打印prototxt文件。
prototxt文件中包含要生成的神经网络中各个节点的连接关系,生成该prototxt文件之后,便于后续根据该prototxt文件构建或者生成神经网络。
1011、结束。
步骤1011表示神经网络的生成过程结束。
上文结合图1至图9对本申请实施例的加速器的检测方法进行了详细的描述。
事实上,本申请还可以保护一种神经网络的生成方法,该神经网络的生成方法具体包括:确定待生成的目标神经网络的代数,以及目标神经网络所有代的节点的节点类型和节点个数;根据预设的节点连接要求确定连接目标神经网络中所有节点的目标连接方式;根据目标连接方式生成目标神经网络。
本申请中,通过先确定待生成的目标神经网络的代数,节点个数和节点类型,再结合预设的节点连接要求生成目标连接方式,能够最终生成目标神经网络,可以更加灵活方便地生成多种类型的神经网络。
上述生成的目标神经网络可以用于对数据进行处理,因此,本申请还可以保护一种数据处理方法,该方法包括:确定待生成的目标神经网络的代数,以及目标神经网络所有代的节点的节点类型和节点个数;根据预设的节点连接要求确定连接目标神经网络中所有节点的目标连接方式;根据目标连接方式生成目标神经网络;采用目标神经网络进行数据处理。
本申请中,通过先确定待生成的目标神经网络的代数,节点个数和节点类型,再结合预设的节点连接要求生成目标连接方式,能够最终生成目标神 经网络,可以更加灵活方便地生成多种类型的神经网络,进而能够更有针对性的采用特定的神经网络对相应的数据进行数据处理。
可选地,上述采用目标神经网络进行数据处理,包括:获取输入数据;采用目标神经网络对输入数据进行数据处理,得到输出数据。
上述输入数据可以是需要采用神经网络进行处理的数据,进一步的,该输入数据可以是人工智能领域内需要采用神经网络进行处理的数据。
例如,上述输入数据可以是待处理的图像数据,上述输出数据可以是图像的分类结果或者识别结果。再如,上述输入数据也可以是待识别的语音数据,上述输出结果可以是语音识别结果。
应理解,上述神经网络的生成方法和数据处理方法中的神经网络的生成的具体方式以及对相关信息的限定和解释可以参见上文中神经网络的生成过程的相关内容(例如,图2所示的相关内容)。
下面结合图10对本申请实施例的加速器的验证平台进行描述,应理解,图10所示的加速器的验证平台能够执行本申请实施例的加速器的检测方法的各个步骤,下面在介绍图10时适当省略重复的描述。
图10是本申请实施例的加速器的验证平台的示意性框图。图10所示的加速器的验证平台2000包括:
存储器2001,用于存储代码;
至少一个处理器2002,用于执行所述存储器中存储的代码,以执行如下操作:
生成至少一个目标神经网络;
将所述至少一个目标神经网络翻译成神经网络指令;
将所述神经网络指令分别输入到加速器以及与所述加速器匹配的软件模型中执行,并确定所述神经网络指令的输出结果的差异;
根据所述神经网络指令的输出结果的差异确定加速器运行过程中出现异常的指令。
应理解,图10中为了方便表示,仅示出了一个处理器2002,事实上,图10所示的验证平台2000可以包含一个或者多个处理器2002。
图11是本申请实施例的生成神经网络的装置的示意性框图。应理解,图11所示的装置3000能够执行本申请实施例的生成神经网络的方法各个步骤,图11所示的装置3000包括:
存储器3001,用于存储代码;
至少一个处理器3002,用于执行所述存储器中存储的代码,以执行如下操作:
生成至少一个目标神经网络;
将所述至少一个目标神经网络翻译成神经网络指令;
将所述神经网络指令分别输入到加速器以及与所述加速器匹配的软件模型中执行,并确定所述神经网络指令的输出结果的差异;
根据所述神经网络指令的输出结果的差异确定加速器运行过程中出现异常的指令。
应理解,图11中为了方便表示,仅示出了一个处理器3002,事实上,图11所示的装置3000可以包含一个或者多个处理器3002。
图12是本申请实施例的数据处理装置的示意性框图。应理解,图12所示的装置4000能够执行本申请实施例的数据处理方法的各个步骤,图12所示的装置4000包括:
存储器4001,用于存储代码;
至少一个处理器4002,用于执行所述存储器中存储的代码,以执行如下操作:
确定待生成的目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数;
根据预设的节点连接要求确定连接所述目标神经网络中所有节点的目标连接方式;
根据所述目标连接方式生成所述目标神经网络;
采用所述目标神经网络进行数据处理。
应理解,图12中为了方便表示,仅示出了一个处理器4002,事实上,图12所示的装置4000可以包含一个或者多个处理器4002。
上述加速器的验证平台2000、装置3000以及装置4000具体可以是电子设备或者服务器,这里的电子设备可以是移动终端(例如,智能手机),电脑,个人数字助理,可穿戴设备,车载设备,物联网设备等包含处理器的设备。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的 形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单 元中。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (44)

  1. 一种加速器的检测方法,其特征在于,包括:
    生成至少一个目标神经网络;
    将所述至少一个目标神经网络翻译成神经网络指令;
    将所述神经网络指令分别输入到加速器以及与所述加速器匹配的软件模型中执行,并确定所述神经网络指令的输出结果的差异;
    根据所述神经网络指令的输出结果的差异确定加速器运行过程中出现异常的指令。
  2. 如权利要求1所述的方法,其特征在于,所述生成至少一个目标神经网络,包括:
    确定所述目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数,其中,所述目标神经网络为所述至少一个目标神经网络中的任意一个目标神经网络;
    根据预设的节点连接要求确定连接所述目标神经网络中所有节点的目标连接方式;
    根据所述目标连接方式生成所述目标神经网络。
  3. 如权利要求2所述的方法,其特征在于,所述根据预设的节点连接要求确定连接所述目标神经网络中所有节点的目标连接方式,包括:
    根据所述节点连接要求确定所述当前节点的候选父节点,其中,所述当前节点和所述候选父节点满足所述节点连接要求;
    从所述候选父节点中选择出所述当前节点的实际父节点;
    确定所述当前节点与所述当前节点的实际父节点之间的连接关系,以最终生成所述目标连接方式。
  4. 如权利要求3所述的方法,其特征在于,所述根据所述节点连接要求确定所述当前节点的候选父节点,包括:
    根据以下连接关系中的至少一种,确定所述当前节点的候选父节点;
    在当前节点的节点类型为Concat或Eltwise时,所述当前节点的父节点个数为多个,且所述当前节点的父节点个数小于或者等于所述当前节点的候选父节点个数;
    在所述当前节点的父节点的节点类型为Active时,所述当前节点的节点类型为Active之外的类型;
    在所述当前节点的父节点的节点类型为Global Pooling时,所述当前节点的节点类型为Global Pooling;
    在所述当前节点的父节点的节点类型为FC时,所述当前节点的节点类型为FC或者Concat;
    在所述当前节点的父节点的节点类型为Conv、Eltwise、Pooling以及Concat时,所述当前节点的节点类型可以为Conv、Eltwise、Pooling、Active、Global Pooling、Concat以及FC中的任意一种。
  5. 如权利要求3或4所述的方法,其特征在于,所述从所述候选父节点中选择出所述当前节点的实际父节点,包括:
    根据概率密度函数确定候选父节点中的每个节点作为所述当前节点的实际父节点的概率;
    根据所述候选父节点中的每个节点作为所述当前节点的实际父节点的概率从所述候选父节点中确定出所述当前节点的实际父节点。
  6. 如权利要求5所述的方法,其特征在于,根据所述候选父节点中的每个节点作为所述当前节点的实际父节点的概率从所述候选父节点中确定出所述当前节点的实际父节点,包括:
    将所述候选父节点中作为所述当前节点的实际父节点的概率大于预设概率值的节点确定为所述当前节点的实际父节点。
  7. 如权利要求5或6所述的方法,其特征在于,所述方法还包括:
    根据所述概率密度函数的期望和方差,调整所述候选父节点中的每个节点作为所述当前节点的实际父节点的概率。
  8. 如权利要求5-7中任一项所述的方法,其特征在于,所述概率密度函数为高斯函数。
  9. 如权利要求1-8中任一项所述的方法,其特征在于,根据所述目标连接方式生成所述目标神经网络,包括:
    根据预设的节点有效连接关系,从所述目标连接关系中确定出有效目标连接关系;
    根据所述有效目标连接关系生成所述目标神经网络。
  10. 如权利要求9所述的方法,其特征在于,所述节点有效连接关系包括下列关系中的至少一种:
    在所述当前节点的节点类型为Eltwise时,所述当前节点的多个输入的 通道数保持一致;
    所述当前节点的节点类型为FC或者GlobalPooling时,所述当前节点的之后不能连接FC、GlobalPooling和act类型之外的其它类型节点。
  11. 如权利要求1-10中任一项所述的方法,其特征在于,所述确定待生成的目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数,包括:
    根据对所述目标神经网络的运算要求确定所述目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数。
  12. 一种神经网络的生成方法,其特征在于,包括:
    确定待生成的目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数;
    根据预设的节点连接要求确定连接所述目标神经网络中所有节点的目标连接方式;
    根据所述目标连接方式生成所述目标神经网络。
  13. 如权利要求12所述的方法,其特征在于,所述根据预设的节点连接要求确定连接所述目标神经网络中所有节点的目标连接方式,包括:
    根据所述节点连接要求确定所述当前节点的候选父节点,其中,所述当前节点和所述候选父节点满足所述节点连接要求;
    从所述候选父节点中选择出所述当前节点的实际父节点;
    确定所述当前节点与所述当前节点的实际父节点之间的连接关系,以最终生成所述目标连接方式。
  14. 如权利要求13所述的方法,其特征在于,所述根据所述节点连接要求确定所述当前节点的候选父节点,包括:
    根据以下连接关系中的至少一种,确定所述当前节点的候选父节点;
    在当前节点的节点类型为Concat或Eltwise时,所述当前节点的父节点个数为多个,且所述当前节点的父节点个数小于或者等于所述当前节点的候选父节点个数;
    在所述当前节点的父节点的节点类型为Active时,所述当前节点的节点类型为Active之外的类型;
    在所述当前节点的父节点的节点类型为Global Pooling时,所述当前节点的节点类型为Global Pooling;
    在所述当前节点的父节点的节点类型为FC时,所述当前节点的节点类型为FC或者Concat;
    在所述当前节点的父节点的节点类型为Conv、Eltwise、Pooling以及Concat时,所述当前节点的节点类型可以为Conv、Eltwise、Pooling、Active、Global Pooling、Concat以及FC中的任意一种。
  15. 如权利要求13或14所述的方法,其特征在于,所述从所述候选父节点中选择出所述当前节点的实际父节点,包括:
    根据概率密度函数确定候选父节点中的每个节点作为所述当前节点的实际父节点的概率;
    根据所述候选父节点中的每个节点作为所述当前节点的实际父节点的概率从所述候选父节点中确定出所述当前节点的实际父节点。
  16. 如权利要求15所述的方法,其特征在于,根据所述候选父节点中的每个节点作为所述当前节点的实际父节点的概率从所述候选父节点中确定出所述当前节点的实际父节点,包括:
    将所述候选父节点中作为所述当前节点的实际父节点的概率大于预设概率值的节点确定为所述当前节点的实际父节点。
  17. 如权利要求15或16所述的方法,其特征在于,所述方法还包括:
    根据所述概率密度函数的期望和方差,调整所述候选父节点中的每个节点作为所述当前节点的实际父节点的概率。
  18. 如权利要求15-17中任一项所述的方法,其特征在于,所述概率密度函数为高斯函数。
  19. 如权利要求12-18中任一项所述的方法,其特征在于,根据所述目标连接方式生成所述目标神经网络,包括:
    根据预设的节点有效连接关系,从所述目标连接关系中确定出有效目标连接关系;
    根据所述有效目标连接关系生成所述目标神经网络。
  20. 如权利要求19所述的方法,其特征在于,所述节点有效连接关系包括下列关系中的至少一种:
    在所述当前节点的节点类型为Eltwise时,所述当前节点的多个输入的通道数保持一致;
    所述当前节点的节点类型为FC或者GlobalPooling时,所述当前节点的 之后不能连接FC、GlobalPooling和act类型之外的其它类型节点。
  21. 如权利要求12-20中任一项所述的方法,其特征在于,所述确定待生成的目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数,包括:
    根据对所述目标神经网络的运算要求确定所述目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数。
  22. 一种数据处理方法,其特征在于,包括:
    确定待生成的目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数;
    根据预设的节点连接要求确定连接所述目标神经网络中所有节点的目标连接方式;
    根据所述目标连接方式生成所述目标神经网络;
    采用所述目标神经网络进行数据处理。
  23. 一种加速器的验证平台,其特征在于,包括:
    存储器,用于存储代码;
    至少一个处理器,用于执行所述存储器中存储的代码,以执行如下操作:
    生成至少一个目标神经网络;
    将所述至少一个目标神经网络翻译成神经网络指令;
    将所述神经网络指令分别输入到加速器以及与所述加速器匹配的软件模型中执行,并确定所述神经网络指令的输出结果的差异;
    根据所述神经网络指令的输出结果的差异确定加速器运行过程中出现异常的指令。
  24. 如权利要求23所述的验证平台,其特征在于,所述生成至少一个目标神经网络,包括:
    确定所述目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数,其中,所述目标神经网络为所述至少一个目标神经网络中的任意一个目标神经网络;
    根据预设的节点连接要求确定连接所述目标神经网络中所有节点的目标连接方式;
    根据所述目标连接方式生成所述目标神经网络。
  25. 如权利要求24所述的验证平台,其特征在于,所述根据预设的节 点连接要求确定连接所述目标神经网络中所有节点的目标连接方式,包括:
    根据所述节点连接要求确定所述当前节点的候选父节点,其中,所述当前节点和所述候选父节点满足所述节点连接要求;
    从所述候选父节点中选择出所述当前节点的实际父节点;
    确定所述当前节点与所述当前节点的实际父节点之间的连接关系,以最终生成所述目标连接方式。
  26. 如权利要求25所述的验证平台,其特征在于,所述根据所述节点连接要求确定所述当前节点的候选父节点,包括:
    根据以下连接关系中的至少一种,确定所述当前节点的候选父节点;
    在当前节点的节点类型为Concat或Eltwise时,所述当前节点的父节点个数为多个,且所述当前节点的父节点个数小于或者等于所述当前节点的候选父节点个数;
    在所述当前节点的父节点的节点类型为Active时,所述当前节点的节点类型为Active之外的类型;
    在所述当前节点的父节点的节点类型为Global Pooling时,所述当前节点的节点类型为Global Pooling;
    在所述当前节点的父节点的节点类型为FC时,所述当前节点的节点类型为FC或者Concat;
    在所述当前节点的父节点的节点类型为Conv、Eltwise、Pooling以及Concat时,所述当前节点的节点类型可以为Conv、Eltwise、Pooling、Active、Global Pooling、Concat以及FC中的任意一种。
  27. 如权利要求25或26所述的验证平台,其特征在于,所述从所述候选父节点中选择出所述当前节点的实际父节点,包括:
    根据概率密度函数确定候选父节点中的每个节点作为所述当前节点的实际父节点的概率;
    根据所述候选父节点中的每个节点作为所述当前节点的实际父节点的概率从所述候选父节点中确定出所述当前节点的实际父节点。
  28. 如权利要求27所述的验证平台,其特征在于,根据所述候选父节点中的每个节点作为所述当前节点的实际父节点的概率从所述候选父节点中确定出所述当前节点的实际父节点,包括:
    将所述候选父节点中作为所述当前节点的实际父节点的概率大于预设 概率值的节点确定为所述当前节点的实际父节点。
  29. 如权利要求27或28所述的验证平台,其特征在于,所述验证平台还包括:
    根据所述概率密度函数的期望和方差,调整所述候选父节点中的每个节点作为所述当前节点的实际父节点的概率。
  30. 如权利要求27-29中任一项所述的验证平台,其特征在于,所述概率密度函数为高斯函数。
  31. 如权利要求23-30中任一项所述的验证平台,其特征在于,根据所述目标连接方式生成所述目标神经网络,包括:
    根据预设的节点有效连接关系,从所述目标连接关系中确定出有效目标连接关系;
    根据所述有效目标连接关系生成所述目标神经网络。
  32. 如权利要求31所述的验证平台,其特征在于,所述节点有效连接关系包括下列关系中的至少一种:
    在所述当前节点的节点类型为Eltwise时,所述当前节点的多个输入的通道数保持一致;
    所述当前节点的节点类型为FC或者GlobalPooling时,所述当前节点的之后不能连接FC、GlobalPooling和act类型之外的其它类型节点。
  33. 如权利要求23-32中任一项所述的验证平台,其特征在于,所述确定待生成的目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数,包括:
    根据对所述目标神经网络的运算要求确定所述目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数。
  34. 一种神经网络的生成装置,其特征在于,包括:
    存储器,用于存储代码;
    至少一个处理器,用于执行所述存储器中存储的代码,以执行如下操作:
    确定待生成的目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数;
    根据预设的节点连接要求确定连接所述目标神经网络中所有节点的目标连接方式;
    根据所述目标连接方式生成所述目标神经网络。
  35. 如权利要求34所述的装置,其特征在于,所述根据预设的节点连接要求确定连接所述目标神经网络中所有节点的目标连接方式,包括:
    根据所述节点连接要求确定所述当前节点的候选父节点,其中,所述当前节点和所述候选父节点满足所述节点连接要求;
    从所述候选父节点中选择出所述当前节点的实际父节点;
    确定所述当前节点与所述当前节点的实际父节点之间的连接关系,以最终生成所述目标连接方式。
  36. 如权利要求35所述的装置,其特征在于,所述根据所述节点连接要求确定所述当前节点的候选父节点,包括:
    根据以下连接关系中的至少一种,确定所述当前节点的候选父节点;
    在当前节点的节点类型为Concat或Eltwise时,所述当前节点的父节点个数为多个,且所述当前节点的父节点个数小于或者等于所述当前节点的候选父节点个数;
    在所述当前节点的父节点的节点类型为Active时,所述当前节点的节点类型为Active之外的类型;
    在所述当前节点的父节点的节点类型为Global Pooling时,所述当前节点的节点类型为Global Pooling;
    在所述当前节点的父节点的节点类型为FC时,所述当前节点的节点类型为FC或者Concat;
    在所述当前节点的父节点的节点类型为Conv、Eltwise、Pooling以及Concat时,所述当前节点的节点类型可以为Conv、Eltwise、Pooling、Active、Global Pooling、Concat以及FC中的任意一种。
  37. 如权利要求35或36所述的装置,其特征在于,所述从所述候选父节点中选择出所述当前节点的实际父节点,包括:
    根据概率密度函数确定候选父节点中的每个节点作为所述当前节点的实际父节点的概率;
    根据所述候选父节点中的每个节点作为所述当前节点的实际父节点的概率从所述候选父节点中确定出所述当前节点的实际父节点。
  38. 如权利要求37所述的装置,其特征在于,根据所述候选父节点中的每个节点作为所述当前节点的实际父节点的概率从所述候选父节点中确定出所述当前节点的实际父节点,包括:
    将所述候选父节点中作为所述当前节点的实际父节点的概率大于预设概率值的节点确定为所述当前节点的实际父节点。
  39. 如权利要求37或38所述的装置,其特征在于,所述装置还包括:
    根据所述概率密度函数的期望和方差,调整所述候选父节点中的每个节点作为所述当前节点的实际父节点的概率。
  40. 如权利要求37-39中任一项所述的装置,其特征在于,所述概率密度函数为高斯函数。
  41. 如权利要求34-40中任一项所述的装置,其特征在于,根据所述目标连接方式生成所述目标神经网络,包括:
    根据预设的节点有效连接关系,从所述目标连接关系中确定出有效目标连接关系;
    根据所述有效目标连接关系生成所述目标神经网络。
  42. 如权利要求41所述的装置,其特征在于,所述节点有效连接关系包括下列关系中的至少一种:
    在所述当前节点的节点类型为Eltwise时,所述当前节点的多个输入的通道数保持一致;
    所述当前节点的节点类型为FC或者GlobalPooling时,所述当前节点的之后只能连接FC、GlobalPooling和act类型之外的节点。
  43. 如权利要求34-42中任一项所述的装置,其特征在于,所述确定待生成的目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数,包括:
    根据对所述目标神经网络的运算要求确定所述目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数。
  44. 一种数据处理装置,其特征在于,包括:
    存储器,用于存储代码;
    至少一个处理器,用于执行所述存储器中存储的代码,以执行如下操作:
    确定待生成的目标神经网络的代数,以及所述目标神经网络所有代的节点的节点类型和节点个数;
    根据预设的节点连接要求确定连接所述目标神经网络中所有节点的目标连接方式;
    根据所述目标连接方式生成所述目标神经网络;
    采用所述目标神经网络进行数据处理。
PCT/CN2019/083225 2019-04-18 2019-04-18 加速器的检测方法和验证平台 WO2020211037A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/083225 WO2020211037A1 (zh) 2019-04-18 2019-04-18 加速器的检测方法和验证平台
CN201980009150.XA CN111656370A (zh) 2019-04-18 2019-04-18 加速器的检测方法和验证平台

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/083225 WO2020211037A1 (zh) 2019-04-18 2019-04-18 加速器的检测方法和验证平台

Publications (1)

Publication Number Publication Date
WO2020211037A1 true WO2020211037A1 (zh) 2020-10-22

Family

ID=72348949

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/083225 WO2020211037A1 (zh) 2019-04-18 2019-04-18 加速器的检测方法和验证平台

Country Status (2)

Country Link
CN (1) CN111656370A (zh)
WO (1) WO2020211037A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007087729A1 (en) * 2006-02-03 2007-08-09 Recherche 2000 Inc. Intelligent monitoring system and method for building predictive models and detecting anomalies
US20100017351A1 (en) * 2008-07-17 2010-01-21 Hench John J Neural network based hermite interpolator for scatterometry parameter estimation
CN104751228A (zh) * 2013-12-31 2015-07-01 安徽科大讯飞信息科技股份有限公司 深度神经网络的构建方法及系统
US20180114117A1 (en) * 2016-10-21 2018-04-26 International Business Machines Corporation Accelerate deep neural network in an fpga
CN109635949A (zh) * 2018-12-31 2019-04-16 浙江新铭智能科技有限公司 一种神经网络生成方法与装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10614354B2 (en) * 2015-10-07 2020-04-07 Altera Corporation Method and apparatus for implementing layers on a convolutional neural network accelerator
CN106933713B (zh) * 2015-12-30 2020-04-28 北京国睿中数科技股份有限公司 硬件加速器的验证方法和验证系统
US20180060724A1 (en) * 2016-08-25 2018-03-01 Microsoft Technology Licensing, Llc Network Morphism
US11216722B2 (en) * 2016-12-31 2022-01-04 Intel Corporation Hardware accelerator template and design framework for implementing recurrent neural networks
CN109358993A (zh) * 2018-09-26 2019-02-19 中科物栖(北京)科技有限责任公司 深度神经网络加速器故障的处理方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007087729A1 (en) * 2006-02-03 2007-08-09 Recherche 2000 Inc. Intelligent monitoring system and method for building predictive models and detecting anomalies
US20100017351A1 (en) * 2008-07-17 2010-01-21 Hench John J Neural network based hermite interpolator for scatterometry parameter estimation
CN104751228A (zh) * 2013-12-31 2015-07-01 安徽科大讯飞信息科技股份有限公司 深度神经网络的构建方法及系统
US20180114117A1 (en) * 2016-10-21 2018-04-26 International Business Machines Corporation Accelerate deep neural network in an fpga
CN109635949A (zh) * 2018-12-31 2019-04-16 浙江新铭智能科技有限公司 一种神经网络生成方法与装置

Also Published As

Publication number Publication date
CN111656370A (zh) 2020-09-11

Similar Documents

Publication Publication Date Title
CN111126574B (zh) 基于内镜图像对机器学习模型进行训练的方法、装置和存储介质
US11062215B2 (en) Using different data sources for a predictive model
US10867244B2 (en) Method and apparatus for machine learning
US10755026B1 (en) Circuit design including design rule violation correction utilizing patches based on deep reinforcement learning
CN108197652B (zh) 用于生成信息的方法和装置
US20220076123A1 (en) Neural network optimization method, electronic device and processor
WO2018068421A1 (zh) 一种神经网络的优化方法及装置
JP6831347B2 (ja) 学習装置、学習方法および学習プログラム
CN111414987A (zh) 神经网络的训练方法、训练装置和电子设备
US20220147877A1 (en) System and method for automatic building of learning machines using learning machines
CN109948680B (zh) 病历数据的分类方法及系统
US20230394369A1 (en) Tracking provenance in data science scripts
EP4343616A1 (en) Image classification method, model training method, device, storage medium, and computer program
CN114399025A (zh) 一种图神经网络解释方法、系统、终端以及存储介质
CN111967581B (zh) 分群模型的解释方法、装置、计算机设备和存储介质
US20220405561A1 (en) Electronic device and controlling method of electronic device
KR102192461B1 (ko) 불확정성을 모델링할 수 있는 뉴럴네트워크 학습 장치 및 방법
WO2020211037A1 (zh) 加速器的检测方法和验证平台
CN116168403A (zh) 医疗数据分类模型训练方法、分类方法、装置及相关介质
US11676050B2 (en) Systems and methods for neighbor frequency aggregation of parametric probability distributions with decision trees using leaf nodes
US20210279575A1 (en) Information processing apparatus, information processing method, and storage medium
CN114898184A (zh) 模型训练方法、数据处理方法、装置及电子设备
US11335466B2 (en) Method for determining disease symptom relations using acceptance and rejection of random samples
CN113642510A (zh) 目标检测方法、装置、设备和计算机可读介质
WO2021189209A1 (zh) 加速器的检测方法和验证平台

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19925087

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19925087

Country of ref document: EP

Kind code of ref document: A1