CN112346916A

CN112346916A - Test method, test device and related product

Info

Publication number: CN112346916A
Application number: CN201910735622.1A
Authority: CN
Inventors: 不公告发明人
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2019-08-09
Filing date: 2019-08-09
Publication date: 2021-02-09

Abstract

The present disclosure relates to a test method, apparatus and related product, the product comprising a controller unit, the controller unit comprising: the device comprises an instruction cache unit, an instruction processing unit and a storage queue unit; the instruction cache unit is used for storing the calculation instruction associated with the artificial neural network operation; the instruction processing unit is used for analyzing the calculation instruction to obtain a plurality of operation instructions; the storage queue unit is configured to store an instruction queue, where the instruction queue includes: and a plurality of operation instructions or calculation instructions to be executed according to the front and back sequence of the queue. Through the method, the operation efficiency of the related product in the operation of the neural network model can be improved.

Description

Test method, test device and related product

Technical Field

The present disclosure relates to the field of information processing technologies, and in particular, to a test method, an apparatus, and a related product.

Background

In the technical field of artificial intelligence, a neural network algorithm is a very popular machine learning algorithm in recent years, and has a very good effect in various fields, such as image recognition, voice recognition, natural language processing and the like. Along with the development of neural network algorithms, the complexity of the algorithms is higher and higher, and in order to improve the recognition degree, the scale of the model is gradually increased.

Disclosure of Invention

According to a first aspect of the present disclosure, there is provided a testing method, the method comprising:

analyzing a test file of the neural network to obtain a first calculation graph;

obtaining a first operation result through the operation of a general processor according to the first calculation diagram;

according to the first calculation diagram, a second calculation result is obtained through the calculation of an artificial intelligence processor;

and obtaining a test result of the neural network according to the first operation result and the second operation result.

According to a second aspect of the present disclosure, there is provided a test apparatus comprising:

the analysis unit is used for analyzing the test file of the neural network to obtain a first calculation graph;

the first operation unit is used for obtaining a first operation result through the operation of a general processor according to the first calculation graph;

the second operation unit is used for obtaining a second operation result through the operation of the artificial intelligence processor according to the first calculation graph;

and the test result generating unit is used for obtaining the test result of the neural network according to the first operation result and the second operation result.

According to a third aspect of the present disclosure, there is provided a test apparatus comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of the first aspect described above.

According to the test method, the device and the related products of the embodiment of the disclosure, the basic programming library of any neural network to be tested can be tested by comparing the deviation between the operation results under the general processor and the artificial intelligence processor, so that the test can be completed on the basis of a framework independent of the neural network, and the efficiency of developers is improved.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a schematic diagram of an application example according to the present disclosure.

FIG. 2 shows a flow diagram of a method of operation according to an embodiment of the present disclosure.

FIG. 3 shows a flow diagram of a method of operation according to an embodiment of the present disclosure.

FIG. 4 shows a flow diagram of a method of operation according to an embodiment of the present disclosure.

FIG. 5 shows a flow diagram of a method of operation according to an embodiment of the present disclosure.

FIG. 6 shows a flow diagram of a method of operation according to an embodiment of the present disclosure.

Fig. 7 shows a block diagram of an arithmetic device according to an embodiment of the present disclosure.

Fig. 8 shows a block diagram of an arithmetic device according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

With the emergence of big data and the improvement of computing power, Neural Networks have attracted a lot of attention again, in recent years, deep learning technologies based on deep Neural Networks have been changed day by day, and various network structures such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Generative adaptive Neural Networks (GAN), and artificial intelligence has reached the place where people can reach shoulders or even surpass people in many fields. With the continuous development of neural networks, many different deep learning frameworks such as pitorch, tensoflow, paddleadd, etc. have been developed for neural networks, so that artificial intelligence research becomes more popular.

At present, a basic library of a deep learning framework basically supports mainstream networks such as a convolutional neural network and a cyclic neural network, however, for any network constructed by an upper framework, when a problem is encountered, a lower basic library is difficult to independently reproduce an environment, and only a network model and a reproduction environment provided by the upper framework can be relied on, so that the burden of a developer is undoubtedly increased to a certain extent. Therefore, when the artificial intelligence processor is tested in the basic library at present, the test of the basic library and the upper-layer framework are difficult to be completely independent, so that the test universality and the application range are reduced.

In order to improve the generality of the testing of the base library, fig. 1 shows a schematic diagram of an application example according to the present disclosure, and as shown in the application example of the present disclosure, the testing of the base library may be implemented by using a json file for neural network operation provided by the base library. In an application example of the present disclosure, a testing process may be that a json file to be tested is respectively calculated by a general processor and an artificial intelligence processor, and a result of the calculation of the general processor and the artificial intelligence processor is compared, so that a comparison result is used as a testing result.

Since the json file needs to be operated by the general processor and the artificial intelligence processor, the test process of the application example of the present disclosure can be divided into three processes of construction, compilation and operation according to the operation requirement.

Wherein, the process during construction can be as follows: analyzing the json file for testing into a calculation graph, performing topological sorting on the operation nodes contained in the calculation graph, and performing sequential execution on the operation nodes in a subsequent process according to the result of the topological sorting. The json file can be automatically generated by the base library through a predetermined program, and in the application example of the present disclosure, the generated json file can be represented by a json array composed of nodes, where each node is a json object, and in the application example of the present disclosure, a node may include four < key, value > pairs, where four keys are respectively denoted as name, op, inputs and attrs, and are respectively used to represent the name, operation, input node and some other attributes of the node. In the application example of the present disclosure, the types of values of the values corresponding to the four keys may be a character string, a character string array, and a json array composed of json objects.

After the computation graph based on json file parsing is obtained, operation can be performed based on the computation graph, so that the process during compiling can be entered. If the calculation graph is operated by an artificial intelligence processor, the compiling process during compiling can be flexibly determined according to the actual situation of the calculation graph.

In an application example of the present disclosure, a specific process when compiling by the artificial intelligence processor may be: firstly, according to the specific situation of a calculation graph, determining whether layer-by-layer compiling or fusion compiling can be performed on the calculation graph, if the layer-by-layer compiling is performed, layer-by-layer traversing can be performed according to the topological sorting result of the calculation graph, according to the traversing situation, the operation nodes in the calculation graph are mapped into operators, and then the compiling function of each operator can be called respectively to obtain the compiling result. If the fusion compiling is executed, the topology sequencing result can be directly added into a fusion operator, then input and output data are added for the fusion operator, and the compiling function of the fusion operator is called to obtain the compiling result.

After the compilation result is obtained, the process of the runtime can be entered. In the application example of the present disclosure, the runtime process can be divided into a runtime process by a general-purpose processor and a runtime process by an artificial intelligence processor. In an application example of the present disclosure, the runtime process performed by the general purpose processor may be: and mapping each operation node in the calculation graph into an operator, calling a calculation function of each operator respectively, and calculating directly, namely completing the process in operation. The runtime process by the artificial intelligence processor may be: firstly, determining whether layer-by-layer mode operation or fusion mode operation is executed during operation according to a compiling process, calling a calculation function of each operator layer by layer for the layer-by-layer mode operation, copying instruction data to an artificial intelligence processor, creating a task flow for asynchronous calculation to obtain a calculation result, and completing the operation process in the layer-by-layer mode; for the operation of the fusion mode, a calculation function of a fusion operator can be called, then instruction data is copied to an artificial intelligence processor, a task flow is created for asynchronous calculation, a calculation result is obtained, and the process of operation under the fusion mode is completed.

Through the process, the operation results of the json file which is operated through the general processor and the artificial intelligence processor can be obtained respectively, and the operation results obtained through the general processor and the artificial intelligence processor can be compared at the moment, so that the comparison result is used as a test result. In an application example of the present disclosure, a specific obtaining process of the test result may be: in the application example of the present disclosure, the error rate s may be calculated by mean square error MSE, and the specific calculation manner of s may be:

through the above calculation process, the test result s of the neural network can be obtained, when the value of s does not exceed the defined threshold, the test can be considered to pass, that is, the tested basic library passes the current test, when the value of s exceeds the defined threshold, the test can be considered to fail, the basic library can be correspondingly debugged, and the specific value is not limited in the application example of the present disclosure because the threshold used for comparing the value of s is the defined threshold. For the technical scheme, the test result is obtained by comparing the operation results of the test files under the general processor and the artificial intelligence processor, so that the test process can be realized without depending on a network model and a recurrence environment provided by the original upper frame of the basic library, the flexibility and the universality of the test can be greatly improved, and the test process does not depend on the upper frame and the recurrence environment any more, so that the structure of a neural network can be freely defined during the test, and the robustness of the test of the basic library is greatly improved; meanwhile, the test process is completed based on the json file, and the json file can be automatically generated through a program, so that the automation and the universality of the test process can be further improved.

FIG. 2 shows a flow diagram of a testing method according to an embodiment of the present disclosure. The testing method may be performed by a terminal device or other processing device, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the test method may be implemented by a processor calling computer readable instructions stored in a memory. In one example, the testing method may be implemented by a general purpose processor and an artificial intelligence processor. As shown, the method may include:

and step S11, analyzing the test file of the neural network to obtain a first calculation chart.

And step S12, obtaining a first operation result through the operation of the general-purpose processor according to the first calculation chart.

And step S13, obtaining a second operation result through the operation of the artificial intelligence processor according to the first calculation chart.

And step S14, obtaining a test result of the neural network according to the first operation result and the second operation result.

By analyzing the test file of the neural network to obtain a first calculation graph, respectively calculating through the general processor and the artificial intelligence processor to obtain a first calculation result and a second calculation result, and obtaining the test result of the neural network based on the first calculation result and the second calculation result, the test method provided in the above-mentioned disclosed embodiment can make the basic programming library of any neural network to be tested obtain the test result of the neural network by comparing the deviation between the calculation results under the general processor and the artificial intelligence processor, because the test result is obtained by comparing the calculation results of the test file under the general processor and the artificial intelligence processor, the test process can be realized without depending on the network model and the recurrence environment provided by the original upper framework of the basic library, therefore, the flexibility and the universality of the test can be greatly improved, the test is completed on the basis of not depending on a neural network framework, and the efficiency of developers is improved.

In the above-described embodiments, the implementation manner of the test file of the neural network is not limited, and any file format that can be analyzed and applied to the test may be used as the implementation form of the test file of the neural network. As can be seen from the above disclosed embodiments, the test file of the neural network may be parsed to obtain the first computation graph, and therefore, in a possible implementation, the test file of the neural network may include a plurality of objects, and the objects may be parsed to correspond to the operation nodes in the first computation graph. Therefore, in a possible implementation manner, the test file of the neural network may include a json file, the object included in the test file may include a json object, a generation manner and specific content of the json file are not limited in the application embodiment of the present disclosure, and are not limited to the following disclosure embodiments. In one possible implementation, the json object may include at least one key-value pair indicating an attribute of an operation node to which the json object corresponds. In one example, the json file can be automatically generated by a program, and in one example, the json file can also be manually edited and generated. In one example, a json file may include a json array, where a json array may be composed of nodes, each node being a json object. Since the implementation manner of the json object is not limited in the above disclosed embodiment, the specific composition manner of the node, that is, the specific content included in the node, is also not limited in the embodiment of the present disclosure, and can be flexibly set according to the actual situation. In one example, a node may contain four key-value pairs, namely < key, value > pairs. The specific content represented by each key-value pair is also not limited in the embodiments of the present disclosure, and is not limited in the following embodiments, and in one example, the keys in the four key-value pairs may be a name used to represent the name of an operation node in the corresponding computation graph, an op used to represent a specific operation of an operation node in the corresponding computation graph, inputs used to represent input nodes of an operation node in the corresponding computation graph, and attrs used to represent other attributes of an operation node in the corresponding computation graph. In an example, the other attributes may be some unique parameters of the operation node, for example, a step size of convolution, a size of a convolution kernel, and other related attributes. The value corresponding to each key, the specific attribute of which is also not limited in the embodiment of the present disclosure, in one example, the value corresponding to the name may be represented by a form of a character string; the value corresponding to the op can be represented in the form of a character string; values corresponding to inputs can be represented by a form of character string arrays; and the value corresponding to attrs may be represented by a json array consisting of json objects. Since the specific content of other attributes is not limited in the embodiments of the present disclosure, in one possible implementation, a node may also include five < key, value > pairs, where five keys are name, op, inputs, attrs1, and attrs2, that is, two key-value pairs are used to indicate other attributes of the node.

Through the test file of the neural network comprising the json file, the operation and the test can be carried out on the analysis result of the json file, so that the final test result is obtained, and the json file can be automatically generated through a program, so that the automation and the efficiency of the test process can be greatly improved; meanwhile, the json file can be edited in a user-defined mode, so that the structure of the neural network can be freely defined during testing, the robustness of the foundation library testing is greatly improved, and the application range of the testing is widened.

It can be seen from the above disclosed embodiments that the implementation form of the test file of the neural network is not limited, and can be flexibly determined according to the actual situation. Therefore, with the difference of the test files, the analysis process of the test files can be changed correspondingly. Therefore, the implementation manner of step S11 is not limited, and fig. 3 shows a flowchart of a testing method according to an embodiment of the present disclosure, and as shown in the figure, step S11 may include, in one possible implementation manner:

step S111, reading all objects included in the test file of the neural network.

Step S112, respectively converting the objects into corresponding operation nodes.

And step S113, carrying out topological sequencing on the operation nodes according to the objects to obtain a first calculation graph.

It can be seen from the foregoing disclosure that the test file of the neural network may include a plurality of objects, and the objects may be parsed to correspond to the operation nodes in the first computation graph, so that in a possible implementation manner, the first computation graph may be obtained by reading all the objects included in the test file, respectively converting the objects into the corresponding operation nodes, and performing topology sorting on the operation nodes according to the objects. The specific implementation modes of reading the test file, converting the object into the operation node and performing topology sequencing on the operation node are not limited in the embodiment of the disclosure, and can be flexibly determined according to the specific implementation mode of the test file. In one example, for a test file of a json file composed of nodes, the process of parsing the test file into a first computation graph may be: firstly, reading all nodes contained in a json file, and it has been proposed in the above-described disclosed embodiment that the nodes may contain key value pairs, and key and value which can indicate names and operations corresponding to the nodes exist in the key value pairs, so that the nodes may be respectively converted into corresponding operation nodes based on the names and the operations, and thus, all operation nodes contained in the first computation graph may be obtained, and because key and value which can indicate input nodes corresponding to the nodes also exist in the key value pairs of the nodes, the nodes may be topologically ordered based on the condition of the input nodes, that is, the input nodes of each operation node are determined, and the connection relationship between different operation nodes is perfected, thereby completing the construction of the first computation graph.

Through reading all objects included by the test file, the objects are converted into corresponding operation nodes respectively, topological sorting is carried out on the operation nodes according to the objects, and a first calculation graph is obtained.

After the first calculation map is obtained, the first calculation result and the second calculation result may be obtained through step S12 and step S13, respectively, and the implementation order of step S12 and step S13 is not limited in the embodiment of the present disclosure, that is, there is no precedence order between the above different steps, and it is only used to indicate that step S12 and step S13 are two independent processes. In a possible implementation manner, the first operation result may be obtained through step S12, and then the second operation result may be obtained through step S13; in a possible implementation manner, the second operation result may be obtained through step S13, and then the first operation result may be obtained through step S12; in a possible implementation manner, since the steps S12 and S13 are implemented by a general-purpose processor and an artificial intelligence processor, respectively, the steps S12 and S13 can be implemented by two hardware devices at the same time, so as to obtain the first operation result and the second operation result at the same time.

The specific implementation manner of step S12 can be flexibly determined according to practical situations, and is not limited to the following disclosed embodiments. Fig. 4 shows a flowchart of a testing method according to an embodiment of the present disclosure, and as shown in the figure, in one possible implementation, step S12 may include:

step S121, performing operation initialization on the first calculation graph to obtain a first initialization result.

And step S122, respectively calling the operation function of each operation node contained in the first calculation graph in the general-purpose processor according to the first initialization result to obtain a first operation result.

In the above disclosed embodiment, the implementation manner of step S121 is not limited, that is, the process of performing operation initialization on the first computation graph is not limited in the present disclosed embodiment. In one possible implementation, the operation initialization may include: acquiring input data, and/or weight initialization. In one possible implementation, the operation initialization of the first computation graph may include both the acquisition of input data and the weight initialization, in one possible implementation, the operation initialization of the first computation graph may include only the acquisition of input data, and in one possible implementation, the operation initialization of the first computation graph may include only the weight initialization. The obtaining of the input data may be obtaining of the whole input data of the first computation graph, a specific obtaining process is not limited in the embodiment of the present disclosure, and in one example, the input data may be further obtained by reading a test file of a neural network; in one example, the input data may also be generated randomly. However, some operation nodes may exist in the first calculation graph, and a weight that needs to be initialized exists in the operation process, so that in order to ensure that the operation process can be performed in a complete flow, the weights in the operation nodes need to be initialized, and a specific initialization mode and process are not limited in the embodiment of the present disclosure and can be flexibly selected according to actual situations.

By means of obtaining input data and weight initialization, initialization of the calculation graph can be fully completed, so that normal and smooth operation of a subsequent calculation process based on the calculation graph can be guaranteed, and completeness and accuracy of testing are guaranteed.

After the initialization of the first computation graph is completed, step S122 may be completed by the general-purpose processor based on the initialized first computation graph, and a first operation result is obtained. In the embodiment of the present disclosure, the type and model of the specific general-purpose processor for performing the operation can be flexibly selected according to the actual situation, and is not limited to the following situations. In one example, a general purpose processor may be a CPU; in one example, a general purpose processor may also be a GPU. Since the implementation manner of the general purpose processor can be flexibly selected, the implementation manner of step S122 can also be flexibly determined in the embodiment of the present disclosure according to the difference of the general purpose processor, that is, the process of performing the operation on the initialized first computation graph by the general purpose processor can be flexibly determined according to the situation of the general purpose processor. In one example, for the CPU, the specific process of implementing step S122 may be: in the CPU, for the first calculation graph after initialization is completed, according to the topological sorting sequence of the operation nodes in the first calculation graph, the calculation functions of operators corresponding to each operation node in the first calculation graph are respectively and directly called, so that the operation on the first calculation graph is completed, and a first operation result is obtained. In an example, for the GPU, a specific process of implementing step S122 is substantially the same as that of the above-described disclosed embodiment, and is not described herein again. Further, in the embodiment of the present disclosure, after obtaining the first operation result, the first operation result may be stored in the general-purpose processor, and wait for the comparison process in the subsequent step S14.

The first operation result is obtained by performing operation initialization on the first calculation graph and calling the operation function in the general processor to perform operation according to the first initialization result, so that the first operation result is obtained.

Likewise, the specific implementation manner of step S13 can be flexibly determined according to practical situations, and is not limited to the following disclosed embodiments. Fig. 5 shows a flowchart of a testing method according to an embodiment of the present disclosure, and as shown in the figure, in one possible implementation, step S13 may include:

and S131, determining the operation mode of the artificial intelligence processor according to the first calculation chart.

And step S132, when the operation mode is a layer-by-layer operation mode, sequentially operating each operation node included in the first calculation graph through the artificial intelligence processor to obtain a second operation result.

And step S133, when the operation mode is the fusion operation mode, the artificial intelligence processor performs operation by using the first calculation graph as a fusion operation node to obtain a second operation result.

In the above-described embodiment, the operation mode of the artificial intelligence processor is determined, and the specific determination manner is not limited, and whether the artificial intelligence processor performs layer-by-layer operation or fusion operation on the first calculation map may be determined according to the actual situation of the first calculation map. In one example, if all of the operation nodes in the first computational graph can operate as a fused operation node of the whole, the first computational graph can be operated on by the fusion mode. In one example, if all the operation nodes in the first computational graph cannot operate as a whole, the first computational graph may be operated in a layer-by-layer mode. In an example, if there is a part of operation nodes in the first computation graph that can be operated as an entire operation node, and the remaining operation nodes cannot be fused with the entire computation graph, a mixed mode of fusion operation and layer-by-layer operation may be considered to operate the first computation graph, that is, after the fusible part is operated in a fusion mode, the remaining part is operated in a layer-by-layer mode.

It can be seen from the above disclosure that, after the operation mode of the artificial intelligence processor is determined, in the layer-by-layer operation mode, in step S132, each operation node included in the first calculation graph is sequentially operated to obtain a second operation result; in the fusion operation mode, the first computation graph may be used as a fusion operation node to perform operation to obtain a second operation result in step S133; through the process, the operation mode of the artificial intelligent processor can be flexibly selected according to the actual situation of the first calculation diagram, the operation mode is fused when the first calculation diagram can be fused, the operation efficiency of the second operation result is improved, the operation mode is operated layer by layer when the first calculation diagram can not be fused, the accuracy of the second operation result is ensured, the operation efficiency of the artificial intelligent processor is greatly improved while the accuracy of the operation result is ensured, and the operation flexibility is also improved.

The specific implementation manner of step S132 may be flexibly selected according to practical situations, and is not limited to the following disclosed embodiments, and in one possible implementation manner, step S132 may include:

and step S1321, performing structure optimization on the first calculation graph in the artificial intelligence processor to obtain a second calculation graph.

Step S1322 is to perform operation initialization on the second calculation graph to obtain a second initialization result.

And step S1323, traversing the second calculation graph according to the second initialization result, and calling the compiling function of each operation node contained in the second calculation graph in the artificial intelligence processor respectively according to the traversing sequence to obtain a first compiling result.

Step S1324, copying the first compilation result to the artificial intelligence processor, and calling the operation function of each operation node included in the second calculation graph in the artificial intelligence processor, respectively, to obtain a second operation result.

In the above disclosed embodiment, the implementation manner of step S1321 is not limited, that is, the manner of performing structure optimization on the first computation graph in the artificial intelligence processor to obtain the second computation graph is not limited, and any manner that can simplify the structure of the first computation graph itself can simplify the structure of the first computation graph, so that when performing operation based on the obtained second computation graph, the operation efficiency or the consumption of operation resources is better than that of performing operation based on the first computation graph on the basis of not affecting the operation result, and can be used as the implementation manner of step S1321. In an example, the implementation manner of step S1321 may be to merge operation nodes that can be merged into one operation node to obtain the second computation graph, and specific operation nodes that can be merged may be determined according to an actual situation in the first computation graph, which is not specifically limited in the embodiment of the present disclosure.

After the second computation graph optimized based on the first computation graph is obtained, the second computation graph may be initialized by performing operation in step S1322 to obtain a second initialization result. The implementation manner of performing operation initialization on the second computation graph is the same as that of performing operation initialization on the first computation graph in the above disclosed embodiment, and only the object of operation initialization is changed, so reference may be made to the above disclosed embodiment, and a detailed implementation manner of step S1322 is not described herein again.

After the operation initialization is completed on the second computation graph, the second computation graph may be compiled according to the second initialization result, and the first compilation result is obtained through step S1323. The implementation manner of step S1323 is not limited, that is, the specific compiling process of the second calculation graph may be flexibly determined according to the actual situation of the second calculation graph, and any manner that the operation nodes included in the second calculation graph may be traversed and compiled layer by layer may be used as the implementation manner of step S1323, which is not limited to the embodiment of the present disclosure. In one example, the process of compiling the second computation graph may be: and traversing the second calculation graph layer by layer, and respectively traversing the operators corresponding to the operation nodes contained in each layer. In the embodiment of the present disclosure, a specific traversal manner of traversing the second computation graph layer by layer is also not limited, and in an example, traversal of the second computation graph may be implemented according to a breadth-first order.

After the first compiling result is obtained, the artificial intelligence processor may perform an operation based on the first compiling result in step S1324 to obtain a second operation result. The implementation manner of step S1324 is also not limited, and may be flexibly determined according to the actual situation of the first compiling result. In one possible implementation manner, the implementation manner of step S1324 may be: the first compiling result obtained in step S1323, that is, the instruction data set forth in the above-described disclosed embodiment, is copied to the artificial intelligence processor, so that the artificial intelligence processor can perform an operation based on the first compiling result to obtain a second operation result. In this embodiment of the disclosure, the process of the artificial intelligence processor performing the operation on the first compiling result may be: the task flow can be created based on the first compiling result, and since the artificial intelligence processor may have a plurality of operation cores during operation, and the operation cores can perform asynchronous calculation on the contents which can be operated in parallel in the first compiling result at the same time, the artificial intelligence processor can perform asynchronous calculation on the first compiling result based on the actual condition of the task flow, thereby obtaining the second operation result. The specific mode of performing the operation based on the task flow on each operation core is not limited in the embodiment of the present disclosure, and may be flexibly selected according to the actual situation of the task flow, and is not limited to the following disclosed embodiment.

The structure of the first calculation graph is optimized in the artificial intelligence processor to obtain a second calculation graph, after the second calculation graph is operated and initialized, the second calculation graph is traversed according to a second initialization result, the compiling function of each operation node contained in the second calculation graph is called in sequence according to the traversing sequence to obtain a first compiling result, the first compiling result is copied to the artificial intelligence processor, the operation function of each operation node contained in the second calculation graph is called in the artificial intelligence processor respectively to obtain a second operation result, through the process, the compiling function and the operation function of each operator in the second calculation graph optimized based on the first calculation graph can be effectively called in sequence to realize the compiling and the operation layer by layer respectively, the accuracy of the operation in the layer-by-layer mode is effectively ensured, and meanwhile, the second calculation graph is optimized based on the first calculation graph, therefore, the efficiency of operation in the artificial intelligence processor can be further improved, and the efficiency of the test is improved.

The specific implementation manner of step S133 may also be flexibly selected according to practical situations, and is not limited to the following disclosed embodiments, and in a possible implementation manner, step S133 may include:

and step S1331, taking the first computation graph as a fusion operation node, and generating a third computation graph based on the fusion operation node.

Step S1332, performing operation initialization on the third computation graph to obtain a third initialization result.

And step S1333, calling a compiling function of the fusion operation node in the artificial intelligence processor according to the third initialization result to obtain a second compiling result.

And step S1334, copying the second compiling result to the artificial intelligence processor, and calling the operation function of the fusion operation node in the artificial intelligence processor to obtain a second operation result.

In the above-described embodiment, the implementation of step S1331 is not limited, that is, a specific implementation of taking the first computation graph as a fusion operation node in the artificial intelligence processor and generating the third computation graph based on the fusion operation node is not specifically limited in the embodiment of the present disclosure, and any manner that the first computation graph can be regarded as a whole, and taken as a fusion operation node, and the third computation graph is generated may be taken as an implementation form of step S1331. In an example, the first computation graph may be directly used as a fusion operator, and input/output data may be added to the fusion operator, and the computation graph corresponding to the fusion operator to which the input/output data is added may be used as the third computation graph.

After the third computation graph fused based on the first computation graph is obtained, the third computation graph may be initialized in step S1332, so as to obtain a third initialization result. The implementation manner of performing operation initialization on the third computation graph is the same as the manner of performing operation initialization on the first computation graph and performing operation initialization on the second computation graph in the above disclosed embodiment, and only the object of operation initialization is changed, so reference may be made to the above disclosed embodiment, and a detailed implementation manner of step S1332 is not described herein again.

After the operation initialization is completed on the third computation graph, the third computation graph may be compiled according to the third initialization result, and the second compilation result is obtained through step S1333. The implementation manner of step S1333 is not limited, that is, the specific compiling process of the third computation graph can be flexibly determined according to the actual situation of the third computation graph, and any process that can compile the fusion operator corresponding to the third computation graph can be used as the implementation manner of step S1333, and is not limited to the embodiment of the present disclosure. In one example, the process of compiling the third computation graph may be: and directly calling a compiling function of the fusion operator corresponding to the third computation graph, so as to obtain instruction data compiled based on the third computation graph as a second compiling result.

After the second compilation result is obtained, the artificial intelligence processor may perform an operation based on the second compilation result in step S1334 to obtain a third operation result. The implementation manner of step S1334 is also not limited, and may be flexibly determined according to the actual situation of the second compiling result. In a possible implementation manner, the implementation manner of step S1334 may be: the second compiling result obtained in step S1333, that is, the instruction data proposed in the above-described disclosed embodiment, is copied to the artificial intelligence processor, so that the artificial intelligence processor can perform an operation based on the second compiling result to obtain a third operation result. In this embodiment of the disclosure, the process of the artificial intelligence processor performing the operation on the second compilation result may be: first, a task flow may be created based on the second compilation result, and the second compilation result may be calculated based on the actual situation of the task flow, so as to obtain a third operation result. The specific operation mode is not limited in the embodiment of the present disclosure, and may be flexibly selected according to the actual situation of the task flow, and is not limited in the following disclosure embodiments, in one example, the artificial intelligence processor may directly call the operation function of the fusion operator, so as to obtain the third operation result.

The first calculation graph is used as a fusion operation node, a third calculation graph based on the fusion operation node is generated, operation initialization is carried out on the third calculation graph, a compiling function of the fusion operation node is called according to a third initialization result to obtain a second compiling result, the second compiling result is copied to the artificial intelligence processor, an operation function of the fusion operation node is called in the artificial intelligence processor to obtain a third operation result, the compiling function and the operation function of an operator of the fused third calculation graph can be effectively called through the process, the fused compiling and operating are achieved, the operation efficiency of the artificial intelligence processor in the fusion mode is effectively improved through the fused compiling and operating process, and therefore the efficiency of the whole testing process is improved.

After the first operation result is obtained through the step S12 and the second operation result is obtained through the step S13, a test result of the neural network may be obtained according to the first operation result and the second operation result through the step S14. The implementation manner of step S14 is not limited in the embodiment of the present disclosure, that is, how to obtain the test result of the neural network through the first operation result and the second operation result, and the obtaining process of the test result is not limited to the following disclosed embodiments, and any manner that can compare the first operation result and the second operation result, so as to reasonably quantify the difference between the first operation result and the second operation result, as the standard of the test result, can be used as the implementation manner of step S14. Fig. 6 shows a flowchart of a testing method according to an embodiment of the present disclosure, and as shown in the figure, in one possible implementation, step S14 may include:

step S141 calculates a mean square error of the first calculation result and the second calculation result.

Step S142, obtaining an error rate between the first operation result and the second operation result according to a ratio between the mean square error and the first operation result, and using the error rate as a test result of the neural network.

In the above-described disclosed embodiment, the hardware device executing step S14 is not limited in the embodiment of the present disclosure, and in one possible implementation, the test result may be calculated by a general-purpose processor, in one possible implementation, the test result may also be calculated by an artificial intelligence processor, and in one possible implementation, a third-party computing device may also be used instead of a general-purpose processor or an artificial intelligence processor that performs operations before to calculate the test result. In one example, to improve the simplicity, the second operation result obtained in step S13 by the artificial intelligence processor may be copied into the general-purpose processor, and compared and calculated with the first operation result obtained in step S12 by the general-purpose processor, so as to obtain the test result.

In an example, the mean square error of the first operation result and the second operation result is calculated, and according to a ratio of the mean square error to the first operation result, the error rate between the first operation result and the second operation result is obtained as a test result of the neural network, and a specific process thereof may be described by a formula, in an application embodiment of the present disclosure, y may be taken as the first operation result, y' as the second operation result, and then the mean square error MSE of the first operation result and the second operation result may be:

in the above formula, ψ is a mean square error between the first operation result and the second operation result, and based on the calculation result of the mean square error, the ratio of the remaining first operation results y can be taken as an error rate s, that is, a test result, and can be expressed by the formula:

after the test result is obtained, the test result may be compared with a preset threshold, and if the test result is greater than the threshold, it may be described that a difference between the second operation result and the first operation result is too large, and at this time, it may be described that a result obtained when the artificial intelligence processor performs operation through the basic library has a large problem, and the problem that occurs may be further located, so that correction and debugging of the basic library may be completed. If the test result is not greater than the threshold value, the result obtained when the artificial intelligence processor carries out operation through the basic library is more accurate, and therefore the test is passed. The threshold value used for comparing with the test result can be flexibly set according to the actual situation, and no numerical limitation is made here.

The mean square error of the first operation result and the second operation result is calculated, the test result of the neural network is obtained based on the ratio of the mean square error to the first operation result, and the difference between the first operation result and the second operation result can be effectively quantized, so that the test result with representativeness and reference is obtained, and effective reference is provided for subsequently determining whether the basic library needs to be corrected and further debugged.

Fig. 7 shows a block diagram of a testing device according to an embodiment of the present disclosure, as shown, the device 20 includes:

and the analysis unit 21 is configured to analyze the test file of the neural network to obtain a first calculation graph.

And a first operation unit 22, configured to obtain a first operation result through a general-purpose processor operation according to the first computation graph.

And the second operation unit 23 is configured to obtain a second operation result through the artificial intelligence processor according to the first calculation graph.

And the test result generating unit 24 is configured to obtain a test result of the neural network according to the first operation result and the second operation result.

In one possible implementation manner, the parsing unit is configured to: reading all objects included in a test file of the neural network; respectively converting the objects into corresponding operation nodes; and carrying out topological sequencing on the operation nodes according to the objects to obtain a first calculation graph.

In one possible implementation, the test file of the neural network includes a json file, and the object includes a json object, where the json object includes at least one key-value pair for indicating an attribute of an operation node corresponding to the json object.

In one possible implementation, the first arithmetic unit is configured to: carrying out operation initialization on the first calculation graph to obtain a first initialization result; and respectively calling the operation function of each operation node contained in the first calculation graph in the general-purpose processor according to the first initialization result to obtain a first operation result.

In one possible implementation, the second arithmetic unit includes: the operation mode determining subunit is used for determining the operation mode of the artificial intelligence processor according to the first calculation graph; the layer-by-layer operation subunit is used for sequentially operating each operation node included in the first calculation graph through the artificial intelligence processor when the operation mode is the layer-by-layer operation mode to obtain a second operation result; and the fusion operation subunit is used for performing operation by using the first calculation graph as a fusion operation node through the artificial intelligence processor when the operation mode is the fusion operation mode to obtain a second operation result.

In one possible implementation, the layer-by-layer operation subunit is configured to: performing structure optimization on the first calculation graph in the artificial intelligence processor to obtain a second calculation graph; performing operation initialization on the second calculation graph to obtain a second initialization result; traversing the second calculation graph according to the second initialization result, and respectively calling the compiling function of each operation node contained in the second calculation graph in the artificial intelligence processor according to the traversing sequence to obtain a first compiling result; and copying the first compiling result to an artificial intelligence processor, and calling the operation function of each operation node contained in the second calculation graph in the artificial intelligence processor respectively to obtain a second operation result.

In one possible implementation, the fusion operation subunit is configured to: taking the first computation graph as a fusion operation node, and generating a third computation graph based on the fusion operation node; carrying out operation initialization on the third calculation graph to obtain a third initialization result; calling a compiling function of the fusion operation node in the artificial intelligence processor according to the third initialization result to obtain a second compiling result; and copying the second compiling result to the artificial intelligence processor, and calling the operation function of the fusion operation node in the artificial intelligence processor to obtain a second operation result.

In one possible implementation, the operation initialization includes: acquiring input data, and/or weight initialization.

In one possible implementation, the test result generating unit is configured to: calculating the mean square error of the first operation result and the second operation result; and obtaining an error rate between the first operation result and the second operation result according to the ratio of the mean square error to the first operation result, and using the error rate as a test result of the neural network.

Fig. 8 is a block diagram illustrating an arithmetic device 1300 according to an example embodiment. For example, the apparatus 1300 may be provided as a server. Referring to fig. 8, apparatus 1300 includes a processing component 1322, which further includes one or more processors, and memory resources, represented by memory 1332, for storing instructions, such as application programs, that may be executed by processing component 1322. The application programs stored in memory 1332 may include one or more modules that each correspond to a set of instructions. Further, processing component 1322 is configured to execute instructions to perform the methods described above.

The apparatus 1300 may also include a power component 1326 configured to perform power management for the apparatus 1300, a wired or wireless network interface 1350 configured to connect the apparatus 1300 to a network, and an input-output (I/O) interface 1358. The apparatus 1300 may operate based on an operating system stored in the memory 1332, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1332, is also provided that includes computer program instructions that are executable by the processing component 1322 of the apparatus 1300 to perform the methods described above.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing may be better understood in light of the following clauses:

clause a1, a method of testing, the method comprising:

Clause a2, the method of clause a1, parsing a test file of the neural network to obtain a first computational graph, comprising:

reading all objects included in a test file of the neural network;

respectively converting the objects into corresponding operation nodes;

and carrying out topological sequencing on the operation nodes according to the objects to obtain a first calculation graph.

Clause A3, the method of clause a2, the test file for the neural network comprising a json file, the objects comprising json objects, wherein,

the json object comprises at least one key-value pair for indicating the attribute of the operation node corresponding to the json object.

Clause a4, the method according to any one of clause a1 to clause A3, wherein the performing, by a general-purpose processor, according to the first computation graph to obtain a first computation result, includes:

carrying out operation initialization on the first calculation graph to obtain a first initialization result;

and according to the first initialization result, respectively calling the operation function of each operation node contained in the first calculation graph in the general processor to obtain a first operation result.

Clause a5, the method according to any one of clause a1 to clause a4, wherein the obtaining a second operation result by performing an operation by an artificial intelligence processor according to the first computation graph, includes:

determining an operation mode of the artificial intelligence processor according to the first calculation graph;

when the operation mode is a layer-by-layer operation mode, sequentially operating each operation node included in the first calculation graph through the artificial intelligence processor to obtain a second operation result;

and when the operation mode is the fusion operation mode, the artificial intelligence processor is used for operating the first calculation graph as a fusion operation node to obtain a second operation result.

Clause a6, the method according to clause a5, where the sequentially performing, by the artificial intelligence processor, an operation on each operation node included in the first computation graph to obtain a second operation result includes:

performing structure optimization on the first calculation graph in an artificial intelligence processor to obtain a second calculation graph;

performing operation initialization on the second calculation graph to obtain a second initialization result;

traversing the second calculation graph according to the second initialization result, and respectively calling a compiling function of each operation node contained in the second calculation graph in the artificial intelligence processor according to a traversing sequence to obtain a first compiling result;

and copying the first compiling result to the artificial intelligence processor, and calling the operation function of each operation node contained in the second calculation graph in the artificial intelligence processor respectively to obtain a second operation result.

Clause a7, the method according to clause a5, wherein the performing, by the artificial intelligence processor, an operation on the first computation graph as a fusion operation node to obtain a second operation result includes:

taking the first computation graph as a fusion operation node, and generating a third computation graph based on the fusion operation node;

performing operation initialization on the third computation graph to obtain a third initialization result;

calling a compiling function of the fusion operation node in the artificial intelligence processor according to the third initialization result to obtain a second compiling result;

and copying the second compiling result to the artificial intelligence processor, and calling the operation function of the fusion operation node in the artificial intelligence processor to obtain a second operation result.

Clause A8, the method of any one of clause a4, clause a6, or clause a7, the operation initialization comprising:

acquiring input data, and/or weight initialization.

Clause a9, the method of clause a1, the obtaining a test result of the neural network from the first operation result and the second operation result, comprising:

calculating a mean square error of the first operation result and the second operation result;

and according to the ratio of the mean square error to the first operation result, obtaining an error rate between the first operation result and the second operation result as a test result of the neural network.

Clause B10, a test device, comprising:

Clause B11, the apparatus of clause B10, the parsing unit to:

reading all objects included in a test file of the neural network;

respectively converting the objects into corresponding operation nodes;

Clause B12, the apparatus of clause B11, the test file for the neural network comprising a json file, the objects comprising json objects, wherein,

Clause B13, the apparatus according to any one of clauses B10 to B12, the first arithmetic unit being configured to:

Clause B14, the apparatus according to any one of clauses B10 to B13, the second arithmetic unit comprising:

an operation mode determining subunit, configured to determine an operation mode of the artificial intelligence processor according to the first computation graph;

the layer-by-layer operation subunit is used for sequentially operating each operation node included in the first calculation graph through the artificial intelligence processor when the operation mode is the layer-by-layer operation mode to obtain a second operation result;

and the fusion operation subunit is used for performing operation by using the first calculation graph as a fusion operation node through the artificial intelligence processor when the operation mode is the fusion operation mode to obtain a second operation result.

Clause B15, the apparatus of clause B14, the layer-by-layer operations subunit to:

Clause B16, the apparatus of clause B14, the fusion operations subunit being configured to:

Clause B17, the apparatus of any one of clause B13, clause B15, or clause B16, the operational initialization comprising:

acquiring input data, and/or weight initialization.

Clause B18, the apparatus of clause B10, the test result generating unit to:

Clause C19, a test device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of any of clause A1-clause A9.

Clause D20, a non-transitory computer readable storage medium having computer program instructions stored thereon that, when executed by a processor, implement the method of any one of clauses a 1-a 9.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of testing, the method comprising:

2. The method of claim 1, wherein parsing the test file of the neural network to obtain the first computational graph comprises:

reading all objects included in a test file of the neural network;

respectively converting the objects into corresponding operation nodes;

3. The method of claim 2, wherein the test file for the neural network comprises a json file and the objects comprise json objects, wherein,

4. The method according to any one of claims 1 to 3, wherein the performing, by a general-purpose processor, an operation according to the first computation graph to obtain a first operation result includes:

5. The method according to any one of claims 1 to 4, wherein the performing, by an artificial intelligence processor, an operation according to the first computation graph to obtain a second operation result comprises:

6. The method of claim 5, wherein the sequentially operating, by the artificial intelligence processor, each operation node included in the first computation graph to obtain a second operation result comprises:

7. The method according to claim 5, wherein said performing, by the artificial intelligence processor, an operation on the first computation graph as a fusion operation node to obtain a second operation result comprises:

8. A test apparatus, comprising:

9. A test apparatus, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the method of any one of claims 1-7.

10. A non-transitory computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method of any of claims 1 to 7.