CN112766512A - Deep learning framework diagnosis system, method, device, equipment and medium based on meta-operator - Google Patents

Deep learning framework diagnosis system, method, device, equipment and medium based on meta-operator Download PDF

Info

Publication number
CN112766512A
CN112766512A CN202110098292.7A CN202110098292A CN112766512A CN 112766512 A CN112766512 A CN 112766512A CN 202110098292 A CN202110098292 A CN 202110098292A CN 112766512 A CN112766512 A CN 112766512A
Authority
CN
China
Prior art keywords
deep learning
operator
calculation
model
operators
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110098292.7A
Other languages
Chinese (zh)
Other versions
CN112766512B (en
Inventor
刘譞哲
马郓
谷典典
黄罡
孙艳春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Information Technology Institute (tianjin Binhai)
Original Assignee
Peking University Information Technology Institute (tianjin Binhai)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Information Technology Institute (tianjin Binhai) filed Critical Peking University Information Technology Institute (tianjin Binhai)
Priority to CN202110098292.7A priority Critical patent/CN112766512B/en
Publication of CN112766512A publication Critical patent/CN112766512A/en
Application granted granted Critical
Publication of CN112766512B publication Critical patent/CN112766512B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a deep learning framework diagnosis system, method, apparatus, medium and device based on meta-operators, the system comprising: the user interface module is used for providing a programming interface for setting up a deep learning model and diagnosing an operator in the model for a user; the model static graph module is used for constructing a structure of a model static calculation graph according to the code of the user; the debugger comprises a state recorder and a data flow executor, wherein the state recorder is used for recording the calculation result executed by the calculation graph after each operator is replaced; the data flow executor is used for executing forward calculation and backward gradient calculation operations according to the structure of the static calculation graph; the meta-operator module is used for providing a basic calculation unit for the system to execute various types of calculation; the operator implementation module is used for numerical calculation and/or multidimensional array operation of different deep learning frames; and the equipment management module is used for managing the CPU and the GPU which are realized by hardware and providing a uniform interface.

Description

Deep learning framework diagnosis system, method, device, equipment and medium based on meta-operator
Technical Field
The present disclosure relates to the field of deep learning technologies, and more particularly, to a deep learning framework diagnostic system, method, apparatus, device, and medium based on meta-operators.
Background
Deep Learning (Deep Learning) has wide application in many fields, such as image recognition, speech recognition, machine translation, automatic driving, and so on. Using the programming interface provided by the Deep Learning framework (Deep Learning frames), users can easily design, train, and test Deep Learning models. Nowadays, various deep learning frameworks are all flowers and all families strive for singing, and further promote the falling application of the deep learning technology in the industry.
In different deep learning frameworks, computations are typically in units of operators, with different operators defining different types of numerical computations. However, each deep learning framework may have errors in implementation of these operator calculations, and these calculation errors may cause the prediction result of the deep learning model to be inaccurate, and may even have serious consequences, for example: one error in the Uber deep learning framework has led to the death of an autonomous automobile crasher. Therefore, diagnostic work on the deep learning framework has received increasing attention over the years.
The amount of work associated with testing and error diagnosis of deep learning frameworks is still small. CRADLE is a system that automatically diagnoses and locates errors in a deep learning framework. Given a deep learning model, CRADLE uses distance metrics to compare the model's output at different depth learning frameworks as back-ends, thereby detecting whether there are inconsistent calculations, and by tracking anomalous data propagation, determining where in the model such inconsistencies arise.
However, the current deep learning framework diagnostic system has the following limitations: the deep learning frame diagnosis systems can only find operators with larger difference of calculation results among different depth learning frames in a comparison and speculation mode, and can not verify the accuracy of diagnosis results. In addition, the existing diagnosis system can only diagnose the calculation error of the deep learning model in the inference process, and cannot diagnose the calculation error of the deep learning model in the training process.
Disclosure of Invention
The technical problem that the deep learning framework diagnosis system in the prior art cannot meet the requirement on the accuracy of deep learning diagnosis is solved.
To achieve the above technical object, the present disclosure provides a deep learning framework diagnosis system based on meta-operators, including:
the user interface module is used for providing a programming interface for setting up a deep learning model and diagnosing an operator in the model for a user;
the model static graph module is used for constructing a structure of a model static calculation graph according to the code of the user;
the debugger comprises a state recorder and a data flow executor, wherein the state recorder is used for recording the calculation result executed by the calculation graph after each operator is replaced; the data flow executor is used for executing forward calculation and backward gradient calculation operations according to the structure of the static calculation graph;
the meta-operator module is used for providing a basic calculation unit for the system to execute various types of calculation;
the operator implementation module is used for numerical calculation and/or multidimensional array operation of different deep learning frames;
and the equipment management module is used for managing the CPU and the GPU which are realized by hardware and providing a uniform interface.
In order to achieve the above technical object, the present disclosure can also provide a deep learning framework diagnosis method based on a meta-operator, which is applied to the deep learning framework diagnosis system based on a meta-operator, and includes:
using the user interface module in the system to build a deep learning model, and appointing two different deep learning frames for comparing calculation results;
diagnosing each operator in the deep learning model one by one, and recording parameters of the model after each operator replacement;
comparing the calculation results of the models before and after the replacement operator bottom layer is realized with a preset threshold value to complete the positioning of the error operator;
operators with larger difference between results before and after realizing the replacement of the operators are realized on the bottom layers of the two deep learning frames, and whether the operators have errors in realization in the deep learning frames is searched.
Further, the step of diagnosing each operator in the deep learning model one by one specifically includes:
and diagnosing each operator in the deep learning model one by one in an operator fine-grained replacement mode.
Further, the parameters specifically include:
the inference of the model and/or the gradient of the model parameters.
Further, the comparison between the calculation results of the models before and after the replacement operator bottom layer is realized and the preset threshold value is specifically carried out by measuring the difference between the calculation results of the models each time through the average absolute deviation.
Further, the operators with larger difference between the previous result and the next result are realized on the bottom layers of the two deep learning frames by comparing the replacement operators, and whether the operators have errors in realization in the deep learning frames is searched.
Further, the method further comprises:
the verifying the error positioning result again specifically includes:
respectively realizing the model by using two deep learning frames, and then replacing the realization method with the error with the other deep learning frame;
and comparing and judging the calculation results of the models before and after replacement, and if the calculation results of the two times are similar, accurately positioning the operator calculation error.
To achieve the above technical object, the present disclosure can also provide a deep learning framework diagnosis apparatus based on a metaoperator, including:
the construction module is used for building a deep learning model and appointing two different deep learning frames for comparing calculation results;
the diagnosis module is used for diagnosing each operator in the deep learning model one by one and recording the parameters of the model after each operator replacement;
the positioning module is used for comparing the calculation results of the models before and after the replacement operator bottom layer is realized with a preset threshold value so as to complete the positioning of the error operator;
and the searching module is used for realizing the realization of the operators with larger difference between the previous result and the next result in the bottom layers of the two deep learning frames by comparing the replacement operators and searching whether the operators have errors in the realization in the deep learning frames.
The searching module searches through a manual comparison mode.
To achieve the above technical objects, the present disclosure can also provide a computer storage medium having a computer program stored thereon, the computer program being executed by a processor for implementing the steps of the deep learning framework diagnosis method based on meta-operators described above.
To achieve the above technical object, the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the deep learning framework diagnosis method based on meta-operators when executing the computer program.
The beneficial effect of this disclosure does:
1. the common calculation logic of operators in different depth learning frames is abstracted into 'meta-operators', and the concrete realization of binding the meta-operators on the premise of not changing model codes is supported, so that the fine-grained replacement of the operators is efficiently realized.
2. And designing and realizing a deep learning framework diagnosis system based on metaoperators. The diagnosis and the positioning of calculation errors and the inspection of error positioning in a deep learning frame are realized through fine-grained replacement of operators;
3. the system disclosed by the invention sacrifices partial performance for meeting the functional requirements, and can effectively detect and locate the difference of operator calculation in different depth learning frames.
Drawings
Fig. 1 shows a schematic structural diagram of embodiment 1 of the present disclosure;
FIG. 2 shows a class diagram of an addition meta-operator class diagram of embodiment 1 of the present disclosure;
fig. 3 shows a schematic flow diagram of embodiment 2 of the present disclosure;
fig. 4 shows a schematic structural diagram of embodiment 3 of the present disclosure;
fig. 5 shows a schematic structural diagram of embodiment 5 of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
Various structural schematics according to embodiments of the present disclosure are shown in the figures. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers, and relative sizes and positional relationships therebetween shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, as actually required.
In response to the above needs and challenges, the present disclosure designs a deep learning framework diagnostic system based on meta-operators. The system abstracts common calculation logics of forward calculation, gradient calculation and the like of operators in different depth learning frames into 'meta-operators', and the bottom implementation of the designated operators can be changed without modifying the bottom codes of the model.
The operation flow of the system disclosed by the invention is as follows: a user builds a deep learning model by using a meta-operator interface provided by the system, prepares input data of the model, and specifies two deep learning frameworks which need to be subjected to error diagnosis. The diagnosis system firstly realizes all operators in the model by one deep learning frame, and then replaces the bottom layer realization of the operators by another deep learning frame one by one according to the reverse order of the topological ordering of the model calculation graph and the bottom-up order. Such a replacement sequence can ensure that the input data of the operator is the same before and after the implementation of the bottom layer of each replacement operator. After each fine-grained replacement of the operator, the diagnostic system automatically runs forward calculation and backward gradient propagation on the model by taking user-specified data as input, and records the output result of the model and the change of the gradient of the model parameter after the operator replacement is made. And the diagnosis system orders the output results of the models and the change of the gradient of the model parameters from large to small before and after the change of the operator bottom layer is realized. If the output result of the model is changed more after the bottom-layer implementation of a certain operator is changed, the more probable the bottom-layer implementation of forward calculation of the operator in one deep learning framework has errors; if the trainable parameter gradient in the model changes more after changing the underlying implementation of an operator, the operator is more likely to have errors in the underlying implementation of gradient computation in one of the deep learning frameworks. Finally, the user checks whether the operators have implementation errors in the deep learning frames by comparing the bottom layer implementations of the operators with the top rank in the two deep learning frames, and verifies the diagnosis result by using the system disclosed by the invention.
The concept of "meta-operators" is proposed in the diagnostic system of the present disclosure. The meta-operators are units for calculation of the diagnostic system, each meta-operator extracts common features in operators with the same semantics in different depth learning frames, attributes and methods common to the operators in the different depth learning frames are realized, and forward inference calculation and gradient calculation of the operators in the depth learning frames can be completed. The calculation of each meta-operator is realized by adopting an operator calculation method of the existing deep learning framework, and a user can specify or modify the calculation of each meta-operator by which calculation method of the deep learning framework is realized.
The first embodiment is as follows:
as shown in fig. 1:
the present disclosure provides a deep learning framework diagnostic system based on meta-operators, comprising:
the user interface module is used for providing a programming interface for setting up a deep learning model and diagnosing an operator in the model for a user;
the model static graph module is used for constructing a structure of a model static calculation graph according to the code of the user;
the debugger comprises a state recorder and a data flow executor, wherein the state recorder is used for recording the calculation result executed by the calculation graph after each operator is replaced; the data flow executor is used for executing forward calculation and backward gradient calculation operations according to the structure of the static calculation graph;
the meta-operator module is used for providing a basic calculation unit for the system to execute various types of calculation;
the operator implementation module is used for numerical calculation and/or multidimensional array operation of different deep learning frames;
and the equipment management module is used for managing the CPU and the GPU which are realized by hardware and providing a uniform interface.
The deep learning framework takes an operator as a unit for calculation. The operator completes different operations such as numerical calculation or multidimensional array operation by running an operator bottom layer realization code in a deep learning frame. Training and using deep learning models requires each operator in the model to perform forward calculations to get inferences, or perform gradient calculations to update model weights.
In most of the operators implemented by any deep learning framework, there are two types of computation logics, namely, forward computation and gradient computation, in addition to the type of operator such as logic judgment. Given these two types of commonality computation logic, semantically identical operators in different depth learning frameworks are abstracted herein as the same meta-operator. The meta-operator realizes the common attribute and method of operators in different depth learning frames, and also provides an interface for calling the calculation realization of the operators in the different depth learning frames. The user only specifies the bottom layer implementation system of the operator through the parameters, and the meta-operator subclass is bound with the bottom layer system adopted during the calculation of the operator through the interface, so that fine-grained replacement of the bottom layer implementation of the operator is realized. The diagnostic system herein takes the meta-operator as a unit of computation.
The class diagram of the additive primitive subclass, taking additive calculations as an example, is shown in FIG. 2:
the meta-operators, including the addition, all inherit from the meta-operator base class. In the meta-operator base, the property and method owned by each meta-operator are defined. The framework attribute represents a deep learning framework that implements operator underlying computations. The name attribute is the name of the operator to distinguish different operators in the model. input _ nodes is a node list, and the nodes in the list are input nodes of the node where the operator is located in the static graph of the deep learning model. Similarly, output _ nodes is also a node list, and all nodes in the list take the node where the operator is as an input node. output _ values are the forward calculation results of the node, and are assigned after the operator performs the first forward calculation. The computer _ output () method performs forward calculation of an operator, the method is not implemented in the meta-operator base class, and the method is implemented only if the specific meta-operator class of the meta-operator base class is inherited. Similarly, the method performs inverse gradient calculation, where the grad parameter is the current gradient, and the method also needs to be implemented in a specific meta-operator class that inherits the meta-operator base class.
Different deep learning frameworks have different implementations for automatic gradient computation, such as: when a matrix is calculated to add a scalar value, the gradient that PyTorch finds for the scalar value is also a scalar, and the gradient given by TensorFlow results in a matrix. In order to enable the diagnostic system to better compare gradient calculation results of different depth learning frames, the shape of the gradient is further processed by the meta-operator when the bottom layer implementation method such as TensorFlow is adopted, so that the shape of the gradient can be kept consistent when the different bottom layer implementation methods are used by the meta-operator.
Add meta-operators inherit the meta-operator base class, which is the addition computed meta-operator in the diagnostic system. The meta-operator has multiple forward calculation and gradient calculation methods, and each forward calculation method and gradient calculation method realizes the calling of the operator calculation method in the deep science framework through an interface so as to complete the specific calculation task
Data flow executor
In a manner that mimics the way computational graphs are executed in TensorFlow, the system herein also designs session control classes to initiate execution of data flows in the computational graphs. A session first needs to be created before execution of the data stream is initiated.
When the conversation class executes the computational graph, a user needs to formulate a 'target node', and the conversation takes the node as a starting point to find all nodes depended on by the node in the computational graph according to a breadth-first search mode. The calculation sequence of the nodes in the calculation graph is the reverse order of breadth-first search, and the sequence can ensure that the dependent nodes of each node have already completed calculation during calculation. The close () method of the conversation class closes the conversation by clearing all the computation results in the computation graph of the conversation.
State recorder
When a deep learning framework is diagnosed in the diagnosis process, after the bottom layer implementation mode of one meta operator is replaced each time, the data flow executor executes the calculation graph to obtain the calculation result of the model. And the state recorder respectively records the forward inference result and the gradient calculation result obtained after executing the calculation graph each time, compares the results before and after replacing the operator each time, and finally sorts the meta-operators according to the difference of the results brought by the replacing operator to give a final diagnosis positioning result.
Model static map
Generally, after the model definition and the parameter solution of the deep learning task are abstracted, a unique calculation logic can be determined, and the logic is represented by a directed acyclic graph, which is called a calculation graph. The calculation graph defines the data circulation mode, the data calculation mode, the mutual dependency relationship among various calculations and the like.
The system herein represents a deep learning model in a form similar to a static graph in TensorFlow. The nodes in the computational graph are mainly classified into four categories, namely, compute nodes (operators), storage nodes (variables), constant nodes (Constants) and data nodes (providers) according to functions. Each computing node corresponds to an element operator and is mainly responsible for algorithm logic expression or flow control. Storage nodes are typically used to store model weights. Constant nodes define constant, unchangeable weights in the computational graph that cannot be trained. The data nodes are used for defining attributes such as types and shapes of input data, and are uniform abstractions of the data.
Edges in computational graphs are directed edges, the direction of the edge is usually the direction of forward evaluation, defining the relationship between different nodes. Edges can be divided into two categories: one type is used to transmit data, called data edges; another class is used to define dependencies, called control edges. All nodes are connected through data edges or control edges, wherein the nodes with the degree of entry of 0 have no pre-dependency and can be immediately executed; and the nodes with the in degree larger than 0 can execute after waiting for the execution of all the nodes depended by the nodes to finish.
Various types of nodes in the computational graph are stored in the structure of the static computational graph by using an unordered list, and the relationship between the nodes is defined by the attribute in each node. The enter () method of the computation graph will reset the current computation graph and the exit () method will restore the computation graph to the computation graph before it.
User interface module
The diagnostic system provides operator functional interfaces similar to those provided by each deep learning framework, and the deep learning model is built in a similar function call mode. For example, for one loss function, a univariate linear regression model of mean square error (mean square error).
In the diagnosis stage of the deep learning frame, after a user specifies the input data of the model and the two deep learning frames to be detected, the diagnostic system can automatically utilize the deep learning model set up by the user to compare the calculation results of the two deep learning frames specified by the user by only starting a diagnostic session.
Example two:
as shown in figure 3 of the drawings,
the present disclosure can also provide a deep learning framework diagnosis method based on a meta-operator, which is applied to the deep learning framework diagnosis system based on a meta-operator in the first embodiment, and the method includes:
s201: using the user interface module in the system to build a deep learning model, and appointing two different deep learning frames for comparing calculation results;
s202: diagnosing each operator in the deep learning model one by one, and recording parameters of the model after each operator replacement;
s203: comparing the calculation results of the models before and after the replacement operator bottom layer is realized with a preset threshold value to complete the positioning of the error operator;
s204: operators with larger difference between results before and after realizing the replacement of the operators are realized on the bottom layers of the two deep learning frames, and whether the operators have errors in realization in the deep learning frames is searched.
Further, the step of diagnosing each operator in the deep learning model one by one specifically includes:
and diagnosing each operator in the deep learning model one by one in an operator fine-grained replacement mode.
Further, the parameters specifically include:
the inference of the model and/or the gradient of the model parameters.
Further, the comparison between the calculation results of the models before and after the replacement operator bottom layer is realized and the preset threshold value is specifically carried out by measuring the difference between the calculation results of the models each time through the average absolute deviation.
Further, the operators with larger difference between the previous result and the next result are realized on the bottom layers of the two deep learning frames by comparing the replacement operators, and whether the operators have errors in realization in the deep learning frames is searched.
Further, the method further comprises:
the verifying the error positioning result again specifically includes:
respectively realizing the model by using two deep learning frames, and then replacing the realization method with the error with the other deep learning frame;
and comparing and judging the calculation results of the models before and after replacement, and if the calculation results of the two times are similar, accurately positioning the operator calculation error.
After a deep learning model is constructed by using meta-operators of the diagnostic system and two deep learning frames are specified, a group of inference result sets and gradient calculation result sets can be obtained by an algorithm of the first algorithm. And after the bottom layer implementation of a certain meta-operator is replaced by the adjacent two calculation results in the set, the two calculation results are obtained by taking the same input data as input. Therefore, the calculation results adjacent to each other in the two sets are respectively compared to deduce whether the realization of a certain operator in different depth learning frames has obvious difference.
The system herein measures the difference between the results of each calculation of the model using Mean Absolute dispersion (Mean Absolute Distance). For the forward inference process, if the forward inference result of the model is Y before the O is implemented at the bottom of the replacement meta-operatorOThe positive inference result after replacing the bottom-level implementation of the meta-operator O with O' is YO’If both vectors of forward inference results contain n elements, then the difference between the two results can be calculated as:
Figure BDA0002914749480000131
if the bottom layer realization O of a certain meta-operator is replaced by O', the difference delta of the results of model calculationC,O,O′Greater than a user-specified threshold TCThen, a calculation error may exist in one of the implementation manners O or O' in the two deep learning frameworks, so that the positioning of the operator with the calculation error in the model is realized.
For a set of gradient calculation results, the difference between the gradients of each parameter in the set in two adjacent calculations can also be measured as the mean absolute deviation. After the bottom layer of the meta-operator is replaced, the diagnostic system calculates the difference between the gradients of each gradient before and after the replacement, and then calculates the average value of the differences of all parameter gradients in the deep learning model, so as to measure the overall influence of the replacement on the parameter gradients. If m gradient parameters can be obtained in the deep learning model, the difference of the overall gradients before and after the replacement of O 'to O' is realized at the bottom layer of a certain meta-operator is as follows:
Figure BDA0002914749480000132
similarly, if the bottom-layer realization O of a certain meta-operator is replaced by O', the difference delta of the overall gradient of the modelG,O,O′Greater than a user-specified threshold TGThere may be errors in the gradient computation of one of the implementations O or O' in both deep learning frameworks, thus enabling the localization of the gradient computation wrong operator in the model.
Finally, the diagnostic system orders the difference between the forward inference results and the overall gradient difference before and after the replacement of a certain meta-operator, and the operator involved in the replacement of the top rank is more likely to have errors. Then, for the operators with higher ranking or operators with calculation results before and after replacing the bottom layer implementation exceeding the threshold, the user can manually compare the bottom layer implementation codes in the two deep learning frames in the two orderings of the operators for further confirmation.
Example three:
the present disclosure can also provide a deep learning framework diagnosis apparatus based on a metaoperator, including:
the building module 301 is used for building a deep learning model and appointing two different deep learning frames for comparing calculation results;
a diagnosis module 302, configured to diagnose each operator in the deep learning model one by one, and record parameters of the model after each operator replacement;
the positioning module 303 is configured to compare the calculation results of the models before and after implementation of the replacement operator bottom layer with a preset threshold value to complete positioning of the error operator;
and the searching module 304 is used for searching whether the operators with larger difference between the previous result and the next result are realized in the bottom layers of the two deep learning frames by comparing the replacement operators, and whether the operators have errors in realization in the deep learning frames.
The searching module searches through a manual comparison mode.
The construction module 301 of the present disclosure is connected to the diagnosis module 302, the positioning module 303, and the search module 304 in sequence.
Example four:
the present disclosure can also provide a computer storage medium having stored thereon a computer program for implementing the steps of the above-described meta-operator based deep learning framework diagnostic method when executed by a processor.
The computer storage medium of the present disclosure may be implemented with a semiconductor memory, a magnetic core memory, a magnetic drum memory, or a magnetic disk memory.
Semiconductor memories are mainly used as semiconductor memory elements of computers, and there are two types, Mos and bipolar memory elements. Mos devices have high integration, simple process, but slow speed. The bipolar element has the advantages of complex process, high power consumption, low integration level and high speed. NMos and CMos were introduced to make Mos memory dominate in semiconductor memory. NMos is fast, e.g. 45ns for 1K bit sram from intel. The CMos power consumption is low, and the access time of the 4K-bit CMos static memory is 300 ns. The semiconductor memories described above are all Random Access Memories (RAMs), i.e. read and write new contents randomly during operation. And a semiconductor Read Only Memory (ROM), which can be read out randomly but cannot be written in during operation, is used to store solidified programs and data. The ROM is classified into a non-rewritable fuse type ROM, PROM, and a rewritable EPROM.
The magnetic core memory has the characteristics of low cost and high reliability, and has more than 20 years of practical use experience. Magnetic core memories were widely used as main memories before the mid 70's. The storage capacity can reach more than 10 bits, and the access time is 300ns at the fastest speed. The typical international magnetic core memory has a capacity of 4 MS-8 MB and an access cycle of 1.0-1.5 mus. After semiconductor memory is rapidly developed to replace magnetic core memory as a main memory location, magnetic core memory can still be applied as a large-capacity expansion memory.
Drum memory, an external memory for magnetic recording. Because of its fast information access speed and stable and reliable operation, it is being replaced by disk memory, but it is still used as external memory for real-time process control computers and medium and large computers. In order to meet the needs of small and micro computers, subminiature magnetic drums have emerged, which are small, lightweight, highly reliable, and convenient to use.
Magnetic disk memory, an external memory for magnetic recording. It combines the advantages of drum and tape storage, i.e. its storage capacity is larger than that of drum, its access speed is faster than that of tape storage, and it can be stored off-line, so that the magnetic disk is widely used as large-capacity external storage in various computer systems. Magnetic disks are generally classified into two main categories, hard disks and floppy disk memories.
Hard disk memories are of a wide variety. The structure is divided into a replaceable type and a fixed type. The replaceable disk is replaceable and the fixed disk is fixed. The replaceable and fixed magnetic disks have both multi-disk combinations and single-chip structures, and are divided into fixed head types and movable head types. The fixed head type magnetic disk has a small capacity, a low recording density, a high access speed, and a high cost. The movable head type magnetic disk has a high recording density (up to 1000 to 6250 bits/inch) and thus a large capacity, but has a low access speed compared with a fixed head magnetic disk. The storage capacity of a magnetic disk product can reach several hundred megabytes with a bit density of 6250 bits per inch and a track density of 475 tracks per inch. The disk set of the multiple replaceable disk memory can be replaced, so that the disk set has large off-body capacity, large capacity and high speed, can store large-capacity information data, and is widely applied to an online information retrieval system and a database management system.
Example five:
the present disclosure also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the deep learning framework diagnosis method based on meta-operators described above are implemented.
Fig. 5 is a schematic diagram of an internal structure of the electronic device in one embodiment. As shown in fig. 5, the electronic device includes a processor, a storage medium, a memory, and a network interface connected through a system bus. The storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable a processor to realize a deep learning framework diagnosis method based on meta-operators when being executed by the processor. The processor of the electrical device is used to provide computing and control capabilities to support the operation of the entire computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform a method for deep learning framework diagnostics based on meta-operators. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The electronic device includes, but is not limited to, a smart phone, a computer, a tablet, a wearable smart device, an artificial smart device, a mobile power source, and the like.
The processor may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor is a Control Unit of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (for example, executing remote data reading and writing programs, etc.) stored in the memory and calling data stored in the memory.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connected communication between the memory and at least one processor or the like.
Fig. 5 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 5 does not constitute a limitation of the electronic device, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor through a power management device, so that functions such as charge management, discharge management, and power consumption management are implemented through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used to establish a communication connection between the electronic device and other electronic devices.
Optionally, the electronic device may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (10)

1. A deep learning framework diagnostic system based on meta-operators, comprising:
the user interface module is used for providing a programming interface for setting up a deep learning model and diagnosing an operator in the model for a user;
the model static graph module is used for constructing a structure of a model static calculation graph according to the code of the user;
the debugger comprises a state recorder and a data flow executor, wherein the state recorder is used for recording the calculation result executed by the calculation graph after each operator is replaced; the data flow executor is used for executing forward calculation and backward gradient calculation operations according to the structure of the static calculation graph;
the meta-operator module is used for providing a basic calculation unit for the system to execute various types of calculation;
the operator implementation module is used for numerical calculation and/or multidimensional array operation of different deep learning frames;
and the equipment management module is used for managing the CPU and the GPU which are realized by hardware and providing a uniform interface.
2. A deep learning framework diagnosis method based on meta-operators, applied to the deep learning framework diagnosis system based on meta-operators as claimed in claim 1, characterized by comprising:
using the user interface module in the system to build a deep learning model, and appointing two different deep learning frames for comparing calculation results;
diagnosing each operator in the deep learning model one by one, and recording parameters of the model after each operator replacement;
comparing the calculation results of the models before and after the replacement operator bottom layer is realized with a preset threshold value to complete the positioning of the error operator;
operators with larger difference between results before and after realizing the replacement of the operators are realized on the bottom layers of the two deep learning frames, and whether the operators have errors in realization in the deep learning frames is searched.
3. The method according to claim 2, wherein the diagnosing each operator in the deep learning model one by one specifically comprises:
and diagnosing each operator in the deep learning model one by one in an operator fine-grained replacement mode.
4. The method according to claim 2, wherein the parameters specifically include:
the inference of the model and/or the gradient of the model parameters.
5. The method according to claim 2, wherein the comparison between the calculation results of the model before and after the implementation of the replacement operator bottom layer and the preset threshold is performed by measuring the difference between the calculation results of the model each time by mean absolute deviation.
6. The method according to claim 2, wherein the operators with larger difference in results before and after the implementation of the replacement operator are implemented at the bottom layers of the two deep learning frames, and whether the operators have implementation errors in the deep learning frames is searched.
7. The method of any one of claims 2 to 6, further comprising:
the verifying the error positioning result again specifically includes:
respectively realizing the model by using two deep learning frames, and then replacing the realization method with the error with the other deep learning frame;
and comparing and judging the calculation results of the models before and after replacement, and if the calculation results of the two times are similar, accurately positioning the operator calculation error.
8. A deep learning framework diagnosis device based on meta-operators is characterized by comprising:
the construction module is used for building a deep learning model and appointing two different deep learning frames for comparing calculation results;
the diagnosis module is used for diagnosing each operator in the deep learning model one by one and recording the parameters of the model after each operator replacement;
the positioning module is used for comparing the calculation results of the models before and after the replacement operator bottom layer is realized with a preset threshold value so as to complete the positioning of the error operator;
and the searching module is used for realizing the realization of the operators with larger difference between the previous result and the next result in the bottom layers of the two deep learning frames by comparing the replacement operators and searching whether the operators have errors in the realization in the deep learning frames.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps corresponding to the deep learning framework diagnosis method based on meta-operators as claimed in any one of claims 2 to 7 when executing the computer program.
10. A computer storage medium having stored thereon computer program instructions, wherein said program instructions, when executed by a processor, are adapted to carry out the steps corresponding to the meta-operator based deep learning framework diagnostic method as claimed in any one of claims 2 to 7.
CN202110098292.7A 2021-01-25 2021-01-25 Deep learning framework diagnosis system, method, device, equipment and medium based on meta-operator Active CN112766512B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110098292.7A CN112766512B (en) 2021-01-25 2021-01-25 Deep learning framework diagnosis system, method, device, equipment and medium based on meta-operator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110098292.7A CN112766512B (en) 2021-01-25 2021-01-25 Deep learning framework diagnosis system, method, device, equipment and medium based on meta-operator

Publications (2)

Publication Number Publication Date
CN112766512A true CN112766512A (en) 2021-05-07
CN112766512B CN112766512B (en) 2022-10-28

Family

ID=75707177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110098292.7A Active CN112766512B (en) 2021-01-25 2021-01-25 Deep learning framework diagnosis system, method, device, equipment and medium based on meta-operator

Country Status (1)

Country Link
CN (1) CN112766512B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592701A (en) * 2021-08-05 2021-11-02 中国科学技术大学 Method and system for developing and registering gradient compression algorithm into deep learning framework
CN114218929A (en) * 2022-02-22 2022-03-22 之江实验室 Multi-platform operator intelligent development system and method based on meta-operator fusion
CN115113528A (en) * 2022-07-06 2022-09-27 昆仑芯(北京)科技有限公司 Operation control method, device, equipment and medium of neural network model
CN117852573A (en) * 2024-03-07 2024-04-09 山东云海国创云计算装备产业创新中心有限公司 Computing force execution system, operator computing flow management method, device, equipment and medium
CN117852573B (en) * 2024-03-07 2024-06-07 山东云海国创云计算装备产业创新中心有限公司 Computing force execution system, operator computing flow management method, device, equipment and medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428880A (en) * 2020-03-20 2020-07-17 矩阵元技术(深圳)有限公司 Privacy machine learning implementation method, device, equipment and storage medium
CN112149828A (en) * 2020-09-29 2020-12-29 北京百度网讯科技有限公司 Operator precision detection method and device based on deep learning framework

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111428880A (en) * 2020-03-20 2020-07-17 矩阵元技术(深圳)有限公司 Privacy machine learning implementation method, device, equipment and storage medium
CN112149828A (en) * 2020-09-29 2020-12-29 北京百度网讯科技有限公司 Operator precision detection method and device based on deep learning framework

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHRISTINA BRESTER ETAL: "《GRNERIC SCHEME OF A RESTART META-HEURISTIC OPERATOR FOR MULTI-OBJECTIVE GENETIC ALGORITHMS》", 《INTERNATIONAL JOURNAL ON INFORMATION TECHNOLOGIES&SECURITY》 *
LAZHAR KHELIFI ETAL: "《Deep Learning for change Detection in Remote Sensing Images:Comprehensive Review and Meta-Analtsis》", 《IEEE ACCESS》 *
SEGMENTFAULT思否: "《深度学习框架jittor开源,创新元算子,推理速度可提升10%-50%》", 《百度:HTTPS://BAIJIAHAO.BAIDU.COM/S?ID=1662108464126976732&WFR=SPIDER&FOR=PC》 *
SHI-MIN HU ETAL: "《Jittor:a novel deep learning framework with meta-operator and unified graph execution》", 《SCIENCE CHINA》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113592701A (en) * 2021-08-05 2021-11-02 中国科学技术大学 Method and system for developing and registering gradient compression algorithm into deep learning framework
CN113592701B (en) * 2021-08-05 2024-03-29 中国科学技术大学 Method and system for registering gradient compression algorithm development into deep learning framework
CN114218929A (en) * 2022-02-22 2022-03-22 之江实验室 Multi-platform operator intelligent development system and method based on meta-operator fusion
CN115113528A (en) * 2022-07-06 2022-09-27 昆仑芯(北京)科技有限公司 Operation control method, device, equipment and medium of neural network model
CN117852573A (en) * 2024-03-07 2024-04-09 山东云海国创云计算装备产业创新中心有限公司 Computing force execution system, operator computing flow management method, device, equipment and medium
CN117852573B (en) * 2024-03-07 2024-06-07 山东云海国创云计算装备产业创新中心有限公司 Computing force execution system, operator computing flow management method, device, equipment and medium

Also Published As

Publication number Publication date
CN112766512B (en) 2022-10-28

Similar Documents

Publication Publication Date Title
CN112766512B (en) Deep learning framework diagnosis system, method, device, equipment and medium based on meta-operator
He et al. Fidelity: Efficient resilience analysis framework for deep learning accelerators
CN110442936A (en) Equipment fault diagnosis method, apparatus and system based on the twin model of number
CN107967485A (en) Electro-metering equipment fault analysis method and device
CN110008080A (en) Operational indicator method for detecting abnormality, device and electronic equipment based on time series
CN113219341A (en) Model generation and battery degradation estimation device, method, medium, and apparatus
CN114528688A (en) Method and device for constructing reliability digital twin model and computer equipment
CN111339072A (en) User behavior based change value analysis method and device, electronic device and medium
Zhang et al. An empirical study on clone consistency prediction based on machine learning
CN113255682B (en) Target detection system, method, device, equipment and medium
CN111949646B (en) Equipment running condition analysis method, device, equipment and medium based on big data
CN113742069A (en) Capacity prediction method and device based on artificial intelligence and storage medium
WO2023027048A1 (en) Battery parameter estimation method, apparatus, device, and medium
Azatchi et al. Advanced analysis techniques for cross-product coverage
WO2022140650A2 (en) Systems and methods for building and deploying machine learning applications
WO2018125419A1 (en) Automatic prediction of patient length of stay and detection of medical center readmission diagnoses
CN112698841A (en) Android-oriented deep learning model unified deployment system, method, equipment and medium
JP7239828B2 (en) System management method, system management program, and system management device
CN113806539A (en) Text data enhancement system, method, device and medium
CN115705383A (en) Sequence recommendation algorithm, system, terminal and medium based on graph neural network time sequence feature extraction
Graf et al. Frost: a platform for benchmarking and exploring data matching results
CN112035513A (en) SQL statement performance optimization method, device, terminal and storage medium
CN112015912A (en) Intelligent index visualization method and device based on knowledge graph
WO2023286650A1 (en) Battery degradation estimating apparatus verification method, apparatus, device, and medium, and battery degradation estimation computational model
CN112232115B (en) Method, medium and equipment for implanting calculation factors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant