CN112912837A - Neural network compiling method, device, equipment, storage medium and program product - Google Patents

Neural network compiling method, device, equipment, storage medium and program product Download PDF

Info

Publication number
CN112912837A
CN112912837A CN201880098337.7A CN201880098337A CN112912837A CN 112912837 A CN112912837 A CN 112912837A CN 201880098337 A CN201880098337 A CN 201880098337A CN 112912837 A CN112912837 A CN 112912837A
Authority
CN
China
Prior art keywords
information
compiling
grouping
neural network
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201880098337.7A
Other languages
Chinese (zh)
Other versions
CN112912837B (en
Inventor
蒋国跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bitmain Technologies Inc
Original Assignee
Bitmain Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bitmain Technologies Inc filed Critical Bitmain Technologies Inc
Publication of CN112912837A publication Critical patent/CN112912837A/en
Application granted granted Critical
Publication of CN112912837B publication Critical patent/CN112912837B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • G06F7/487Multiplying; Dividing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Nonlinear Science (AREA)
  • Stored Programmes (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A neural network compiling method, apparatus, device, storage medium and program product, the method comprising: acquiring intermediate representation information corresponding to the neural network, and determining grouping information according to the intermediate representation information (101); the packets of the neural network are compiled online (102) according to the packet information. The neural network compiling method, the neural network compiling device, the neural network compiling apparatus, the storage medium and the program product can determine the intermediate representation information in advance according to the layer grouping result of the neural network, so that the intermediate representation information comprises the layer grouping information of the neural network, determine the grouping information of the neural network according to the acquired intermediate representation information, and compile the neural network based on the grouping information, and can compile the neural network according to the grouping information even if tensor dimensions between layers are different.

Description

Neural network compiling method, device, equipment, storage medium and program product Technical Field
The present application relates to the field of neural networks, and for example, to a neural network compiling method, apparatus, device, storage medium, and program product.
Background
In recent years, the achievement of deep learning in image recognition, voice recognition, and the like has made artificial intelligence the hottest field, and the main core in deep learning is the neural network, and in order to achieve high image recognition and voice recognition accuracy, the number of layers (layers) of the neural network has become deeper, and this also has made a greater demand for computational power.
To accommodate the high computational demands of neural networks, various neural network processors (alternatively referred to as AI chips) have been proposed. When the neural network is operated, the neural network needs to be mapped to a special processor for execution, and in the process, the compiling of the neural network is crucial.
In the prior art, the instructions of the processor can be generated according to a neural network, but the method can only support one tensor dimension in the neural network. In order to support multiple tensor dimensions, the tensor dimensions of transmission data in the neural network need to be determined in advance, and instructions are compiled for each tensor dimension. Therefore, the compiling method in the prior art has certain limitations and is inconvenient in the application process.
The above background is only for the purpose of aiding understanding of the present application and does not constitute an admission or admission that any of the matter referred to is part of the common general knowledge relative to the present application.
Disclosure of Invention
The embodiment of the disclosure provides an online compiling method of a neural network, which comprises the following steps:
acquiring intermediate representation information corresponding to a neural network, wherein the intermediate representation information is determined by performing layer grouping on the neural network in advance and according to a grouping result;
determining grouping information of the neural network according to the intermediate representation information;
and compiling the grouping of the neural network on line according to the grouping information.
The embodiment of the present disclosure further provides an online compiling apparatus for a neural network, including:
the acquisition module is used for acquiring intermediate representation information corresponding to the neural network, wherein the intermediate representation information is determined by performing layer grouping on the neural network in advance and according to a grouping result;
a first determining module, configured to determine grouping information of the neural network according to the intermediate representation information;
and the compiling module is used for compiling the grouping of the neural network on line according to the grouping information.
The embodiment of the disclosure also provides a computer, which comprises the neural network online compiling device.
The embodiment of the present disclosure further provides a computer-readable storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are configured to execute the neural network online compiling method.
The embodiments of the present disclosure also provide a computer program product, which includes a computer program stored on a computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer executes the neural network online compiling method described above.
An embodiment of the present disclosure further provides an electronic device, including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor, which when executed by the at least one processor, cause the at least one processor to perform the neural network online compilation method described above.
The neural network compiling method, the device, the equipment, the storage medium and the program product provided by the disclosure comprise the steps of obtaining intermediate representation information corresponding to the neural network, wherein the intermediate representation information is determined according to grouping results by grouping the neural network in layers in advance; determining grouping information of the neural network according to the intermediate representation information; and compiling the grouping of the neural network on line according to the grouping information. The neural network compiling method, device, equipment, storage medium and program product provided by the disclosure can determine the intermediate representation information in advance according to the layer grouping result of the neural network, so that the intermediate representation information comprises the layer grouping information of the neural network, then determine the grouping information of the neural network according to the obtained intermediate representation information, and compile the neural network based on the grouping information, and even if the tensor dimensionality between layers is different, the neural network can be compiled according to the grouping information.
Drawings
One or more embodiments are illustrated in the accompanying drawings, which correspond to the accompanying drawings, and which do not constitute a limitation on the embodiments, in which elements having the same reference numeral designations represent like elements, and in which:
FIG. 1 is a flow chart illustrating a neural network online compilation method in accordance with an exemplary embodiment of the present invention;
FIG. 2 is a flowchart illustrating a neural network online compilation method according to another exemplary embodiment of the present invention;
FIG. 2A is a flow diagram illustrating an online compilation of packets based on segmented data in accordance with an exemplary embodiment of the present invention;
fig. 3 is a block diagram illustrating an online neural network compiling apparatus according to an exemplary embodiment of the present invention;
fig. 4 is a block diagram illustrating an online neural network compiling apparatus according to another exemplary embodiment of the present invention;
fig. 5 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present invention.
Detailed Description
So that the manner in which the features and elements of the disclosed embodiments can be understood in detail, a more particular description of the disclosed embodiments, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. In the following description of the technology, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may be practiced without these details. In other instances, well-known structures and devices may be shown in simplified form in order to simplify the drawing.
The method provided by the embodiment compiles the neural network based on the grouping result of each layer in the neural network in advance. The specific grouping method for the neural network may be existing in the prior art, and some layer grouping methods will be described in this embodiment, but the specific method is to group the layers of the neural network, and this embodiment does not limit this.
Some neural network processors employ local storage that can be managed by software, and the computing layer of the neural network is deployed by the software to compute in the local storage, so as to achieve high performance. The local storage is referred to as the internal storage of the processor, in order to place the layer of the neural network in the local storage for calculation as much as possible, avoid high-overhead global storage access, and in order to more efficiently utilize the local storage, the layer of the neural network can be subjected to grouping fusion. And each layer in the neural network is operated in units of groups. The method provided by this embodiment compiles the neural network based on the layer grouping result in the neural network to support data of different tensor dimensions.
Fig. 1 is a flowchart illustrating an online neural network compiling method according to an exemplary embodiment of the present invention.
As shown in fig. 1, the online neural network compiling method provided in this embodiment includes:
step 101, obtaining intermediate representation information corresponding to the neural network, and determining grouping information according to the intermediate representation information.
The intermediate representation information is determined by grouping the neural networks in advance and according to grouping results.
Specifically, in the method provided in this embodiment, layer grouping processing may be performed on the neural network in advance, and the intermediate representation information may be determined according to a grouping result. The method for layer grouping of the neural network may be executed by a grouping device, the method provided by this embodiment may be executed by a compiling device, and the grouping device and the compiling device may be the same device or different devices.
Further, a storage unit for storing the intermediate representation information may be provided, from which the information, in particular the intermediate representation information, may be read.
In practical application, the intermediate representation information may have an identifier corresponding to the neural network, so that the corresponding intermediate representation information can be obtained according to the neural network identifier that needs to be compiled.
The intermediate representation information may carry a plurality of layer packet information of the neural network, and the layer packet information may include: the number of layers of the neural network is grouped, for example, the layers in the neural network are divided into 50 groups; the method can also comprise an input data tensor dimension, an output data tensor dimension and identification of input and output data of the whole neural network, wherein tensor (tensor) is a multidimensional data storage form; the layer information included in each layer packet may also be included, and the input data tensor dimension and the output data tensor dimension corresponding to each layer, and the identification of the input data and the output data, etc. may also be included.
Specifically, if the amount of data processed in the neural network processing is large, when the neural network is subjected to layer grouping, the tensor dimension of the data transmitted between layers can be further cut, so that the data can be processed by grouping, at this time, the intermediate representation information may further include tensor dimension cutting information between layers, and may further include time step information, specifically which layers are calculated at each time step, which tensors need to be transported from the external storage to the local storage, and the like.
The format protocol of the intermediate representation information may also be agreed in advance, for example, the top layer of the information is the information of the neural network, the second layer is the information of each group in the neural network, and the third layer is the information of each layer in each group.
Information in the intermediate representation information can be extracted to determine grouping information of the neural network. For example, it may be determined which layers in the neural network are grouped together into several groups, which layers are in each group, which layers are running at each time step, which data tensors are being transmitted, etc. Specifically, information required for compilation may be extracted from the intermediate presentation information as grouping information. The packet information may include information corresponding to each packet.
And 102, compiling the grouping of the neural network on line according to the grouping information.
Further, the actions performed at each time step, for example, the layer calculated at each time step, may be determined according to the grouping information, and the corresponding layer calculation may be performed at the corresponding time step.
In practical application, the grouping information may further include an input data tensor and an output data tensor of each layer, so that the input tensors corresponding to the layers may be acquired based on the input and output information, and each group of the neural network may be calculated based on the tensors. Even if the tensor dimensions of the interlayer transmission are different, the method provided by this embodiment can process the interlayer tensor according to the intermediate representation information, and does not need to compile the neural network according to the interlayer tensor dimensions in advance.
In the compiling process, the compiling may be performed on a packet-by-packet basis, for example, a first group is compiled first, and then a second group is compiled. Each group may be compiled based on the grouping information.
However, the packet information may include data transfer information corresponding to a time step, and for example, when the output tensor M of the layer a is written in the external storage at the 10 th time step, the output tensor of the layer a may be written in the external storage based on the information.
Specifically, the control processor may determine a specific execution instruction according to the grouping information, and then send the execution instruction to the neural network processor, and the neural network processor executes a corresponding instruction, thereby implementing operation of the neural network.
Furthermore, the control processor and the neural network processor can be arranged in the same neural network processor chip and are responsible for different functions. The neural network processor chip is arranged in the compiling device.
In practical application, if the grouping information further includes interlayer tensor dimension cutting information, the control processor may further generate a corresponding data cutting instruction according to the information, so that the neural network processor cuts the interlayer tensor according to the instruction.
The neural network online compiling method provided by the embodiment comprises the steps of obtaining intermediate representation information corresponding to a neural network, wherein the intermediate representation information is determined according to grouping results by performing layer grouping on the neural network in advance; determining grouping information of the neural network according to the intermediate representation information; and compiling the grouping of the neural network on line according to the grouping information. In the method provided by this embodiment, the intermediate representation information may be determined in advance according to the layer grouping result of the neural network, so that the intermediate representation information includes the layer grouping information of the neural network, the grouping information of the neural network is determined according to the acquired intermediate representation information, and the neural network is compiled based on the grouping information, so that the neural network can be compiled according to the grouping information even if tensor dimensions between layers are different.
Fig. 2 is a flowchart illustrating an online neural network compiling method according to another exemplary embodiment of the present invention.
As shown in fig. 2, the method for online compiling a neural network provided in this embodiment includes:
step 201, obtaining intermediate representation information corresponding to the neural network, and determining grouping information of the neural network according to the intermediate representation information.
The intermediate representation information is determined by grouping the neural networks layer by layer in advance and according to the grouping result.
The specific principle and implementation of step 201 are similar to those of step 101, and are not described herein again.
Specifically, the grouping information includes input data tensor information, output data tensor information, and preset tensor information of the grouping processing data.
The input data tensor information may include input tensor information corresponding to each group, and may also include input tensor information of each layer, and the output tensor information may also include output tensor information corresponding to each group, and may also include output tensor information of each layer.
Step 202, determining grouped input data tensor information and preset tensor information according to the grouped information.
Specifically, in the compiling process, the respective packets may be compiled in order, and the packet being processed may be taken as the current packet. And acquiring data information corresponding to the current group from the group information, and determining the tensor information of the input data of the current group and the preset tensor information of the current group.
The input data of the current packet may be determined from the input data tensor information, for example, if the data a is input into the current packet, an instruction to acquire the data a from an external storage may be generated. And sends the instruction to the neural network processor, which performs the action.
The preset tensor information refers to the largest data dimension that can be processed by the current packet each time, and specifically may include two dimensions, namely, batch size (size) and height slice (height). The preset tensor information may be cut information of the input data tensor, for example, the preset tensor information may be cut several times in two dimensions, namely, the batch size (size) and the height slice (height).
And 203, segmenting the grouped input data according to preset tensor information and input data tensor information to obtain segmented data.
Further, the tensor information of the input data comprises an identifier of the input data; the preset tensor information includes a preset tensor dimension. The input data of the current group can be determined according to the input data identification, and the data dimension size of each processing of the group is determined according to the preset tensor dimension.
During practical application, the input data corresponding to the current grouping can be determined according to the data identification, the input data are segmented according to the preset tensor dimensionality to obtain segmented data, and the dimensionality of the segmented data is smaller than or equal to the preset tensor dimensionality. The input data can be cut in two dimensions, namely, the batch size and the height slice, so that the cutting size of the input data is consistent with the preset tensor dimension. If the preset tensor information can also be cutting information of the input data tensor, the tensor dimensionality of the input data is directly cut according to the cutting information.
Specifically, the preset tensor dimension of each group of input data is determined when the neural network is subjected to layer grouping, for one group, the tensor dimension of the input data is the same as the preset tensor dimension, the input data can support grouping for calculation, the storage unit in the processor can contain data generated by the grouping calculation, and if the input data dimension is larger than the preset tensor dimension, when the group is actually calculated, the generated data exceeds the storage space in the processor, so that the problem that normal calculation cannot be performed is caused.
And step 204, compiling the packet on line according to the segmentation data.
Further, an instruction to acquire the segmentation data may be generated and sent to the neural network processor, and an instruction to compute the packets one by one from the segmentation data may be generated and sent to the neural network processor. The segmented data is read by the neural network processor according to the instructions and the current packet is calculated according to the segmented data.
Fig. 2A is a flow chart illustrating an online compilation of packets based on segmented data according to an exemplary embodiment of the invention.
In practical application, the grouping information further includes: global tensor information, time step information.
At this time, as shown in fig. 2A, step 204 further includes:
step 2041, one of the segmented data is read according to a preset reading rule.
For example, the rule for reading the divided data may be preset, and if the input data is divided into 4 parts in the height dimension, the first part, the second part, and the fourth part of the divided data may be read first.
Specifically, when the input data is divided into a plurality of dimensions, the divided data can be read in accordance with the association relationship of the divided data in the plurality of dimensions. For example, if the input data is divided into 4 parts in total and divided into size and data height, respectively, four divided data (size, height) can be expressed as (1,1), (1,2), (2, 1), (2,2), and (1,1) can be read first, then (1,2) can be read, then (2, 1) can be read, and finally (2,2) can be read.
Step 2042, according to the grouped global tensor information, determining step length and processing intercept point when processing the segmentation data.
Further, the grouping information may further include global tensor information of each group, and the global tensor information of the current group may be acquired from the grouping information.
In practical applications, the neural network layers such as the convolutional layer and the pooling layer have the sizes of a convolution kernel, a data expansion size and a convolution kernel moving step length, when the segmented data is processed by grouping, the segmented data is actually processed by a plurality of layers included in the neural network layers, and the layers can not process the segmented data once, but scan the segmented data for a plurality of times, wherein the size of each scan is the same as the size of the convolution kernel. After one scan, the start position of the next scan is related to the convolution kernel shift step. And synthesizing the sizes of the convolution kernels, the data expansion sizes and the convolution kernel moving step length parameters of a plurality of layers included in the packet, and determining the sizes of the overall convolution kernels, the data expansion sizes and the convolution kernel moving step length parameters in the packet, namely the overall tensor information. I.e. the convolution kernel size, the data expansion size, the convolution kernel shift step size of this packet. The grouping information may include global variable information corresponding to each group.
When the processing of the divided data is started, the current time step can be regarded as t, and the step length and the processing intercept point when the divided data is processed for the first time can be determined according to the global tensor information. For example, when the acquired divided data is processed for the first time, the data size of the processing, and the intercept point of the processing. And determining the data and the intercept point which should be processed when the segmented data is processed next time according to the data processing intercept point and the global tensor information.
And 2043, determining the partition subdata according to the step length and the processing intercept point, and generating an instruction according to the time step information and the partition subdata in the grouping information.
The partition data may be determined according to the step size and the processing intercept point. When the partition sub data is first determined, data between the start position of the partition data and the position of the intercept point may be regarded as the partition sub data. When the partition sub-data is determined again, the data in the step size range after the processing intercept point can be used as the partition sub-data.
In the method provided in this embodiment, the grouping information may further include time step information corresponding to each group, and the time step information includes specific time steps that need to be executed, for example, data is transported or layer calculation is performed.
T +1 time step information of the current packet can be acquired, and an action instruction corresponding to the time step information can be generated. For example, if the data is transport data, a command for transporting the data is generated. If the layer calculation information is included, an instruction for performing layer calculation based on the partition sub data is generated. The generated instructions may also be sent to a neural network processor to cause the neural network processor to execute the instructions.
Specifically, in step 2043, 1 may be added to the current time step information to generate an action command corresponding to the time step information. Specifically, the command may be a command for transferring data or a command for layer calculation. At some time step, a layer in the packet may process the partition sub-data and output other data, while other layers in the packet may further process the output data as input data. Therefore, all time steps corresponding to the packet can be traversed until all time step information of the current packet is traversed, so that the packet completes the processing process of the partition sub-data.
Step 2044, determine whether the segmented data is processed according to the processing intercept point.
After the sub-data generation instruction is generated according to the sub-data, whether a processing intercept point corresponding to the sub-data is the tail of the segmented data or not can be judged, if yes, the currently acquired segmented data is considered to be processed, and if not, the currently acquired segmented data is determined to be processed.
If not, continue to step 2042 until the currently acquired segmentation data is processed.
If the processing of the currently acquired segmentation data is completed, step 2045 is executed.
Step 2045, determine if all the segmented data has been processed. If so, the current packet can be considered to be compiled, and the next packet can be compiled; otherwise, step 2041 is executed until all the segmented data is processed.
Each piece of the divided data may have a corresponding identifier, and the processed divided data may be marked, so that whether or not there is any unprocessed divided data is determined based on the marked divided data and all the divided data.
Based on step 2041-2045, an online compiling process for a group according to the input data tensor information, the preset tensor information, the global tensor information, and the time step information can be realized. Step 202 and 204 may be performed for each packet in the neural network to implement the compilation process for each packet.
Optionally, in the method provided by this embodiment, the intermediate representation information may further include a packet number N of the neural network.
At this time, the method provided in this embodiment may further include, after step 201:
step 205, determining the compiling quantity of the compiling completion, and determining whether all the groups are compiled according to the compiling quantity and the grouping quantity.
If not, the step 202 is continuously executed, namely the step of compiling the grouping of the neural network on line according to the grouping information is continuously executed. In particular to compile the next packet.
If yes, it indicates that all the packets have been compiled, and the compiling process may be ended.
In actual application, a compiling identifier may be set to determine whether all the groups are compiled.
Before step 202, a compilation flag of 0 may be initialized.
The initial compiling identifier is used for marking the number of compiled groups, and the number of the initially compiled groups is 0.
After step 204, specifically, after it is determined in step 2044 that all the partition data have been processed, 1 is superimposed on the basis of the compilation identifier, and then the compilation identifier is compared with the value N, if the compilation identifier is consistent with the value N, the compilation process is ended, if the compilation identifier is inconsistent with the value N, step 202 is continuously executed, after the execution of step 204 is ended, 1 is superimposed on the basis of the compilation identifier again, and then the compilation identifier is compared with N, and the process is repeated until the compilation identifier is consistent with N. Based on the method provided by the embodiment, all the groups in the neural network can be compiled, and omission does not occur.
Fig. 3 is a block diagram illustrating an online neural network compiling apparatus according to an exemplary embodiment of the present invention.
As shown in fig. 3, the neural network online compiling device provided in this embodiment includes:
an obtaining module 31, configured to obtain intermediate representation information corresponding to a neural network, where the intermediate representation information is determined according to a grouping result by performing layer grouping on the neural network in advance;
a first determining module 32, configured to determine grouping information of the neural network according to the intermediate representation information;
and the compiling module 33 is configured to compile the grouping of the neural network on line according to the grouping information.
The online neural network compiling device provided by the embodiment comprises an acquisition module, a layer grouping module and a layer grouping module, wherein the acquisition module is used for acquiring intermediate representation information corresponding to a neural network, and the intermediate representation information is determined according to grouping results by performing layer grouping on the neural network in advance; the first determining module is used for determining grouping information of the neural network according to the intermediate representation information; and the compiling module is used for compiling the grouping of the neural network on line according to the grouping information. In the apparatus provided in this embodiment, the intermediate representation information may be determined in advance according to the layer grouping result of the neural network, so that the intermediate representation information includes the layer grouping information of the neural network, the grouping information of the neural network may be determined according to the acquired intermediate representation information, and the neural network may be compiled based on the grouping information, so that the neural network may be compiled according to the grouping information even if tensor dimensions between layers are different.
The specific principle and implementation of the neural network online compiling device provided by this embodiment are similar to those of the embodiment shown in fig. 1, and are not described herein again.
Fig. 4 is a block diagram illustrating an online neural network compiling apparatus according to another exemplary embodiment of the present invention.
As shown in fig. 4, on the basis of the above embodiment, the neural network online compiling apparatus provided in this embodiment includes input data tensor information, output data tensor information, and preset tensor information of the grouped processing data;
the compiling module 33 includes:
a first determining unit 331 configured to determine, according to the grouping information, tensor information of input data of the group and preset tensor information;
a dividing unit 332, configured to divide the grouped input data according to the preset tensor information and the input data tensor information to obtain divided data;
a compiling unit 333, configured to compile the packet online according to the segmentation data.
Optionally, the input data tensor information includes an input data identifier; the preset tensor information comprises preset tensor dimensionality;
the dividing unit 332 is specifically configured to:
determining input data of the group according to the input data identification;
and segmenting the input data according to the preset tensor dimension to obtain segmented data, wherein the dimension of the segmented data is smaller than or equal to the preset tensor dimension.
Optionally, the grouping information further includes: global tensor information, time step information;
the compiling unit 333 is specifically configured to:
reading one of the segmentation data according to a preset reading rule;
determining a step length and a processing intercept point when the segmentation data is processed according to the grouped global tensor information;
determining partition subdata according to the step length and the processing intercept point, and generating an instruction according to the time step information and the partition subdata in the grouping information;
and judging whether the processing of the segmentation data is finished according to the processing intercept, if not, continuing to execute the step of determining the step length and the processing intercept when the segmentation data is processed according to the grouped global tensor information.
Optionally, if the compiling unit 333 determines that the processing of the split data is completed, the compiling module 33 further includes a second determining unit 334 configured to determine whether all the split data are completed, and if not, the compiling unit 333 continues to perform the step of reading one of the split data according to a preset reading rule.
Optionally, the intermediate representation information includes a packet number of the neural network;
the device further comprises:
a second determining module 34, configured to determine the compiling number after compiling is completed, determine whether all the groups are compiled according to the compiling number and the group number, and if not, the compiling module 33 continues to perform the step of compiling the groups of the neural network on line according to the group information.
The second determining module 34 is specifically configured to:
initializing a compiling identifier to be 0;
after the compiling module 33 compiles the grouping of the neural network on line according to the grouping information, the second determining module 34 obtains a new compiling identifier by superimposing 1 on the compiling identifier;
the second determining module 34 determines whether all the groups are compiled according to the compiling number and the group number, including:
and comparing whether the compiling identification is consistent with the grouping quantity, and if so, determining that the grouping is finished after compiling.
The embodiment of the disclosure also provides a computer, which comprises the neural network online compiling device.
The embodiment of the present disclosure further provides a computer-readable storage medium, in which computer-executable instructions are stored, and the computer-executable instructions are configured to execute the neural network online compiling method.
The disclosed embodiments also provide a computer program product comprising a computer program stored on a computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the above neural network online compilation method.
The computer-readable storage medium described above may be a transitory computer-readable storage medium or a non-transitory computer-readable storage medium.
Fig. 5 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present invention.
As shown in fig. 5, the electronic device provided in this embodiment includes:
the electronic device includes:
at least one processor (processor)50, one processor 50 being exemplified in fig. 5; and a memory (memory)51, and may further include a Communication Interface (Communication Interface)52 and a bus 53. The processor 50, the communication interface 52 and the memory 51 may communicate with each other via a bus 53. The communication interface 52 may be used for information transfer. The processor 50 may call logic instructions in the memory 51 to perform the neural network online compilation method of the above-described embodiment.
In addition, the logic instructions in the memory 51 may be implemented in the form of software functional units and may be stored in a computer readable storage medium when the logic instructions are sold or used as independent products.
The memory 51 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 50 executes the software program, the instructions and the modules stored in the memory 51, thereby executing the functional application and the data processing, that is, implementing the neural network online compiling method in the above method embodiment.
The memory 51 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 51 may include a high-speed random access memory, and may also include a nonvolatile memory.
The technical solution of the embodiments of the present disclosure may be embodied in the form of a software product, where the computer software product is stored in a storage medium and includes one or more instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present disclosure. And the aforementioned storage medium may be a non-transitory storage medium comprising: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes, and may also be a transient storage medium.
As used in this application, although the terms "first," "second," etc. may be used in this application to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, unless the meaning of the description changes, so long as all occurrences of the "first element" are renamed consistently and all occurrences of the "second element" are renamed consistently. The first and second elements are both elements, but may not be the same element.
The words used in this application are words of description only and not of limitation of the claims. As used in the description of the embodiments and the claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Similarly, the term "and/or" as used in this application is meant to encompass any and all possible combinations of one or more of the associated listed. Furthermore, the terms "comprises" and/or "comprising," when used in this application, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The various aspects, implementations, or features of the described embodiments can be used alone or in any combination. Aspects of the described embodiments may be implemented by software, hardware, or a combination of software and hardware. The described embodiments may also be embodied by a computer-readable medium having computer-readable code stored thereon, the computer-readable code comprising instructions executable by at least one computing device. The computer readable medium can be associated with any data storage device that can store data which can be read by a computer system. Exemplary computer readable media can include read-only memory, random-access memory, CD-ROMs, HDDs, DVDs, magnetic tape, and optical data storage devices, among others. The computer readable medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
The above description of the technology may refer to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration embodiments in which the described embodiments may be practiced. These embodiments, while described in sufficient detail to enable those skilled in the art to practice them, are non-limiting; other embodiments may be utilized and changes may be made without departing from the scope of the described embodiments. For example, the order of operations described in a flowchart is non-limiting, and thus the order of two or more operations illustrated in and described in accordance with the flowchart may be altered in accordance with several embodiments. As another example, in several embodiments, one or more operations illustrated in and described with respect to the flowcharts are optional or may be eliminated. Additionally, certain steps or functions may be added to the disclosed embodiments, or two or more steps may be permuted in order. All such variations are considered to be encompassed by the disclosed embodiments and the claims.
Additionally, terminology is used in the foregoing description of the technology to provide a thorough understanding of the described embodiments. However, no unnecessary detail is required to implement the described embodiments. Accordingly, the foregoing description of the embodiments has been presented for purposes of illustration and description. The embodiments presented in the foregoing description and the examples disclosed in accordance with these embodiments are provided solely to add context and aid in the understanding of the described embodiments. The above description is not intended to be exhaustive or to limit the described embodiments to the precise form disclosed. Many modifications, alternative uses, and variations are possible in light of the above teaching. In some instances, well known process steps have not been described in detail in order to avoid unnecessarily obscuring the described embodiments.

Claims (18)

  1. An online neural network compiling method, comprising:
    acquiring intermediate representation information corresponding to a neural network, wherein the intermediate representation information is determined by performing layer grouping on the neural network in advance and according to a grouping result;
    determining grouping information of the neural network according to the intermediate representation information;
    and compiling the grouping of the neural network on line according to the grouping information.
  2. The method according to claim 1, wherein the grouping information includes input data tensor information, output data tensor information, preset tensor information of the grouping processing data;
    the compiling the grouping of the neural network on line according to the grouping information comprises the following steps:
    determining input data tensor information and preset tensor information of the grouping according to the grouping information;
    segmenting the grouped input data according to the preset tensor information and the input data tensor information to obtain segmented data;
    and compiling the grouping on line according to the segmentation data.
  3. The method of claim 2, wherein the input data tensor information comprises an input data identification; the preset tensor information comprises preset tensor dimensionality;
    the segmenting the grouped input data according to the preset tensor information and the input data tensor information to obtain segmented data comprises the following steps:
    determining input data of the group according to the input data identification;
    and segmenting the input data according to the preset tensor dimension to obtain segmented data, wherein the dimension of the segmented data is smaller than or equal to the preset tensor dimension.
  4. The method of claim 2, wherein the grouping information further comprises: global tensor information, time step information;
    the compiling the packet on-line according to the segmentation data comprises:
    reading one of the segmentation data according to a preset reading rule;
    determining a step length and a processing intercept point when the segmentation data is processed according to the grouped global tensor information;
    determining partition subdata according to the step length and the processing intercept point, and generating an instruction according to the time step information and the partition subdata in the grouping information;
    and judging whether the processing of the segmentation data is finished according to the processing intercept, if not, continuing to execute the step of determining the step length and the processing intercept when the segmentation data is processed according to the grouped global tensor information.
  5. The method according to claim 4, wherein if the segmented data is determined to be processed, determining whether all the segmented data is processed, and if not, continuing to perform the step of reading one of the segmented data according to a preset reading rule.
  6. The method of claim 1, wherein the intermediate representation information includes a number of packets of the neural network;
    the method further comprises the following steps:
    and determining the compiling quantity of the compiled packets, determining whether all the compiled packets are compiled according to the compiling quantity and the packet quantity, and if not, continuing to execute the step of compiling the packets of the neural network on line according to the packet information.
  7. The method of claim 6,
    the determining the compiling number of the compiling completion comprises:
    initializing a compiling identifier to be 0;
    after the grouping of the neural network is compiled online according to the grouping information, a new compiling identifier is obtained by superposing 1 on the compiling identifier;
    determining whether all the groups are compiled according to the compiling quantity and the grouping quantity, wherein the determining comprises the following steps:
    and comparing whether the compiling identification is consistent with the grouping quantity, and if so, determining that the grouping is finished after compiling.
  8. An online neural network compiling device, comprising:
    the acquisition module is used for acquiring intermediate representation information corresponding to the neural network, wherein the intermediate representation information is determined by performing layer grouping on the neural network in advance and according to a grouping result;
    a first determining module, configured to determine grouping information of the neural network according to the intermediate representation information;
    and the compiling module is used for compiling the grouping of the neural network on line according to the grouping information.
  9. The apparatus according to claim 8, wherein the grouping information includes input data tensor information, output data tensor information, preset tensor information of the grouping processing data;
    the compiling module comprises:
    a first determining unit configured to determine, according to the grouping information, input data tensor information and preset tensor information of the group;
    the dividing unit is used for dividing the grouped input data according to the preset tensor information and the input data tensor information to obtain divided data;
    and the compiling unit is used for compiling the grouping on line according to the segmentation data.
  10. The apparatus of claim 9, wherein the input data tensor information comprises an input data identification; the preset tensor information comprises preset tensor dimensionality;
    the segmentation unit is specifically configured to:
    determining input data of the group according to the input data identification;
    and segmenting the input data according to the preset tensor dimension to obtain segmented data, wherein the dimension of the segmented data is smaller than or equal to the preset tensor dimension.
  11. The apparatus of claim 9, wherein the grouping information further comprises: global tensor information, time step information;
    the compiling unit is specifically configured to:
    reading one of the segmentation data according to a preset reading rule;
    determining a step length and a processing intercept point when the segmentation data is processed according to the grouped global tensor information;
    determining partition subdata according to the step length and the processing intercept point, and generating an instruction according to the time step information and the partition subdata in the grouping information;
    and judging whether the processing of the segmentation data is finished according to the processing intercept, if not, continuing to execute the step of determining the step length and the processing intercept when the segmentation data is processed according to the grouped global tensor information.
  12. The apparatus of claim 11, wherein if the compiling unit determines that the split data is processed, the compiling module further includes a second determining unit configured to determine whether all the split data is processed, and if not, the compiling unit continues to perform the step of reading one of the split data according to a preset reading rule.
  13. The apparatus of claim 8, wherein the intermediate representation information comprises a number of packets of the neural network;
    the device further comprises:
    and the second determining module is used for determining the compiling quantity after compiling is finished, determining whether all the groups are compiled according to the compiling quantity and the grouping quantity, and if not, continuing to execute the step of compiling the groups of the neural network on line according to the grouping information by the compiling module.
  14. The apparatus of claim 13,
    the second determining module is specifically configured to:
    initializing a compiling identifier to be 0;
    after the compiling module carries out online compiling on the grouping of the neural network according to the grouping information, the second determining module obtains a new compiling identifier by superposing 1 on the compiling identifier;
    the second determining module determines whether all the groups are compiled according to the compiling quantity and the grouping quantity, and comprises the following steps:
    and comparing whether the compiling identification is consistent with the grouping quantity, and if so, determining that the grouping is finished after compiling.
  15. A computer comprising the apparatus of any one of claims 8-14.
  16. An electronic device, comprising:
    at least one processor; and
    a memory communicatively coupled to the at least one processor; wherein,
    the memory stores instructions executable by the at least one processor, the instructions, when executed by the at least one processor, causing the at least one processor to perform the method of any one of claims 1-7.
  17. A computer-readable storage medium having stored thereon computer-executable instructions configured to perform the method of any one of claims 1-7.
  18. A computer program product, characterized in that the computer program product comprises a computer program stored on a computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to carry out the method of any one of claims 1-7.
CN201880098337.7A 2018-11-08 2018-11-08 Neural network compiling method, device, equipment, storage medium and program product Active CN112912837B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/114543 WO2020093304A1 (en) 2018-11-08 2018-11-08 Method, apparatus, and device for compiling neural network, storage medium, and program product

Publications (2)

Publication Number Publication Date
CN112912837A true CN112912837A (en) 2021-06-04
CN112912837B CN112912837B (en) 2024-02-13

Family

ID=70611242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880098337.7A Active CN112912837B (en) 2018-11-08 2018-11-08 Neural network compiling method, device, equipment, storage medium and program product

Country Status (2)

Country Link
CN (1) CN112912837B (en)
WO (1) WO2020093304A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114428616A (en) * 2022-04-01 2022-05-03 北京清微智能信息技术有限公司 Method for optimizing replacement cost in neural network compiling stage

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114385867A (en) * 2020-10-16 2022-04-22 中科寒武纪科技股份有限公司 Apparatus, method and computer program product for processing multidimensional data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355244A (en) * 2016-08-30 2017-01-25 深圳市诺比邻科技有限公司 CNN (convolutional neural network) construction method and system
CN106547605A (en) * 2016-09-29 2017-03-29 乐视控股(北京)有限公司 Compiled code sending method and compiled code dispensing device
CN107103113A (en) * 2017-03-23 2017-08-29 中国科学院计算技术研究所 Towards the Automation Design method, device and the optimization method of neural network processor
CN107918794A (en) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 Neural network processor based on computing array
CN108363559A (en) * 2018-02-13 2018-08-03 北京旷视科技有限公司 Multiplication processing method, equipment and the computer-readable medium of neural network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127297B (en) * 2016-06-02 2019-07-12 中国科学院自动化研究所 The acceleration of depth convolutional neural networks based on tensor resolution and compression method
CN106650922B (en) * 2016-09-29 2019-05-03 清华大学 Hardware neural network conversion method, computing device, software and hardware cooperative system
CN107239315B (en) * 2017-04-11 2019-11-15 赛灵思公司 Programming model towards neural network heterogeneous computing platforms

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355244A (en) * 2016-08-30 2017-01-25 深圳市诺比邻科技有限公司 CNN (convolutional neural network) construction method and system
CN106547605A (en) * 2016-09-29 2017-03-29 乐视控股(北京)有限公司 Compiled code sending method and compiled code dispensing device
CN107103113A (en) * 2017-03-23 2017-08-29 中国科学院计算技术研究所 Towards the Automation Design method, device and the optimization method of neural network processor
CN107918794A (en) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 Neural network processor based on computing array
CN108363559A (en) * 2018-02-13 2018-08-03 北京旷视科技有限公司 Multiplication processing method, equipment and the computer-readable medium of neural network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114428616A (en) * 2022-04-01 2022-05-03 北京清微智能信息技术有限公司 Method for optimizing replacement cost in neural network compiling stage

Also Published As

Publication number Publication date
CN112912837B (en) 2024-02-13
WO2020093304A1 (en) 2020-05-14

Similar Documents

Publication Publication Date Title
CN110298035B (en) Word vector definition method, device, equipment and storage medium based on artificial intelligence
US11677686B2 (en) Packet forwarding method, apparatus, device, and system
JP6352958B2 (en) Graph index search device and operation method of graph index search device
CN103617226B (en) A kind of matching regular expressions method and device
KR102207408B1 (en) Method, apparatus and computer readable medium for image processing
US12026607B1 (en) Memory operation for systolic array
CN112912837A (en) Neural network compiling method, device, equipment, storage medium and program product
CN114968612B (en) Data processing method, system and related equipment
KR20210014561A (en) Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium
CN108875914B (en) Method and device for preprocessing and post-processing neural network data
CN115344805A (en) Material auditing method, computing equipment and storage medium
KR102305575B1 (en) Method and system for highlighting similar areas using similarity between images
CN108897858B (en) Distributed cluster index fragmentation evaluation method and device and electronic equipment
CN113361567B (en) Image processing method, device, electronic equipment and storage medium
CN110502975B (en) Batch processing system for pedestrian re-identification
CN109213972B (en) Method, device, equipment and computer storage medium for determining document similarity
CN113377998A (en) Data loading method and device, electronic equipment and storage medium
CN117196015A (en) Operator execution method, device, electronic equipment and storage medium
CN117313166A (en) Data filling method, device, computer equipment and storage medium
CN110019295B (en) Database retrieval method, device, system and storage medium
CN112955906B (en) Neural network layer grouping method, device, equipment, storage medium and program product
CN114387588A (en) Character recognition method and device, electronic equipment and storage medium
CN109934037B (en) Two-dimensional code image finding method, positioning method, server and storage medium
CN113840169A (en) Video processing method and device, computing equipment and storage medium
CN113537392A (en) Similar image identification method and device, computing equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant