CN111415004A - Method and apparatus for outputting information - Google Patents
Method and apparatus for outputting information Download PDFInfo
- Publication number
- CN111415004A CN111415004A CN202010184800.9A CN202010184800A CN111415004A CN 111415004 A CN111415004 A CN 111415004A CN 202010184800 A CN202010184800 A CN 202010184800A CN 111415004 A CN111415004 A CN 111415004A
- Authority
- CN
- China
- Prior art keywords
- zero
- output
- ptr
- input
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000004364 calculation method Methods 0.000 claims abstract description 59
- 238000010586 diagram Methods 0.000 claims abstract description 32
- 238000000605 extraction Methods 0.000 claims abstract description 29
- 239000011159 matrix material Substances 0.000 claims abstract description 19
- 230000001133 acceleration Effects 0.000 claims abstract description 18
- 230000008569 process Effects 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 abstract description 8
- 238000001514 detection method Methods 0.000 abstract description 3
- 230000000007 visual effect Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000013138 pruning Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000011179 visual inspection Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the disclosure discloses a method and a device for outputting information. One embodiment of the method comprises: acquiring an input feature map and a convolution kernel of at least one input channel; performing non-zero index extraction on the sparse weight parameter matrix serving as the convolution kernel to obtain a non-zero element index list; for each input channel, performing traversal multiplication and addition calculation on each non-zero element in the non-zero element index list and the input feature map of the input channel to obtain a tangent plane of an output feature map corresponding to the non-zero element of the input channel; and for each output channel, accumulating the tangent planes of the output characteristic diagrams corresponding to all the non-zero elements in the non-zero element index list to obtain and output the output characteristic diagram of the output channel. The implementation method can utilize the sparsity of the convolutional neural network to carry out calculation acceleration, can greatly reduce the calculation amount, improves the execution speed of the visual detection task, and is suitable for a general hardware architecture.
Description
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for outputting information.
Background
Convolution (convolution) refers to an operator in a Deep Neural Network (Deep Neural Network) in a visual inspection task, and completes feature extraction by sliding different convolution kernels on an input image and running a multiply-add operation. Convolutional neural networks have become the most widely used model in the deep learning field. With the application of deep learning on mobile terminals, including mobile phones, automobiles, internet of things and other computationally-limited devices, the application is more and more, and since most of the computation amount of the convolutional neural network is distributed in the convolutional computation, efficient implementation of the convolutional computation becomes especially necessary.
Han et al have demonstrated that the weight parameters of convolutional neural networks generally have sparsity of 20% to 80%, i.e., after the algorithm model is optimized by pruning, the sparsity of the model is about 20% to 80% without affecting the inference accuracy. The sparsity is the ratio of the number of zero-valued elements in the model weight parameter to the number of the total weight parameter. These zero values introduce computational effort, consuming a lot of computation power, but do not contribute to the result. In order to realize the efficient calculation of the convolutional neural network at the mobile terminal, the convolution operator of the sparse model needs to be realized again, and an efficient calculation method without zero value calculation is found.
The structural damage is often brought by the parameter sparsification of the weight pruning, the calculation of the convolution is no longer performed by a regular convolution kernel in a sliding calculation mode, so that most of convolution acceleration (im2col and winogrd) is invalid, and most of the current modes aiming at the sparsification convolution acceleration are aimed at specific hardware.
The existing sparse acceleration scheme aiming at specific hardware can obtain an acceleration effect by utilizing the characteristics of the hardware, but has no universality, too limited application scene and higher deployment cost. Currently, most of mobile terminal deep learning is still deployed on general CPU computing equipment, a general sparse convolution acceleration method is still needed, and easy integration and rapid deployment can be realized.
Disclosure of Invention
Embodiments of the present disclosure propose methods and apparatuses for outputting information.
In a first aspect, an embodiment of the present disclosure provides a method for outputting information, including: acquiring an input feature map and a convolution kernel of at least one input channel; performing non-zero index extraction on the sparse weight parameter matrix serving as the convolution kernel to obtain a non-zero element index list; for each input channel, performing traversal multiplication and addition calculation on each non-zero element in the non-zero element index list and the input feature map of the input channel to obtain a tangent plane of an output feature map corresponding to the non-zero element of the input channel; and for each output channel, accumulating the tangent planes of the output characteristic diagrams corresponding to all the non-zero elements in the non-zero element index list to obtain and output the output characteristic diagram of the output channel.
In some embodiments, the method further comprises: in the inner loop of the sparse convolution calculation and the traversal calculation process of the output feature graph, the acceleration is carried out through instruction disorder and data prefetching.
In some embodiments, non-zero index extraction is performed on a sparse weight parameter matrix serving as a convolution kernel to obtain a non-zero element index list, including: and traversing the sparse weight parameter matrix of the convolution kernel, and storing the weight parameter larger than the preset threshold value into a non-zero element index list.
In some embodiments, prior to performing the non-zero index extraction, the method further comprises: and defining lists in _ ptr, w _ ptr, out _ ptr and out _ cnt for storing indexes, wherein the lists are respectively used for storing the initial data address of the input feature graph, the non-zero weight parameter address, the initial data address of the output feature graph and the number of the tangent planes corresponding to the initial data address of the output feature graph.
In some embodiments, after performing the non-zero index extraction, the method further comprises: storing the address of the non-zero element into a w _ ptr list; calculating a channel initial address of the input feature map and an initial offset of sliding traversal according to the corresponding input channel, calculating to obtain an initial address of each traversal, and storing the initial address in _ ptr; and judging whether the initial address of the output characteristic diagram corresponding to the current calculation is stored into the out _ ptr or not, if not, storing into the out _ ptr, otherwise, increasing the count of the section by 1 and updating into an out _ cnt list.
In some embodiments, for each non-zero element in the non-zero element index list, performing a traversal multiply-add calculation on the non-zero element and the input feature map of the input channel includes: traversing the out _ ptr list, and reading the count value of the corresponding subscript in the out _ cnt list; traversing the count value, and reading an input index list in _ ptr and a weight index list w _ ptr according to a circulating variable of the traversal; and traversing each pixel in the input feature map of the input channel, and sequentially calculating a tangent plane by the multiplication and the addition of the nonzero element and the pixel at the corresponding position of the input feature map.
In a second aspect, an embodiment of the present disclosure provides an apparatus for outputting information, including: an acquisition unit configured to acquire an input feature map and a convolution kernel of at least one input channel; the index extraction unit is configured to extract a non-zero index of the sparse weight parameter matrix serving as the convolution kernel to obtain a non-zero element index list; the calculation unit is configured to perform traversal multiplication and addition calculation on each non-zero element in the non-zero element index list and an input feature map of the input channel for each input channel, so as to obtain a tangent plane of an output feature map corresponding to the non-zero element of the input channel; and the output unit is configured to accumulate the tangent planes of the output characteristic graphs corresponding to all the non-zero elements in the non-zero element index list for each output channel to obtain and output the output characteristic graph of the output channel.
In some embodiments, the apparatus further comprises an acceleration unit configured to: in the inner loop of the sparse convolution calculation and the traversal calculation process of the output feature graph, the acceleration is carried out through instruction disorder and data prefetching.
In some embodiments, the index extraction unit is further configured to: and traversing the sparse weight parameter matrix of the convolution kernel, and storing the weight parameter larger than the preset threshold value into a non-zero element index list.
In some embodiments, the apparatus further comprises a definition unit configured to: before non-zero index extraction, lists in _ ptr, w _ ptr, out _ ptr and out _ cnt used for storing indexes are defined and are respectively used for storing the initial data address of the input feature graph, the non-zero weight parameter address, the initial data address of the output feature graph and the number of tangent planes corresponding to the initial data address of the output feature graph.
In some embodiments, the apparatus further comprises a storage unit configured to: after non-zero index extraction, storing the address of a non-zero element into a w _ ptr list; calculating a channel initial address of the input feature map and an initial offset of sliding traversal according to the corresponding input channel, calculating to obtain an initial address of each traversal, and storing the initial address in _ ptr; and judging whether the initial address of the output characteristic diagram corresponding to the current calculation is stored into the out _ ptr or not, if not, storing into the out _ ptr, otherwise, increasing the count of the section by 1 and updating into an out _ cnt list.
In some embodiments, the computing unit is further configured to: traversing the out _ ptr list, and reading the count value of the corresponding subscript in the out _ cnt list; traversing the count value, and reading an input index list in _ ptr and a weight index list w _ ptr according to a circulating variable of the traversal; and traversing each pixel in the input feature map of the input channel, and sequentially calculating a tangent plane by the multiplication and the addition of the nonzero element and the pixel at the corresponding position of the input feature map.
In a third aspect, an embodiment of the present disclosure provides an electronic device for outputting information, including: one or more processors; a storage device having one or more programs stored thereon which, when executed by one or more processors, cause the one or more processors to implement a method as in any one of the first aspects.
In a fourth aspect, embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements a method as in any one of the first aspect.
The method and the device for outputting the information can utilize the sparsity of the convolutional neural network to carry out calculation acceleration, and the calculation amount can be greatly reduced. The sparse convolution operator implementation mode is suitable for general hardware architectures such as a CPU, a DSP and the like. The sparse convolution operator is realized without depending on specific hardware or software, is easy to integrate into a depth learning inference frame, can effectively reduce the calculated amount, and improves the execution speed of a visual detection task.
Drawings
Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram for one embodiment of a method for outputting information, according to the present disclosure;
3a-3c are schematic diagrams of a convolution process of a method for outputting information according to the present disclosure;
FIG. 4 is a flow diagram of yet another embodiment of a method for outputting information in accordance with the present disclosure;
FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for outputting information according to the present disclosure;
FIG. 6 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the disclosed method for outputting information or apparatus for outputting information may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as an image recognition application, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a display screen and supporting image browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio L layer III, motion Picture Experts compression standard Audio layer 3), MP4(Moving Picture Experts Group Audio L layer IV, motion Picture Experts compression standard Audio layer 4) players, laptop portable computers, desktop computers, and the like.
The server 105 may be a server that provides various services, such as a background recognition server that provides recognition functions for images displayed on the terminal devices 101, 102, 103. The background recognition server may analyze and otherwise process the received data such as the image recognition request, and feed back a processing result (for example, the category of people and objects in the image) to the terminal device.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for outputting information provided by the embodiment of the present disclosure is generally performed by the server 105, and accordingly, the apparatus for outputting information is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for outputting information in accordance with the present disclosure is shown. The method for outputting information comprises the following steps:
In the present embodiment, an execution subject (e.g., a server shown in fig. 1) of the method for outputting information may receive an original image to be recognized from a terminal through a wired connection manner or a wireless connection manner. The server stores the original convolutional neural network model. The present disclosure may optimize the model. The original image can be used as an input feature map. If the convolutional neural network model has multiple layers, the feature diagram output after the processing of the previous convolutional layer can be used as the input feature diagram of the next convolutional layer. The convolution kernel consists of a matrix of weight parameters that has been trained.
In this embodiment, the essence of the sparse acceleration is to avoid zero-valued elements from participating in the calculation, so that the sparse weight parameter matrix of the convolution kernel can be traversed, and the element index whose weight parameter is not 0 is directly extracted to generate the non-zero element index list.
In some optional implementations of this embodiment, for further pruning, the sparse weight parameter matrix of the convolution kernel is traversed, and the weight parameters greater than the preset threshold are stored in the non-zero element index list.
In some optional implementations of this embodiment, lists in _ ptr, w _ ptr, out _ ptr, and out _ cnt for storing indexes are defined, and are respectively used to store the starting data address of the input feature map, the non-zero weight parameter address, the starting data address of the output feature map, and the number of slices corresponding to the starting data address of the output feature map.
As shown in fig. 3a, the address of each non-zero element is stored in the w _ ptr list.
And calculating a channel starting address base _ input _ addr of the input feature map and a sliding traversal starting offset according to the corresponding input channel, calculating to obtain a starting address base _ input _ addr + offset of the traversal, and storing the starting address base _ input _ addr + offset in _ ptr. offset refers to the position offset of the first non-zero value in the weight parameter of the convolution kernel in the input profile.
The specific calculation method of offset is as follows: assuming the convolution kernel shape is r x s (r rows and s columns), the width of the input feature map is w;
for(int i=0:r)
for(intj=0:s)
offset=i*w+j
and judging whether the initial address of the output characteristic diagram corresponding to the current calculation is stored into the out _ ptr or not, if not, storing into the out _ ptr, otherwise, increasing the count by 1 and updating into an out _ cnt list.
In this embodiment, the convolution operator is usually implemented by multiple loop nesting, and the traversal order of loops can be defined as a data stream; different data streams can obtain memory access performance which is not passed, so that the calculation performance is influenced; the data flow of the convolution operator is optimized in a targeted manner aiming at the calculation characteristics of the sparse convolution.
The premise of data flow optimization is to change the circulation sequence of convolution calculation, and the correctness of the convolution calculation result is not influenced. Taking an input feature map of a single channel and a single element in a convolution kernel, and performing traversal multiplication and addition calculation to obtain a tangent plane of the output feature map, wherein the tangent plane is called a partial plane; this section is not a complete feature map, and only when partial planes corresponding to all elements in the convolution kernel are accumulated, a complete feature map can be obtained. The specific process is shown in fig. 3 b.
After the initial address, each pixel of the input feature map is traversed, multiplication operation is sequentially carried out on each pixel and one element of the convolution kernel, and the obtained output feature map is a tangent plane;
for a convolution kernel with the size of r & lts & gt, the sparsity of the sparse weight tensor is sparsity, so that the average number of partialplanes is r & lts & gt (1-sparsity), and compared with non-sparse convolution, a large amount of calculation is reduced;
the specific data flow for the sparse convolution is:
(1) traversing the out _ ptr list, and reading a count value cnt of a corresponding subscript in the out _ cnt list;
(2) traversing the count value cnt, and reading an input index list in _ ptr and a weight index list w _ ptr according to a circulating variable of the traversal;
(3) and traversing the input feature graph w x h, and sequentially calculating a partial plane.
And 204, accumulating the tangent planes of the output characteristic diagrams corresponding to all the non-zero elements in the non-zero element index list for each output channel to obtain and output the output characteristic diagram of the output channel.
In this embodiment, the partial plane is accumulated to obtain the output characteristic diagram, as shown in fig. 3c, the slice accumulation is the direct addition of the corresponding pixels, each slice is w × h, and the corresponding pixels are accumulated.
With further reference to fig. 4, a flow 400 of yet another embodiment of a method for outputting information is shown. The process 400 of the method for outputting information includes the steps of:
And step 402, performing non-zero index extraction on the sparse weight parameter matrix serving as the convolution kernel to obtain a non-zero element index list.
The steps 401-402 are substantially the same as the steps 201-202, and therefore will not be described again.
In this embodiment, lists in _ ptr, w _ ptr, out _ ptr, and out _ cnt for storing indexes are defined, and are respectively used to store the starting data address of the input feature map, the non-zero weight parameter address, the starting data address of the output feature map, and the number of slices corresponding to the starting data address of the output feature map. The specific process refers to step 202.
In this embodiment, data prefetching can significantly improve memory access performance, so that the instruction pipeline is not blocked. The inner loop of the sparse convolution calculation and the traversal calculation process of the feature graph can be accelerated through instruction disordering and data prefetching.
In this embodiment, each output index corresponds to an output index count. E.g., index count is 4, then 4 steps 406 need to be performed.
And 406, taking a weight index and an input index each time, and performing traversal multiplication and addition calculation on the non-zero element pointed by the weight index and the pixel pointed by the initial address of the input feature map pointed by the input index to obtain a section of the output feature map.
In this embodiment, to further increase the computation speed, acceleration can be performed by instruction out-of-order and data prefetching. For example, if the count of the output index is not less than 4, 4 input indexes and 4 weight indexes are taken, and the non-zero elements pointed by the 4 weight indexes are respectively subjected to traversal multiplication addition calculation with the input feature map pointed by the 4 input indexes, so that a plurality of sections of the output feature map can be obtained. The traversal calculation here means that the traversal calculation is performed for each input index, because each index represents a start address.
The specific implementation process is as follows:
(1) taking 4 input indexes and 4 non-zero weight parameters for calculation at a time
(2) Loading an input index 1, prefetching an input index 2, and calculating a partial plane of the output index 1;
(3) prefetching an input index 3, and calculating a partial plane of an output index 2;
(4) prefetching an input index 4, and calculating a partial plane of an output index 3;
"fetch" refers to the traversal of an index from an index array.
"load" is the loading of data from memory (memory) into a register (register) based on an index (actually a memory address).
"prefetch" relates to the concept of an instruction pipeline, and when a multiply-add computation program is compiled into an assembly instruction, access and computation of a plurality of instructions are generated and put in the instruction pipeline, and data needs to be loaded into a register from a memory first before multiplication or addition can be performed; when the calculation is executed each time, the efficiency is influenced by making memory access again because the memory access is time-consuming; the pre-fetching operation is to load the data required by the next computation instruction to the register while executing the computation instruction, and after the computation instruction is executed, the data required by the next computation is already in the register, so that the time consumed by memory access is saved, and the acceleration is realized.
The weight parameters are only loaded to the register once, and the weight parameters do not need to be traversed in real time when the feature graph is traversed, so that prefetching is not needed when calculation is needed; the input index represents a starting address and needs to be traversed, and prefetching needs to be accelerated in the traversing process; input index 1 corresponds to the first non-zero weight parameter.
If the count of the output index is not less than 4, a single input index and the weight index are taken for traversal calculation, the nonzero element pointed by the weight index and the input feature map pointed by the input index are subjected to traversal multiplication and addition calculation, and a tangent plane of the output feature map can be obtained.
The detailed process is substantially the same as step 203, and therefore, the detailed description is omitted.
Step 407 is substantially the same as step 204, and therefore is not described in detail.
If the output index is not read, step 404 and step 407 are continued.
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for outputting information in the present embodiment embodies steps accelerated by instruction out-of-order and data prefetching. Therefore, the scheme described in the embodiment can further improve the execution speed of the visual detection task.
With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an apparatus for outputting information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.
As shown in fig. 5, the apparatus 500 for outputting information of the present embodiment includes: an acquisition unit 501, an index extraction unit 502, a calculation unit 503, and an output unit 504. The obtaining unit 501 is configured to obtain an input feature map and a convolution kernel of at least one input channel; an index extraction unit 502 configured to perform non-zero index extraction on the sparse weight parameter matrix serving as the convolution kernel to obtain a non-zero element index list; a calculating unit 503, configured to perform, for each input channel, traversal multiply-add calculation on each non-zero element in the non-zero element index list and the input feature map of the input channel, so as to obtain a tangent plane of the output feature map corresponding to the non-zero element of the input channel; and the output unit 504 is configured to, for each output channel, accumulate the tangent planes of the output feature maps corresponding to all non-zero elements in the non-zero element index list to obtain and output the output feature map of the output channel.
In the present embodiment, specific processing of the acquisition unit 501, the index extraction unit 502, the calculation unit 503, and the output unit 504 of the apparatus 500 for outputting information may refer to step 201, step 202, step 203, step 204 in the corresponding embodiment of fig. 2.
In some optional implementations of this embodiment, the apparatus 500 further comprises an acceleration unit (not shown in the drawings) configured to: in the inner loop of the sparse convolution calculation and the traversal calculation process of the output feature graph, the acceleration is carried out through instruction disorder and data prefetching.
In some optional implementations of this embodiment, the index extraction unit 502 is further configured to: and traversing the sparse weight parameter matrix of the convolution kernel, and storing the weight parameter larger than the preset threshold value into a non-zero element index list.
In some optional implementations of this embodiment, the apparatus 500 further comprises a defining unit (not shown in the drawings) configured to: before non-zero index extraction, lists in _ ptr, w _ ptr, out _ ptr and out _ cnt used for storing indexes are defined and are respectively used for storing the initial data address of the input feature graph, the non-zero weight parameter address, the initial data address of the output feature graph and the number of tangent planes corresponding to the initial data address of the output feature graph.
In some optional implementations of this embodiment, the apparatus 500 further comprises a storage unit (not shown in the drawings) configured to: after non-zero index extraction, storing the address of a non-zero element into a w _ ptr list; calculating a channel initial address of the input feature map and an initial offset of sliding traversal according to the corresponding input channel, calculating to obtain an initial address of each traversal, and storing the initial address in _ ptr; and judging whether the initial address of the output characteristic diagram corresponding to the current calculation is stored into the out _ ptr or not, if not, storing into the out _ ptr, otherwise, increasing the count of the section by 1 and updating into an out _ cnt list.
In some optional implementations of this embodiment, the computing unit 503 is further configured to: traversing the out _ ptr list, and reading the count value of the corresponding subscript in the out _ cnt list; traversing the count value, and reading an input index list in _ ptr and a weight index list w _ ptr according to a circulating variable of the traversal; and traversing each pixel in the input feature map of the input channel, and sequentially calculating a tangent plane by the multiplication and the addition of the nonzero element and the pixel at the corresponding position of the input feature map.
Referring now to fig. 6, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device/server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
In general, input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc., output devices 607 including, for example, a liquid crystal display (L CD), speaker, vibrator, etc., storage devices 608 including, for example, magnetic tape, hard disk, etc., and communication devices 609.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring an input feature map and a convolution kernel of at least one input channel; performing non-zero index extraction on the sparse weight parameter matrix serving as the convolution kernel to obtain a non-zero element index list; for each input channel, performing traversal multiplication and addition calculation on each non-zero element in the non-zero element index list and the input feature map of the input channel to obtain a tangent plane of an output feature map corresponding to the non-zero element of the input channel; and for each output channel, accumulating the tangent planes of the output characteristic diagrams corresponding to all the non-zero elements in the non-zero element index list to obtain and output the output characteristic diagram of the output channel.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including AN object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, an index extraction unit, a calculation unit, and an output unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, the acquisition unit may also be described as a "unit that acquires an input feature map and convolution kernel of at least one input channel".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Claims (14)
1. A method for outputting information, comprising:
acquiring an input feature map and a convolution kernel of at least one input channel;
performing non-zero index extraction on the sparse weight parameter matrix serving as the convolution kernel to obtain a non-zero element index list;
for each input channel, performing traversal multiplication and addition calculation on each non-zero element in the non-zero element index list and the input feature map of the input channel to obtain a tangent plane of an output feature map corresponding to the non-zero element of the input channel;
and for each output channel, accumulating the tangent planes of the output characteristic diagrams corresponding to all the non-zero elements in the non-zero element index list to obtain and output the output characteristic diagram of the output channel.
2. The method of claim 1, wherein the method further comprises:
in the inner loop of the sparse convolution calculation and the traversal calculation process of the output feature graph, the acceleration is carried out through instruction disorder and data prefetching.
3. The method of claim 1, wherein the performing non-zero index extraction on the sparse weight parameter matrix as the convolution kernel to obtain a non-zero element index list comprises:
and traversing the sparse weight parameter matrix of the convolution kernel, and storing the weight parameter larger than the preset threshold value into a non-zero element index list.
4. The method of claim 1, wherein prior to performing non-zero index extraction, the method further comprises:
and defining lists in _ ptr, w _ ptr, out _ ptr and out _ cnt for storing indexes, wherein the lists are respectively used for storing the initial data address of the input feature graph, the non-zero weight parameter address, the initial data address of the output feature graph and the number of the tangent planes corresponding to the initial data address of the output feature graph.
5. The method of claim 4, wherein after performing non-zero index extraction, the method further comprises:
storing the address of the non-zero element into a w _ ptr list;
calculating a channel initial address of the input feature map and an initial offset of sliding traversal according to the corresponding input channel, calculating to obtain an initial address of each traversal, and storing the initial address in _ ptr;
and judging whether the initial address of the output characteristic diagram corresponding to the current calculation is stored into the out _ ptr or not, if not, storing into the out _ ptr, otherwise, increasing the count of the section by 1 and updating into an out _ cnt list.
6. The method of claim 5, wherein the performing a traversal multiply-add calculation on each non-zero element in the non-zero element index list and the input feature map of the input channel comprises:
traversing the out _ ptr list, and reading the count value of the corresponding subscript in the out _ cnt list;
traversing the count value, and reading an input index list in _ ptr and a weight index list w _ ptr according to a circulating variable of the traversal;
and traversing each pixel in the input feature map of the input channel, and sequentially calculating a tangent plane by the multiplication and the addition of the nonzero element and the pixel at the corresponding position of the input feature map.
7. An apparatus for outputting information, comprising:
an acquisition unit configured to acquire an input feature map and a convolution kernel of at least one input channel;
the index extraction unit is configured to extract a non-zero index of the sparse weight parameter matrix serving as the convolution kernel to obtain a non-zero element index list;
the calculation unit is configured to perform traversal multiplication and addition calculation on each non-zero element in the non-zero element index list and an input feature map of the input channel for each input channel, so as to obtain a tangent plane of an output feature map corresponding to the non-zero element of the input channel;
and the output unit is configured to accumulate the tangent planes of the output characteristic graphs corresponding to all the non-zero elements in the non-zero element index list for each output channel to obtain and output the output characteristic graph of the output channel.
8. The apparatus of claim 7, wherein the apparatus further comprises an acceleration unit configured to:
in the inner loop of the sparse convolution calculation and the traversal calculation process of the output feature graph, the acceleration is carried out through instruction disorder and data prefetching.
9. The apparatus of claim 7, wherein the index extraction unit is further configured to:
and traversing the sparse weight parameter matrix of the convolution kernel, and storing the weight parameter larger than the preset threshold value into a non-zero element index list.
10. The apparatus of claim 7, wherein the apparatus further comprises a definition unit configured to:
before non-zero index extraction, lists in _ ptr, w _ ptr, out _ ptr and out _ cnt used for storing indexes are defined and are respectively used for storing the initial data address of the input feature graph, the non-zero weight parameter address, the initial data address of the output feature graph and the number of tangent planes corresponding to the initial data address of the output feature graph.
11. The apparatus of claim 10, wherein the apparatus further comprises a storage unit configured to:
after non-zero index extraction, storing the address of a non-zero element into a w _ ptr list;
calculating a channel initial address of the input feature map and an initial offset of sliding traversal according to the corresponding input channel, calculating to obtain an initial address of each traversal, and storing the initial address in _ ptr;
and judging whether the initial address of the output characteristic diagram corresponding to the current calculation is stored into the out _ ptr or not, if not, storing into the out _ ptr, otherwise, increasing the count of the section by 1 and updating into an out _ cnt list.
12. The apparatus of claim 11, wherein the computing unit is further configured to:
traversing the out _ ptr list, and reading the count value of the corresponding subscript in the out _ cnt list;
traversing the count value, and reading an input index list in _ ptr and a weight index list w _ ptr according to a circulating variable of the traversal;
and traversing each pixel in the input feature map of the input channel, and sequentially calculating a tangent plane by the multiplication and the addition of the nonzero element and the pixel at the corresponding position of the input feature map.
13. An electronic device for outputting information, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
14. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010184800.9A CN111415004B (en) | 2020-03-17 | 2020-03-17 | Method and device for outputting information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010184800.9A CN111415004B (en) | 2020-03-17 | 2020-03-17 | Method and device for outputting information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111415004A true CN111415004A (en) | 2020-07-14 |
CN111415004B CN111415004B (en) | 2023-11-03 |
Family
ID=71492977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010184800.9A Active CN111415004B (en) | 2020-03-17 | 2020-03-17 | Method and device for outputting information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111415004B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI742802B (en) * | 2020-08-18 | 2021-10-11 | 創鑫智慧股份有限公司 | Matrix calculation device and operation method thereof |
WO2023004670A1 (en) * | 2021-07-29 | 2023-02-02 | Qualcomm Incorporated | Channel-guided nested loop transformation and scalar replacement |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW201346755A (en) * | 2011-12-20 | 2013-11-16 | Intel Corp | System and method for out-of-order prefetch instructions in an in-order pipeline |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
CN107451652A (en) * | 2016-05-31 | 2017-12-08 | 三星电子株式会社 | The efficient sparse parallel convolution scheme based on Winograd |
US20180046916A1 (en) * | 2016-08-11 | 2018-02-15 | Nvidia Corporation | Sparse convolutional neural network accelerator |
CN107909148A (en) * | 2017-12-12 | 2018-04-13 | 北京地平线信息技术有限公司 | For performing the device of the convolution algorithm in convolutional neural networks |
CN107944555A (en) * | 2017-12-07 | 2018-04-20 | 广州华多网络科技有限公司 | Method, storage device and the terminal that neutral net is compressed and accelerated |
WO2018073975A1 (en) * | 2016-10-21 | 2018-04-26 | Nec Corporation | Improved sparse convolution neural network |
CN108510066A (en) * | 2018-04-08 | 2018-09-07 | 清华大学 | A kind of processor applied to convolutional neural networks |
CN109344698A (en) * | 2018-08-17 | 2019-02-15 | 西安电子科技大学 | EO-1 hyperion band selection method based on separable convolution sum hard threshold function |
CN109359726A (en) * | 2018-11-27 | 2019-02-19 | 华中科技大学 | A kind of convolutional neural networks optimization method based on winograd algorithm |
US20190108436A1 (en) * | 2017-10-06 | 2019-04-11 | Deepcube Ltd | System and method for compact and efficient sparse neural networks |
CN109840585A (en) * | 2018-01-10 | 2019-06-04 | 中国科学院计算技术研究所 | A kind of operation method and system towards sparse two-dimensional convolution |
CN109857744A (en) * | 2019-02-13 | 2019-06-07 | 上海燧原智能科技有限公司 | Sparse tensor computation method, apparatus, equipment and storage medium |
CN109993297A (en) * | 2019-04-02 | 2019-07-09 | 南京吉相传感成像技术研究院有限公司 | A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing |
CN109993683A (en) * | 2017-12-29 | 2019-07-09 | 英特尔公司 | Machine learning sparse calculation mechanism, the algorithm calculations micro-architecture and sparsity for training mechanism of any neural network |
CN110062233A (en) * | 2019-04-25 | 2019-07-26 | 西安交通大学 | The compression method and system of the sparse weight matrix of the full articulamentum of convolutional neural networks |
CN110070178A (en) * | 2019-04-25 | 2019-07-30 | 北京交通大学 | A kind of convolutional neural networks computing device and method |
CN110796238A (en) * | 2019-10-29 | 2020-02-14 | 上海安路信息科技有限公司 | Convolutional neural network weight compression method and system |
-
2020
- 2020-03-17 CN CN202010184800.9A patent/CN111415004B/en active Active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW201346755A (en) * | 2011-12-20 | 2013-11-16 | Intel Corp | System and method for out-of-order prefetch instructions in an in-order pipeline |
CN107451652A (en) * | 2016-05-31 | 2017-12-08 | 三星电子株式会社 | The efficient sparse parallel convolution scheme based on Winograd |
US20180046916A1 (en) * | 2016-08-11 | 2018-02-15 | Nvidia Corporation | Sparse convolutional neural network accelerator |
WO2018073975A1 (en) * | 2016-10-21 | 2018-04-26 | Nec Corporation | Improved sparse convolution neural network |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
US20190108436A1 (en) * | 2017-10-06 | 2019-04-11 | Deepcube Ltd | System and method for compact and efficient sparse neural networks |
CN107944555A (en) * | 2017-12-07 | 2018-04-20 | 广州华多网络科技有限公司 | Method, storage device and the terminal that neutral net is compressed and accelerated |
CN107909148A (en) * | 2017-12-12 | 2018-04-13 | 北京地平线信息技术有限公司 | For performing the device of the convolution algorithm in convolutional neural networks |
CN109993683A (en) * | 2017-12-29 | 2019-07-09 | 英特尔公司 | Machine learning sparse calculation mechanism, the algorithm calculations micro-architecture and sparsity for training mechanism of any neural network |
CN109840585A (en) * | 2018-01-10 | 2019-06-04 | 中国科学院计算技术研究所 | A kind of operation method and system towards sparse two-dimensional convolution |
CN108510066A (en) * | 2018-04-08 | 2018-09-07 | 清华大学 | A kind of processor applied to convolutional neural networks |
CN109344698A (en) * | 2018-08-17 | 2019-02-15 | 西安电子科技大学 | EO-1 hyperion band selection method based on separable convolution sum hard threshold function |
CN109359726A (en) * | 2018-11-27 | 2019-02-19 | 华中科技大学 | A kind of convolutional neural networks optimization method based on winograd algorithm |
CN109857744A (en) * | 2019-02-13 | 2019-06-07 | 上海燧原智能科技有限公司 | Sparse tensor computation method, apparatus, equipment and storage medium |
CN109993297A (en) * | 2019-04-02 | 2019-07-09 | 南京吉相传感成像技术研究院有限公司 | A kind of the sparse convolution neural network accelerator and its accelerated method of load balancing |
CN110062233A (en) * | 2019-04-25 | 2019-07-26 | 西安交通大学 | The compression method and system of the sparse weight matrix of the full articulamentum of convolutional neural networks |
CN110070178A (en) * | 2019-04-25 | 2019-07-30 | 北京交通大学 | A kind of convolutional neural networks computing device and method |
CN110796238A (en) * | 2019-10-29 | 2020-02-14 | 上海安路信息科技有限公司 | Convolutional neural network weight compression method and system |
Non-Patent Citations (6)
Title |
---|
ANGSHUMAN PARASHAR等: "SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks", 《ACM SIGARCH COMPUTER ARCHITECTURE NEWS》, vol. 45, no. 2, pages 27 - 40, XP033268524, DOI: 10.1145/3079856.3080254 * |
INTERESTING233333: "卷积神经网络(三)", vol. 1, pages 162 - 2, Retrieved from the Internet <URL:《https://blog.csdn.net/lipengfei0427/article/details/100180374》> * |
XUHAO CHEN等: "Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs", pages 1 - 9 * |
付世航: "深度卷积算法优化与硬件加速", no. 2019, pages 135 - 271 * |
周国飞;: "一种支持稀疏卷积的深度神经网络加速器的设计", no. 04, pages 109 - 111 * |
李林鹏: "压缩卷积神经网络的FPGA加速研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 2020, pages 135 - 767 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI742802B (en) * | 2020-08-18 | 2021-10-11 | 創鑫智慧股份有限公司 | Matrix calculation device and operation method thereof |
WO2023004670A1 (en) * | 2021-07-29 | 2023-02-02 | Qualcomm Incorporated | Channel-guided nested loop transformation and scalar replacement |
Also Published As
Publication number | Publication date |
---|---|
CN111415004B (en) | 2023-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112364860B (en) | Training method and device of character recognition model and electronic equipment | |
CN110413812B (en) | Neural network model training method and device, electronic equipment and storage medium | |
CN112650790B (en) | Target point cloud plane determining method and device, electronic equipment and storage medium | |
CN111368973B (en) | Method and apparatus for training a super network | |
CN110362750B (en) | Target user determination method, device, electronic equipment and computer readable medium | |
CN110826567A (en) | Optical character recognition method, device, equipment and storage medium | |
CN112650841A (en) | Information processing method and device and electronic equipment | |
US20200389182A1 (en) | Data conversion method and apparatus | |
CN110633423A (en) | Target account identification method, device, equipment and storage medium | |
CN111415004B (en) | Method and device for outputting information | |
CN110633434A (en) | Page caching method and device, electronic equipment and storage medium | |
CN113255812B (en) | Video frame detection method and device and electronic equipment | |
CN111783731B (en) | Method and device for extracting video features | |
CN111782933A (en) | Method and device for recommending book list | |
CN110378282A (en) | Image processing method and device | |
CN113240108B (en) | Model training method and device and electronic equipment | |
CN113220922B (en) | Image searching method and device and electronic equipment | |
CN115761248B (en) | Image processing method, device, equipment and storage medium | |
CN111950572A (en) | Method, apparatus, electronic device and computer-readable storage medium for training classifier | |
CN113177174B (en) | Feature construction method, content display method and related device | |
CN113283115B (en) | Image model generation method and device and electronic equipment | |
CN114724639B (en) | Preprocessing acceleration method, device, equipment and storage medium | |
CN110826497B (en) | Vehicle weight removing method and device based on minimum distance method and storage medium | |
CN113033770A (en) | Neural network model testing method and device and electronic equipment | |
CN116403201A (en) | Text recognition method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20211011 Address after: 100176 101, floor 1, building 1, yard 7, Ruihe West 2nd Road, Beijing Economic and Technological Development Zone, Daxing District, Beijing Applicant after: Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Address before: 2 / F, baidu building, No. 10, Shangdi 10th Street, Haidian District, Beijing 100085 Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |