CN116126548A

CN116126548A - Method, system, equipment and storage medium for reducing resource occupation in NPU

Info

Publication number: CN116126548A
Application number: CN202310422636.4A
Authority: CN
Inventors: 黄茂芹
Original assignee: Guangdong Saifang Technology Co ltd
Current assignee: Guangdong Saifang Technology Co ltd
Priority date: 2023-04-20
Filing date: 2023-04-20
Publication date: 2023-05-16
Anticipated expiration: 2043-04-20
Also published as: CN116126548B

Abstract

The invention relates to the technical field of deep learning, in particular to a method, a system, equipment and a storage medium for reducing resource occupation in an NPU (non-point processing unit), wherein the method comprises the steps of obtaining information to be processed, preprocessing the information to be processed and extracting preliminary features to generate a feature matrix corresponding to the preliminary features; sequencing according to a preset calculation sequence to obtain a sequencing matrix group containing a specific sequencing sequence; screening all feature matrixes of the sorting matrix group, generating a basic matrix group and then sending the basic matrix group into a cache unit of the neural network processor; and positioning all the feature matrixes of the sequencing matrix group, generating positioning information only containing the position information of each feature matrix, sending the positioning information to a neural network processor for analysis, and calling the corresponding feature matrix in the basic matrix group from a cache unit according to the analyzed positioning information to perform association calculation. The invention can reduce the occupation of computing resources and improve the computing speed.

Description

Method, system, equipment and storage medium for reducing resource occupation in NPU

Technical Field

The present invention relates to the field of deep learning technologies, and in particular, to a method, a system, an apparatus, and a storage medium for reducing resource occupation in an NPU.

Background

The time-consuming operators in the deep learning model are often convolution operations, and the essence of convolution is multiplication and addition calculation of a matrix, so that the aim of accelerating deep learning training and reasoning can be achieved by accelerating the multiplication and addition calculation of the matrix through an NPU (embedded neural network processor); however, for NPUs, especially for image convolution, it is mainly the operation of accelerated convolution, and the convolution occupies more computing resources in the process, so that the computing speed is slower, and thus a method and a system for reducing the computing resource occupation in the NPUs are needed.

Disclosure of Invention

The invention aims to provide a method, a system, equipment and a storage medium for reducing resource occupation in NPU, which can reduce occupation of computing resources and improve computing speed.

To achieve the purpose, the invention adopts the following technical scheme:

a method of reducing resource occupancy in an NPU, comprising the steps of:

s1, acquiring information to be processed, and preprocessing the information to be processed, wherein the information to be processed comprises image or video information;

s2, carrying out preliminary feature extraction on the preprocessed information to generate a feature matrix corresponding to the preliminary features;

s3, sequencing all the feature matrixes according to a preset calculation sequence to obtain a sequencing matrix group containing a specific sequencing sequence;

s4, screening all feature matrixes of the sorting matrix group to generate a basic matrix group, and sending the basic matrix group into a cache unit of the neural network processor;

s5, positioning all feature matrixes of the sorting matrix group, generating positioning information only containing the position information of each feature matrix, and sending the positioning information to the neural network processor;

and S6, analyzing the positioning information by the neural network processor, and calling the corresponding feature matrix in the basic matrix group from the cache unit according to the analyzed positioning information to perform association calculation.

Preferably, in S1, the preprocessing of the information to be processed specifically includes the following steps:

s11, judging that the information to be processed is image or video information, if the information to be processed is the image information, executing S12, and if the information to be processed is the video information, executing S13;

s12, converting the format of the image information into an RMVB format;

s13, extracting key frames of the video information, obtaining key frame images, and converting the formats of the key frame images into RMVB formats.

Preferably, in S2, the preliminary feature extraction is performed on the preprocessed information to generate a feature matrix corresponding to the preliminary feature, which specifically includes the following steps:

s21, performing color channel separation on the image obtained after pretreatment, and separating the image into a red channel image, a green channel image and a blue channel image;

s22, carrying out preliminary feature extraction on the separated image through an HOG image feature extraction algorithm to obtain an image with preliminary features;

s23, importing the image with the preliminary features into a preset feature conversion model to generate a feature matrix corresponding to the preliminary features.

Preferably, in S22, the HOG image feature extraction algorithm specifically includes the following steps:

s221, carrying out graying, gamma correction, overlapping fast normalization and segmentation on the separated image;

s222, calculating the gradient direction and the gradient amplitude of each pixel point of the segmented image, and calculating and forming a direction gradient histogram according to the gradient direction and the gradient amplitude of each pixel point;

s223, convolving the directional gradient histogram by utilizing the kernel of the gradient operator, extracting HOG features, and connecting the extracted HOG features first to form preliminary features.

Preferably, in S4, the filtering all feature matrices of the sorting matrix set specifically includes the following steps:

and reserving one of the same feature matrixes to generate a basic matrix group with different feature matrixes, wherein the basic matrix group comprises the types of all the feature matrixes in the sorting matrix group.

Preferably, in S5, the positioning information includes a separation symbol for separating different feature matrices.

Preferably, in S6, the neural network processor parses the positioning information, and invokes a corresponding feature matrix in the base matrix set from the cache unit according to the parsed positioning information to perform association calculation, and specifically includes the following steps:

s61, the neural network processor separates the positioning information into a plurality of characteristic positioning information corresponding to different characteristics in the image or the video according to the separation symbols in the positioning information;

s62, respectively calling the feature matrixes from the cache units according to the sequence of all the feature positioning information to calculate.

A system for reducing computing resource occupancy in an NPU, implementing a method for reducing computing resource occupancy in an NPU as described above, the system comprising:

the reading module is used for acquiring information to be processed and preprocessing the information to be processed, wherein the information to be processed comprises image or video information;

the initial extraction module is used for sending the preprocessed information to the central processing unit for carrying out initial feature extraction and generating a feature matrix corresponding to the initial feature;

the sorting module is used for sorting all the feature matrixes according to a preset calculation sequence to obtain a sorting matrix group containing a specific arrangement sequence;

the matrix extraction module is used for screening all feature matrices of the ordering matrix group, generating a basic matrix group and sending the basic matrix group into a buffer unit of the neural network processor;

the positioning information conversion module is used for positioning all the feature matrixes of the sequencing matrix group, generating positioning information only containing the position information of each feature matrix, and sending the positioning information to the calculation module of the neural network processor;

and the calculation module is used for analyzing the positioning information, and calling the corresponding feature matrix from the cache unit according to the analyzed positioning information to calculate.

An apparatus comprising at least one processor, at least one memory, and a data bus, the processor comprising a central processor and a neural network processor; wherein: the central processing unit, the neural network processor and the memory complete mutual communication through the data bus; the memory stores program instructions for execution by the processor, the at least one processor invoking the program instructions to perform a method of reducing computing resource usage in an NPU as described above.

A storage medium having stored thereon a computer program which when executed by at least one processor implements a method of reducing computing resource occupation in an NPU as described above.

One of the above technical solutions has the following beneficial effects: aiming at the problem of computing resource occupation in NPU, the design adopts a mode of setting a cache unit to store a matrix to be computed and adopting the same matrix to reserve one matrix, so that occupied storage is less, and the method of directly calling the matrix to be computed from the cache unit in real time in the computing process reduces the occupation of computing resources and improves the computing speed.

Drawings

FIG. 1 is a flow chart of a method of reducing computing resource usage in an NPU in accordance with the present invention;

FIG. 2 is a schematic diagram of a system for reducing computing resource usage in an NPU in accordance with the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to the present invention.

In the accompanying drawings: 1. a reading module; 2. an initial extraction module; 3. a sequencing module; 4. a matrix extraction module; 5. a positioning information conversion module; 6. a computing module; 7. a central processing unit; 8. a neural network processor; 9. a memory; 10. a data bus.

Detailed Description

The technical scheme of the invention is further described below by the specific embodiments with reference to the accompanying drawings.

Example 1

Referring to fig. 1, in order to solve the problem of computing resource occupation in NPU, the present design stores a matrix to be computed by setting a buffer unit, and reserves one matrix in the same manner, so that the occupied storage is less, and in a manner of sorting the matrix to be computed, the occupation of computing resource is reduced by directly calling from the buffer unit in real time in the computing process, thereby improving the computing speed. The method specifically comprises the following steps:

specifically, for image and adaptation acquisition, a client or front page may be set to allow a user to input image or video information to be processed.

s12, converting the format of the image information into an RMVB format;

s13, extracting key frames of the video information, obtaining key frame images, and converting the formats of the key frame images into RMVB formats. Because the video cannot directly acquire image information, the image acquisition is carried out on the video by adopting a keyword extraction mode.

In some embodiments of the invention, the picture format is in RMVB format. In order to enable the deep learning of picture classification to grasp more characteristics, high definition pictures are adopted as much as possible, and high definition refers to images above 720P and 720P, and the formats of the high definition pictures are usually AVI format, RMVB format or MPEG format. Among them, the compression rate of RMVB is higher than that of the three, but the information contained in RMVB is much more than that of AVI format and MPEG format. Thereby adopting RMVB format.

s21, performing color channel separation on the image obtained after pretreatment, and separating the image into a red channel image, a green channel image and a blue channel image; the image in the information to be processed adopts RGB image, and as the image in RGB mode is illuminant color mode and additive color mode, the color expression range is wide, and more technical characteristics can be improved for image identification.

the HOG image feature extraction algorithm specifically comprises the following steps of:

in order to facilitate positioning of the feature matrices, the feature matrices are required to be arranged according to a certain sequence, and the specific arrangement mode of the sequence is required to be automatically ordered according to the sequence required in calculation, for example, the sequence of matrix feature retrieval is calculated.

S4, screening all feature matrixes of the sorting matrix group, specifically, reserving one of the same feature matrixes, generating a basic matrix group with different feature matrixes, wherein the basic matrix group comprises the types of all feature matrixes in the sorting matrix group, and then sending the basic matrix group into a cache unit of a neural network processor 8;

after the sorting is completed, feature matrices of all components in the matrix group are screened, for example, the feature matrices comprise the following third-order matrices:

、/>

、/>

、/>

、/>

；

after screening, it gave:

、/>

、/>

；

all matrix constituent elements within the ordered matrix group are thus obtained. Compared with the prior art, all the matrixes containing the repetition are sequentially transmitted to the NPU for calculation, so that occupied resources are saved.

S5, positioning all feature matrixes of the sorting matrix group, generating positioning information only containing position information of each feature matrix, and sending the positioning information to the neural network processor 8, wherein the positioning information comprises separation symbols for separating different feature matrixes;

after the extraction, the matrix in the buffer unit cannot be calculated according to the correct sequence, so that each feature matrix is positioned according to the sequence, namely, the same feature matrix in the buffer unit of each feature matrix in the matrix group is positioned in an associated mode.

And S6, the neural network processor 8 analyzes the positioning information, and according to the analyzed positioning information, the corresponding feature matrix in the basic matrix group is called from the cache unit to perform association calculation.

S61, the neural network processor 8 separates the positioning information into a plurality of characteristic positioning information corresponding to different characteristics in the image or the video according to the separation symbols in the positioning information;

in the deep learning process, multiple features are needed to be operated simultaneously in many cases, and more than one feature may be included in the positioning information, so that the positioning information is distinguished by using separation symbols. And then after the separation symbol is read, the separation symbol is separated into a plurality of pieces of positioning information to be calculated, and each feature is distinguished.

S62, respectively calling the feature matrixes from the cache units according to the sequence of all the feature positioning information to calculate. Therefore, during calculation, the NPU directly calls the matrix in the cache unit to calculate according to the preset positioning information, so that occupation of calculation resources is reduced, and calculation speed is improved.

It should be noted that the neural network processor is the NPU.

Example 2

Referring to fig. 2, a system for reducing computing resource occupation in an NPU according to the present invention implements a method for reducing computing resource occupation in an NPU as described above, where the system includes:

the reading module 1 is used for acquiring information to be processed, and preprocessing the information to be processed, wherein the information to be processed comprises image or video information;

the initial extraction module 2 is used for carrying out initial feature extraction on the preprocessed information and generating a feature matrix corresponding to the initial feature;

the sorting module 3 is used for sorting all the feature matrixes according to a preset calculation sequence to obtain a sorting matrix group containing a specific arrangement sequence;

the matrix extraction module 4 is used for screening all feature matrices of the sorting matrix group, generating a basic matrix group, and sending the basic matrix group into a cache unit of the neural network processor;

the positioning information conversion module 5 is used for positioning all feature matrixes of the sequencing matrix group, generating positioning information only containing the position information of each feature matrix, and sending the positioning information to the calculation module 6 of the neural network processor;

and the calculation module 6 is used for analyzing the positioning information, and calling the corresponding feature matrix from the cache unit according to the analyzed positioning information to calculate.

Example 3

Referring to fig. 3, an electronic device according to the present invention includes at least one processor, at least one memory 9 and a data bus 10, where the processor includes a central processor 7 and a neural network processor 8; wherein: the central processing unit 7, the neural network processor 8 and the memory 9 complete the communication with each other through a data bus 10; the memory 9 stores program instructions executable by the processor, the at least one processor invoking the program instructions to perform a method of reducing computing resource usage in an NPU. For example, implementation:

the central processing unit 7 acquires information to be processed, and preprocesses the information to be processed, wherein the information to be processed comprises image or video information; extracting preliminary features of the preprocessed information to generate a feature matrix corresponding to the preliminary features; sequencing all the feature matrixes according to a preset calculation sequence to obtain a sequencing matrix group containing a specific sequencing sequence; screening all feature matrixes of the sorting matrix group to generate a basic matrix group, and sending the basic matrix group to a cache unit of the neural network processor 8; positioning all feature matrixes of the sequencing matrix group, generating positioning information only containing the position information of each feature matrix, and sending the positioning information to the neural network processor 8; the neural network processor 8 analyzes the positioning information, and invokes the corresponding feature matrix in the basic matrix group from the cache unit according to the analyzed positioning information to perform association calculation.

Example 4

The present invention provides a computer readable storage medium having stored thereon a computer program which when executed by the at least one processor implements a method of reducing computing resource occupation in an NPU. For example, implementation:

The Memory 9 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The technical principle of the present invention is described above in connection with the specific embodiments. The description is made for the purpose of illustrating the general principles of the invention and should not be taken in any way as limiting the scope of the invention. Other embodiments of the invention will occur to those skilled in the art from consideration of this specification without the exercise of inventive faculty, and such equivalent modifications and alternatives are intended to be included within the scope of the invention as defined in the claims.

Claims

1. A method for reducing computing resource occupancy in an NPU, comprising the steps of:

2. The method for reducing computing resource occupation in an NPU according to claim 1, wherein in S1, the preprocessing of the information to be processed specifically includes the following steps:

s12, converting the format of the image information into an RMVB format;

3. The method for reducing computing resource occupation in NPU according to claim 2, wherein in S2, the preliminary feature extraction is performed on the preprocessed information to generate a feature matrix corresponding to the preliminary feature, and the method specifically includes the following steps:

4. A method for reducing computing resource occupation in an NPU according to claim 3, wherein in S22, the HOG image feature extraction algorithm comprises the steps of:

5. The method for reducing computing resource occupation in an NPU according to claim 4, wherein in S4, the filtering all feature matrices of the rank matrix set specifically comprises the steps of:

6. The method of claim 5, wherein in S5 the positioning information includes separation symbols for separating different feature matrices.

7. The method for reducing computing resource occupation in NPU according to claim 6, wherein in S6, the neural network processor parses the positioning information, and invokes the corresponding feature matrix in the basic matrix set from the cache unit according to the parsed positioning information to perform the association computation, specifically comprising the following steps:

8. A system for reducing computing resource occupancy in an NPU, wherein a method for reducing computing resource occupancy in an NPU is implemented as recited in any of claims 1-7, the system comprising:

the initial extraction module is used for carrying out initial feature extraction on the preprocessed information and generating a feature matrix corresponding to the initial feature;

9. An apparatus comprising at least one processor, at least one memory, and a data bus, the processor comprising a central processor and a neural network processor; wherein: the central processing unit, the neural network processor and the memory complete mutual communication through the data bus; the memory stores program instructions for execution by the processor, the at least one processor invoking the program instructions to perform a method of reducing computing resource usage in an NPU as recited in any of claims 1-7.

10. A storage medium having stored thereon a computer program which, when executed by at least one processor, implements a method of reducing computing resource usage in an NPU as claimed in any of claims 1 to 7.