CN116126548A - Method, system, equipment and storage medium for reducing resource occupation in NPU - Google Patents

Method, system, equipment and storage medium for reducing resource occupation in NPU Download PDF

Info

Publication number
CN116126548A
CN116126548A CN202310422636.4A CN202310422636A CN116126548A CN 116126548 A CN116126548 A CN 116126548A CN 202310422636 A CN202310422636 A CN 202310422636A CN 116126548 A CN116126548 A CN 116126548A
Authority
CN
China
Prior art keywords
information
feature
image
matrix
positioning information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310422636.4A
Other languages
Chinese (zh)
Other versions
CN116126548B (en
Inventor
黄茂芹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Saifang Technology Co ltd
Original Assignee
Guangdong Saifang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Saifang Technology Co ltd filed Critical Guangdong Saifang Technology Co ltd
Priority to CN202310422636.4A priority Critical patent/CN116126548B/en
Publication of CN116126548A publication Critical patent/CN116126548A/en
Application granted granted Critical
Publication of CN116126548B publication Critical patent/CN116126548B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the technical field of deep learning, in particular to a method, a system, equipment and a storage medium for reducing resource occupation in an NPU (non-point processing unit), wherein the method comprises the steps of obtaining information to be processed, preprocessing the information to be processed and extracting preliminary features to generate a feature matrix corresponding to the preliminary features; sequencing according to a preset calculation sequence to obtain a sequencing matrix group containing a specific sequencing sequence; screening all feature matrixes of the sorting matrix group, generating a basic matrix group and then sending the basic matrix group into a cache unit of the neural network processor; and positioning all the feature matrixes of the sequencing matrix group, generating positioning information only containing the position information of each feature matrix, sending the positioning information to a neural network processor for analysis, and calling the corresponding feature matrix in the basic matrix group from a cache unit according to the analyzed positioning information to perform association calculation. The invention can reduce the occupation of computing resources and improve the computing speed.

Description

Method, system, equipment and storage medium for reducing resource occupation in NPU
Technical Field
The present invention relates to the field of deep learning technologies, and in particular, to a method, a system, an apparatus, and a storage medium for reducing resource occupation in an NPU.
Background
The time-consuming operators in the deep learning model are often convolution operations, and the essence of convolution is multiplication and addition calculation of a matrix, so that the aim of accelerating deep learning training and reasoning can be achieved by accelerating the multiplication and addition calculation of the matrix through an NPU (embedded neural network processor); however, for NPUs, especially for image convolution, it is mainly the operation of accelerated convolution, and the convolution occupies more computing resources in the process, so that the computing speed is slower, and thus a method and a system for reducing the computing resource occupation in the NPUs are needed.
Disclosure of Invention
The invention aims to provide a method, a system, equipment and a storage medium for reducing resource occupation in NPU, which can reduce occupation of computing resources and improve computing speed.
To achieve the purpose, the invention adopts the following technical scheme:
a method of reducing resource occupancy in an NPU, comprising the steps of:
s1, acquiring information to be processed, and preprocessing the information to be processed, wherein the information to be processed comprises image or video information;
s2, carrying out preliminary feature extraction on the preprocessed information to generate a feature matrix corresponding to the preliminary features;
s3, sequencing all the feature matrixes according to a preset calculation sequence to obtain a sequencing matrix group containing a specific sequencing sequence;
s4, screening all feature matrixes of the sorting matrix group to generate a basic matrix group, and sending the basic matrix group into a cache unit of the neural network processor;
s5, positioning all feature matrixes of the sorting matrix group, generating positioning information only containing the position information of each feature matrix, and sending the positioning information to the neural network processor;
and S6, analyzing the positioning information by the neural network processor, and calling the corresponding feature matrix in the basic matrix group from the cache unit according to the analyzed positioning information to perform association calculation.
Preferably, in S1, the preprocessing of the information to be processed specifically includes the following steps:
s11, judging that the information to be processed is image or video information, if the information to be processed is the image information, executing S12, and if the information to be processed is the video information, executing S13;
s12, converting the format of the image information into an RMVB format;
s13, extracting key frames of the video information, obtaining key frame images, and converting the formats of the key frame images into RMVB formats.
Preferably, in S2, the preliminary feature extraction is performed on the preprocessed information to generate a feature matrix corresponding to the preliminary feature, which specifically includes the following steps:
s21, performing color channel separation on the image obtained after pretreatment, and separating the image into a red channel image, a green channel image and a blue channel image;
s22, carrying out preliminary feature extraction on the separated image through an HOG image feature extraction algorithm to obtain an image with preliminary features;
s23, importing the image with the preliminary features into a preset feature conversion model to generate a feature matrix corresponding to the preliminary features.
Preferably, in S22, the HOG image feature extraction algorithm specifically includes the following steps:
s221, carrying out graying, gamma correction, overlapping fast normalization and segmentation on the separated image;
s222, calculating the gradient direction and the gradient amplitude of each pixel point of the segmented image, and calculating and forming a direction gradient histogram according to the gradient direction and the gradient amplitude of each pixel point;
s223, convolving the directional gradient histogram by utilizing the kernel of the gradient operator, extracting HOG features, and connecting the extracted HOG features first to form preliminary features.
Preferably, in S4, the filtering all feature matrices of the sorting matrix set specifically includes the following steps:
and reserving one of the same feature matrixes to generate a basic matrix group with different feature matrixes, wherein the basic matrix group comprises the types of all the feature matrixes in the sorting matrix group.
Preferably, in S5, the positioning information includes a separation symbol for separating different feature matrices.
Preferably, in S6, the neural network processor parses the positioning information, and invokes a corresponding feature matrix in the base matrix set from the cache unit according to the parsed positioning information to perform association calculation, and specifically includes the following steps:
s61, the neural network processor separates the positioning information into a plurality of characteristic positioning information corresponding to different characteristics in the image or the video according to the separation symbols in the positioning information;
s62, respectively calling the feature matrixes from the cache units according to the sequence of all the feature positioning information to calculate.
A system for reducing computing resource occupancy in an NPU, implementing a method for reducing computing resource occupancy in an NPU as described above, the system comprising:
the reading module is used for acquiring information to be processed and preprocessing the information to be processed, wherein the information to be processed comprises image or video information;
the initial extraction module is used for sending the preprocessed information to the central processing unit for carrying out initial feature extraction and generating a feature matrix corresponding to the initial feature;
the sorting module is used for sorting all the feature matrixes according to a preset calculation sequence to obtain a sorting matrix group containing a specific arrangement sequence;
the matrix extraction module is used for screening all feature matrices of the ordering matrix group, generating a basic matrix group and sending the basic matrix group into a buffer unit of the neural network processor;
the positioning information conversion module is used for positioning all the feature matrixes of the sequencing matrix group, generating positioning information only containing the position information of each feature matrix, and sending the positioning information to the calculation module of the neural network processor;
and the calculation module is used for analyzing the positioning information, and calling the corresponding feature matrix from the cache unit according to the analyzed positioning information to calculate.
An apparatus comprising at least one processor, at least one memory, and a data bus, the processor comprising a central processor and a neural network processor; wherein: the central processing unit, the neural network processor and the memory complete mutual communication through the data bus; the memory stores program instructions for execution by the processor, the at least one processor invoking the program instructions to perform a method of reducing computing resource usage in an NPU as described above.
A storage medium having stored thereon a computer program which when executed by at least one processor implements a method of reducing computing resource occupation in an NPU as described above.
One of the above technical solutions has the following beneficial effects: aiming at the problem of computing resource occupation in NPU, the design adopts a mode of setting a cache unit to store a matrix to be computed and adopting the same matrix to reserve one matrix, so that occupied storage is less, and the method of directly calling the matrix to be computed from the cache unit in real time in the computing process reduces the occupation of computing resources and improves the computing speed.
Drawings
FIG. 1 is a flow chart of a method of reducing computing resource usage in an NPU in accordance with the present invention;
FIG. 2 is a schematic diagram of a system for reducing computing resource usage in an NPU in accordance with the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to the present invention.
In the accompanying drawings: 1. a reading module; 2. an initial extraction module; 3. a sequencing module; 4. a matrix extraction module; 5. a positioning information conversion module; 6. a computing module; 7. a central processing unit; 8. a neural network processor; 9. a memory; 10. a data bus.
Detailed Description
The technical scheme of the invention is further described below by the specific embodiments with reference to the accompanying drawings.
Example 1
Referring to fig. 1, in order to solve the problem of computing resource occupation in NPU, the present design stores a matrix to be computed by setting a buffer unit, and reserves one matrix in the same manner, so that the occupied storage is less, and in a manner of sorting the matrix to be computed, the occupation of computing resource is reduced by directly calling from the buffer unit in real time in the computing process, thereby improving the computing speed. The method specifically comprises the following steps:
specifically, for image and adaptation acquisition, a client or front page may be set to allow a user to input image or video information to be processed.
S1, acquiring information to be processed, and preprocessing the information to be processed, wherein the information to be processed comprises image or video information;
s11, judging that the information to be processed is image or video information, if the information to be processed is the image information, executing S12, and if the information to be processed is the video information, executing S13;
s12, converting the format of the image information into an RMVB format;
s13, extracting key frames of the video information, obtaining key frame images, and converting the formats of the key frame images into RMVB formats. Because the video cannot directly acquire image information, the image acquisition is carried out on the video by adopting a keyword extraction mode.
In some embodiments of the invention, the picture format is in RMVB format. In order to enable the deep learning of picture classification to grasp more characteristics, high definition pictures are adopted as much as possible, and high definition refers to images above 720P and 720P, and the formats of the high definition pictures are usually AVI format, RMVB format or MPEG format. Among them, the compression rate of RMVB is higher than that of the three, but the information contained in RMVB is much more than that of AVI format and MPEG format. Thereby adopting RMVB format.
S2, carrying out preliminary feature extraction on the preprocessed information to generate a feature matrix corresponding to the preliminary features;
s21, performing color channel separation on the image obtained after pretreatment, and separating the image into a red channel image, a green channel image and a blue channel image; the image in the information to be processed adopts RGB image, and as the image in RGB mode is illuminant color mode and additive color mode, the color expression range is wide, and more technical characteristics can be improved for image identification.
S22, carrying out preliminary feature extraction on the separated image through an HOG image feature extraction algorithm to obtain an image with preliminary features;
the HOG image feature extraction algorithm specifically comprises the following steps of:
s221, carrying out graying, gamma correction, overlapping fast normalization and segmentation on the separated image;
s222, calculating the gradient direction and the gradient amplitude of each pixel point of the segmented image, and calculating and forming a direction gradient histogram according to the gradient direction and the gradient amplitude of each pixel point;
s223, convolving the directional gradient histogram by utilizing the kernel of the gradient operator, extracting HOG features, and connecting the extracted HOG features first to form preliminary features.
S23, importing the image with the preliminary features into a preset feature conversion model to generate a feature matrix corresponding to the preliminary features.
S3, sequencing all the feature matrixes according to a preset calculation sequence to obtain a sequencing matrix group containing a specific sequencing sequence;
in order to facilitate positioning of the feature matrices, the feature matrices are required to be arranged according to a certain sequence, and the specific arrangement mode of the sequence is required to be automatically ordered according to the sequence required in calculation, for example, the sequence of matrix feature retrieval is calculated.
S4, screening all feature matrixes of the sorting matrix group, specifically, reserving one of the same feature matrixes, generating a basic matrix group with different feature matrixes, wherein the basic matrix group comprises the types of all feature matrixes in the sorting matrix group, and then sending the basic matrix group into a cache unit of a neural network processor 8;
after the sorting is completed, feature matrices of all components in the matrix group are screened, for example, the feature matrices comprise the following third-order matrices:
Figure SMS_1
、/>
Figure SMS_2
、/>
Figure SMS_3
、/>
Figure SMS_4
、/>
Figure SMS_5
after screening, it gave:
Figure SMS_6
、/>
Figure SMS_7
、/>
Figure SMS_8
all matrix constituent elements within the ordered matrix group are thus obtained. Compared with the prior art, all the matrixes containing the repetition are sequentially transmitted to the NPU for calculation, so that occupied resources are saved.
S5, positioning all feature matrixes of the sorting matrix group, generating positioning information only containing position information of each feature matrix, and sending the positioning information to the neural network processor 8, wherein the positioning information comprises separation symbols for separating different feature matrixes;
after the extraction, the matrix in the buffer unit cannot be calculated according to the correct sequence, so that each feature matrix is positioned according to the sequence, namely, the same feature matrix in the buffer unit of each feature matrix in the matrix group is positioned in an associated mode.
And S6, the neural network processor 8 analyzes the positioning information, and according to the analyzed positioning information, the corresponding feature matrix in the basic matrix group is called from the cache unit to perform association calculation.
S61, the neural network processor 8 separates the positioning information into a plurality of characteristic positioning information corresponding to different characteristics in the image or the video according to the separation symbols in the positioning information;
in the deep learning process, multiple features are needed to be operated simultaneously in many cases, and more than one feature may be included in the positioning information, so that the positioning information is distinguished by using separation symbols. And then after the separation symbol is read, the separation symbol is separated into a plurality of pieces of positioning information to be calculated, and each feature is distinguished.
S62, respectively calling the feature matrixes from the cache units according to the sequence of all the feature positioning information to calculate. Therefore, during calculation, the NPU directly calls the matrix in the cache unit to calculate according to the preset positioning information, so that occupation of calculation resources is reduced, and calculation speed is improved.
It should be noted that the neural network processor is the NPU.
Example 2
Referring to fig. 2, a system for reducing computing resource occupation in an NPU according to the present invention implements a method for reducing computing resource occupation in an NPU as described above, where the system includes:
the reading module 1 is used for acquiring information to be processed, and preprocessing the information to be processed, wherein the information to be processed comprises image or video information;
the initial extraction module 2 is used for carrying out initial feature extraction on the preprocessed information and generating a feature matrix corresponding to the initial feature;
the sorting module 3 is used for sorting all the feature matrixes according to a preset calculation sequence to obtain a sorting matrix group containing a specific arrangement sequence;
the matrix extraction module 4 is used for screening all feature matrices of the sorting matrix group, generating a basic matrix group, and sending the basic matrix group into a cache unit of the neural network processor;
the positioning information conversion module 5 is used for positioning all feature matrixes of the sequencing matrix group, generating positioning information only containing the position information of each feature matrix, and sending the positioning information to the calculation module 6 of the neural network processor;
and the calculation module 6 is used for analyzing the positioning information, and calling the corresponding feature matrix from the cache unit according to the analyzed positioning information to calculate.
Example 3
Referring to fig. 3, an electronic device according to the present invention includes at least one processor, at least one memory 9 and a data bus 10, where the processor includes a central processor 7 and a neural network processor 8; wherein: the central processing unit 7, the neural network processor 8 and the memory 9 complete the communication with each other through a data bus 10; the memory 9 stores program instructions executable by the processor, the at least one processor invoking the program instructions to perform a method of reducing computing resource usage in an NPU. For example, implementation:
the central processing unit 7 acquires information to be processed, and preprocesses the information to be processed, wherein the information to be processed comprises image or video information; extracting preliminary features of the preprocessed information to generate a feature matrix corresponding to the preliminary features; sequencing all the feature matrixes according to a preset calculation sequence to obtain a sequencing matrix group containing a specific sequencing sequence; screening all feature matrixes of the sorting matrix group to generate a basic matrix group, and sending the basic matrix group to a cache unit of the neural network processor 8; positioning all feature matrixes of the sequencing matrix group, generating positioning information only containing the position information of each feature matrix, and sending the positioning information to the neural network processor 8; the neural network processor 8 analyzes the positioning information, and invokes the corresponding feature matrix in the basic matrix group from the cache unit according to the analyzed positioning information to perform association calculation.
Example 4
The present invention provides a computer readable storage medium having stored thereon a computer program which when executed by the at least one processor implements a method of reducing computing resource occupation in an NPU. For example, implementation:
the central processing unit 7 acquires information to be processed, and preprocesses the information to be processed, wherein the information to be processed comprises image or video information; extracting preliminary features of the preprocessed information to generate a feature matrix corresponding to the preliminary features; sequencing all the feature matrixes according to a preset calculation sequence to obtain a sequencing matrix group containing a specific sequencing sequence; screening all feature matrixes of the sorting matrix group to generate a basic matrix group, and sending the basic matrix group to a cache unit of the neural network processor 8; positioning all feature matrixes of the sequencing matrix group, generating positioning information only containing the position information of each feature matrix, and sending the positioning information to the neural network processor 8; the neural network processor 8 analyzes the positioning information, and invokes the corresponding feature matrix in the basic matrix group from the cache unit according to the analyzed positioning information to perform association calculation.
The Memory 9 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The technical principle of the present invention is described above in connection with the specific embodiments. The description is made for the purpose of illustrating the general principles of the invention and should not be taken in any way as limiting the scope of the invention. Other embodiments of the invention will occur to those skilled in the art from consideration of this specification without the exercise of inventive faculty, and such equivalent modifications and alternatives are intended to be included within the scope of the invention as defined in the claims.

Claims (10)

1. A method for reducing computing resource occupancy in an NPU, comprising the steps of:
s1, acquiring information to be processed, and preprocessing the information to be processed, wherein the information to be processed comprises image or video information;
s2, carrying out preliminary feature extraction on the preprocessed information to generate a feature matrix corresponding to the preliminary features;
s3, sequencing all the feature matrixes according to a preset calculation sequence to obtain a sequencing matrix group containing a specific sequencing sequence;
s4, screening all feature matrixes of the sorting matrix group to generate a basic matrix group, and sending the basic matrix group into a cache unit of the neural network processor;
s5, positioning all feature matrixes of the sorting matrix group, generating positioning information only containing the position information of each feature matrix, and sending the positioning information to the neural network processor;
and S6, analyzing the positioning information by the neural network processor, and calling the corresponding feature matrix in the basic matrix group from the cache unit according to the analyzed positioning information to perform association calculation.
2. The method for reducing computing resource occupation in an NPU according to claim 1, wherein in S1, the preprocessing of the information to be processed specifically includes the following steps:
s11, judging that the information to be processed is image or video information, if the information to be processed is the image information, executing S12, and if the information to be processed is the video information, executing S13;
s12, converting the format of the image information into an RMVB format;
s13, extracting key frames of the video information, obtaining key frame images, and converting the formats of the key frame images into RMVB formats.
3. The method for reducing computing resource occupation in NPU according to claim 2, wherein in S2, the preliminary feature extraction is performed on the preprocessed information to generate a feature matrix corresponding to the preliminary feature, and the method specifically includes the following steps:
s21, performing color channel separation on the image obtained after pretreatment, and separating the image into a red channel image, a green channel image and a blue channel image;
s22, carrying out preliminary feature extraction on the separated image through an HOG image feature extraction algorithm to obtain an image with preliminary features;
s23, importing the image with the preliminary features into a preset feature conversion model to generate a feature matrix corresponding to the preliminary features.
4. A method for reducing computing resource occupation in an NPU according to claim 3, wherein in S22, the HOG image feature extraction algorithm comprises the steps of:
s221, carrying out graying, gamma correction, overlapping fast normalization and segmentation on the separated image;
s222, calculating the gradient direction and the gradient amplitude of each pixel point of the segmented image, and calculating and forming a direction gradient histogram according to the gradient direction and the gradient amplitude of each pixel point;
s223, convolving the directional gradient histogram by utilizing the kernel of the gradient operator, extracting HOG features, and connecting the extracted HOG features first to form preliminary features.
5. The method for reducing computing resource occupation in an NPU according to claim 4, wherein in S4, the filtering all feature matrices of the rank matrix set specifically comprises the steps of:
and reserving one of the same feature matrixes to generate a basic matrix group with different feature matrixes, wherein the basic matrix group comprises the types of all the feature matrixes in the sorting matrix group.
6. The method of claim 5, wherein in S5 the positioning information includes separation symbols for separating different feature matrices.
7. The method for reducing computing resource occupation in NPU according to claim 6, wherein in S6, the neural network processor parses the positioning information, and invokes the corresponding feature matrix in the basic matrix set from the cache unit according to the parsed positioning information to perform the association computation, specifically comprising the following steps:
s61, the neural network processor separates the positioning information into a plurality of characteristic positioning information corresponding to different characteristics in the image or the video according to the separation symbols in the positioning information;
s62, respectively calling the feature matrixes from the cache units according to the sequence of all the feature positioning information to calculate.
8. A system for reducing computing resource occupancy in an NPU, wherein a method for reducing computing resource occupancy in an NPU is implemented as recited in any of claims 1-7, the system comprising:
the reading module is used for acquiring information to be processed and preprocessing the information to be processed, wherein the information to be processed comprises image or video information;
the initial extraction module is used for carrying out initial feature extraction on the preprocessed information and generating a feature matrix corresponding to the initial feature;
the sorting module is used for sorting all the feature matrixes according to a preset calculation sequence to obtain a sorting matrix group containing a specific arrangement sequence;
the matrix extraction module is used for screening all feature matrices of the ordering matrix group, generating a basic matrix group and sending the basic matrix group into a buffer unit of the neural network processor;
the positioning information conversion module is used for positioning all the feature matrixes of the sequencing matrix group, generating positioning information only containing the position information of each feature matrix, and sending the positioning information to the calculation module of the neural network processor;
and the calculation module is used for analyzing the positioning information, and calling the corresponding feature matrix from the cache unit according to the analyzed positioning information to calculate.
9. An apparatus comprising at least one processor, at least one memory, and a data bus, the processor comprising a central processor and a neural network processor; wherein: the central processing unit, the neural network processor and the memory complete mutual communication through the data bus; the memory stores program instructions for execution by the processor, the at least one processor invoking the program instructions to perform a method of reducing computing resource usage in an NPU as recited in any of claims 1-7.
10. A storage medium having stored thereon a computer program which, when executed by at least one processor, implements a method of reducing computing resource usage in an NPU as claimed in any of claims 1 to 7.
CN202310422636.4A 2023-04-20 2023-04-20 Method, system, equipment and storage medium for reducing resource occupation in NPU Active CN116126548B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310422636.4A CN116126548B (en) 2023-04-20 2023-04-20 Method, system, equipment and storage medium for reducing resource occupation in NPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310422636.4A CN116126548B (en) 2023-04-20 2023-04-20 Method, system, equipment and storage medium for reducing resource occupation in NPU

Publications (2)

Publication Number Publication Date
CN116126548A true CN116126548A (en) 2023-05-16
CN116126548B CN116126548B (en) 2023-08-01

Family

ID=86312202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310422636.4A Active CN116126548B (en) 2023-04-20 2023-04-20 Method, system, equipment and storage medium for reducing resource occupation in NPU

Country Status (1)

Country Link
CN (1) CN116126548B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103825845A (en) * 2014-03-17 2014-05-28 北京航空航天大学 Matrix decomposition-based packet scheduling algorithm of reconfigurable VOQ (virtual output queuing) structure switch
CN108205680A (en) * 2017-12-29 2018-06-26 深圳云天励飞技术有限公司 Image characteristics extraction integrated circuit, method, terminal
CN109190756A (en) * 2018-09-10 2019-01-11 中国科学院计算技术研究所 Arithmetic unit based on Winograd convolution and the neural network processor comprising the device
US20200279153A1 (en) * 2019-03-01 2020-09-03 Microsoft Technology Licensing, Llc Deriving a concordant software neural network layer from a quantized firmware neural network layer
CN113469350A (en) * 2021-07-07 2021-10-01 武汉魅瞳科技有限公司 Deep convolutional neural network acceleration method and system suitable for NPU
CN115734091A (en) * 2021-08-27 2023-03-03 思特威(上海)电子科技股份有限公司 Image sensor, image processing method, terminal, and computer storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103825845A (en) * 2014-03-17 2014-05-28 北京航空航天大学 Matrix decomposition-based packet scheduling algorithm of reconfigurable VOQ (virtual output queuing) structure switch
CN108205680A (en) * 2017-12-29 2018-06-26 深圳云天励飞技术有限公司 Image characteristics extraction integrated circuit, method, terminal
CN109190756A (en) * 2018-09-10 2019-01-11 中国科学院计算技术研究所 Arithmetic unit based on Winograd convolution and the neural network processor comprising the device
US20200279153A1 (en) * 2019-03-01 2020-09-03 Microsoft Technology Licensing, Llc Deriving a concordant software neural network layer from a quantized firmware neural network layer
CN113469350A (en) * 2021-07-07 2021-10-01 武汉魅瞳科技有限公司 Deep convolutional neural network acceleration method and system suitable for NPU
CN115734091A (en) * 2021-08-27 2023-03-03 思特威(上海)电子科技股份有限公司 Image sensor, image processing method, terminal, and computer storage medium

Also Published As

Publication number Publication date
CN116126548B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
US10282643B2 (en) Method and apparatus for obtaining semantic label of digital image
CN111950723A (en) Neural network model training method, image processing method, device and terminal equipment
KR100708130B1 (en) Apparatus and method for extracting moving image
CN107943811B (en) Content publishing method and device
CN110930327B (en) Video denoising method based on cascade depth residual error network
CN113051236B (en) Method and device for auditing video and computer-readable storage medium
CN114494981B (en) Action video classification method and system based on multi-level motion modeling
CN111476124A (en) Camera detection method and device, electronic equipment and system
CN111428590B (en) Video clustering segmentation method and system
CN112163120A (en) Classification method, terminal and computer storage medium
CN112188236B (en) Video interpolation frame model training method, video interpolation frame generation method and related device
CN109871790B (en) Video decoloring method based on hybrid neural network model
CN111310516B (en) Behavior recognition method and device
CN105979283A (en) Video transcoding method and device
CN116126548B (en) Method, system, equipment and storage medium for reducing resource occupation in NPU
EP4047547A1 (en) Method and system for removing scene text from images
CN112532938B (en) Video monitoring system based on big data technology
CN112288748B (en) Semantic segmentation network training and image semantic segmentation method and device
CN111241365B (en) Table picture analysis method and system
CN108921792B (en) Method and device for processing pictures
CN110163043B (en) Face detection method, device, storage medium and electronic device
CN108287817B (en) Information processing method and device
CN112016554B (en) Semantic segmentation method and device, electronic equipment and storage medium
CN110647898A (en) Image processing method, image processing device, electronic equipment and computer storage medium
US11024067B2 (en) Methods for dynamic management of format conversion of an electronic image and devices thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant