CN113012689A - Electronic equipment and deep learning hardware acceleration method - Google Patents

Electronic equipment and deep learning hardware acceleration method Download PDF

Info

Publication number
CN113012689A
CN113012689A CN202110407570.2A CN202110407570A CN113012689A CN 113012689 A CN113012689 A CN 113012689A CN 202110407570 A CN202110407570 A CN 202110407570A CN 113012689 A CN113012689 A CN 113012689A
Authority
CN
China
Prior art keywords
information
chip
processed
neural network
deep neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110407570.2A
Other languages
Chinese (zh)
Other versions
CN113012689B (en
Inventor
韩大强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Aich Technology Co Ltd
Original Assignee
Chengdu Aich Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Aich Technology Co Ltd filed Critical Chengdu Aich Technology Co Ltd
Priority to CN202110407570.2A priority Critical patent/CN113012689B/en
Publication of CN113012689A publication Critical patent/CN113012689A/en
Application granted granted Critical
Publication of CN113012689B publication Critical patent/CN113012689B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an electronic device and a deep learning hardware acceleration method, and relates to the technical field of electronics. The electronic device includes: the system-on-chip comprises a hardware acceleration unit and an off-chip memory which is communicated with the system-on-chip; the off-chip memory is used for storing the deep neural network parameters; the system on chip is used for acquiring information to be processed and deep neural network parameters; the hardware acceleration unit in the system on chip is used for processing information to be processed by adopting a deep neural network based on the deep neural network parameters, obtaining an information identification result and generating an identification end signal; the system on chip is also used for responding to the recognition end signal and reading the information recognition result. The method is applied to the electronic equipment. The method provided by the invention is used for information identification, and can improve the timeliness of the electronic equipment for acquiring information, thereby improving the overall performance and the processing efficiency of the electronic equipment.

Description

Electronic equipment and deep learning hardware acceleration method
Technical Field
The invention relates to the technical field of electronics, in particular to an electronic device and a deep learning hardware acceleration method.
Background
With the development of the electronic technology field, the speech processing technology has also been rapidly developed as an important branch of the electronic technology field.
At present, algorithms such as signal processing and deep learning need to be applied to voice processing, but a general-purpose chip generally adopts a von neumann structure (sequentially executes instruction fetching, data reading, operation and data writing), so that the real-time processing requirement on voice is difficult to meet, and the computational power requirement on a processing system is high due to the algorithms such as signal processing and deep learning, so that the system power consumption is high, and the cost is high.
Disclosure of Invention
The invention aims to provide an electronic device and a deep learning hardware acceleration method, and aims to solve the problems that the existing voice processing method is difficult to meet the real-time processing requirement on voice, and the system power consumption is high and the cost is high due to the fact that algorithms such as signal processing and deep learning have high calculation force requirement on a processing system.
In a first aspect, the present invention provides an electronic device comprising: the system-on-chip comprises a hardware acceleration unit and an off-chip memory which is communicated with the system-on-chip;
the off-chip memory is used for storing the deep neural network parameters;
the system on chip is used for acquiring information to be processed and the deep neural network parameters;
the hardware acceleration unit in the system on chip is used for processing the information to be processed by adopting a deep neural network based on the deep neural network parameters, obtaining an information identification result and generating an identification end signal;
the system on chip is also used for responding the identification end signal and reading the information identification result.
Under the condition of adopting the technical scheme, the acceleration processing of the information to be processed can be realized through the system on chip comprising the hardware acceleration unit, further, the real-time processing requirement on the voice can be met, the requirement of the information processing on the calculation power of the system is reduced, the power consumption of the system is reduced, and further, the cost is reduced.
In a possible implementation manner, after the off-chip controller is configured to obtain information to be processed, the system on chip is further configured to, before obtaining an information identification result, process the information to be processed by using a deep neural network based on the deep neural network parameter, further:
and extracting the characteristics of the information to be processed to obtain the information to be processed after the characteristics are extracted.
In a possible implementation manner, the information to be processed is a multidimensional characteristic vector in m rows and n columns, and the characteristic vector values of at least two first columns corresponding to at least two consecutive multidimensional characteristic vectors are consecutive.
In a possible implementation manner, the time from processing the information to be processed to obtaining the information identification result is M/N seconds, where M is a total data amount of a coefficient matrix corresponding to the system on chip, and N represents a bandwidth of the off-chip memory.
In one possible implementation, the system-on-chip has a first memory area and a second memory area, and the deep neural network has a first memory area and a second memory areaKA layer full interconnect layer; first, theiThe output data of the layer full link layer is stored in the first storage areai-The output data of the 1-layer full connection layer is stored in a second storage area, and the output data is not less than 2iK
In a possible implementation manner, the input data of each fully-connected layer is the deep neural network parameter, the information to be processed, and the historical output data of the last fully-connected layer corresponding to the fully-connected layer, and the input data and the output data of each fully-connected layer satisfy: the output data is the product of the deep neural network parameters and the information to be processed, and the product is the sum of the output data of the upper full-connection layer and the output data of the upper full-connection layer.
In a possible implementation manner, the fully-connected layer includes at least two hidden layers, the deep neural network parameters satisfy, between the information to be processed and the information recognition result:
Figure 736367DEST_PATH_IMAGE001
(ii) a Wherein, the
Figure 95673DEST_PATH_IMAGE002
Representing the information recognition result; the above-mentioned
Figure 719553DEST_PATH_IMAGE003
Representing the information to be processed; the above-mentioned
Figure 785598DEST_PATH_IMAGE004
Representing the deep neural network parameters; the above-mentioned
Figure 100002_DEST_PATH_IMAGE005
Representing the hidden layer data corresponding to the hidden layer; the above-mentioned
Figure 617156DEST_PATH_IMAGE006
Representing output parameters corresponding to the output data; the above-mentionediAnd saidKThe number of layers of the hidden layer is represented.
In a second aspect, the present invention further provides a deep learning hardware acceleration method, applied to an electronic device having an on-chip system including a hardware acceleration unit and an off-chip memory in communication with the on-chip system, the method including:
the off-chip memory stores deep neural network parameters;
the system on chip acquires information to be processed and the deep neural network parameters;
the hardware acceleration unit in the system on chip adopts a deep neural network to process the information to be processed based on the deep neural network parameters to obtain an information identification result and generate an identification end signal;
and the system on chip responds to the identification ending signal and reads the information identification result.
The beneficial effect of the deep learning hardware acceleration method provided by the second aspect is the same as that of the electronic device described in the first aspect or any possible implementation manner of the first aspect, and details are not repeated here.
In a third aspect, the present invention further provides a computer storage medium, where instructions are stored, and when the instructions are executed, the deep learning hardware acceleration method described in the second aspect or any possible implementation manner of the second aspect is implemented.
The beneficial effect of the computer storage medium provided by the third aspect is the same as that of the deep learning hardware acceleration method described in the second aspect or any possible implementation manner of the second aspect, and details are not repeated here.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
fig. 1 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an electronic device in a speech recognition scenario according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a speech feature parameter provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating a deep neural network according to an embodiment of the present disclosure;
FIG. 5 is a diagram illustrating an operation storage structure according to an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating a saving scenario of a feature vector according to an embodiment of the present application;
FIG. 7 is a flowchart illustrating a deep learning hardware acceleration method according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a chip according to an embodiment of the present invention.
Detailed Description
In order to facilitate clear description of technical solutions of the embodiments of the present invention, in the embodiments of the present invention, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. For example, the first threshold and the second threshold are only used for distinguishing different thresholds, and the sequence order of the thresholds is not limited. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.
It is to be understood that the terms "exemplary" or "such as" are used herein to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
In the present invention, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a and b combination, a and c combination, b and c combination, or a, b and c combination, wherein a, b and c can be single or multiple.
Fig. 1 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 1, the electronic device includes: a system-on-chip 10 including a hardware acceleration unit 101 and an off-chip memory 20 in communication with the system-on-chip 10;
the off-chip memory 20 is used for storing deep neural network parameters;
the system on chip 10 is configured to obtain information to be processed and a deep neural network parameter;
the hardware acceleration unit 101 in the system on chip 10 is configured to process information to be processed by using a deep neural network based on a deep neural network parameter, obtain an information recognition result, and generate a recognition end signal;
the system-on-chip 10 is further adapted to read the information recognition result in response to the recognition end signal.
For example, the information to be processed may be voice information, and in an application scenario of accelerated recognition of voice information, fig. 2 illustrates a schematic diagram of an electronic device in a voice recognition scenario provided by an embodiment of the present invention, as shown in fig. 2, a system on chip may include a Deep Neural Network (DNN) hardware acceleration engine 01, a Central Processing Unit (CPU) 02, and a Random Access Memory (RAM) 03, a Mel-frequency cepstral coefficients (MFCC) 07, a Flash controller 08, and a Dynamic Random Access Memory (DRAM) controller 09, and an off-chip Memory may include a Flash Memory (Flash) 04, an INPUT first-in first-out (int put FIFO) 05, and an OUTPUT first-in first-out (out) FIFO 06.
The CPU can configure related parameters, and read DNN coefficient vectors and vectors corresponding to hidden layers from a Flash memory, namely, read deep neural network parameters, wherein the DNN coefficient vectors comprise coefficient vectors corresponding to an INPUT layer, each hidden layer, output and the like, and store all the vectors in an on-chip RAM, and the MFCC can extract voice characteristic parameters of INPUT voice and store the voice characteristic parameters (namely, information to be processed) in an INPUT FIFO. And starting the DNN hardware acceleration engine to perform matrix operation under the condition that the effective data in the FIFO reaches a certain preset number and the DNN is in an idle state. The DNN acceleration engine receives the voice characteristic parameters through the INPUT FIFO and reads coefficient matrix parameters required by matrix operation through the Flash controller. The matrix operation result (i.e. the information recognition result) is stored in OUTPUT FIFO, so that the CPU reads the matrix operation result and decodes the matrix operation result to obtain the recognized voice.
The electronic device provided by the embodiment of the invention can realize the accelerated processing of the information to be processed through the system on chip comprising the hardware acceleration unit, further can meet the real-time processing requirement on voice, reduces the requirement of the information processing on the computing power of the system, reduces the power consumption of the system, and further reduces the cost.
Optionally, after the system on chip 10 is configured to obtain the information to be processed, the information to be processed is processed by using the deep neural network based on the deep neural network parameter, and before the information identification result is obtained, the system on chip is further configured to: and performing feature extraction on the information to be processed to obtain the information to be processed after feature extraction.
When the information to be processed is Voice information, after the off-chip controller obtains the information to be processed, the Voice information may be denoised first, and specifically, the Voice information may be divided into Voice frames and noise frames by using Voice Activity Detection (VAD) technology by using a logarithmic spectrum distance method.
Optionally, the information to be processed is multidimensional characteristic vectors in m rows and n columns, and the characteristic vector values of at least two first columns corresponding to at least two continuous multidimensional characteristic vectors are continuous.
Each frame of voice information after the denoising processing is processed by the MFCC to obtain a plurality of eigenvector sequences, each eigenvector includes a node, the number of nodes of the plurality of eigenvector sequences is the same, fig. 3 shows a schematic diagram of a voice characteristic parameter provided by an embodiment of the present invention, as shown in fig. 3, a plurality of eigenvector sequences V1, V2, V3..
An off-chip controller of the electronic equipment reads the deep neural network parameters from an off-chip memory, namely reads the parameters required by the first layer operation of DNN from a Flash memory, the parameters can be understood as a parameter matrix B, and the parameters are stored in an on-chip RAM of the on-chip system.
The system on chip may include a DNN hardware acceleration engine, which may implement acceleration operation of a deep neural network with multiple hidden layers, fig. 4 shows a network structure diagram of a DNN provided in an embodiment of the present application, as shown in fig. 4, a model corresponding to the DNN hardware acceleration engine includes a first-level input layer W and a first-level output layer E,Kstage hidingIncluding the layers. Wherein, the hidden layer contains d nodes, and the output stage contains n nodes, and the basic formula of model includes:
Figure 214491DEST_PATH_IMAGE007
(1);
wherein the content of the first and second substances,
Figure 628679DEST_PATH_IMAGE008
indicating the information identification result;
Figure 221335DEST_PATH_IMAGE009
representing the information to be processed, namely a parameter matrix A;
Figure 177789DEST_PATH_IMAGE010
representing the parameters of the deep neural network, namely a parameter matrix B;
Figure 777267DEST_PATH_IMAGE005
representing the hidden layer data corresponding to the hidden layer;
Figure 8528DEST_PATH_IMAGE011
representing output parameters corresponding to the output data;iandKthe number of layers of the hidden layer is shown, i.e. the number of levels.
Figure 49165DEST_PATH_IMAGE012
(2);
Wherein the content of the first and second substances,arepresenting each frame of speech data;tthe number of rows of the parameter matrix is represented,ithe number of columns of the parameter matrix is indicated.
Figure 160210DEST_PATH_IMAGE013
(3);
Wherein the content of the first and second substances,bthe parameters of the input layer are represented,ithe number of rows of the parameter matrix is represented,dthe number of columns of the parameter matrix, i.e. the number of nodes, is represented.
Figure 732136DEST_PATH_IMAGE014
(4);
Wherein the content of the first and second substances,mthe parameters of the hidden layer are represented,dindicating the number of nodes of the hidden layer.
Figure 612761DEST_PATH_IMAGE015
(5);
Wherein the content of the first and second substances,fthe parameters of the output layer are represented,dthe number of rows of the parameter matrix is represented,nthe number of columns of the parameter matrix is indicated.
Optionally, the system-on-chip has a first memory area and a second memory area, and the deep neural network hasKA layer full interconnect layer; first, theiThe output data of the layer full link layer is stored in the first storage areai-The output data of the 1-layer full connection layer is stored in a second storage area, and the output data is not less than 2iK
Wherein the full connection layer comprises a first-level input layer and a first-level output layer,Ka level hidden layer.
Optionally, the input data of each fully-connected layer is the depth neural network parameter, the information to be processed and the historical output data of the last fully-connected layer corresponding to the fully-connected layer, and the input data and the output data of each fully-connected layer satisfy: the output data is the product of the deep neural network parameters and the information to be processed, and the product is the sum of the output data of the upper full-connection layer and the output data of the upper full-connection layer.
In this application, fig. 5 is a schematic diagram illustrating an operation storage structure provided by an embodiment of the present invention, as shown in fig. 5,ais one data node of the matrix a,bis a node of the matrix B and,c0the original value of the corresponding node in the matrix C, namely C is a historical information matrix, can realize the separation and array processing of the calculation and control algorithm logic, can improve the reusability of the algorithm and the reusability of an operation unit, generates interruption after the operation of a DNN hardware acceleration engine is completed, generates an identification ending signal, informs a CPU to read the information identification result after DNN, namely the processed voice data, reads the information identification result, and decodes the information identification resultAnd obtaining the recognized voice.
For example, as shown in fig. 5, the input data (matrix C) of the input layer, that is, the previous layer, may be stored in G0(GROUP0), the deep neural network parameter (B) corresponding to the current layer is read from the off-chip memory, the input data (a and C) and the neural network parameter (B) are subjected to multiply-accumulate operation, and the operation result of a + B is stored in G1 (GROUP 1), so as to complete a layer of DNN operation, when the next layer is calculated, the output of G1 may be used as the input layer data and stored in G0 together with the operation result of the deep neural network parameter corresponding to the current layer read from the off-chip memory, so as to repeatedly implement multi-layer DNN fully-connected acceleration operation, reduce the dependence on off-chip memory, reduce the number of on-chip multipliers, solve the problem of memory walls in electronic devices, for example, deep learning chips, and greatly improve the system operational capability of the electronic devices, furthermore, the running power consumption of the system is reduced.
Optionally, a time from processing the information to be processed to obtaining the information identification result is M/N seconds, where M is a total data amount of the coefficient matrix corresponding to the system on chip, and N represents a bandwidth of the off-chip memory.
For example, the total data size of the coefficient matrix corresponding to the input layer, the hidden layer, the output and the like is M Bits (Bits), the average bandwidth for continuously reading the Flash memory is N Bits/second (bps), the time for reading all layer parameters into the memory is M/N seconds, and the DNN matrix calculation time is equal to the time for reading the matrix coefficients into the memory, so the time from the preparation of the input matrix a to the output matrix G is M/N seconds. Then, the minimum value of the number of lines (Slice) of the input matrix is M/N in the case of a slip frame of 1, and M/N/2 in the case of a slip frame of 2. When the input layer includes 990 nodes, the hidden layer includes 512 nodes, and the output layer includes 3072 nodes, the output layer coefficient matrix parameters include 32-bit floating point numbers, and the other layer parameters are 8-bit (bit) integer coefficients, the total data volume of the matrix is 990 × 512 × 32+512 × 8 × 5+3072 × 512 × 8=37M Bits, the effective bandwidth of the Flash memory is about 380M bps in practice, the maximum time delay is 38.5/380=0.1 seconds, that is, the processing time from the input of speech to the completion of DNN operation is 100 milliseconds.
By way of example, it is assumed that the sampling frequency QUOTE of the information to be processed is obtained
Figure DEST_PATH_IMAGE017
Figure 39063DEST_PATH_IMAGE017
After voice information is detected by VAD, the electronic equipment triggers the off-chip controller to read parameter data required by the operation of the INPUT layer from the Flash memory and stores the parameter data into the INPUT FIFO. The electronic device takes a frame as a unit, the input voice can perform voice feature parameter extraction on the input voice through the MFCC, a 90-dimensional floating-point voice feature vector is output, namely an input matrix A, in order to ensure the relevance of the data, the application processes multiple frames of data at the same time, for example, 90 × 11=990 (namely the number of nodes forming each row of the input matrix) is possible, and after the electronic device processes 11 continuous frames each time, the initial frame moves forward by one frame to be used as a new Slice for operation. Fig. 6 shows a schematic view of a storage scenario of a feature vector provided in this embodiment of the present application, as shown in fig. 6, a first feature vector V0 is stored in an SRAM0, a second feature vector V1 is stored in the SRAM1, and so on, a 16 th feature vector is stored in the SRAM15, and a 17 th feature vector is stored in the SRAM0, so that the loop is performed, thereby implementing ping-pong and loop operation combination of an off-chip SDRAM and an on-chip RAM, and improving reusability of an algorithm and a reuse rate of an arithmetic unit.
It should be noted that in the present application, the parameters of each Slice are the same, and a parallel processing manner of multiple slices may be adopted, that is, after multiple slices are accumulated to a preset degree, the DNN parameters are read once and parallel operation is performed, so that two indexes of system bandwidth and time delay may be balanced. The number range of the slices processed in parallel may be greater than or equal to 1 and less than or equal to 16, which is not specifically limited in the embodiment of the present application, and the marking adjustment may be performed according to an actual application scenario. The electronic device can read parameters required by the next set of Slice operation while operating the last set of Slice. And after the processing of all the slices is finished, sending an interrupt to a CPU, processing the slices frame by the CPU, finally sending the normalized data to the CPU for decoding to obtain the finally recognized voice, finishing the recognition processing, and repeatedly executing the electronic equipment after the latest voice data arrives.
The INPUT layer matrix A and the parameter matrix B in the INPUT FIFO can be subjected to full-network operation to obtain a first layer hidden layer M1And buffering the output result in the on-chip RAM. Hidden layer matrix MiPerforming full-network operation with parameter matrix F in INPUT FIFO to obtain Mi+1A hidden layer.
In the application, the operation result of each layer can be stored in the on-chip RAM and used as the input matrix of the next stage. The storage area corresponding to the original output matrix is used as the input matrix, the storage area corresponding to the original input matrix is used as the output matrix, and therefore the output matrix of the previous stage is used as the input matrix of the next stage in an alternating mode, the neural network hardware accelerated operation of any layer can be achieved, the problem of a memory wall in the electronic equipment is solved, the operation capacity of a system of the electronic equipment is greatly improved, and the operation power consumption of the system is further reduced.
For example, in the case of a number of layers of 5 for the hidden layer, the hidden layer matrix M may be repeatediPerforming full-network operation with parameter matrix F in INPUT FIFO to obtain Mi+1The hidden layer is five times hidden to obtain the last layer of input matrix M5Matrix M5And calculating with the output layer parameter matrix to obtain an output matrix G of DNN.
The electronic device provided by the embodiment of the invention can realize the accelerated processing of the information to be processed through the system on chip comprising the hardware acceleration unit, further can meet the real-time processing requirement on voice, reduces the requirement of the information processing on the computing power of the system, reduces the power consumption of the system, and further reduces the cost.
Fig. 7 is a flowchart illustrating a deep learning hardware acceleration method according to an embodiment of the present invention, which is applied to an electronic device including an on-chip system having a hardware acceleration unit and an off-chip memory in communication with the on-chip system. As shown in fig. 7, the deep learning hardware acceleration method includes:
step 301: the system on chip acquires information to be processed and deep neural network parameters.
After the off-chip controller obtains the information to be processed and the deep neural network parameters, step 302 is performed.
Step 302: the off-chip memory stores deep neural network parameters.
After the off-chip memory stores the deep neural network parameters, step 303 is performed.
Step 303: and a hardware acceleration unit in the system on chip processes the information to be processed by adopting the deep neural network based on the deep neural network parameters, obtains an information identification result and generates an identification ending signal.
After the off-chip controller processes the information to be processed by using the deep neural network based on the deep neural network parameters to obtain an information recognition result and generate a recognition end signal, step 304 is executed.
Step 304: and the system on chip responds to the recognition end signal and reads the information recognition result.
For example, the information to be processed may be voice information, and in an application scenario of accelerated recognition of voice information, fig. 2 shows a schematic diagram of an electronic device provided by an embodiment of the present invention in a voice recognition scenario, as shown in fig. 2, a system on chip may include a DNN hardware acceleration engine 01, a CPU02, an on chip RAM03, an MFCC07, a Flash controller 08, and a DRAM controller 09, and an off chip memory may include a Flash memory 04, an INPUT FIFO05, and an OUTPUT FIFO 06.
The CPU can configure related parameters, and read DNN coefficient vectors and vectors corresponding to hidden layers from a Flash memory, namely, read deep neural network parameters, wherein the DNN coefficient vectors comprise coefficient vectors corresponding to an INPUT layer, each hidden layer, output and the like, and store all the vectors in an on-chip RAM, and the MFCC can extract voice characteristic parameters of INPUT voice and store the voice characteristic parameters (namely, information to be processed) in an INPUT FIFO. And starting the DNN hardware acceleration engine to perform matrix operation under the condition that the effective data in the FIFO reaches a certain preset number and the DNN is in an idle state. The DNN acceleration engine receives the voice characteristic parameters through the INPUT FIFO and reads coefficient matrix parameters required by matrix operation through the Flash controller. The matrix operation result (i.e. the information recognition result) is stored in OUTPUT FIFO, so that the CPU reads the matrix operation result and decodes the matrix operation result to obtain the recognized voice.
The electronic device provided by the embodiment of the invention can realize the accelerated processing of the information to be processed through the system on chip comprising the hardware acceleration unit, further can meet the real-time processing requirement on voice, reduces the requirement of the information processing on the computing power of the system, reduces the power consumption of the system, and further reduces the cost.
The deep learning hardware acceleration method provided by the invention is applied to the electronic equipment comprising an on-chip system with a hardware acceleration unit and an off-chip memory communicated with the on-chip system, such as the electronic equipment shown in fig. 1 to 6, and is not repeated here for avoiding repetition.
The electronic device in the embodiment of the present invention may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiment of the present invention is not particularly limited.
The electronic device in the embodiment of the present invention may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present invention are not limited in particular.
Fig. 8 is a schematic diagram illustrating a hardware structure of an electronic device according to an embodiment of the present invention. As shown in fig. 8, the electronic device 400 includes a processor 410.
As shown in fig. 8, the processor 410 may be a general processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs according to the present invention.
As shown in fig. 8, the electronic device 400 may further include a communication line 440. Communication link 440 may include a path for transmitting information between the aforementioned components.
Optionally, as shown in fig. 8, the electronic device may further include a communication interface 420. The communication interface 420 may be one or more. Communication interface 420 may use any transceiver or the like for communicating with other devices or a communication network.
Optionally, as shown in fig. 8, the electronic device may further include a memory 430. The memory 430 is used to store computer-executable instructions for performing aspects of the present invention and is controlled for execution by the processor. The processor is used for executing the computer execution instructions stored in the memory, thereby realizing the method provided by the embodiment of the invention.
As shown in fig. 8, the memory 430 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a Random Access Memory (RAM) or other types of dynamic storage devices that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 430 may be separate and coupled to the processor 410 via a communication link 440. The memory 430 may also be integrated with the processor 410.
Optionally, the computer-executable instructions in the embodiment of the present invention may also be referred to as application program codes, which is not specifically limited in this embodiment of the present invention.
In particular implementations, as one embodiment, processor 410 may include one or more CPUs, such as CPU0 and CPU1 in fig. 8, as shown in fig. 8.
In one embodiment, as shown in fig. 8, the terminal device may include a plurality of processors, such as processor 410 and processor 450 in fig. 8. Each of these processors may be a single core processor or a multi-core processor.
Fig. 9 is a schematic structural diagram of a chip according to an embodiment of the present invention. As shown in fig. 9, the chip 500 includes one or more (including two) processors 510.
Optionally, as shown in fig. 9, the chip further includes a communication interface 520 and a memory 530, and the memory 530 may include a read-only memory and a random access memory and provide operating instructions and data to the processor. The portion of memory may also include non-volatile random access memory (NVRAM).
In some embodiments, as shown in FIG. 9, memory 530 stores elements, execution modules or data structures, or a subset thereof, or an expanded set thereof.
In the embodiment of the present invention, as shown in fig. 9, by calling an operation instruction stored in the memory (the operation instruction may be stored in the operating system), a corresponding operation is performed.
As shown in fig. 9, the processor 510 controls the processing operation of any one of the terminal devices, and the processor 510 may also be referred to as a Central Processing Unit (CPU).
As shown in fig. 9, memory 530 may include both read-only memory and random access memory, and provides instructions and data to the processor. A portion of the memory 530 may also include NVRAM. For example, in applications where the memory, communication interface, and memory are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 540 in fig. 9.
As shown in fig. 9, the method disclosed in the above embodiment of the present invention can be applied to a processor, or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an ASIC, an FPGA (field-programmable gate array) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
In one aspect, a computer-readable storage medium is provided, in which instructions are stored, and when executed, the instructions implement the functions performed by the terminal device in the above embodiments.
In one aspect, a chip is provided, where the chip is applied in a terminal device, and the chip includes at least one processor and a communication interface, where the communication interface is coupled to the at least one processor, and the processor is configured to execute instructions to implement the functions performed by an electronic device including an on-chip system with a hardware acceleration unit and an off-chip memory in communication with the on-chip system in the foregoing embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the procedures or functions described in the embodiments of the present invention are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a terminal, a user device, or other programmable apparatus. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire or wirelessly. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, such as a floppy disk, a hard disk, a magnetic tape; or optical media such as Digital Video Disks (DVDs); it may also be a semiconductor medium, such as a Solid State Drive (SSD).
While the invention has been described in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention, from a review of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the word "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.
While the invention has been described in conjunction with specific features and embodiments thereof, it will be evident that various modifications and combinations can be made thereto without departing from the spirit and scope of the invention. Accordingly, the specification and figures are merely exemplary of the invention as defined in the appended claims and are intended to cover any and all modifications, variations, combinations, or equivalents within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. An electronic device, comprising: the system-on-chip comprises a hardware acceleration unit and an off-chip memory which is communicated with the system-on-chip;
the off-chip memory is used for storing deep neural network parameters;
the system on chip is used for acquiring information to be processed and the deep neural network parameters;
the hardware acceleration unit in the system on chip is used for processing the information to be processed by adopting a deep neural network based on the deep neural network parameters, obtaining an information identification result and generating an identification end signal;
the system on chip is also used for responding the identification end signal and reading the information identification result.
2. The electronic device of claim 1, wherein after the off-chip controller is configured to obtain the information to be processed, the system-on-chip is further configured to, before obtaining the information recognition result, further:
and extracting the characteristics of the information to be processed to obtain the information to be processed after the characteristics are extracted.
3. The electronic device according to claim 1, wherein the information to be processed is a multi-dimensional feature vector in m rows and n columns, and values of the feature vectors of at least two first columns corresponding to at least two consecutive multi-dimensional feature vectors are consecutive.
4. The electronic device according to claim 1, wherein a time from processing the information to be processed to obtaining the information identification result is M/N seconds, where M is a total data amount of a coefficient matrix corresponding to the system on chip, and N represents a bandwidth of the off-chip memory.
5. The electronic device of claim 1, wherein the system-on-chip has a first memory area and a second memory area, and wherein the deep neural network has a first memory area and a second memory areaKA layer full interconnect layer; first, theiThe output data of the layer full link layer is stored in the first storage areai-The output data of the 1-layer full connection layer is stored in a second storage area, and the output data is not less than 2iK
6. The electronic device according to claim 5, wherein the input data of each fully-connected layer is the deep neural network parameters, the information to be processed and historical output data of an upper fully-connected layer corresponding to the fully-connected layer, and the input data and the output data of each fully-connected layer satisfy: the output data is the product of the deep neural network parameters and the information to be processed, and the product is the sum of the output data of the upper full-connection layer and the output data of the upper full-connection layer.
7. The electronic device of claim 6, wherein the fully-connected layer comprises at least two hidden layers, and the deep neural network parameters satisfy, between the information to be processed and the information recognition result:
Figure DEST_PATH_IMAGE001
(ii) a Wherein, the
Figure 641720DEST_PATH_IMAGE002
Representing the information recognition result; the above-mentioned
Figure DEST_PATH_IMAGE003
Representing the information to be processed; the above-mentioned
Figure 143547DEST_PATH_IMAGE004
Representing the deep neural network parameters; the above-mentioned
Figure DEST_PATH_IMAGE005
Representing the hidden layer data corresponding to the hidden layer; the above-mentioned
Figure 516759DEST_PATH_IMAGE006
Representing output parameters corresponding to the output data; the above-mentionediAnd saidKThe number of layers of the hidden layer is represented.
8. A deep learning hardware acceleration method is applied to an electronic device which comprises an on-chip system of a hardware acceleration unit and an off-chip memory communicated with the on-chip system, and the method comprises the following steps:
the off-chip memory stores deep neural network parameters;
the system on chip acquires information to be processed and the deep neural network parameters;
the hardware acceleration unit in the system on chip adopts a deep neural network to process the information to be processed based on the deep neural network parameters to obtain an information identification result and generate an identification end signal;
and the system on chip responds to the identification ending signal and reads the information identification result.
9. The method of claim 8, wherein after the system on chip obtains the information to be processed, before the deep neural network parameters process the information to be processed using a deep neural network, the method further comprises:
and the off-chip controller performs feature extraction on the information to be processed to obtain the information to be processed after feature extraction.
10. The method according to claim 8, wherein the information to be processed is a multi-dimensional feature vector in m rows and n columns, and the feature vector values of at least two first columns corresponding to at least two consecutive multi-dimensional feature vectors are consecutive.
CN202110407570.2A 2021-04-15 2021-04-15 Electronic equipment and deep learning hardware acceleration method Active CN113012689B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110407570.2A CN113012689B (en) 2021-04-15 2021-04-15 Electronic equipment and deep learning hardware acceleration method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110407570.2A CN113012689B (en) 2021-04-15 2021-04-15 Electronic equipment and deep learning hardware acceleration method

Publications (2)

Publication Number Publication Date
CN113012689A true CN113012689A (en) 2021-06-22
CN113012689B CN113012689B (en) 2023-04-07

Family

ID=76389383

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110407570.2A Active CN113012689B (en) 2021-04-15 2021-04-15 Electronic equipment and deep learning hardware acceleration method

Country Status (1)

Country Link
CN (1) CN113012689B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860049A (en) * 2023-03-02 2023-03-28 瀚博半导体(上海)有限公司 Data scheduling method and equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
US20180121796A1 (en) * 2016-11-03 2018-05-03 Intel Corporation Flexible neural network accelerator and methods therefor
CN207440765U (en) * 2017-01-04 2018-06-01 意法半导体股份有限公司 System on chip and mobile computing device
US20190043496A1 (en) * 2017-09-28 2019-02-07 Intel Corporation Distributed speech processing
CN110352434A (en) * 2017-02-28 2019-10-18 微软技术许可有限责任公司 Utilize the Processing with Neural Network that model is fixed
CN111199276A (en) * 2020-01-02 2020-05-26 上海寒武纪信息科技有限公司 Data processing method and related product
US20200193274A1 (en) * 2018-12-18 2020-06-18 Microsoft Technology Licensing, Llc Training neural network accelerators using mixed precision data formats
CN112016669A (en) * 2019-05-31 2020-12-01 辉达公司 Training neural networks using selective weight updates
EP3790000A1 (en) * 2019-09-05 2021-03-10 SoundHound, Inc. System and method for detection and correction of a speech query

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228240A (en) * 2016-07-30 2016-12-14 复旦大学 Degree of depth convolutional neural networks implementation method based on FPGA
US20180121796A1 (en) * 2016-11-03 2018-05-03 Intel Corporation Flexible neural network accelerator and methods therefor
CN207440765U (en) * 2017-01-04 2018-06-01 意法半导体股份有限公司 System on chip and mobile computing device
CN108268941A (en) * 2017-01-04 2018-07-10 意法半导体股份有限公司 Depth convolutional network isomery framework
CN110352434A (en) * 2017-02-28 2019-10-18 微软技术许可有限责任公司 Utilize the Processing with Neural Network that model is fixed
US20190043496A1 (en) * 2017-09-28 2019-02-07 Intel Corporation Distributed speech processing
US20200193274A1 (en) * 2018-12-18 2020-06-18 Microsoft Technology Licensing, Llc Training neural network accelerators using mixed precision data formats
CN112016669A (en) * 2019-05-31 2020-12-01 辉达公司 Training neural networks using selective weight updates
EP3790000A1 (en) * 2019-09-05 2021-03-10 SoundHound, Inc. System and method for detection and correction of a speech query
CN111199276A (en) * 2020-01-02 2020-05-26 上海寒武纪信息科技有限公司 Data processing method and related product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AHMET ERDEM 等: "Design Space Exploration for Orlando Ultra Low-Power Convolutional Neural Network SoC", 《2018 IEEE 29TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS》 *
郑茜: "基于可重构计算平台的卷积神经网络算法芯片设计", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860049A (en) * 2023-03-02 2023-03-28 瀚博半导体(上海)有限公司 Data scheduling method and equipment

Also Published As

Publication number Publication date
CN113012689B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
US11551068B2 (en) Processing system and method for binary weight convolutional neural network
US11137981B2 (en) Operation processing device, information processing device, and information processing method
WO2017219991A1 (en) Optimization method and apparatus suitable for model of pattern recognition, and terminal device
CN112559800B (en) Method, apparatus, electronic device, medium and product for processing video
CN111488985A (en) Deep neural network model compression training method, device, equipment and medium
US20200285859A1 (en) Video summary generation method and apparatus, electronic device, and computer storage medium
WO2023174098A1 (en) Real-time gesture detection method and apparatus
CN108875519B (en) Object detection method, device and system and storage medium
WO2022152104A1 (en) Action recognition model training method and device, and action recognition method and device
CN113012689B (en) Electronic equipment and deep learning hardware acceleration method
CN111444807A (en) Target detection method, device, electronic equipment and computer readable medium
CN112771546A (en) Operation accelerator and compression method
CN112397086A (en) Voice keyword detection method and device, terminal equipment and storage medium
CN112784572A (en) Marketing scene conversational analysis method and system
CN108764206B (en) Target image identification method and system and computer equipment
US10915794B2 (en) Neural network classification through decomposition
WO2021238289A1 (en) Sequence processing method and apparatus
CN113361621B (en) Method and device for training model
CN111353428B (en) Action information identification method and device, electronic equipment and storage medium
CN111027682A (en) Neural network processor, electronic device and data processing method
CN113642510A (en) Target detection method, device, equipment and computer readable medium
CN112348121B (en) Target detection method, target detection equipment and computer storage medium
CN113554042A (en) Neural network and training method thereof
CN111582444A (en) Matrix data processing device, electronic equipment and storage medium
CN113538205B (en) SURF algorithm-based feature point detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant