CN110265002A - Audio recognition method, device, computer equipment and computer readable storage medium - Google Patents

Audio recognition method, device, computer equipment and computer readable storage medium Download PDF

Info

Publication number
CN110265002A
CN110265002A CN201910480466.9A CN201910480466A CN110265002A CN 110265002 A CN110265002 A CN 110265002A CN 201910480466 A CN201910480466 A CN 201910480466A CN 110265002 A CN110265002 A CN 110265002A
Authority
CN
China
Prior art keywords
weight
neural networks
carry
convolutional neural
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910480466.9A
Other languages
Chinese (zh)
Other versions
CN110265002B (en
Inventor
刘玲
欧阳鹏
尹首一
李秀东
王博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qingwei Intelligent Technology Co Ltd
Original Assignee
Beijing Qingwei Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qingwei Intelligent Technology Co Ltd filed Critical Beijing Qingwei Intelligent Technology Co Ltd
Priority to CN201910480466.9A priority Critical patent/CN110265002B/en
Publication of CN110265002A publication Critical patent/CN110265002A/en
Application granted granted Critical
Publication of CN110265002B publication Critical patent/CN110265002B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The present invention provides a kind of audio recognition method, device, computer equipment and computer readable storage medium, comprising: carries out down-sampled processing to the audio data of acquisition, obtains the down-sampled data of audio;The down-sampled data of the audio are divided into trained audio data and testing audio data;LS-SVM sparseness is carried out to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum, obtains the binaryzation convolutional neural networks of rarefaction;Using the trained audio data, the binaryzation convolutional neural networks of the rarefaction are trained, obtain trained binaryzation convolutional neural networks;Using the testing audio data, speech recognition is carried out based on the trained binaryzation convolutional neural networks.Since the program is by carrying out LS-SVM sparseness to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum, a lot of computational spaces and time can be saved.

Description

Audio recognition method, device, computer equipment and computer readable storage medium
Technical field
The present invention relates to technical field of voice recognition, in particular to a kind of audio recognition method, device, computer equipment and Computer readable storage medium.
Background technique
Speech recognition formally enters the every aspect of our daily life, the new side that will be interacted with a computer as the mankind Formula, such as mobile phone, game machine, the intelligence of family require speech recognition, and decades have been developed in speech recognition, suddenly Become very powerful and exceedingly arrogant, this is attributed to the fact that deep learning.It is required to deposit however as the continuous improvement of the precision of prediction of neural network Storage space and calculation amount also constantly increase, the demand to hardware resource is continuously increased, neural network require big memory space and Intensive seriously hinders its application in the equipment such as mobile phone, wrist-watch and mobile robot, reduces memory space and calculating It is imperative to measure.
Now with many compress modes, such as svd (singular value decomposition, Singular Value Decomposition), Quantization (quantization) and binaryzation etc. compression algorithm, binary neural network is one of method, and this method passes through The coefficient of floating-point single precision is become positive 1 or minus 1, tens times of ground reduce network size and calculation amount, for example, the two-value of coefficient Change can reach storage size becomes original 1/32, that is, 3%.On the CPU and GPU for supporting 64 bit arithmetics, it means that 64 times of theoretical speed-up ratio.Therefore, the neural network that one can only run on the server before this is operated in intelligence by two-value network On energy wrist-watch.
Due to the element Zhan Yiwei binary system in two-value network weight W, the model when institute after training is being saved The memory needed is very small, while eliminating common multiplication operation again, is reducing memory shared by model parameter and operand It is also able to maintain the performance of neural network simultaneously, this application to deep learning in mobile terminal brings very big prospect.But Even such neural network is to will realize that accurate, quick, low latency, mini Mod and the speech recognition of low-power consumption also have The resistance of a bit, because sparsity is not present in argument section known to binarization method, parameter can not save space not rarefaction And the time.
Summary of the invention
The embodiment of the invention provides a kind of audio recognition method, device, computer equipment and computer-readable storage mediums Matter can make net by carrying out LS-SVM sparseness to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum Network structure is more sparse, to reduce calculation amount.
Audio recognition method provided in an embodiment of the present invention includes:
Down-sampled processing is carried out to the audio data of acquisition, obtains the down-sampled data of audio;
The down-sampled data of the audio are divided into trained audio data and testing audio data;
LS-SVM sparseness is carried out to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum, obtains rarefaction Binaryzation convolutional neural networks;
Using the trained audio data, the binaryzation convolutional neural networks of the rarefaction are trained, are instructed The binaryzation convolutional neural networks perfected;
Using the testing audio data, speech recognition is carried out based on the trained binaryzation convolutional neural networks. Speech recognition equipment provided in an embodiment of the present invention includes:
Down-sampled processing module obtains the down-sampled data of audio for carrying out down-sampled processing to the audio data of acquisition;
Data categorization module, for the down-sampled data of the audio to be divided into trained audio data and testing audio data;
LS-SVM sparseness module, it is dilute for being carried out to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum Thinization processing, obtains the binaryzation convolutional neural networks of rarefaction;
Training module, for utilizing the trained audio data, to the binaryzation convolutional neural networks of the rarefaction into Row training, obtains trained binaryzation convolutional neural networks;
Speech recognition module is based on the trained binaryzation convolutional Neural for utilizing the testing audio data Network carries out speech recognition.
Computer equipment provided in an embodiment of the present invention includes memory, processor and storage on a memory and can locate The computer program run on reason device, the processor realize speech recognition side described above when executing the computer program Method.
Computer readable storage medium provided in an embodiment of the present invention, the computer-readable recording medium storage have execution The computer program of audio recognition method described above.
In one embodiment, dilute by being carried out to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum Thinization processing, can make network structure more sparse, using training audio data to the binaryzation convolutional Neural net of rarefaction Network, which is trained, obtains trained binaryzation convolutional neural networks, subsequently utilizes testing audio data, is based on the training Good binaryzation convolutional neural networks carry out speech recognition, can reduce calculation amount in this way.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of audio recognition method flow chart (one) provided in an embodiment of the present invention;
Fig. 2 is a kind of data and multiplied by weight realization principle figure in the prior art;
Fig. 3 is a kind of data provided in an embodiment of the present invention and multiplied by weight realization principle figure;
Fig. 4 is a kind of audio recognition method flow chart (two) provided in an embodiment of the present invention;
Fig. 5 is a kind of circuit diagram of approximate adder structure provided in an embodiment of the present invention;
Fig. 6 is a kind of selector working principle diagram provided in an embodiment of the present invention;
Fig. 7 is a kind of speech recognition equipment structural block diagram (one) provided in an embodiment of the present invention;
Fig. 8 is a kind of speech recognition equipment structural block diagram (two) provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that the described embodiment is only a part of the embodiment of the present invention, instead of all the embodiments.Based on this Embodiment in invention, every other reality obtained by those of ordinary skill in the art without making creative efforts Example is applied, shall fall within the protection scope of the present invention.
In embodiments of the present invention, a kind of audio recognition method is provided, as shown in Figure 1, this method comprises:
Step 101: down-sampled processing being carried out to the audio data of acquisition, obtains the down-sampled data of audio;
Step 102: the down-sampled data of the audio are divided into trained audio data and testing audio data;
Step 103: LS-SVM sparseness being carried out to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum, is obtained Obtain the binaryzation convolutional neural networks of rarefaction;
Step 104: utilizing the trained audio data, the binaryzation convolutional neural networks of the rarefaction are instructed Practice, obtains trained binaryzation convolutional neural networks;
Step 105: utilizing the testing audio data, carry out language based on the trained binaryzation convolutional neural networks Sound identification.
In embodiments of the present invention, step 101 can be realized as follows: can be each sentence from training set Words carry out down-sampled feature extraction, feature can be reduced to 100 dimensions.
In embodiments of the present invention, the present invention be directed to another second compression of binaryzation convolutional neural networks, mind is realized Rarefaction through network.Due to after binaryzation weight with respect to low-power consumption requirement be still it is huge, so the present invention be In the case that accuracy is guaranteed, sparsity compression (step 103) is carried out to the weight in convolutional layer and full articulamentum.It is sparse Property compress mode is as follows:
(1) for the weight in convolutional layer: each convolution kernel is directed to, according to scheduled ratio (random) by convolution kernel A high position for weight is set as -1 (- 1 is indicated with 0 in hardware);Wherein, preset ratio refers to starting specified compression factor.
(2) for the weight in full articulamentum: determining whether below to connect each layer by the value of each first weight of layer Continuous weight is set as -1.Wherein, if the primary weight of each layer is greater than 0, each subsequent weight of layer does not change;Often If one layer of primary weight is less than 0, by multiple (random number) weights continuous behind first in each layer according to predetermined Ratio be randomly set to -1.
So that more values is -1 as far as possible, the degree of rarefication of the input of convolutional layer and full articulamentum can be made to be mentioned in this way It is high.So the matrix multiplication in -1 pair of more as far as possible convolutional layers use in have and greatly reduce, the weight in full articulamentum is such as High-order fruit is continuously -1, can save a lot of computational spaces and time.
Here is a simple citing:
The whole realization scheme of data and multiplied by weight is as shown in Fig. 2, i.e. result is equal to data multiplied by weight.And it is of the invention The implementation of proposition is as follows:
A weighted data high position is much 0, and most of weighted data is compressible.Example:
16 ' b0000_0000_XXXX_XXXX are compressible are as follows: 8 ' bXXXX_XXXX+1 ' bflag
Wherein, Flag shows that data are compressed for 1.
After above-mentioned compression, using new calling process as shown in Figure 3, i.e. result is equal to data for data and weight Multiplied by { high-order non-zero weight and compression weight are by the data after MUX } (MUX, data selector, multiplexer).
Following cyberspace can be saved by above-mentioned compression: (1000x16-500x17-500x9) bit.
In the prior art, the parameter in two-value network all uses binaryzation, thus in convolutional neural networks, convolution Layer and full articulamentum are all based on multiply-add operation, and are easily achieved parallel, therefore in the hardware for convolutional neural networks In framework, in order to obtain high-performance, the calculating mode of high degree of parallelism is most common, including the time and spatially parallel, In the design process of some circuits, such as sensor, analog circuit etc., allow certain mistake to save more resources Scheme existing for rate is relatively common.Based on this point, approximate calculation becomes popular research direction recently.By receiving knot The unreliability of fruit, overcomes the limitation of traditional design, improves performance, reduction power consumption and the extension of maintenance technology etc. to reach Purpose.The application prospect of approximate calculation is boundless.As the data volume that cloud and mobile device are handled becomes gradually huge, Most applications can contain some small errors without influencing function and user experience.Such as when using Baidu search keyword, It will appear thousands of search results, but be not that each is all coincide with desired result.This point is especially embodied in figure In the processing application of picture or video, the mistake of sub-fraction is unimportant even not noticeable.It is dug with data at these In application based on the statistic algorithms such as pick, identification, search and machine learning, required acquisition is not single gold knot Fruit, but one kind matched enough is as a result, approximate calculation can play huge potentiality in such applications.Therefore, of the invention Other than using the compression of above-mentioned sparsity, approximate adder is introduced also in hardware structure to further increase the property of hardware Energy.
Specifically, as shown in figure 4, the audio recognition method can also include:
Step 106: binaryzation convolutional neural networks convolution is replaced using the approximate adder based on carry chain incision principle Matrix adder in operation obtains replaced binaryzation convolutional neural networks;
At this point, step 104 specifically: the trained audio data is utilized, to replaced binaryzation convolutional neural networks It is trained, obtains trained binaryzation convolutional neural networks.
In embodiments of the present invention, since the multiply-add operation in convolutional layer is all turned in binaryzation convolutional neural networks Add operation is turned to, therefore approximate calculation of the invention only considers addition, that is to say, that computing unit is all by approximate adder The computing unit in framework is constituted, replaces traditional accurate adder further to play the role of acceleration with approximate adder.This Invention is the approximate adder based on carry chain incision principle, and approximate adder principle is as follows:
Define three functions (or being signal) first: carry generates function gi, carry propagation function piCancel letter with carry Number ki, it embodies as follows:
Wherein,Indicate the convolution algorithm of not multiplication;aiAnd biIndicate two input signals in i-th bit;Point It Biao Shi not be to ai、biIt negates;Carry signal c on each can be judged by these functionsiGeneration, thus calculating and Position si, carry signal ciAnd with position siExpression formula are as follows:
From expression formula (1-2) it can be seen that only when carry propagation function is equal to 1 i.e. carry propagation signal piWhen being true Carry signal ciThe carry signal c of ability and previous positioni-1It is related, otherwise ciIt is solely dependent upon carry and generates signal giOr carry is cancelled Signal ki(i.e. unrelated with input bit before).Similarly, only work as pi-1When being true, ci-1Just depend on ci-2.This is also just meaned , only work as piAnd pi-1When being simultaneously true, ciJust depend on ci-2.It is possible to which summing up a general rule is exactly only When i to i-k-1 carry propagation signals are true, the carry signal c of i-th bitiJust depend on the carry letter on the i-th-k Number ci-k
Approximate adder circuit is made of m circuit block, each circuit block have one k adder, one k into Position generator and a selector, each selector cascade two neighboring carry generator, and k=n/m, n indicate add operation Data bit width, as shown in Figure 5.The input of j-th of circuit block is denoted asWithOutput is denoted asIn plus signal Afterwards, carry generator each first according to the input of the partial circuit (With) generate carry output signalsThen selector is according to Rule of judgmentSelect the first two carry Carry output signals in generator as and output generator carry input signalMost The adder of every part produces and exports afterwardsSo the critical path delay of entire circuit is The sum of three parts circuit (carry generates, selector and adder), as shown in dotted line frame in Fig. 5, black portions presentation selector.
If the carry propagation signal of jth part is true, the correct carry output signals of jth part be by What the input before jth part determined, if the carry signal is that very, will be unable to the accurate transmission and give jth+1 partial circuit, Lead to and export result error.And approximate construction can be by judging this conditionIt whether is true Carrying out control selections device selects the carry output signals of -1 part of jth part or jth defeated as the carry of i+1 part adder Enter signal, if it is true, the carry output signals of selection -1 part of jth;Otherwise, the carry output signals of jth part are selected.This Sample result will be accurately more.The circuit is analyzed, is equivalent to increasing carry chain into k after having added selector, this point It can be obtained by cascading two neighboring carry generating circuit, but the delay of one k carry generation chains is significantly greater than one The delay of selector, especially when k is very big.The working principle expression formula of selector are as follows:
Wherein,
In formula (1-4),WithIt is the carry output signals of jth -1 part and jth partial circuit,It is The carry propagation signal of the i-th bit of j partial circuit.Example about selector concrete operating principle can be shown in Fig. 6.Root in Fig. 6 According to known to input A, BAndThis two signal is inputted simultaneously in selector, by judging that the carry of jth part passes It broadcasts signal to be true, i.e.,OutputCarry as the part j+1 adder inputsDue to depositing for selector Correctly delivering carry signal.
For example, parameter can be set as 4 in 16 adders of the invention.Preceding 4 carries of adder first It is consistent with the carry principle of accurate adder;C [7] (the 7th carry for indicating 16 additions) is laggard by the amendment of selector Position chain length fades to 7 (pushing away since 0), therefore the carry chain length of c [8], c [9] and c [10] are 8,9 and 10 respectively; C [11] carry chain length after the amendment of selector fades to 7 (pushing away since the 4th of input), c [12], c [13] and The carry chain length of c [14] is equally respectively 8,9 and 10.
Experimental result: compact model and the accuracy without compact model compare, as shown in table 1:
Table 1
Based on the same inventive concept, a kind of speech recognition equipment is additionally provided in the embodiment of the present invention, such as following implementation Described in example.Since the principle that speech recognition equipment solves the problems, such as is similar to audio recognition method, the reality of speech recognition equipment The implementation that may refer to audio recognition method is applied, overlaps will not be repeated.It is used below, term " unit " or " mould The combination of the software and/or hardware of predetermined function may be implemented in block ".Although device described in following embodiment is preferably with soft Part is realized, but the realization of the combination of hardware or software and hardware is also that may and be contemplated.
Fig. 7 is the structural block diagram (one) of the speech recognition equipment of the embodiment of the present invention, as shown in fig. 7, comprises:
Down-sampled processing module 701 obtains the down-sampled number of audio for carrying out down-sampled processing to the audio data of acquisition According to;
Data categorization module 702, for the down-sampled data of the audio to be divided into trained audio data and test tone frequency According to;
LS-SVM sparseness module 703, for the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum into Row LS-SVM sparseness obtains the binaryzation convolutional neural networks of rarefaction;
Training module 704, for utilizing the trained audio data, to the binaryzation convolutional neural networks of the rarefaction It is trained, obtains trained binaryzation convolutional neural networks;
Speech recognition module 705, for utilizing the testing audio data, based on the trained binaryzation convolution mind Speech recognition is carried out through network.
In embodiments of the present invention, the LS-SVM sparseness module 703 is specifically used for:
The weight in binaryzation convolutional neural networks convolutional layer and full articulamentum is carried out at rarefaction as follows Reason:
For the weight in convolutional layer: each convolution kernel is directed to, according to scheduled ratio by the height of the weight of convolution kernel Position is set as -1;
For the weight in full articulamentum: being determined whether by the value of each first weight of layer continuous below by each layer Weight is set as -1.
In embodiments of the present invention, the LS-SVM sparseness module 703 is specifically used for:
Determined whether as follows by the value of each first weight of layer by each layer of continuous weight setting below It is -1:
If each primary weight of layer is greater than 0, each subsequent weight of layer does not change;
If each primary weight of layer less than 0, by first in each layer below continuous multiple weights be set as- 1。
In embodiments of the present invention, as shown in figure 8, the speech recognition equipment further include: add operation replacement module 706, For after carrying out LS-SVM sparseness to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum, using being based on Matrix adder in the approximate adder replacement binaryzation convolutional neural networks convolution operation of carry chain incision principle, is replaced Binaryzation convolutional neural networks after changing
Wherein, training module 704 is specifically used for: the trained audio data is utilized, to replaced binaryzation convolution mind It is trained through network, obtains trained binaryzation convolutional neural networks.
In embodiments of the present invention, the add operation replacement module 706 is specifically using above-mentioned formula (1-1) to formula The approximate adder based on carry chain incision principle under (1-4) and corresponding description form.
The embodiment of the invention also provides a kind of computer equipments, including memory, processor and storage are on a memory And the computer program that can be run on a processor, the processor realize voice described above when executing the computer program Recognition methods.
The embodiment of the invention also provides a kind of computer readable storage medium, the computer-readable recording medium storage There is the computer program for executing audio recognition method described above.
In conclusion audio recognition method proposed by the present invention, device, computer equipment and computer readable storage medium It has the advantages that
By carrying out LS-SVM sparseness to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum, can make It is more sparse to obtain network structure, the binaryzation convolutional neural networks of rarefaction are trained using training audio data and are instructed The binaryzation convolutional neural networks perfected subsequently utilize testing audio data, based on the trained binaryzation convolution mind Speech recognition is carried out through network, calculation amount can be reduced in this way.In addition, using the approximate adder based on carry chain incision principle The matrix adder in binaryzation convolutional neural networks convolution operation is replaced, can further play the role of acceleration in this way, subtract Few operation time.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the embodiment of the present invention can have various modifications and variations.All within the spirits and principles of the present invention, made Any modification, equivalent substitution, improvement and etc. should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of audio recognition method characterized by comprising
Down-sampled processing is carried out to the audio data of acquisition, obtains the down-sampled data of audio;
The down-sampled data of the audio are divided into trained audio data and testing audio data;
LS-SVM sparseness is carried out to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum, obtains the two of rarefaction Value convolutional neural networks;
Using the trained audio data, the binaryzation convolutional neural networks of the rarefaction are trained, are trained Binaryzation convolutional neural networks;
Using the testing audio data, speech recognition is carried out based on the trained binaryzation convolutional neural networks.
2. audio recognition method as described in claim 1, which is characterized in that binaryzation convolutional neural networks convolutional layer and entirely Weight in articulamentum carries out LS-SVM sparseness, comprising:
For the weight in convolutional layer: being directed to each convolution kernel, set a high position for the weight of convolution kernel according to scheduled ratio It is set to -1;
For the weight in full articulamentum: being determined whether by the value of each first weight of layer by weight continuous behind each layer It is set as -1.
3. audio recognition method as claimed in claim 2, which is characterized in that determined whether by the value of each first weight of layer By this layer, continuous weight is set as -1 below, comprising:
If each primary weight of layer is greater than 0, each subsequent weight of layer does not change;
If each primary weight of layer, less than 0, by first in each layer, continuously multiple weights are set as -1 below.
4. audio recognition method as described in claim 1, which is characterized in that binaryzation convolutional neural networks convolutional layer and Weight in full articulamentum carries out after LS-SVM sparseness, further includes:
Using the matrix in the approximate adder replacement binaryzation convolutional neural networks convolution operation based on carry chain incision principle Adder.
5. audio recognition method as claimed in claim 4, which is characterized in that the approximation based on carry chain incision principle adds Musical instruments used in a Buddhist or Taoist mass is specific as follows:
Define three function gi、piAnd ki:
gi=aibi,
Wherein, giIndicate that carry generates function;aiAnd biIndicate two input signals in i-th bit;It respectively indicates to ai、bi It negates;kiIndicate that carry cancels function;piIndicate carry propagation function;Indicate the convolution algorithm of not multiplication;
Carry signal c on each is determined according to three functionsiAnd position si, carry signal ciAnd with position siExpression formula Are as follows:
Wherein, only when i to i-k-1 carry propagation signals are true, the carry signal c of i-th bitiJust depend on the Carry signal c on i-ki-k
Approximate adder circuit is made of m circuit block, and each circuit block has one k adders, one k carries to produce Raw device and a selector, each selector cascade two neighboring carry generator, and k=n/m, n indicate the data of add operation Bit wide;
After plus signal, each carry generator is according to the input of jth partial circuitWithGenerate carry output signalsSelector is according to Rule of judgmentSelect jth partial circuit or the Carry input signal of the carry output signals of j-1 partial circuit as+1 part adder of jth, ifIt is true, the carry output signals of selection -1 partial circuit of jth;Otherwise, jth partial circuit is selected Carry input signal of the carry output signals as+1 part adder of jthEvery partial circuit Adder produces and exportsWherein,WithIndicate the input of j-th of circuit block,Indicate the output of j-th of circuit block;
The working principle expression formula of selector are as follows:
Wherein, WithIt is the carry output signals of jth -1 part and jth partial circuit,It is jth The carry propagation signal of the i-th bit of partial circuit.
6. a kind of speech recognition equipment characterized by comprising
Down-sampled processing module obtains the down-sampled data of audio for carrying out down-sampled processing to the audio data of acquisition;
Data categorization module, for the down-sampled data of the audio to be divided into trained audio data and testing audio data;
LS-SVM sparseness module, for carrying out rarefaction to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum Processing, obtains the binaryzation convolutional neural networks of rarefaction;
Training module instructs the binaryzation convolutional neural networks of the rarefaction for utilizing the trained audio data Practice, obtains trained binaryzation convolutional neural networks;
Speech recognition module is based on the trained binaryzation convolutional neural networks for utilizing the testing audio data Carry out speech recognition.
7. speech recognition equipment as claimed in claim 6, which is characterized in that the LS-SVM sparseness module is specifically used for:
LS-SVM sparseness is carried out to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum as follows:
For the weight in convolutional layer: being directed to each convolution kernel, set a high position for the weight of convolution kernel according to scheduled ratio It is set to -1;
For the weight in full articulamentum: being determined whether by the value of each first weight of layer by weight continuous behind each layer It is set as -1.
8. speech recognition equipment as claimed in claim 6, which is characterized in that further include: add operation replacement module is used for After carrying out LS-SVM sparseness to the weight in binaryzation convolutional neural networks convolutional layer and full articulamentum, using based on carry chain Matrix adder in the approximate adder replacement binaryzation convolutional neural networks convolution operation of incision principle.
9. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes any institute's predicate of claim 1 to 5 when executing the computer program Voice recognition method.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has perform claim It is required that the computer program of 1 to 5 any audio recognition method.
CN201910480466.9A 2019-06-04 2019-06-04 Speech recognition method, speech recognition device, computer equipment and computer readable storage medium Active CN110265002B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910480466.9A CN110265002B (en) 2019-06-04 2019-06-04 Speech recognition method, speech recognition device, computer equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910480466.9A CN110265002B (en) 2019-06-04 2019-06-04 Speech recognition method, speech recognition device, computer equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110265002A true CN110265002A (en) 2019-09-20
CN110265002B CN110265002B (en) 2021-07-23

Family

ID=67916581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910480466.9A Active CN110265002B (en) 2019-06-04 2019-06-04 Speech recognition method, speech recognition device, computer equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110265002B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409773A (en) * 2021-08-18 2021-09-17 中科南京智能技术研究院 Binaryzation neural network voice awakening method and system

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101097509A (en) * 2006-06-26 2008-01-02 英特尔公司 Sparse tree adder
CN103259529A (en) * 2012-02-17 2013-08-21 京微雅格(北京)科技有限公司 Integrated circuit using carry skip chains
CN106816147A (en) * 2017-01-25 2017-06-09 上海交通大学 Speech recognition system based on binary neural network acoustic model
CN106909970A (en) * 2017-01-12 2017-06-30 南京大学 A kind of two-value weight convolutional neural networks hardware accelerator computing module based on approximate calculation
CN107153873A (en) * 2017-05-08 2017-09-12 中国科学院计算技术研究所 A kind of two-value convolutional neural networks processor and its application method
CN107203808A (en) * 2017-05-08 2017-09-26 中国科学院计算技术研究所 A kind of two-value Convole Unit and corresponding two-value convolutional neural networks processor
WO2018048907A1 (en) * 2016-09-06 2018-03-15 Neosensory, Inc. C/O Tmc+260 Method and system for providing adjunct sensory information to a user
CN108010515A (en) * 2017-11-21 2018-05-08 清华大学 A kind of speech terminals detection and awakening method and device
WO2018102240A1 (en) * 2016-12-02 2018-06-07 Microsoft Technology Licensing, Llc Joint language understanding and dialogue management
CN109100142A (en) * 2018-06-26 2018-12-28 北京交通大学 A kind of semi-supervised method for diagnosing faults of bearing based on graph theory
CN109214502A (en) * 2017-07-03 2019-01-15 清华大学 Neural network weight discretization method and system
CN109643228A (en) * 2016-10-01 2019-04-16 英特尔公司 Low energy consumption mantissa multiplication for floating point multiplication addition operation
CN109787929A (en) * 2019-02-20 2019-05-21 深圳市宝链人工智能科技有限公司 Signal modulate method, electronic device and computer readable storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101097509A (en) * 2006-06-26 2008-01-02 英特尔公司 Sparse tree adder
CN103259529A (en) * 2012-02-17 2013-08-21 京微雅格(北京)科技有限公司 Integrated circuit using carry skip chains
WO2018048907A1 (en) * 2016-09-06 2018-03-15 Neosensory, Inc. C/O Tmc+260 Method and system for providing adjunct sensory information to a user
CN109643228A (en) * 2016-10-01 2019-04-16 英特尔公司 Low energy consumption mantissa multiplication for floating point multiplication addition operation
WO2018102240A1 (en) * 2016-12-02 2018-06-07 Microsoft Technology Licensing, Llc Joint language understanding and dialogue management
CN106909970A (en) * 2017-01-12 2017-06-30 南京大学 A kind of two-value weight convolutional neural networks hardware accelerator computing module based on approximate calculation
CN106816147A (en) * 2017-01-25 2017-06-09 上海交通大学 Speech recognition system based on binary neural network acoustic model
CN107153873A (en) * 2017-05-08 2017-09-12 中国科学院计算技术研究所 A kind of two-value convolutional neural networks processor and its application method
CN107203808A (en) * 2017-05-08 2017-09-26 中国科学院计算技术研究所 A kind of two-value Convole Unit and corresponding two-value convolutional neural networks processor
CN109214502A (en) * 2017-07-03 2019-01-15 清华大学 Neural network weight discretization method and system
CN108010515A (en) * 2017-11-21 2018-05-08 清华大学 A kind of speech terminals detection and awakening method and device
CN109100142A (en) * 2018-06-26 2018-12-28 北京交通大学 A kind of semi-supervised method for diagnosing faults of bearing based on graph theory
CN109787929A (en) * 2019-02-20 2019-05-21 深圳市宝链人工智能科技有限公司 Signal modulate method, electronic device and computer readable storage medium

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
BENJAMIN GRAHAM: ""Spatially-sparse convolutioanl neural networks"", 《COMPUTER VISION AND PATTERN RECOGNITION》 *
DANDAN SONG: ""Low Bits: Binary Neural Network For Vad and Wakeup"", 《2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING》 *
MATTHIEU COURBARIAUX: ""Binarized Neural Networks:Training Neural networks with Weights and Activations Constrained to +1 or -1"", 《ARXIV》 *
SHOUYI YIN: ""A 141 uW, 2.46 pJ/Neuron Binarized Convolutional Neural Network based Self-Selflearning"", 《IEEE》 *
SONG HAN: ""Deep compression: compressing deep neural networks with pruing"", 《ICLR 2016》 *
YAN-MIN QIAN: ""Binary neural networks for speech recognition"", 《FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING》 *
YU PAN: ""A multilevel Cell STT-MRAM-Based Computing In-Memory Accelerator for binary Convolutioanl Neural Networks"", 《IEEE TRANSACTION ON MAGNETICS》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113409773A (en) * 2021-08-18 2021-09-17 中科南京智能技术研究院 Binaryzation neural network voice awakening method and system

Also Published As

Publication number Publication date
CN110265002B (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN110378468B (en) Neural network accelerator based on structured pruning and low bit quantization
CN109840589B (en) Method and device for operating convolutional neural network on FPGA
WO2020057161A1 (en) Split accumulator for convolutional neural network accelerator
Kadetotad et al. Efficient memory compression in deep neural networks using coarse-grain sparsification for speech applications
CN109146067A (en) A kind of Policy convolutional neural networks accelerator based on FPGA
CN110555516A (en) FPGA-based YOLOv2-tiny neural network low-delay hardware accelerator implementation method
Li et al. Quantized neural networks with new stochastic multipliers
CN110265002A (en) Audio recognition method, device, computer equipment and computer readable storage medium
CN109101620A (en) Similarity calculating method, clustering method, device, storage medium and electronic equipment
Fujii et al. An FPGA realization of a deep convolutional neural network using a threshold neuron pruning
Meyer et al. Efficient convolutional neural network for audio event detection
Wu et al. EasyQuant: Post-training Quantization via Scale Optimization
CN109325590A (en) For realizing the device for the neural network processor that computational accuracy can be changed
Rajagopal et al. Accurate and efficient fixed point inference for deep neural networks
Ago et al. The parallel FDFM processor core approach for neural networks
Ueki et al. Learning accelerator of deep neural networks with logarithmic quantization
CN110874635A (en) Deep neural network model compression method and device
CN110718211B (en) Keyword recognition system based on hybrid compressed convolutional neural network
Liu et al. An energy-efficient accelerator for hybrid bit-width DNNs
US20210287074A1 (en) Neural network weight encoding
CN109766993B (en) Convolutional neural network compression method suitable for hardware
Zhang et al. Improved hybrid memory cube for weight-sharing deep convolutional neural networks
KR102340412B1 (en) Log-quantized mac for stochastic computing and accelerator comprising the same
CN110084362B (en) Logarithmic quantization device and method for neural network
Hsieh et al. A Multiplier-less Convolutional Neural Network Inference Accelerator for Intelligent Edge Devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant