CN110265002B - Speech recognition method, speech recognition device, computer equipment and computer readable storage medium - Google Patents

Speech recognition method, speech recognition device, computer equipment and computer readable storage medium Download PDF

Info

Publication number
CN110265002B
CN110265002B CN201910480466.9A CN201910480466A CN110265002B CN 110265002 B CN110265002 B CN 110265002B CN 201910480466 A CN201910480466 A CN 201910480466A CN 110265002 B CN110265002 B CN 110265002B
Authority
CN
China
Prior art keywords
neural network
layer
carry
bit
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910480466.9A
Other languages
Chinese (zh)
Other versions
CN110265002A (en
Inventor
刘玲
欧阳鹏
尹首一
李秀东
王博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qingwei Intelligent Technology Co ltd
Original Assignee
Beijing Qingwei Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qingwei Intelligent Technology Co ltd filed Critical Beijing Qingwei Intelligent Technology Co ltd
Priority to CN201910480466.9A priority Critical patent/CN110265002B/en
Publication of CN110265002A publication Critical patent/CN110265002A/en
Application granted granted Critical
Publication of CN110265002B publication Critical patent/CN110265002B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Abstract

The invention provides a voice recognition method, a voice recognition device, computer equipment and a computer readable storage medium, wherein the voice recognition method comprises the following steps: performing down-sampling processing on the acquired audio data to acquire audio down-sampled data; dividing the audio downsampling data into training audio data and testing audio data; carrying out sparsification processing on weights in the convolution layer and the full-connection layer of the binary convolutional neural network to obtain a sparse binary convolutional neural network; training the thinned binary convolution neural network by using the training audio data to obtain a trained binary convolution neural network; and performing voice recognition based on the trained binary convolution neural network by using the test audio data. According to the scheme, the weights in the convolution layer and the full-connection layer of the binary convolution neural network are subjected to sparse processing, so that more operation space and time can be saved.

Description

Speech recognition method, speech recognition device, computer equipment and computer readable storage medium
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a speech recognition method, an apparatus, a computer device, and a computer-readable storage medium.
Background
The speech recognition formally enters the aspects of our daily life and becomes a new way for human to interact with computers, such as the intellectualization of mobile phones, game machines and families, which all need speech recognition, and the speech recognition has been developed for decades and suddenly becomes impatient in hands, which is attributed to deep learning. However, with the continuous improvement of the prediction accuracy of the neural network, the required storage space and the required calculation amount are also continuously increased, the demand on hardware resources is continuously increased, the large storage space and the large calculation amount required by the neural network seriously hinder the application of the neural network to devices such as mobile phones, watches, mobile robots and the like, and the reduction of the storage space and the calculation amount is imperative.
At present, there are many compression methods, such as svd (Singular Value Decomposition), Quantization, binarization, etc., and a binary neural network is one of the methods that reduces the size and the amount of computation several tens of times by changing the floating-point single-precision coefficient to positive 1 or negative 1, for example, the binarization of the coefficient can achieve the storage size to 1/32, i.e., 3%. On CPUs and GPUs that support 64-bit operations, this means a theoretical acceleration ratio of 64 times. Thus, the binary network runs a neural network on the smart watch that previously could only be run on the server.
Because the elements in the binary network weight W only occupy one binary system, the memory required for storing the trained model is very small, and the common multiplication operation is removed, so that the performance of the neural network can be maintained while the memory and the operand occupied by the model parameters are reduced, and the application of deep learning in a mobile terminal is provided with a very large prospect. However, even such a neural network has little resistance to realize accurate, fast, low-delay, small-model and low-power speech recognition, because the parameter part known by the binarization method has no sparsity, and the space and time cannot be saved without sparsifying the parameters.
Disclosure of Invention
The embodiment of the invention provides a voice recognition method, a voice recognition device, computer equipment and a computer readable storage medium, wherein the network structure can be sparser by sparsifying weights in a convolution layer and a full connection layer of a binarization convolution neural network, so that the calculated amount is reduced.
The voice recognition method provided by the embodiment of the invention comprises the following steps:
performing down-sampling processing on the acquired audio data to acquire audio down-sampled data;
dividing the audio downsampling data into training audio data and testing audio data;
carrying out sparsification processing on weights in the convolution layer and the full-connection layer of the binary convolutional neural network to obtain a sparse binary convolutional neural network;
training the thinned binary convolution neural network by using the training audio data to obtain a trained binary convolution neural network;
and performing voice recognition based on the trained binary convolution neural network by using the test audio data. The speech recognition device provided by the embodiment of the invention comprises:
the down-sampling processing module is used for performing down-sampling processing on the acquired audio data to acquire audio down-sampling data;
a data classification module for classifying the audio downsampling data into training audio data and testing audio data;
the sparse processing module is used for carrying out sparse processing on the weights in the convolution layer and the full connection layer of the binary convolutional neural network to obtain a sparse binary convolutional neural network;
the training module is used for training the sparse binarization convolution neural network by utilizing the training audio data to obtain a trained binarization convolution neural network;
and the voice recognition module is used for carrying out voice recognition on the basis of the trained binary convolution neural network by utilizing the test audio data.
The computer device provided by the embodiment of the invention comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the voice recognition method when executing the computer program.
The computer-readable storage medium stores a computer program for executing the voice recognition method.
In one embodiment, the network structure can be sparser by sparsifying the weights in the convolution layer and the full-link layer of the binarization convolutional neural network, the training audio data is used for training the sparse binarization convolutional neural network to obtain a trained binarization convolutional neural network, then the test audio data is used for carrying out voice recognition based on the trained binarization convolutional neural network, and therefore the calculated amount can be reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flow chart (one) of a speech recognition method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a prior art implementation of multiplication of data and weights;
FIG. 3 is a schematic diagram of a data and weight multiplication implementation provided by an embodiment of the present invention;
fig. 4 is a flowchart of a speech recognition method according to an embodiment of the present invention (two);
FIG. 5 is a circuit diagram of an approximate adder architecture according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a selector according to an embodiment of the present invention;
fig. 7 is a block diagram (a) of a speech recognition apparatus according to an embodiment of the present invention;
fig. 8 is a block diagram of a speech recognition apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In an embodiment of the present invention, a speech recognition method is provided, as shown in fig. 1, the method including:
step 101: performing down-sampling processing on the acquired audio data to acquire audio down-sampled data;
step 102: dividing the audio downsampling data into training audio data and testing audio data;
step 103: carrying out sparsification processing on weights in the convolution layer and the full-connection layer of the binary convolutional neural network to obtain a sparse binary convolutional neural network;
step 104: training the thinned binary convolution neural network by using the training audio data to obtain a trained binary convolution neural network;
step 105: and performing voice recognition based on the trained binary convolution neural network by using the test audio data.
In the embodiment of the present invention, step 101 may be implemented as follows: that is, downsampling feature extraction may be performed for each sentence from the training set, and the feature may be reduced to 100 dimensions.
In the embodiment of the invention, the invention aims at the secondary compression of the binary convolution neural network, and realizes the sparseness of the neural network. Since the requirement of relatively low power consumption of the weight after binarization is still huge, the invention performs sparse compression on the weights in the convolutional layer and the full link layer under the condition of ensuring the accuracy (step 103). The sparsity compression method is as follows:
(1) for weights in convolutional layers: for each convolution kernel, setting the high order bits of the weight of the convolution kernel to-1 in a predetermined proportion (randomly) (in hardware, -1 is represented by 0); the preset ratio refers to a compression ratio at which designation is started.
(2) For weights in the fully connected layer: whether to set the weight of the following succession of each layer to-1 is determined by the value of the first bit weight of each layer. Wherein if the weight of the first bit of each layer is greater than 0, the weight of the following bits of each layer is not changed; if the weight of the first bit in each layer is less than 0, a plurality of (random number) weights following the first bit in each layer are randomly set to-1 according to a predetermined ratio.
The value is-1 as much as possible, so that the input sparsity of the convolution layer and the full link layer can be improved. Therefore, the use of the matrix multiplication in the convolutional layer by-1 is greatly reduced, and if the high order bits of the weight in the fully-connected layer are continuously-1, the operation space and time are saved.
The following is a simple example:
a general implementation of the multiplication of data and weight is shown in fig. 2, i.e. the result is equal to the multiplication of the data by the weight. The implementation scheme provided by the invention is as follows:
the weight data is much higher by 0, and most of the weight data is compressible. Example (c):
16' b0000_0000_ XXXX _ XXXX may be compressed as: 8 'bXXXXX _ XXXXX + 1' bflag
Wherein a Flag of 1 indicates that the data is compressed.
After the above compression, the data and weights are applied in a new calling process as shown in fig. 3, i.e. the result is equal to the data multiplied by { upper non-0 weight and the data after the compression weight passes through the MUX } (MUX, data selector, multiplexer).
The following network space can be saved through the compression: (1000x 16-500 x 17-500 x9) bit.
In the prior art, parameters in a binary network are all binarized, so that in a convolutional neural network, a convolutional layer and a fully-connected layer are both based on multiply-add operation and are easy to implement parallelism, so in a hardware architecture for the convolutional neural network, in order to obtain high performance, a calculation mode with high parallelism is the most common, including parallelism in time and space, and in the design process of some circuits, such as sensors, analog circuits and the like, a scheme that allows a certain error rate to exist in order to save more resources is common. Based on this, approximate calculation has recently become a popular research direction. By accepting the unreliability of the result, the limitation of the traditional design is overcome, thereby achieving the purposes of improving the performance, reducing the power consumption, maintaining the technical expansion and the like. The application prospect of approximate calculation is very wide. As the amount of data processed by the cloud and the mobile device becomes increasingly large, most applications can accommodate small errors without affecting functionality and user experience. For example, using a hundred degree search keyword, thousands of search results may appear, but not every one fits the desired result. This is particularly true in image or video processing applications where a small portion of the error is insignificant or even imperceptible. In these applications based on statistical algorithms such as data mining, recognition, search and machine learning, it is not a single golden result that needs to be obtained, but a sufficiently matched class of results, and the approximate calculation can exert great potential in these applications. Therefore, in addition to the sparsity compression, the invention also introduces an approximate adder in the hardware architecture to further improve the performance of the hardware.
Specifically, as shown in fig. 4, the speech recognition method may further include:
step 106: replacing a matrix adder in the convolution operation of the binary convolution neural network by adopting an approximate adder based on a carry chain cutting principle to obtain a replaced binary convolution neural network;
at this time, step 104 specifically includes: and training the replaced binary convolutional neural network by using the training audio data to obtain the trained binary convolutional neural network.
In the embodiment of the invention, because the multiplication and addition operations in the convolutional layer are all converted into addition operations in the binary convolutional neural network, the approximate calculation of the invention only considers addition, that is, the calculation units are all formed by approximate adders, so that the calculation units in the architecture play a role of acceleration by replacing the traditional precise adders with the approximate adders. The invention relates to an approximate adder based on a carry chain cutting principle, which has the following principle:
first three functions (or signals) are defined: carry generation function giCarry propagation function piSum carry cancel function kiThe specific expression is as follows:
Figure BDA0002083666290000051
wherein the content of the first and second substances,
Figure BDA0002083666290000052
represents a convolution operation without multiplication; a isiAnd biRepresenting two input signals at the ith bit;
Figure BDA0002083666290000053
respectively represent a pairi、biTaking the inverse; the carry signal c on each bit can be judged through the functionsiThereby to calculate the sum bit siCarry signal ciAnd a sum bit siThe expression of (a) is:
Figure BDA0002083666290000054
Figure BDA0002083666290000061
from the expression (1-2) it can be seen that the carry propagate signal p is only present if the carry propagate function is equal to 1iIs a true time carry signal ciIs compared with the carry signal c of the previous biti-1About, otherwise ciDependent only on carry-generating signal giOr carry cancel signal ki(i.e., independent of the previous input bit). Likewise, only if pi-1When true, ci-1Is dependent on ci-2. This means that only when piAnd pi-1While being true, ciIs dependent on ci-2. Then, it can be concluded that a general rule is that the carry signal c of the ith bit is true only when the carry propagation signals from the i bit to the i-k-1 bit are all trueiIs dependent on the carry signal c at the i-ki-k
The approximate adder circuit is formed by m circuit blocks, each having an adder of k bits, a carry generator of k bits, and a selector, each selector concatenating two adjacent carry generators, k being n/m, n representing the data bit width of the addition operation, as shown in fig. 5. Let the input of the jth circuit block be
Figure BDA0002083666290000062
And
Figure BDA0002083666290000063
output notation
Figure BDA0002083666290000064
After adding the signals, first each carry generator is based on the inputs of the partial circuit
Figure BDA0002083666290000065
And
Figure BDA0002083666290000066
) Generating carry output signals
Figure BDA0002083666290000067
Then the selector is based on the judgment condition
Figure BDA0002083666290000068
Selecting one carry output signal of the first two carry generators as carry input signal of the sum output generator
Figure BDA0002083666290000069
Finally, each part of adder generates and outputs
Figure BDA00020836662900000610
Therefore, the critical path delay of the whole circuit is the sum of the three parts of the circuit (carry generation, selector and adder), as shown by the dashed box in fig. 5, and the black part represents the selector.
If the carry propagate signal of the j-th part is true, the correct carry output signal of the j-th part is determined by the input before the j-th part, and if the carry propagate signal is true, the carry propagate signal cannot be accurately transmitted to the j + 1-th part circuit, so that the sum output result is in error. And the approximate structure can be judged by judging the condition
Figure BDA00020836662900000611
Whether the carry output signal is true or not is used for controlling the selector to select the j-th part or the j-1 th part as the carry input signal of the adder of the (i + 1) th part, and if the carry output signal is true, the carry output signal of the j-1 th part is selected; otherwise, the carry out signal of the j-th section is selected. So that the result will be much more accurate. Analyzing the electricityThe way, after adding the selector, is equivalent to lengthening the carry chain by k bits, which can also be obtained by cascading two adjacent carry generation circuits, but the delay of the carry generation chain of one k bit is obviously larger than that of one selector, especially when k is large. The working principle expression of the selector is as follows:
Figure BDA00020836662900000612
wherein the content of the first and second substances,
Figure BDA0002083666290000071
in the formula (1-4), the first and second groups,
Figure BDA0002083666290000072
and
Figure BDA0002083666290000073
is the carry output signal of the j-1 th and j-th section circuits,
Figure BDA0002083666290000074
is the carry propagate signal of the ith bit of the jth circuit. An example of the specific operating principle of the selector can be seen in fig. 6. From input A, B in FIG. 6
Figure BDA0002083666290000075
And
Figure BDA0002083666290000076
the two signals are input simultaneously in the selector, by judging that the carry propagation signals of the j-th part are all true, i.e.
Figure BDA0002083666290000077
Output of
Figure BDA0002083666290000078
As carry-in for j +1 partial adders
Figure BDA0002083666290000079
Due to the presence of the selector, the carry signal is correctly passed.
For example, in the 16-bit adder of the present invention, the parameter may be set to 4. Firstly, the first 4 carry bits of the adder are consistent with the carry principle of the precise adder; c 7 (the 7 th carry bit representing 16 bit addition) is modified by the selector to change the length of the carry chain to 7 (i.e. from 0 bit), so that the carry chain lengths of c 8, c 9 and c 10 are 8, 9 and 10 respectively; c 11 is modified by the selector to become 7 (i.e. starting from the 4 th bit of the input), and the carry chain lengths of c 12, c 13 and c 14 are also 8, 9 and 10 respectively.
The experimental results are as follows: the accuracy of the compressed model and the uncompressed model are compared, as shown in table 1:
TABLE 1
Figure BDA00020836662900000710
Based on the same inventive concept, the embodiment of the present invention further provides a speech recognition apparatus, as described in the following embodiments. Because the principle of the speech recognition device for solving the problem is similar to that of the speech recognition method, the implementation of the speech recognition device can refer to the implementation of the speech recognition method, and repeated details are not repeated. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 7 is a block diagram (a) of a speech recognition apparatus according to an embodiment of the present invention, as shown in fig. 7, including:
a down-sampling processing module 701, configured to perform down-sampling processing on the acquired audio data to obtain audio down-sampled data;
a data classification module 702 for dividing the audio downsampling data into training audio data and testing audio data;
a sparsification processing module 703, configured to perform sparsification processing on the weights in the convolution layer and the full-link layer of the binarization convolutional neural network to obtain a sparsified binarization convolutional neural network;
a training module 704, configured to train the sparse binarization convolutional neural network by using the training audio data, so as to obtain a trained binarization convolutional neural network;
a speech recognition module 705, configured to perform speech recognition based on the trained binarized convolutional neural network by using the test audio data.
In this embodiment of the present invention, the sparsification processing module 703 is specifically configured to:
the weights in the convolution layer and the full-connection layer of the binary convolution neural network are thinned as follows:
for weights in convolutional layers: for each convolution kernel, setting the high order of the weight of the convolution kernel to-1 according to a preset proportion;
for weights in the fully connected layer: whether to set the weight of the following succession of each layer to-1 is determined by the value of the first bit weight of each layer.
In this embodiment of the present invention, the sparsification processing module 703 is specifically configured to:
whether to set the weight of the following succession of each layer to-1 is determined by the value of the first bit weight of each layer as follows:
if the weight of the first bit of each layer is greater than 0, then the weights behind each layer do not change;
each layer sets a number of weights following the first bit in each layer to-1 if the weight of the first bit is less than 0.
In the embodiment of the present invention, as shown in fig. 8, the speech recognition apparatus further includes: an addition operation replacement module 706, configured to replace, after performing sparsification processing on the weights in the convolution layer and the full-link layer of the binarization convolutional neural network, the matrix adder in the convolution operation of the binarization convolutional neural network with an approximate adder based on a carry chain cutting principle, so as to obtain a replaced binarization convolutional neural network
Wherein training module 704 is specifically configured to: and training the replaced binary convolutional neural network by using the training audio data to obtain the trained binary convolutional neural network.
In the embodiment of the present invention, the addition operation replacement module 706 specifically adopts the above formulas (1-1) to (1-4) and the approximate adder based on the carry chain cutting principle in the corresponding description form.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the voice recognition method when executing the computer program.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing the speech recognition method is stored in the computer-readable storage medium.
In summary, the speech recognition method, the speech recognition apparatus, the computer device and the computer-readable storage medium provided by the present invention have the following advantages:
the network structure can be more sparse by performing sparsification treatment on the weights in the convolution layer and the full-connection layer of the binarization convolutional neural network, the training audio data is utilized to train the sparse binarization convolutional neural network to obtain a trained binarization convolutional neural network, then the test audio data is utilized, and voice recognition is performed on the basis of the trained binarization convolutional neural network, so that the calculated amount can be reduced. In addition, an approximate adder based on the carry chain cutting principle is adopted to replace a matrix adder in the convolution operation of the binary convolution neural network, so that the acceleration effect can be further played, and the operation time is reduced.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A speech recognition method, comprising:
performing down-sampling processing on the acquired audio data to acquire audio down-sampled data;
dividing the audio downsampling data into training audio data and testing audio data;
carrying out sparsification processing on weights in the convolution layer and the full-connection layer of the binary convolutional neural network to obtain a sparse binary convolutional neural network;
training the thinned binary convolution neural network by using the training audio data to obtain a trained binary convolution neural network;
carrying out voice recognition based on the trained binary convolution neural network by utilizing the test audio data;
after the weight in the convolution layer and the full-link layer of the binary convolution neural network is thinned, the method further comprises the following steps:
and replacing a matrix adder in the convolution operation of the binary convolution neural network by adopting an approximate adder based on a carry chain cutting principle.
2. The speech recognition method of claim 1, wherein the thinning of the weights in the convolution layer and the full-connected layer of the binarized convolutional neural network comprises:
for weights in convolutional layers: for each convolution kernel, setting the high order of the weight of the convolution kernel to-1 according to a preset proportion;
for weights in the fully connected layer: whether to set the weight of the following succession of each layer to-1 is determined by the value of the first bit weight of each layer.
3. The speech recognition method of claim 2, wherein determining whether to set a weight of a successive bit following each layer to-1 from the value of the first bit weight of the layer comprises:
if the weight of the first bit of each layer is greater than 0, then the weights behind each layer do not change;
each layer sets a number of weights following the first bit in each layer to-1 if the weight of the first bit is less than 0.
4. The speech recognition method of claim 1, wherein the approximate adder based on carry chain cutting principle is specifically as follows:
defining three functions gi、piAnd ki
gi=aibi,
Figure FDA0003070629880000011
Wherein, giRepresenting a carry generation function; a isiAnd biRepresenting two input signals at the ith bit;
Figure FDA0003070629880000021
respectively represent a pairi、biTaking the inverse; k is a radical ofiRepresenting a carry cancel function; p is a radical ofiRepresenting a carry propagation function;
Figure FDA0003070629880000022
represents a convolution operation without multiplication;
determining an upper-order signal c of each bit according to the three functionsiAnd a bit siCarry signal ciAnd a sum bit siThe expression of (a) is:
Figure FDA0003070629880000023
Figure FDA0003070629880000024
wherein, only when the carry propagation signals from the i bit to the i-k-1 bit are all true,carry signal c of ith bitiIs dependent on the carry signal c at the i-ki-k
The approximate adder circuit is composed of m circuit blocks, each circuit block is provided with a k-bit adder, a k-bit carry generator and a selector, each selector is connected with two adjacent carry generators in a cascade mode, k is n/m, and n represents the data bit width of the addition operation;
after adding the signals, each carry generator is based on the input of the j-th circuit
Figure FDA0003070629880000025
And
Figure FDA0003070629880000026
generating carry output signals
Figure FDA0003070629880000027
The selector is based on the judgment condition
Figure FDA0003070629880000028
Selecting the carry output signal of the j-th part circuit or the j-1-th part circuit as the carry input signal of the j + 1-th part adder if
Figure FDA0003070629880000029
If true, selecting the carry output signal of the j-1 th circuit; otherwise, selecting the carry output signal of the j part circuit as the carry input signal of the j +1 part adder
Figure FDA00030706298800000210
Adder generation and output for each circuit section
Figure FDA00030706298800000211
Wherein the content of the first and second substances,
Figure FDA00030706298800000212
and
Figure FDA00030706298800000213
an input of a jth circuit block is shown,
Figure FDA00030706298800000214
represents the output of the jth circuit block;
the working principle expression of the selector is as follows:
Figure FDA00030706298800000215
wherein the content of the first and second substances,
Figure FDA00030706298800000216
Figure FDA00030706298800000217
and
Figure FDA00030706298800000218
is the carry output signal of the j-1 th and j-th section circuits,
Figure FDA00030706298800000219
is the carry propagate signal of the ith bit of the jth circuit.
5. A speech recognition apparatus, comprising:
the down-sampling processing module is used for performing down-sampling processing on the acquired audio data to acquire audio down-sampling data;
a data classification module for classifying the audio downsampling data into training audio data and testing audio data;
the sparse processing module is used for carrying out sparse processing on the weights in the convolution layer and the full connection layer of the binary convolutional neural network to obtain a sparse binary convolutional neural network;
the addition operation replacement module is used for replacing a matrix adder in the convolution operation of the binary convolution neural network by adopting an approximate adder based on a carry chain cutting principle after the weights in the convolution layer of the binary convolution neural network and the full connection layer are thinned;
the training module is used for training the sparse binarization convolution neural network by utilizing the training audio data to obtain a trained binarization convolution neural network;
and the voice recognition module is used for carrying out voice recognition on the basis of the trained binary convolution neural network by utilizing the test audio data.
6. The speech recognition apparatus of claim 5, wherein the sparsification processing module is specifically configured to:
the weights in the convolution layer and the full-connection layer of the binary convolution neural network are thinned as follows:
for weights in convolutional layers: for each convolution kernel, setting the high order of the weight of the convolution kernel to-1 according to a preset proportion;
for weights in the fully connected layer: whether to set the weight of the following succession of each layer to-1 is determined by the value of the first bit weight of each layer.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the speech recognition method of any one of claims 1 to 4 when executing the computer program.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the speech recognition method according to any one of claims 1 to 4.
CN201910480466.9A 2019-06-04 2019-06-04 Speech recognition method, speech recognition device, computer equipment and computer readable storage medium Active CN110265002B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910480466.9A CN110265002B (en) 2019-06-04 2019-06-04 Speech recognition method, speech recognition device, computer equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910480466.9A CN110265002B (en) 2019-06-04 2019-06-04 Speech recognition method, speech recognition device, computer equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110265002A CN110265002A (en) 2019-09-20
CN110265002B true CN110265002B (en) 2021-07-23

Family

ID=67916581

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910480466.9A Active CN110265002B (en) 2019-06-04 2019-06-04 Speech recognition method, speech recognition device, computer equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110265002B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852361B (en) * 2019-10-30 2022-10-25 清华大学 Image classification method and device based on improved deep neural network and electronic equipment
CN111583940A (en) * 2020-04-20 2020-08-25 东南大学 Very low power consumption keyword awakening neural network circuit
CN112863520B (en) * 2021-01-18 2023-10-24 东南大学 Binary weight convolutional neural network module and method for identifying voiceprint by using same
CN113409773B (en) * 2021-08-18 2022-01-18 中科南京智能技术研究院 Binaryzation neural network voice awakening method and system
CN114822510B (en) * 2022-06-28 2022-10-04 中科南京智能技术研究院 Voice awakening method and system based on binary convolutional neural network

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101097509A (en) * 2006-06-26 2008-01-02 英特尔公司 Sparse tree adder
CN103259529A (en) * 2012-02-17 2013-08-21 京微雅格(北京)科技有限公司 Integrated circuit using carry skip chains
CN106816147A (en) * 2017-01-25 2017-06-09 上海交通大学 Speech recognition system based on binary neural network acoustic model
CN106909970A (en) * 2017-01-12 2017-06-30 南京大学 A kind of two-value weight convolutional neural networks hardware accelerator computing module based on approximate calculation
CN107153873A (en) * 2017-05-08 2017-09-12 中国科学院计算技术研究所 A kind of two-value convolutional neural networks processor and its application method
CN107203808A (en) * 2017-05-08 2017-09-26 中国科学院计算技术研究所 A kind of two-value Convole Unit and corresponding two-value convolutional neural networks processor
WO2018048907A1 (en) * 2016-09-06 2018-03-15 Neosensory, Inc. C/O Tmc+260 Method and system for providing adjunct sensory information to a user
CN108010515A (en) * 2017-11-21 2018-05-08 清华大学 A kind of speech terminals detection and awakening method and device
WO2018102240A1 (en) * 2016-12-02 2018-06-07 Microsoft Technology Licensing, Llc Joint language understanding and dialogue management
CN109100142A (en) * 2018-06-26 2018-12-28 北京交通大学 A kind of semi-supervised method for diagnosing faults of bearing based on graph theory
CN109214502A (en) * 2017-07-03 2019-01-15 清华大学 Neural network weight discretization method and system
CN109643228A (en) * 2016-10-01 2019-04-16 英特尔公司 Low energy consumption mantissa multiplication for floating point multiplication addition operation
CN109787929A (en) * 2019-02-20 2019-05-21 深圳市宝链人工智能科技有限公司 Signal modulate method, electronic device and computer readable storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101097509A (en) * 2006-06-26 2008-01-02 英特尔公司 Sparse tree adder
CN103259529A (en) * 2012-02-17 2013-08-21 京微雅格(北京)科技有限公司 Integrated circuit using carry skip chains
WO2018048907A1 (en) * 2016-09-06 2018-03-15 Neosensory, Inc. C/O Tmc+260 Method and system for providing adjunct sensory information to a user
CN109643228A (en) * 2016-10-01 2019-04-16 英特尔公司 Low energy consumption mantissa multiplication for floating point multiplication addition operation
WO2018102240A1 (en) * 2016-12-02 2018-06-07 Microsoft Technology Licensing, Llc Joint language understanding and dialogue management
CN106909970A (en) * 2017-01-12 2017-06-30 南京大学 A kind of two-value weight convolutional neural networks hardware accelerator computing module based on approximate calculation
CN106816147A (en) * 2017-01-25 2017-06-09 上海交通大学 Speech recognition system based on binary neural network acoustic model
CN107203808A (en) * 2017-05-08 2017-09-26 中国科学院计算技术研究所 A kind of two-value Convole Unit and corresponding two-value convolutional neural networks processor
CN107153873A (en) * 2017-05-08 2017-09-12 中国科学院计算技术研究所 A kind of two-value convolutional neural networks processor and its application method
CN109214502A (en) * 2017-07-03 2019-01-15 清华大学 Neural network weight discretization method and system
CN108010515A (en) * 2017-11-21 2018-05-08 清华大学 A kind of speech terminals detection and awakening method and device
CN109100142A (en) * 2018-06-26 2018-12-28 北京交通大学 A kind of semi-supervised method for diagnosing faults of bearing based on graph theory
CN109787929A (en) * 2019-02-20 2019-05-21 深圳市宝链人工智能科技有限公司 Signal modulate method, electronic device and computer readable storage medium

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
"A 141 uW, 2.46 pJ/Neuron Binarized Convolutional Neural Network based Self-Selflearning";Shouyi Yin;《IEEE》;20181231;全文 *
"A multilevel Cell STT-MRAM-Based Computing In-Memory Accelerator for binary Convolutioanl Neural Networks";Yu Pan;《IEEE transaction on Magnetics》;20181231;第54卷(第11期);全文 *
"Binarized Neural Networks:Training Neural networks with Weights and Activations Constrained to +1 or -1";Matthieu Courbariaux;《arXiv》;20160317;全文 *
"Binary neural networks for speech recognition";Yan-min QIAN;《Frontiers of Information Technology & Electronic Engineering》;20190513;全文 *
"Deep compression: compressing deep neural networks with pruing";Song Han;《ICLR 2016》;20161231;全文 *
"Low Bits: Binary Neural Network For Vad and Wakeup";Dandan Song;《2018 5th International Conference on Information Science and Control Engineering》;20181231;全文 *
"Spatially-sparse convolutioanl neural networks";Benjamin Graham;《Computer Vision and Pattern Recognition》;20140922;全文 *

Also Published As

Publication number Publication date
CN110265002A (en) 2019-09-20

Similar Documents

Publication Publication Date Title
CN110265002B (en) Speech recognition method, speech recognition device, computer equipment and computer readable storage medium
Gysel et al. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks
Mellempudi et al. Mixed precision training with 8-bit floating point
Kim DeepX: Deep learning accelerator for restricted boltzmann machine artificial neural networks
CN109840589B (en) Method and device for operating convolutional neural network on FPGA
CN111062472B (en) Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN107340993B (en) Arithmetic device and method
US20180018555A1 (en) System and method for building artificial neural network architectures
Li et al. Quantized neural networks with new stochastic multipliers
CN109543029B (en) Text classification method, device, medium and equipment based on convolutional neural network
CN115238893B (en) Neural network model quantification method and device for natural language processing
CN107402905B (en) Neural network-based computing method and device
CN107967132A (en) A kind of adder and multiplier for neural network processor
TWI738048B (en) Arithmetic framework system and method for operating floating-to-fixed arithmetic framework
Wu et al. GBC: An energy-efficient LSTM accelerator with gating units level balanced compression strategy
Fuketa et al. Image-classifier deep convolutional neural network training by 9-bit dedicated hardware to realize validation accuracy and energy efficiency superior to the half precision floating point format
Jiang et al. A low-latency LSTM accelerator using balanced sparsity based on FPGA
Sun et al. HSIM-DNN: Hardware simulator for computation-, storage-and power-efficient deep neural networks
Rajagopal et al. Accurate and efficient fixed point inference for deep neural networks
Temenos et al. A stochastic computing sigma-delta adder architecture for efficient neural network design
Kang et al. Weight partitioning for dynamic fixed-point neuromorphic computing systems
Hsieh et al. A multiplier-less convolutional neural network inference accelerator for intelligent edge devices
Devnath et al. A mathematical approach towards quantization of floating point weights in low power neural networks
CN110990776B (en) Coding distributed computing method, device, computer equipment and storage medium
Li et al. E-Sparse: Boosting the Large Language Model Inference through Entropy-based N: M Sparsity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230717

Address after: 610, Floor 6, Block A, No. 2, Lize Middle Second Road, Chaoyang District, Beijing 100102

Patentee after: Zhongguancun Technology Leasing Co.,Ltd.

Address before: 100056 2212, 22 / F, No.9, North Fourth Ring Road West, Haidian District, Beijing

Patentee before: Beijing Qingwei Intelligent Technology Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231114

Address after: 100192 201, 2nd floor, building 26, yard 1, Baosheng South Road, Haidian District, Beijing

Patentee after: Beijing Qingwei Intelligent Technology Co.,Ltd.

Address before: 610, Floor 6, Block A, No. 2, Lize Middle Second Road, Chaoyang District, Beijing 100102

Patentee before: Zhongguancun Technology Leasing Co.,Ltd.

TR01 Transfer of patent right
CB03 Change of inventor or designer information

Inventor after: Liu Ling

Inventor after: OuYang Peng

Inventor after: Li Xiudong

Inventor after: Wang Bo

Inventor before: Liu Ling

Inventor before: OuYang Peng

Inventor before: Yin Shouyi

Inventor before: Li Xiudong

Inventor before: Wang Bo

CB03 Change of inventor or designer information