CN115249479A - BRNN-based power grid dispatching complex speech recognition method, system and terminal - Google Patents
BRNN-based power grid dispatching complex speech recognition method, system and terminal Download PDFInfo
- Publication number
- CN115249479A CN115249479A CN202210078771.7A CN202210078771A CN115249479A CN 115249479 A CN115249479 A CN 115249479A CN 202210078771 A CN202210078771 A CN 202210078771A CN 115249479 A CN115249479 A CN 115249479A
- Authority
- CN
- China
- Prior art keywords
- brnn
- acoustic model
- speech recognition
- recognition method
- power grid
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000013528 artificial neural network Methods 0.000 claims abstract description 19
- 238000007781 pre-processing Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 13
- 238000001514 detection method Methods 0.000 claims description 7
- 238000009432 framing Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 241001672694 Citrus reticulata Species 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 3
- 230000037433 frameshift Effects 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a BRNN-based power grid dispatching complex voice recognition method, a system and a terminal, relating to the technical field of voice recognition, and the technical scheme is as follows: acquiring original voice information, and preprocessing the original voice information to obtain voice data; extracting key feature parameters in the voice data to obtain a key feature sequence; constructing an initial acoustic model, and training the initial acoustic model by using a data set to obtain a final acoustic model; inputting the key characteristic sequence into a final acoustic model to obtain a text recognized by original voice information; the initial acoustic model is a BRNN-CTC model which is constructed in an end-to-end mode by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is constructed by taking initials and finals as basic units. According to the invention, the BRNN neural network is constructed by taking the initial consonant and the final as basic units, and the BRNN-CTC model has higher identification accuracy in a complex environment.
Description
Technical Field
The invention relates to the technical field of voice recognition, in particular to a BRNN-based power grid dispatching complex voice recognition method, system and terminal.
Background
Automatic Speech Recognition (ASR) is a technology that can convert Speech signals into text. Acoustic models based on GMM-HMM have been considered in the past as the main framework prevalent in the ASR field, where GMM is used to recognize frames as states, act on the mapping of speech inputs to HMM states, HMM is used to combine states into phonemes, and combining the phonemes into words allows the change in the speech signal over a time series to be obtained. With the increasing popularity of deep learning, DNN is introduced into ASR acoustic modeling, and unlike GMM-HMM based acoustic models, DNN models the observation state probability instead of GMM, which enables better utilization of context information to realize nonlinear feature transformation and improve the accuracy of speech recognition. However, the framework based on the DNN-HMM still has the problem that training data needs forced alignment, which causes difficulty in optimization, and the HMM is a generated model whose conditional independence assumption is inconsistent with reality. Under such circumstances, RNN and LSTM can further improve the accuracy of speech recognition by their powerful sequence output capabilities for sequence tagging tasks.
The dispatching team of the power dispatching control center is used as one of power production running mechanisms, and the existing manual dispatching mode is low in efficiency and high in cost and cannot meet increasing workload. For this reason, in the prior art, a voice recognition technology is used in the power dispatching operation, and voice information is converted into a text signal and then converted into a command recognizable by the device. However, the power grid service coverage is wide, not only the operating environment is complex and various, but also the accuracy of the existing voice recognition technology is relatively high due to the influence of the word number and language capability of the operator.
Therefore, how to research and design a BRNN-based power grid dispatching complex speech recognition method, system and terminal capable of overcoming the defects is provided.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a BRNN-based power grid dispatching complex speech recognition method, system and terminal, a BRNN neural network is constructed by taking initials and finals as basic units, a BRNN-CTC model in an end-to-end mode is constructed by combining a CTC target loss function, context information can be better utilized by utilizing a BRNN network structure, and higher recognition accuracy rate is achieved in a complex environment.
The technical purpose of the invention is realized by the following technical scheme:
in a first aspect, a BRNN-based power grid dispatching complex speech recognition method is provided, which includes the following steps:
acquiring original voice information, and preprocessing the original voice information to obtain voice data;
extracting key feature parameters in the voice data to obtain a key feature sequence;
constructing an initial acoustic model, and training the initial acoustic model by using a data set to obtain a final acoustic model;
inputting the key characteristic sequence into a final acoustic model to obtain a text recognized by original voice information;
the method is characterized in that the initial acoustic model is an end-to-end BRNN-CTC model constructed by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is constructed by taking initials and finals as basic units.
Further, the preprocessing of the original voice information comprises:
pre-emphasis, namely performing high-pass filtering on the original voice information to enhance high-frequency signals and weaken low-frequency signals to obtain voice signals;
framing by adopting an overlapped sampling method, wherein the length of the frame is 10Ms-30Ms, and the ratio of frame shift to frame length is 0-0.5;
windowing, namely performing transformation calculation on elements of each frame after framing and elements corresponding to a window sequence by adopting a Hamming window function;
and end point detection, namely performing effective voice detection based on short-time energy and short-time average zero crossing rate.
Further, the initial acoustic model specifically includes:
let the training data be S = { (x) 1 ,z 1 ),(x 2 ,z 2 ),...(x N ,z N ) Where the number of samples is N, the samples are x, x = (x) 1 ,x 2 ,x 3 ,...,x T ),x i ∈R m Denotes a division of a sample into T frames, x i Representing the characteristic parameters of the ith frame, and labeled as z, z = (z) 1 ,z 2 ,z 3 ,...z U ) Indicating the correct phoneme corresponding to the sample x;
after the characteristics are processed by two RNNs, the posterior probability y of the phoneme is calculated by softmax,representing the probability that the phoneme is k at time t, the probabilities of all phonemes over a frame add up to 1, i.e.
Further, the data set adopts a data set of multiple types of samples, and the multiple types of samples comprise voice information of male, female, mandarin, dialect, noisy environment, non-noisy environment, single person, multiple persons and different age groups.
Further, the identification method further comprises the following steps:
calculating by character comparison according to the recognized text and a standard result obtained by feedback to obtain a character error rate;
and adjusting the proportion of the corresponding class sample in the data set in a positive correlation mode according to the word error rate.
Further, the calculation formula of the word error rate is specifically as follows:
wherein, P represents the error rate; n represents the total number of characters of the standard result; s represents the number of characters needing to be replaced; d represents the number of characters needing to be deleted; i denotes the number of characters that need to be inserted.
Further, the occupation ratio of the samples in the data set is synchronously adjusted according to the word error rate of the multiple types of samples.
In a second aspect, a BRNN-based power grid dispatching complex speech recognition system is provided, including:
the preprocessing module is used for acquiring original voice information and preprocessing the original voice information to obtain voice data;
the feature extraction module is used for extracting key feature parameters in the voice data to obtain a key feature sequence;
the model building module is used for building an initial acoustic model and training the initial acoustic model by using a data set to obtain a final acoustic model;
the text recognition module is used for inputting the key characteristic sequence into the final acoustic model to obtain a text recognized by the original voice information;
the initial acoustic model in the model construction module is a BRNN-CTC model which is constructed in an end-to-end mode by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is constructed by taking initials and finals as basic units.
In a third aspect, a computer terminal is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the BRNN-based power grid dispatching complex speech recognition method according to any one of the first aspect is implemented.
In a fourth aspect, a computer-readable medium is provided, on which a computer program is stored, where the computer program is executed by a processor, and the BRNN-based power grid dispatching complex speech recognition method according to any one of the first aspect may be implemented.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the BRNN-based power grid dispatching complex speech recognition method, the BRNN neural network is constructed by taking initial consonants and final consonants as basic units, and a BRNN-CTC model in an end-to-end mode is constructed by combining with a CTC target loss function, so that context information can be better utilized by utilizing a BRNN network structure, and higher recognition accuracy rate is achieved in a complex environment;
2. the method dynamically adjusts the ratio of the data set for training in the BRNN-CTC model according to the word error rate of different types of samples, so that the long-time identification process can always keep higher accuracy.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a flow chart in an embodiment of the invention;
FIG. 2 is a CTC-based end-to-end framework diagram in an embodiment of the present invention;
fig. 3 is a block diagram of a system in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
Example 1: the BRNN-based power grid dispatching complex speech recognition method, as shown in FIG. 1, comprises the following steps:
s1: acquiring original voice information, and preprocessing the original voice information to obtain voice data;
s2: extracting key feature parameters in the voice data to obtain a key feature sequence;
s3: constructing an initial acoustic model, and training the initial acoustic model by using a data set to obtain a final acoustic model; the initial acoustic model is a BRNN-CTC model which is constructed in an end-to-end mode by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is constructed by taking initials and finals as basic units;
s4: inputting the key characteristic sequence into a final acoustic model to obtain a text recognized by original voice information; and converting the text into a Linux instruction, transmitting the Linux instruction to a specified dispatching desk, and executing the Linux instruction by a specified dispatching desk workstation to open the picture.
In this embodiment, the pre-processing of the original speech information includes pre-emphasis, framing, windowing, and endpoint detection. And pre-emphasis, namely performing high-pass filtering on the original voice information to enhance high-frequency signals and weaken low-frequency signals to obtain voice signals. And (3) framing by adopting an overlapped sampling method, wherein the length of the frame is 10Ms-30Ms, and the ratio of frame shift to frame length is 0-0.5. And (4) windowing, namely performing transformation calculation on the elements of each frame after framing and the elements corresponding to the window sequence by adopting a Hamming window function. And end point detection, namely performing effective voice detection based on short-time energy and short-time average zero crossing rate.
As shown in fig. 2, the initial acoustic model is specifically: let the training data be S = { (x) 1 ,z 1 ),(x 2 ,z 2 ),...(x N ,z N ) Where the number of samples is N, the samples are x, x = (x) 1 ,x 2 ,x 3 ,...,x T ),x i ∈R m Denotes a division of a sample into T frames, x i Representing the characteristic parameters of the ith frame, and labeled as z, z = (z) 1 ,z 2 ,z 3 ,...z U ) Indicating the correct phoneme corresponding to the sample x; after the characteristics are processed by two RNNs, the posterior probability y of the phoneme is calculated by softmax,representing the probability that the phoneme is k at time t, the probabilities of all phonemes over a frame add up to 1, i.e.
In this embodiment, the data set is randomly divided according to the proportion of 7:3 to obtain a training set and a testing set, and then the training set is divided according to a certain proportion to obtain a verification set and a training set, wherein the verification set is used for adjusting parameters, and the testing set is used for verifying the effect of the model.
The data set adopts a data set of multiple types of samples, and the multiple types of samples comprise voice information of males, females, mandarin, dialects, noisy environments, non-noisy environments, single persons, multiple persons and different age groups.
The identification method further comprises the following steps: calculating by character comparison according to the recognized text and a standard result obtained by feedback to obtain a character error rate; and adjusting the proportion of the corresponding class sample in the data set in a positive correlation mode according to the word error rate.
The calculation formula of the word error rate is specifically as follows:
wherein, P represents the error rate; n represents the total number of characters of the standard result; s represents the number of characters needing to be replaced; d represents the number of characters needing to be deleted; i denotes the number of characters that need to be inserted.
As an alternative embodiment, the occupation ratio of samples in the data set is synchronously adjusted according to the word error rate of multiple types of samples.
And (3) experimental verification: 13652 pieces of voice provided in this embodiment are processed to obtain 14H50min effective data, the voice quality is uneven, the collected data has problems of male and female mixing, dialect, accent, and the like, the constructed voice adopts mandarin, each voice has about 10 seconds, the number of non-repeated words has 36115, then the training set and the test set are divided according to the proportion of 7:3, and then the feature extraction is performed on the data set.
And inputting the training set into a BRNN-CTC model for training, and under the condition of ensuring the consistency of the hidden layers, because BRNN is bidirectional, the number of neurons of each hidden layer is doubled.
The training result is shown in table 1, the average accuracy comparison result is shown in table 2, the effect of performing speech recognition by using the deep neural network is better than that of a shallow network based on a GMM-HMM, and the BRNN structure can better utilize context information to obtain higher recognition accuracy.
TABLE 1 BRNN-CTC-based training results
TABLE 2 comparison of average accuracy
From the above, the speech recognition based on BRNN for complex environment is really better than the traditional model recognition.
Example 2: the BRNN-based power grid dispatching complex speech recognition system comprises a preprocessing module, a feature extraction module, a model construction module and a text recognition module, as shown in FIG. 3.
The preprocessing module is used for acquiring original voice information and preprocessing the original voice information to obtain voice data. And the feature extraction module is used for extracting key feature parameters in the voice data to obtain a key feature sequence. The model building module is used for building an initial acoustic model and training the initial acoustic model by using a data set to obtain a final acoustic model; the initial acoustic model in the model construction module is a BRNN-CTC model which is constructed in an end-to-end mode by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is constructed by taking initials and finals as basic units. And the text recognition module is used for inputting the key characteristic sequence into the final acoustic model to obtain a text recognized by the original voice information.
The working principle is as follows: according to the invention, the BRNN neural network is constructed by taking the initial consonant and the final sound as basic units, and the BRNN-CTC model in an end-to-end mode is constructed by combining the CTC target loss function, so that the context information can be better utilized by utilizing the BRNN network structure, and the identification accuracy rate is higher in a complex environment; in addition, the data set ratio for training in the BRNN-CTC model is dynamically adjusted according to the word error rate of different types of samples, so that the long-time identification process can always keep higher accuracy.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. The BRNN-based power grid dispatching complex speech recognition method comprises the following steps:
acquiring original voice information, and preprocessing the original voice information to obtain voice data;
extracting key feature parameters in the voice data to obtain a key feature sequence;
constructing an initial acoustic model, and training the initial acoustic model by using a data set to obtain a final acoustic model;
inputting the key characteristic sequence into a final acoustic model to obtain a text recognized by original voice information;
the method is characterized in that the initial acoustic model is an end-to-end BRNN-CTC model constructed by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is constructed by taking initials and finals as basic units.
2. The BRNN-based power grid dispatching complex speech recognition method of claim 1, wherein the preprocessing of the raw speech information comprises:
pre-emphasis, namely performing high-pass filtering on the original voice information to enhance high-frequency signals and weaken low-frequency signals to obtain voice signals;
framing by adopting an overlapped sampling method, wherein the length of the frame is 10Ms-30Ms, and the ratio of frame shift to frame length is 0-0.5;
windowing, namely performing transformation calculation on elements of each frame after framing and elements corresponding to a window sequence by adopting a Hamming window function;
and end point detection, namely performing effective voice detection based on short-time energy and short-time average zero crossing rate.
3. The BRNN-based power grid dispatching complex speech recognition method of claim 1, wherein the initial acoustic model is specifically:
let the training data be S = { (x) 1 ,z 1 ),(x 2 ,z 2 ),...(x N ,z N ) Where the number of samples is N, the samples are x, x = (x) 1 ,x 2 ,x 3 ,...,x T ),x i ∈R m Denotes a division of a sample into T frames, x i Representing the characteristic parameters of the ith frame, and labeled as z, z = (z) 1 ,z 2 ,z 3 ,...z U ) Indicating the correct phoneme corresponding to the sample x;
4. The BRNN-based power grid dispatching complex speech recognition method of claim 1, wherein the data set is a data set of multiple types of samples, and the multiple types of samples are composed of speech information of male, female, mandarin, dialect, noisy environment, non-noisy environment, single person, multiple persons, and different age groups.
5. The BRNN-based power grid dispatching complex speech recognition method of claim 4, wherein the recognition method further comprises:
calculating by character comparison according to the recognized text and a standard result obtained by feedback to obtain a character error rate;
and adjusting the proportion of the corresponding class sample in the data set in a positive correlation mode according to the word error rate.
6. The BRNN-based power grid dispatching complex speech recognition method of claim 5, wherein the calculation formula of the word error rate is specifically as follows:
wherein, P represents the error rate; n represents the total number of characters of the standard result; s represents the number of characters needing to be replaced; d represents the number of characters needing to be deleted; i denotes the number of characters that need to be inserted.
7. The BRNN-based power grid dispatching complex speech recognition method of claim 1, wherein the fraction of samples in the data set is synchronously adjusted according to the word error rate of multiple types of samples.
8. A BRNN-based power grid dispatching complex speech recognition system is characterized by comprising:
the preprocessing module is used for acquiring original voice information and preprocessing the original voice information to obtain voice data;
the feature extraction module is used for extracting key feature parameters in the voice data to obtain a key feature sequence;
the model building module is used for building an initial acoustic model and training the initial acoustic model by using a data set to obtain a final acoustic model;
the text recognition module is used for inputting the key characteristic sequence into the final acoustic model to obtain a text recognized by the original voice information;
the method is characterized in that an initial acoustic model in the model building module is a BRNN-CTC model which is built in an end-to-end mode by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is built by taking initials and finals as basic units.
9. A computer terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the program implements the BRNN-based grid scheduling complex speech recognition method according to any of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, the computer program being executable by a processor to implement the BRNN-based grid dispatch complex speech recognition method according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210078771.7A CN115249479A (en) | 2022-01-24 | 2022-01-24 | BRNN-based power grid dispatching complex speech recognition method, system and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210078771.7A CN115249479A (en) | 2022-01-24 | 2022-01-24 | BRNN-based power grid dispatching complex speech recognition method, system and terminal |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115249479A true CN115249479A (en) | 2022-10-28 |
Family
ID=83697990
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210078771.7A Pending CN115249479A (en) | 2022-01-24 | 2022-01-24 | BRNN-based power grid dispatching complex speech recognition method, system and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115249479A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116631379A (en) * | 2023-07-20 | 2023-08-22 | 中邮消费金融有限公司 | Speech recognition method, device, equipment and storage medium |
CN116825109A (en) * | 2023-08-30 | 2023-09-29 | 深圳市友杰智新科技有限公司 | Processing method, device, equipment and medium for voice command misrecognition |
-
2022
- 2022-01-24 CN CN202210078771.7A patent/CN115249479A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116631379A (en) * | 2023-07-20 | 2023-08-22 | 中邮消费金融有限公司 | Speech recognition method, device, equipment and storage medium |
CN116631379B (en) * | 2023-07-20 | 2023-09-26 | 中邮消费金融有限公司 | Speech recognition method, device, equipment and storage medium |
CN116825109A (en) * | 2023-08-30 | 2023-09-29 | 深圳市友杰智新科技有限公司 | Processing method, device, equipment and medium for voice command misrecognition |
CN116825109B (en) * | 2023-08-30 | 2023-12-08 | 深圳市友杰智新科技有限公司 | Processing method, device, equipment and medium for voice command misrecognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11062699B2 (en) | Speech recognition with trained GMM-HMM and LSTM models | |
CN111429889B (en) | Method, apparatus, device and computer readable storage medium for real-time speech recognition based on truncated attention | |
WO2022083083A1 (en) | Sound conversion system and training method for same | |
WO2018227781A1 (en) | Voice recognition method, apparatus, computer device, and storage medium | |
CN110782872A (en) | Language identification method and device based on deep convolutional recurrent neural network | |
CN110246488B (en) | Voice conversion method and device of semi-optimized cycleGAN model | |
CN107408384A (en) | The end-to-end speech recognition of deployment | |
CN109147774B (en) | Improved time-delay neural network acoustic model | |
CN112802448A (en) | Speech synthesis method and system for generating new tone | |
JP2023542685A (en) | Speech recognition method, speech recognition device, computer equipment, and computer program | |
CN115249479A (en) | BRNN-based power grid dispatching complex speech recognition method, system and terminal | |
CN115019776A (en) | Voice recognition model, training method thereof, voice recognition method and device | |
Bhosale et al. | End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios. | |
Ismail et al. | Mfcc-vq approach for qalqalahtajweed rule checking | |
CN111862934A (en) | Method for improving speech synthesis model and speech synthesis method and device | |
Wang et al. | Research on speech emotion recognition technology based on deep and shallow neural network | |
CN110910891A (en) | Speaker segmentation labeling method and device based on long-time memory neural network | |
CN114944150A (en) | Dual-task-based Conformer land-air communication acoustic model construction method | |
CN111862956A (en) | Data processing method, device, equipment and storage medium | |
CN113571095B (en) | Speech emotion recognition method and system based on nested deep neural network | |
CN111090726A (en) | NLP-based electric power industry character customer service interaction method | |
CN112133292A (en) | End-to-end automatic voice recognition method for civil aviation land-air communication field | |
CN115836300A (en) | Self-training WaveNet for text-to-speech | |
Radha et al. | Speech and speaker recognition using raw waveform modeling for adult and children’s speech: a comprehensive review | |
CN111933121B (en) | Acoustic model training method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |