CN115249479A - BRNN-based power grid dispatching complex speech recognition method, system and terminal - Google Patents

BRNN-based power grid dispatching complex speech recognition method, system and terminal Download PDF

Info

Publication number
CN115249479A
CN115249479A CN202210078771.7A CN202210078771A CN115249479A CN 115249479 A CN115249479 A CN 115249479A CN 202210078771 A CN202210078771 A CN 202210078771A CN 115249479 A CN115249479 A CN 115249479A
Authority
CN
China
Prior art keywords
brnn
acoustic model
speech recognition
recognition method
power grid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210078771.7A
Other languages
Chinese (zh)
Inventor
童浩
陈筱
李含宇
梁少华
封雨欣
熊玉仙
付裕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze University
Original Assignee
Yangtze University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze University filed Critical Yangtze University
Priority to CN202210078771.7A priority Critical patent/CN115249479A/en
Publication of CN115249479A publication Critical patent/CN115249479A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a BRNN-based power grid dispatching complex voice recognition method, a system and a terminal, relating to the technical field of voice recognition, and the technical scheme is as follows: acquiring original voice information, and preprocessing the original voice information to obtain voice data; extracting key feature parameters in the voice data to obtain a key feature sequence; constructing an initial acoustic model, and training the initial acoustic model by using a data set to obtain a final acoustic model; inputting the key characteristic sequence into a final acoustic model to obtain a text recognized by original voice information; the initial acoustic model is a BRNN-CTC model which is constructed in an end-to-end mode by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is constructed by taking initials and finals as basic units. According to the invention, the BRNN neural network is constructed by taking the initial consonant and the final as basic units, and the BRNN-CTC model has higher identification accuracy in a complex environment.

Description

BRNN-based power grid dispatching complex speech recognition method, system and terminal
Technical Field
The invention relates to the technical field of voice recognition, in particular to a BRNN-based power grid dispatching complex voice recognition method, system and terminal.
Background
Automatic Speech Recognition (ASR) is a technology that can convert Speech signals into text. Acoustic models based on GMM-HMM have been considered in the past as the main framework prevalent in the ASR field, where GMM is used to recognize frames as states, act on the mapping of speech inputs to HMM states, HMM is used to combine states into phonemes, and combining the phonemes into words allows the change in the speech signal over a time series to be obtained. With the increasing popularity of deep learning, DNN is introduced into ASR acoustic modeling, and unlike GMM-HMM based acoustic models, DNN models the observation state probability instead of GMM, which enables better utilization of context information to realize nonlinear feature transformation and improve the accuracy of speech recognition. However, the framework based on the DNN-HMM still has the problem that training data needs forced alignment, which causes difficulty in optimization, and the HMM is a generated model whose conditional independence assumption is inconsistent with reality. Under such circumstances, RNN and LSTM can further improve the accuracy of speech recognition by their powerful sequence output capabilities for sequence tagging tasks.
The dispatching team of the power dispatching control center is used as one of power production running mechanisms, and the existing manual dispatching mode is low in efficiency and high in cost and cannot meet increasing workload. For this reason, in the prior art, a voice recognition technology is used in the power dispatching operation, and voice information is converted into a text signal and then converted into a command recognizable by the device. However, the power grid service coverage is wide, not only the operating environment is complex and various, but also the accuracy of the existing voice recognition technology is relatively high due to the influence of the word number and language capability of the operator.
Therefore, how to research and design a BRNN-based power grid dispatching complex speech recognition method, system and terminal capable of overcoming the defects is provided.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a BRNN-based power grid dispatching complex speech recognition method, system and terminal, a BRNN neural network is constructed by taking initials and finals as basic units, a BRNN-CTC model in an end-to-end mode is constructed by combining a CTC target loss function, context information can be better utilized by utilizing a BRNN network structure, and higher recognition accuracy rate is achieved in a complex environment.
The technical purpose of the invention is realized by the following technical scheme:
in a first aspect, a BRNN-based power grid dispatching complex speech recognition method is provided, which includes the following steps:
acquiring original voice information, and preprocessing the original voice information to obtain voice data;
extracting key feature parameters in the voice data to obtain a key feature sequence;
constructing an initial acoustic model, and training the initial acoustic model by using a data set to obtain a final acoustic model;
inputting the key characteristic sequence into a final acoustic model to obtain a text recognized by original voice information;
the method is characterized in that the initial acoustic model is an end-to-end BRNN-CTC model constructed by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is constructed by taking initials and finals as basic units.
Further, the preprocessing of the original voice information comprises:
pre-emphasis, namely performing high-pass filtering on the original voice information to enhance high-frequency signals and weaken low-frequency signals to obtain voice signals;
framing by adopting an overlapped sampling method, wherein the length of the frame is 10Ms-30Ms, and the ratio of frame shift to frame length is 0-0.5;
windowing, namely performing transformation calculation on elements of each frame after framing and elements corresponding to a window sequence by adopting a Hamming window function;
and end point detection, namely performing effective voice detection based on short-time energy and short-time average zero crossing rate.
Further, the initial acoustic model specifically includes:
let the training data be S = { (x) 1 ,z 1 ),(x 2 ,z 2 ),...(x N ,z N ) Where the number of samples is N, the samples are x, x = (x) 1 ,x 2 ,x 3 ,...,x T ),x i ∈R m Denotes a division of a sample into T frames, x i Representing the characteristic parameters of the ith frame, and labeled as z, z = (z) 1 ,z 2 ,z 3 ,...z U ) Indicating the correct phoneme corresponding to the sample x;
after the characteristics are processed by two RNNs, the posterior probability y of the phoneme is calculated by softmax,
Figure BDA0003485205790000021
representing the probability that the phoneme is k at time t, the probabilities of all phonemes over a frame add up to 1, i.e.
Figure BDA0003485205790000022
Further, the data set adopts a data set of multiple types of samples, and the multiple types of samples comprise voice information of male, female, mandarin, dialect, noisy environment, non-noisy environment, single person, multiple persons and different age groups.
Further, the identification method further comprises the following steps:
calculating by character comparison according to the recognized text and a standard result obtained by feedback to obtain a character error rate;
and adjusting the proportion of the corresponding class sample in the data set in a positive correlation mode according to the word error rate.
Further, the calculation formula of the word error rate is specifically as follows:
Figure BDA0003485205790000023
wherein, P represents the error rate; n represents the total number of characters of the standard result; s represents the number of characters needing to be replaced; d represents the number of characters needing to be deleted; i denotes the number of characters that need to be inserted.
Further, the occupation ratio of the samples in the data set is synchronously adjusted according to the word error rate of the multiple types of samples.
In a second aspect, a BRNN-based power grid dispatching complex speech recognition system is provided, including:
the preprocessing module is used for acquiring original voice information and preprocessing the original voice information to obtain voice data;
the feature extraction module is used for extracting key feature parameters in the voice data to obtain a key feature sequence;
the model building module is used for building an initial acoustic model and training the initial acoustic model by using a data set to obtain a final acoustic model;
the text recognition module is used for inputting the key characteristic sequence into the final acoustic model to obtain a text recognized by the original voice information;
the initial acoustic model in the model construction module is a BRNN-CTC model which is constructed in an end-to-end mode by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is constructed by taking initials and finals as basic units.
In a third aspect, a computer terminal is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the BRNN-based power grid dispatching complex speech recognition method according to any one of the first aspect is implemented.
In a fourth aspect, a computer-readable medium is provided, on which a computer program is stored, where the computer program is executed by a processor, and the BRNN-based power grid dispatching complex speech recognition method according to any one of the first aspect may be implemented.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the BRNN-based power grid dispatching complex speech recognition method, the BRNN neural network is constructed by taking initial consonants and final consonants as basic units, and a BRNN-CTC model in an end-to-end mode is constructed by combining with a CTC target loss function, so that context information can be better utilized by utilizing a BRNN network structure, and higher recognition accuracy rate is achieved in a complex environment;
2. the method dynamically adjusts the ratio of the data set for training in the BRNN-CTC model according to the word error rate of different types of samples, so that the long-time identification process can always keep higher accuracy.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a flow chart in an embodiment of the invention;
FIG. 2 is a CTC-based end-to-end framework diagram in an embodiment of the present invention;
fig. 3 is a block diagram of a system in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
Example 1: the BRNN-based power grid dispatching complex speech recognition method, as shown in FIG. 1, comprises the following steps:
s1: acquiring original voice information, and preprocessing the original voice information to obtain voice data;
s2: extracting key feature parameters in the voice data to obtain a key feature sequence;
s3: constructing an initial acoustic model, and training the initial acoustic model by using a data set to obtain a final acoustic model; the initial acoustic model is a BRNN-CTC model which is constructed in an end-to-end mode by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is constructed by taking initials and finals as basic units;
s4: inputting the key characteristic sequence into a final acoustic model to obtain a text recognized by original voice information; and converting the text into a Linux instruction, transmitting the Linux instruction to a specified dispatching desk, and executing the Linux instruction by a specified dispatching desk workstation to open the picture.
In this embodiment, the pre-processing of the original speech information includes pre-emphasis, framing, windowing, and endpoint detection. And pre-emphasis, namely performing high-pass filtering on the original voice information to enhance high-frequency signals and weaken low-frequency signals to obtain voice signals. And (3) framing by adopting an overlapped sampling method, wherein the length of the frame is 10Ms-30Ms, and the ratio of frame shift to frame length is 0-0.5. And (4) windowing, namely performing transformation calculation on the elements of each frame after framing and the elements corresponding to the window sequence by adopting a Hamming window function. And end point detection, namely performing effective voice detection based on short-time energy and short-time average zero crossing rate.
As shown in fig. 2, the initial acoustic model is specifically: let the training data be S = { (x) 1 ,z 1 ),(x 2 ,z 2 ),...(x N ,z N ) Where the number of samples is N, the samples are x, x = (x) 1 ,x 2 ,x 3 ,...,x T ),x i ∈R m Denotes a division of a sample into T frames, x i Representing the characteristic parameters of the ith frame, and labeled as z, z = (z) 1 ,z 2 ,z 3 ,...z U ) Indicating the correct phoneme corresponding to the sample x; after the characteristics are processed by two RNNs, the posterior probability y of the phoneme is calculated by softmax,
Figure BDA0003485205790000041
representing the probability that the phoneme is k at time t, the probabilities of all phonemes over a frame add up to 1, i.e.
Figure BDA0003485205790000042
In this embodiment, the data set is randomly divided according to the proportion of 7:3 to obtain a training set and a testing set, and then the training set is divided according to a certain proportion to obtain a verification set and a training set, wherein the verification set is used for adjusting parameters, and the testing set is used for verifying the effect of the model.
The data set adopts a data set of multiple types of samples, and the multiple types of samples comprise voice information of males, females, mandarin, dialects, noisy environments, non-noisy environments, single persons, multiple persons and different age groups.
The identification method further comprises the following steps: calculating by character comparison according to the recognized text and a standard result obtained by feedback to obtain a character error rate; and adjusting the proportion of the corresponding class sample in the data set in a positive correlation mode according to the word error rate.
The calculation formula of the word error rate is specifically as follows:
Figure BDA0003485205790000043
wherein, P represents the error rate; n represents the total number of characters of the standard result; s represents the number of characters needing to be replaced; d represents the number of characters needing to be deleted; i denotes the number of characters that need to be inserted.
As an alternative embodiment, the occupation ratio of samples in the data set is synchronously adjusted according to the word error rate of multiple types of samples.
And (3) experimental verification: 13652 pieces of voice provided in this embodiment are processed to obtain 14H50min effective data, the voice quality is uneven, the collected data has problems of male and female mixing, dialect, accent, and the like, the constructed voice adopts mandarin, each voice has about 10 seconds, the number of non-repeated words has 36115, then the training set and the test set are divided according to the proportion of 7:3, and then the feature extraction is performed on the data set.
And inputting the training set into a BRNN-CTC model for training, and under the condition of ensuring the consistency of the hidden layers, because BRNN is bidirectional, the number of neurons of each hidden layer is doubled.
The training result is shown in table 1, the average accuracy comparison result is shown in table 2, the effect of performing speech recognition by using the deep neural network is better than that of a shallow network based on a GMM-HMM, and the BRNN structure can better utilize context information to obtain higher recognition accuracy.
TABLE 1 BRNN-CTC-based training results
Figure BDA0003485205790000051
TABLE 2 comparison of average accuracy
Figure BDA0003485205790000052
From the above, the speech recognition based on BRNN for complex environment is really better than the traditional model recognition.
Example 2: the BRNN-based power grid dispatching complex speech recognition system comprises a preprocessing module, a feature extraction module, a model construction module and a text recognition module, as shown in FIG. 3.
The preprocessing module is used for acquiring original voice information and preprocessing the original voice information to obtain voice data. And the feature extraction module is used for extracting key feature parameters in the voice data to obtain a key feature sequence. The model building module is used for building an initial acoustic model and training the initial acoustic model by using a data set to obtain a final acoustic model; the initial acoustic model in the model construction module is a BRNN-CTC model which is constructed in an end-to-end mode by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is constructed by taking initials and finals as basic units. And the text recognition module is used for inputting the key characteristic sequence into the final acoustic model to obtain a text recognized by the original voice information.
The working principle is as follows: according to the invention, the BRNN neural network is constructed by taking the initial consonant and the final sound as basic units, and the BRNN-CTC model in an end-to-end mode is constructed by combining the CTC target loss function, so that the context information can be better utilized by utilizing the BRNN network structure, and the identification accuracy rate is higher in a complex environment; in addition, the data set ratio for training in the BRNN-CTC model is dynamically adjusted according to the word error rate of different types of samples, so that the long-time identification process can always keep higher accuracy.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. The BRNN-based power grid dispatching complex speech recognition method comprises the following steps:
acquiring original voice information, and preprocessing the original voice information to obtain voice data;
extracting key feature parameters in the voice data to obtain a key feature sequence;
constructing an initial acoustic model, and training the initial acoustic model by using a data set to obtain a final acoustic model;
inputting the key characteristic sequence into a final acoustic model to obtain a text recognized by original voice information;
the method is characterized in that the initial acoustic model is an end-to-end BRNN-CTC model constructed by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is constructed by taking initials and finals as basic units.
2. The BRNN-based power grid dispatching complex speech recognition method of claim 1, wherein the preprocessing of the raw speech information comprises:
pre-emphasis, namely performing high-pass filtering on the original voice information to enhance high-frequency signals and weaken low-frequency signals to obtain voice signals;
framing by adopting an overlapped sampling method, wherein the length of the frame is 10Ms-30Ms, and the ratio of frame shift to frame length is 0-0.5;
windowing, namely performing transformation calculation on elements of each frame after framing and elements corresponding to a window sequence by adopting a Hamming window function;
and end point detection, namely performing effective voice detection based on short-time energy and short-time average zero crossing rate.
3. The BRNN-based power grid dispatching complex speech recognition method of claim 1, wherein the initial acoustic model is specifically:
let the training data be S = { (x) 1 ,z 1 ),(x 2 ,z 2 ),...(x N ,z N ) Where the number of samples is N, the samples are x, x = (x) 1 ,x 2 ,x 3 ,...,x T ),x i ∈R m Denotes a division of a sample into T frames, x i Representing the characteristic parameters of the ith frame, and labeled as z, z = (z) 1 ,z 2 ,z 3 ,...z U ) Indicating the correct phoneme corresponding to the sample x;
after the characteristics are processed by two RNNs, the posterior probability y of the phoneme is calculated by softmax,
Figure FDA0003485205780000011
representing the probability that the phoneme is k at time t, the probabilities of all phonemes over a frame add up to 1, i.e.
Figure FDA0003485205780000012
4. The BRNN-based power grid dispatching complex speech recognition method of claim 1, wherein the data set is a data set of multiple types of samples, and the multiple types of samples are composed of speech information of male, female, mandarin, dialect, noisy environment, non-noisy environment, single person, multiple persons, and different age groups.
5. The BRNN-based power grid dispatching complex speech recognition method of claim 4, wherein the recognition method further comprises:
calculating by character comparison according to the recognized text and a standard result obtained by feedback to obtain a character error rate;
and adjusting the proportion of the corresponding class sample in the data set in a positive correlation mode according to the word error rate.
6. The BRNN-based power grid dispatching complex speech recognition method of claim 5, wherein the calculation formula of the word error rate is specifically as follows:
Figure FDA0003485205780000021
wherein, P represents the error rate; n represents the total number of characters of the standard result; s represents the number of characters needing to be replaced; d represents the number of characters needing to be deleted; i denotes the number of characters that need to be inserted.
7. The BRNN-based power grid dispatching complex speech recognition method of claim 1, wherein the fraction of samples in the data set is synchronously adjusted according to the word error rate of multiple types of samples.
8. A BRNN-based power grid dispatching complex speech recognition system is characterized by comprising:
the preprocessing module is used for acquiring original voice information and preprocessing the original voice information to obtain voice data;
the feature extraction module is used for extracting key feature parameters in the voice data to obtain a key feature sequence;
the model building module is used for building an initial acoustic model and training the initial acoustic model by using a data set to obtain a final acoustic model;
the text recognition module is used for inputting the key characteristic sequence into the final acoustic model to obtain a text recognized by the original voice information;
the method is characterized in that an initial acoustic model in the model building module is a BRNN-CTC model which is built in an end-to-end mode by combining a BRNN neural network and a CTC target loss function, and the BRNN neural network is built by taking initials and finals as basic units.
9. A computer terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the program implements the BRNN-based grid scheduling complex speech recognition method according to any of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, the computer program being executable by a processor to implement the BRNN-based grid dispatch complex speech recognition method according to any of claims 1-7.
CN202210078771.7A 2022-01-24 2022-01-24 BRNN-based power grid dispatching complex speech recognition method, system and terminal Pending CN115249479A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210078771.7A CN115249479A (en) 2022-01-24 2022-01-24 BRNN-based power grid dispatching complex speech recognition method, system and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210078771.7A CN115249479A (en) 2022-01-24 2022-01-24 BRNN-based power grid dispatching complex speech recognition method, system and terminal

Publications (1)

Publication Number Publication Date
CN115249479A true CN115249479A (en) 2022-10-28

Family

ID=83697990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210078771.7A Pending CN115249479A (en) 2022-01-24 2022-01-24 BRNN-based power grid dispatching complex speech recognition method, system and terminal

Country Status (1)

Country Link
CN (1) CN115249479A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116631379A (en) * 2023-07-20 2023-08-22 中邮消费金融有限公司 Speech recognition method, device, equipment and storage medium
CN116825109A (en) * 2023-08-30 2023-09-29 深圳市友杰智新科技有限公司 Processing method, device, equipment and medium for voice command misrecognition

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116631379A (en) * 2023-07-20 2023-08-22 中邮消费金融有限公司 Speech recognition method, device, equipment and storage medium
CN116631379B (en) * 2023-07-20 2023-09-26 中邮消费金融有限公司 Speech recognition method, device, equipment and storage medium
CN116825109A (en) * 2023-08-30 2023-09-29 深圳市友杰智新科技有限公司 Processing method, device, equipment and medium for voice command misrecognition
CN116825109B (en) * 2023-08-30 2023-12-08 深圳市友杰智新科技有限公司 Processing method, device, equipment and medium for voice command misrecognition

Similar Documents

Publication Publication Date Title
US11062699B2 (en) Speech recognition with trained GMM-HMM and LSTM models
CN111429889B (en) Method, apparatus, device and computer readable storage medium for real-time speech recognition based on truncated attention
WO2022083083A1 (en) Sound conversion system and training method for same
WO2018227781A1 (en) Voice recognition method, apparatus, computer device, and storage medium
CN110782872A (en) Language identification method and device based on deep convolutional recurrent neural network
CN110246488B (en) Voice conversion method and device of semi-optimized cycleGAN model
CN107408384A (en) The end-to-end speech recognition of deployment
CN109147774B (en) Improved time-delay neural network acoustic model
CN112802448A (en) Speech synthesis method and system for generating new tone
JP2023542685A (en) Speech recognition method, speech recognition device, computer equipment, and computer program
CN115249479A (en) BRNN-based power grid dispatching complex speech recognition method, system and terminal
CN115019776A (en) Voice recognition model, training method thereof, voice recognition method and device
Bhosale et al. End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios.
Ismail et al. Mfcc-vq approach for qalqalahtajweed rule checking
CN111862934A (en) Method for improving speech synthesis model and speech synthesis method and device
Wang et al. Research on speech emotion recognition technology based on deep and shallow neural network
CN110910891A (en) Speaker segmentation labeling method and device based on long-time memory neural network
CN114944150A (en) Dual-task-based Conformer land-air communication acoustic model construction method
CN111862956A (en) Data processing method, device, equipment and storage medium
CN113571095B (en) Speech emotion recognition method and system based on nested deep neural network
CN111090726A (en) NLP-based electric power industry character customer service interaction method
CN112133292A (en) End-to-end automatic voice recognition method for civil aviation land-air communication field
CN115836300A (en) Self-training WaveNet for text-to-speech
Radha et al. Speech and speaker recognition using raw waveform modeling for adult and children’s speech: a comprehensive review
CN111933121B (en) Acoustic model training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination