CN113643695A - Dialect accent mandarin voice recognition optimization method and system - Google Patents

Dialect accent mandarin voice recognition optimization method and system Download PDF

Info

Publication number
CN113643695A
CN113643695A CN202111048340.8A CN202111048340A CN113643695A CN 113643695 A CN113643695 A CN 113643695A CN 202111048340 A CN202111048340 A CN 202111048340A CN 113643695 A CN113643695 A CN 113643695A
Authority
CN
China
Prior art keywords
mandarin
dialect
features
audio
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111048340.8A
Other languages
Chinese (zh)
Other versions
CN113643695B (en
Inventor
杨逸舟
陈海江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lishi Technology Co Ltd
Original Assignee
Zhejiang Lishi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lishi Technology Co Ltd filed Critical Zhejiang Lishi Technology Co Ltd
Priority to CN202111048340.8A priority Critical patent/CN113643695B/en
Publication of CN113643695A publication Critical patent/CN113643695A/en
Application granted granted Critical
Publication of CN113643695B publication Critical patent/CN113643695B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of voice recognition, in particular to a voice recognition optimization method and a voice recognition optimization system for dialect accent Mandarin, wherein a convolution neural network is used for performing convolution and feature extraction on audio contents; learning the characteristics of standard Mandarin through a neural network; after the standard mandarin audio is generated, the features are convoluted again to extract the features, the features are added in the middle features to serve as offsets, the mandarin features and the convolution results in each step are used as the features, the features are added to the end of the convolution layer corresponding to the dialect processing module to serve as the offsets, the obtained parameters are subjected to deconvolution after the convolution layer is passed, original information is amplified, a target audio is generated, and finally the target audio is input into the voice recognition function to be recognized. The method reduces the cost of customizing each dialect special model, and simultaneously amplifies the required characteristics of the standard mandarin by utilizing a mandarin fish and dialect accent characteristic superposition method, thereby further improving the accuracy of voice recognition while generalizing the model difficulty.

Description

Dialect accent mandarin voice recognition optimization method and system
Technical Field
The invention relates to the technical field of voice recognition, in particular to a method and a system for optimizing voice recognition of dialect accent Mandarin.
Background
At present, speech recognition only aims at standard Mandarin, a model of a corresponding dialect is required to be established in a targeted manner for dialect recognition, and a relatively universal dialect accent removal solution is not available. The current scheme aims at establishing relevant speech recognition model modules in a pertinence way in a region with particularly serious dialect accent, so that the interference and the influence of the accent on the speech recognition accuracy are reduced.
In the prior art, each dialect accent recognition model needs to invest a large amount of cost, requires to collect mandarin audio training data with accents, labels the data at the same time, and then performs targeted training on the model. Meanwhile, the number of areas with dialect accents is large, the time cost and the labor cost of the targeted construction model of each area are too large, and the method and the system are not suitable for being applied in an actual scene, so that the dialect accent mandarin speech recognition optimization method and the system are provided for solving the problem.
Disclosure of Invention
Aiming at the defects of the prior art, the invention discloses a method and a system for speech recognition optimization of dialect accent Mandarin, which are used for solving the problems that each dialect accent recognition model needs to invest a large amount of cost, Mandarin audio training data with accents needs to be collected, the data is labeled at the same time, and then the model is trained in a targeted manner. Meanwhile, the number of the regions with dialect accents is large, and the time cost and the labor cost of the targeted construction model of each region are too high, so that the method is not suitable for being applied in an actual scene.
The invention is realized by the following technical scheme:
in a first aspect, the invention discloses a speech recognition optimization method for dialect accent Mandarin, comprising the following steps:
s1, inputting standard Mandarin audio as a model into a Mandarin enhancement module, and performing convolution on audio content by using a convolution neural network to extract features;
s2 deconvolving the extracted features to generate original audio, and learning the features of standard Mandarin through a neural network;
s3, generating standard Mandarin audio, convolving again to extract features, adding the features in the middle as offset, and strengthening the content of Mandarin and the related features of voice tone;
s4, adding the Mandarin Chinese character and the convolution result of each step as character to the end of convolution layer corresponding to the dialect processing module as offset, and carrying out convolution processing;
s5 deconvolves the obtained parameters after the convolution layer, amplifies the original information, generates the target audio, and finally inputs the target audio into the voice recognition function for recognition.
Further, in the method, a Mandarin enhancement module uses a self-encoded model structure that includes a convolution portion and a deconvolution portion.
Furthermore, in the method, the feature extraction of the self-coding model part of the mandarin enhancement module is trained independently, and the basic input is non-pure white noise.
Further, in the method, the dialect processing module adds convolution result parameters from the standard Mandarin module at the end of each layer of convolution based on a self-coding model framework.
Furthermore, in the method, the convolution parameters carried by the dialect processing module include mandarin semantics and related information of intonation and meaning.
Furthermore, the method is based on a convolutional neural network, the training samples are reading audios of the same characters, and each dialect accent audio corresponds to a standard mandarin audio.
In a second aspect, the present invention discloses a speech recognition optimization system for dialect accent mandarin, which is used for implementing the speech recognition optimization method for dialect accent mandarin according to the first aspect, and comprises a dialect accent processing module and a standard mandarin speech enhancement module.
Furthermore, the dialect module is used for extracting features of the dialect accent audio, obtaining enhanced features of the standard mandarin, regenerating the dialect accent and generating a section of standard mandarin audio.
Furthermore, the standard Mandarin enhancement module is used for extracting the characteristics of the standard Mandarin and the text content characteristics, and is used for enhancing the content in the voice process and improving the recognition capability of the voice.
The invention has the beneficial effects that:
compared with the traditional speech recognition algorithm, the method only recognizes the content of the Mandarin Chinese alone, and does not process the accent through some enhancement modules.
The universal dialect accent processing model reduces the cost of customizing each dialect special model, and simultaneously amplifies the required characteristics of the standard mandarin by utilizing a mandarin and dialect accent characteristic superposition method, thereby further improving the accuracy of voice recognition while generalizing the model difficulty.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram of the speech recognition optimization method for dialect accent Mandarin;
FIG. 2 is a schematic diagram of a method for optimizing speech recognition of dialect accent Mandarin.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The embodiment discloses a speech recognition optimization method for dialect accent Mandarin as shown in FIG. 1, which includes the following steps:
s1, inputting standard Mandarin audio as a model into a Mandarin enhancement module, and performing convolution on audio content by using a convolution neural network to extract features;
s2 deconvolving the extracted features to generate original audio, and learning the features of standard Mandarin through a neural network;
s3, generating standard Mandarin audio, convolving again to extract features, adding the features in the middle as offset, and strengthening the content of Mandarin and the related features of voice tone;
s4, adding the Mandarin Chinese character and the convolution result of each step as character to the end of convolution layer corresponding to the dialect processing module as offset, and carrying out convolution processing;
s5 deconvolves the obtained parameters after the convolution layer, amplifies the original information, generates the target audio, and finally inputs the target audio into the voice recognition function for recognition.
In the embodiment, the neural network learns the characteristics of the standard mandarin through the model structure of the mandarin enhancement module self-coding. The characteristic extraction of the self-coding model part of the mandarin enhancement module is trained independently, and the basic input of the training is non-pure white noise.
The dialect processing module of the present embodiment is based on a self-coding model framework, adding convolution result parameters from the standard Mandarin module at the end of each layer of convolution. The dialect processing module promotes the relevant information of the Mandarin Chinese semantics, intonation and meaning in the dialect under the condition of carrying the convolution parameters.
The embodiment is based on a convolutional neural network, the training samples are the same text reading audio, and each dialect accent audio corresponds to a standard mandarin audio.
In this embodiment, an input speech is processed by using an optimization method model to generate an audio without dialect accents, and the result is used as a subsequent input for speech recognition.
The embodiment establishes the relevant voice recognition model module aiming at the area with particularly serious dialect accent, thereby reducing the interference and influence of the accent on the voice recognition accuracy.
Example 2
The embodiment discloses a specific implementation of the speech recognition optimization method for dialect accent mandarin, which is shown in fig. 2 and specifically includes the following steps:
in this embodiment, the mandarin enhancement module is trained separately, the standard mandarin audio is used as the model input, and the convolutional layer 1: the size of the filter is as follows: 4410x2, offset: 441 sample points. And (3) convolutional layer 2: the size of the filter is as follows: 441x2, offset: sample points 40. And (3) convolutional layer: the size of the filter is as follows: 441x2, offset: sample points 40. And carrying out convolution and feature extraction on the audio content by utilizing a convolution neural network.
In the embodiment, the training sample is the same word reading audio, the text length is 15 words, 3 sections of texts are randomly selected, the length of each section of audio is 5 seconds, and each section of dialect accent audio needs to correspond to a section of standard mandarin audio.
In this embodiment, the original audio is generated by using feature deconvolution, wherein a deconvolution layer 2 is selected: the size of the filter is as follows: 4410x2, offset: 441 sample points. Deconvolution layer 1: the size of the filter is as follows: 441x2, offset: sample points 40. And through the self-coding model structure, the neural network can better learn the characteristics of the standard mandarin. When the features can completely generate standard Mandarin audio, the generated audio is convolved to extract features, and the features are added in the middle as offsets, so that the content of Mandarin and the voice tone related features are strengthened.
In this embodiment, the number of samples is trained by selecting 50 segments of the audio of Guangdong Mandarin, Sichuan Mandarin, Hunan Mandarin, Fujian Mandarin, and Beijing Mandarin, wherein the recording accounts for 50 segments of the audio of male and female. Mandarin recording 100 segments, wherein each station of man and woman is 50%.
In the training process, the characteristic extraction of the self-coding model part of the Mandarin enhancement module is trained independently, so that the basic input of the model is not pure white noise, and the training effect of the model at the later stage can be further improved.
After the mandarin chinese enhancement module training is completed, the learning rate is selected: 0.005, when the loss function value is lower than 0.01, the learning rate is reduced by 0.0001, and the training iteration times are as follows: 5000 times. And taking the obtained Mandarin Chinese characteristics and the convolution result of each step as characteristics, adding the characteristics to the end of the convolution layer corresponding to the dialect processing module as an offset, and embedding the characteristics of the Mandarin Chinese into the dialect dialog.
The dialect processing module in this embodiment is also based on a self-coding model framework, a convolution result parameter from a standard mandarin module is added at the end of each layer of convolution, and relevant information of mandarin semantics and intonation and meaning in the dialect is promoted by carrying the convolution parameter.
Wherein selecting a convolution layer 1: the size of the filter is as follows: 4410x2, offset: 441 sample points. And (3) convolutional layer 2: the size of the filter is as follows: 441x2, offset: sample points 40.
After passing through the convolutional layer, the obtained parameters were deconvoluted (choice of deconvolution layer 2: filter size: 4410x2, offset: 441 samples; deconvolution layer 1: filter size: 441x2, offset: 40 samples), and the learning rate was set: 0.005, when the loss function value is less than 0.01, the learning rate is reduced by 0.0001. Training iteration times: 50000 times. And amplifying the original information to generate a target audio. And finally, inputting the target audio into a voice recognition function for recognition.
Compared with the traditional speech recognition algorithm, the embodiment only recognizes the content of the Mandarin Chinese alone, and does not process the accent through some enhancement modules.
The embodiment reduces the cost of customizing each dialect special model, and simultaneously amplifies the required characteristics of the standard mandarin by utilizing a mandarin and dialect accent characteristic superposition method, thereby further improving the accuracy of voice recognition while generalizing the model difficulty.
Example 3
The embodiment discloses a dialect accent mandarin voice recognition optimization system, which comprises a dialect accent processing module and a standard mandarin voice enhancement module.
The dialect module of the embodiment mainly extracts features of dialect accent audio, obtains enhanced features of standard mandarin, regenerates the dialect accent, and generates a section of standard mandarin audio.
The standard mandarin chinese enhancement module of the embodiment is used for extracting features of the standard mandarin chinese and text content features, and is used for enhancing contents in a voice process and improving recognition capability of voice.
The whole system of the embodiment is constructed based on a convolutional neural network, training samples are reading audios with the same characters, the length of each text is 15 characters, 3 sections of texts are randomly selected, the length of each section of audio is 5 seconds, and each section of dialect accent audio needs to correspond to one section of standard mandarin audio.
In this embodiment, the number of samples is trained by selecting 50 segments of the audio of Guangdong Mandarin, Sichuan Mandarin, Hunan Mandarin, Fujian Mandarin, and Beijing Mandarin, wherein the recording accounts for 50 segments of the audio of male and female. Mandarin recording 100 segments, wherein each station of man and woman is 50%.
The standard mandarin module parameters of the embodiment are as follows:
the input audio vector is: 44100x5x 2;
the convolutional layer 1: the size of the filter is as follows: 4410x2, offset: 441 sampling points;
and (3) convolutional layer 2: the size of the filter is as follows: 441x2, offset: 40 sampling points;
and (3) convolutional layer: the size of the filter is as follows: 441x2, offset: 40 sampling points;
deconvolution layer 2: the size of the filter is as follows: 4410x2, offset: 441 sampling points;
deconvolution layer 1: the size of the filter is as follows: 441x2, offset: 40 sampling points;
learning rate: 0.005, when the loss function value is lower than 0.01, the learning rate is reduced by 0.0001;
training iteration times: 5000 times.
The dialect processing module parameters of the embodiment are as follows:
the input audio vector is: 44100x5x 2;
the convolutional layer 1: the size of the filter is as follows: 4410x2, offset: 441 sampling points;
and (3) convolutional layer 2: the size of the filter is as follows: 441x2, offset: 40 sampling points;
deconvolution layer 2: the size of the filter is as follows: 4410x2, offset: 441 sampling points;
deconvolution layer 1: the size of the filter is as follows: 441x2, offset: 40 sampling points;
learning rate: 0.005, when the loss function value is lower than 0.01, the learning rate is reduced by 0.0001;
training iteration times: 50000 times.
The method solves the problems that each dialect accent recognition model needs to invest a large amount of cost, requires to acquire mandarin audio training data with accents, labels the data at the same time, and performs targeted training on the model subsequently. Meanwhile, the number of the regions with dialect accents is large, and the time cost and the labor cost of the targeted construction model of each region are too high, so that the method is not suitable for being applied in an actual scene.
Compared with the traditional speech recognition algorithm, the method only recognizes the content of the Mandarin Chinese alone, and does not process the accent through some enhancement modules.
The universal dialect accent processing model reduces the cost of customizing each dialect special model, and simultaneously amplifies the required characteristics of the standard mandarin by utilizing a mandarin and dialect accent characteristic superposition method, thereby further improving the accuracy of voice recognition while generalizing the model difficulty.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A method for optimizing speech recognition of the dialect accent mandarin, the method comprising the steps of:
s1, inputting standard Mandarin audio as a model into a Mandarin enhancement module, and performing convolution on audio content by using a convolution neural network to extract features;
s2 deconvolving the extracted features to generate original audio, and learning the features of standard Mandarin through a neural network;
s3, generating standard Mandarin audio, convolving again to extract features, adding the features in the middle as offset, and strengthening the content of Mandarin and the related features of voice tone;
s4, adding the Mandarin Chinese character and the convolution result of each step as character to the end of convolution layer corresponding to the dialect processing module as offset, and carrying out convolution processing;
s5 deconvolves the obtained parameters after the convolution layer, amplifies the original information, generates the target audio, and finally inputs the target audio into the voice recognition function for recognition.
2. The method of claim 1, wherein the Mandarin enhancement module uses a self-coding model structure that includes a convolution portion and a deconvolution portion.
3. The method of claim 1, wherein the self-coding model partial feature extraction of the Mandarin enhancement module is trained separately, and the underlying input is non-pure white noise.
4. The method of claim 1, wherein the dialect processing module is based on a self-coding model framework, and convolution result parameters from a standard Mandarin module are added at the end of each layer of convolution.
5. The method as claimed in claim 4, wherein the convolution parameters carried by the dialect processing module include Mandarin semantic and information related to intonation and meaning.
6. The method of claim 1, wherein the training samples are reading audio of the same word based on a convolutional neural network, and each section of dialect accent audio corresponds to a section of standard mandarin audio.
7. A speech recognition optimization system for dialect accent Mandarin, the system being adapted to implement a method for speech recognition optimization of dialect accent Mandarin as claimed in any of claims 1-6, comprising a dialect accent processing module and a standard Mandarin speech enhancement module.
8. The system of claim 7, wherein the dialect module is configured to perform feature extraction on the dialect accent voice frequency, obtain enhanced features of standard mandarin, and regenerate the dialect accent to generate a segment of standard mandarin voice frequency.
9. The system of claim 7, wherein the standard Mandarin enhancement module is configured to extract features of standard Mandarin and text content features, so as to enhance the content of the speech process and improve the speech recognition capability.
CN202111048340.8A 2021-09-08 2021-09-08 Method and system for optimizing voice recognition of dialect accent mandarin Active CN113643695B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111048340.8A CN113643695B (en) 2021-09-08 2021-09-08 Method and system for optimizing voice recognition of dialect accent mandarin

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111048340.8A CN113643695B (en) 2021-09-08 2021-09-08 Method and system for optimizing voice recognition of dialect accent mandarin

Publications (2)

Publication Number Publication Date
CN113643695A true CN113643695A (en) 2021-11-12
CN113643695B CN113643695B (en) 2024-03-08

Family

ID=78425319

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111048340.8A Active CN113643695B (en) 2021-09-08 2021-09-08 Method and system for optimizing voice recognition of dialect accent mandarin

Country Status (1)

Country Link
CN (1) CN113643695B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160314782A1 (en) * 2015-04-21 2016-10-27 Google Inc. Customizing speech-recognition dictionaries in a smart-home environment
CN109065021A (en) * 2018-10-18 2018-12-21 江苏师范大学 The end-to-end dialect identification method of confrontation network is generated based on condition depth convolution
CN115512722A (en) * 2022-10-10 2022-12-23 浙江力石科技股份有限公司 Multi-mode emotion recognition method, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160314782A1 (en) * 2015-04-21 2016-10-27 Google Inc. Customizing speech-recognition dictionaries in a smart-home environment
CN109065021A (en) * 2018-10-18 2018-12-21 江苏师范大学 The end-to-end dialect identification method of confrontation network is generated based on condition depth convolution
CN115512722A (en) * 2022-10-10 2022-12-23 浙江力石科技股份有限公司 Multi-mode emotion recognition method, equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MALYSHA,N: ""Analysis of Nonmodal Phonation using Minimum Entropy Deconvolution"", 《9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING/INTERSPEECH》 *
更藏措毛: ""基于深度神经网络的安多藏语语音识别"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *
潘嘉: "\"深度学习语音识别系统中的自适应方法研究\"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Also Published As

Publication number Publication date
CN113643695B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
CN111382580B (en) Encoder-decoder framework pre-training method for neural machine translation
CN109065021B (en) End-to-end dialect identification method for generating countermeasure network based on conditional deep convolution
CN111862953B (en) Training method of voice recognition model, voice recognition method and device
CN111477216A (en) Training method and system for pronunciation understanding model of conversation robot
CN105261356A (en) Voice recognition system and method
CN109993169A (en) One kind is based on character type method for recognizing verification code end to end
CN114495904B (en) Speech recognition method and device
CN112989008A (en) Multi-turn dialog rewriting method and device and electronic equipment
CN111933113B (en) Voice recognition method, device, equipment and medium
CN110728154A (en) Construction method of semi-supervised general neural machine translation model
CN111933120A (en) Voice data automatic labeling method and system for voice recognition
CN113643695A (en) Dialect accent mandarin voice recognition optimization method and system
CN106682642A (en) Multi-language-oriented behavior identification method and multi-language-oriented behavior identification system
CN111079528A (en) Primitive drawing checking method and system based on deep learning
CN113160796B (en) Language identification method, device and equipment for broadcast audio and storage medium
CN115331703A (en) Song voice detection method and device
CN112241467A (en) Audio duplicate checking method and device
CN114550693A (en) Multilingual voice translation method and system
CN110858268B (en) Method and system for detecting unsmooth phenomenon in voice translation system
CN113658587B (en) Intelligent voice recognition method and system with high recognition rate based on deep learning
CN110399456B (en) Question dialogue completion method and device
CN115905500B (en) Question-answer pair data generation method and device
CN113035247B (en) Audio text alignment method and device, electronic equipment and storage medium
CN113792723B (en) Optimization method and system for identifying stone carving characters
CN111613208B (en) Language identification method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant