CN106710587A - Speech recognition data pre-processing method - Google Patents
Speech recognition data pre-processing method Download PDFInfo
- Publication number
- CN106710587A CN106710587A CN201611184565.5A CN201611184565A CN106710587A CN 106710587 A CN106710587 A CN 106710587A CN 201611184565 A CN201611184565 A CN 201611184565A CN 106710587 A CN106710587 A CN 106710587A
- Authority
- CN
- China
- Prior art keywords
- pronunciation
- model
- standard
- dictionary
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/01—Assessment or evaluation of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1807—Speech classification or search using natural language modelling using prosody or stress
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
- G10L2015/0633—Creating reference templates; Clustering using lexical or orthographic knowledge sources
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/086—Recognition of spelled words
Abstract
The invention provides a speech recognition data pre-processing method; the system comprises a standard audio frequency file organization module, a standard text editing module, a pronunciation dictionary configuration module, a voice model forming module and a standard pronunciation characteristic data identification processing module; the method stores a finally formed standard pronunciation data model to the file system, directly loads a pre-formed data model in an application product so as to identify and grade user recordings, thus solving the problems that an existing method needs to identify the standard pronunciations and identify user recordings, thus causing low efficiency.
Description
Technical field
The present invention relates to field of speech recognition, more particularly to a kind of voice recognition data preprocess method.
Background technology
Speech recognition technology be widely used to the user terminals such as mobile phone, pc in such as:Input method, langue leaning system is searched
Cable system product major part speech recognition application products absolutely are all to gather to use using user terminal in speech recognition technology application
Family recording data, is sent to a kind of voice knowledge that background server is identified pattern such as Publication No. CN103137129 A
Other method and electronic installation, user's service condition that it passes through electronic installation collect user specific information, record the speech of user,
Remote server is set to produce remote speech recognition result of speech of record etc., the shortcoming of this pattern is backstage identifying system
The hardware system support of powerful performance is needed, high cost is built, it is more than one hundred million easily just to substantially meet large user's amount requirements for access
And if user terminal machine must can be calculated energy by user under network connection state using speech recognition application product
Power is used, and can just be significantly reduced hardware cost but the typically no server system of user terminal computing capability is strong, therefore
Need a kind for the treatment of for optimizing and speech recognition calculating being carried out in user terminal, the method for improving recognition efficiency.
The content of the invention
It is an object of the invention to provide one kind can Optimum utilization user terminal computing capability carry out speech recognition scoring effect
Rate and the treatment voice recognition data method that carries out.
Concrete technical scheme is comprised the following steps:
Step 1) organizational standard audio file, arrange the audio file for needing to generate data model;
Step 2) received text is edited, reduction needs the texts such as the literary section of identification scoring, sentence, word;
Step 3) configuration pronunciation dictionary, general pronunciation dictionary or special pronunciation dictionary that configuration this article section needs are used;
Step 4) generation correspondence speech model, correspondence speech model is generated according to above step output file, preserve language
Sound model file;
Step 5) call the speech recognition engine be identified to standard pronunciation characteristic using the speech model of generation
Treatment, generates and preserves standard pronunciation data model;
Step 6) pre-generatmg data model is loaded directly into application product using pre-generatmg data model carries out to user
Recording is identified scoring.
Further, the step 1) it is specifically divided into following steps:
11) because user terminal CPU computing capabilitys are limited, carrying out speech recognition scoring needs identification goal-setting one
Determine in scope and be such as set as a text content for unit;
Further, the step 2) in reduction text the step of it is as follows:An XML configuration file is created, to every
Individual sentence or word create a node all in configuration file, and path and correspondence text are quoted comprising audio file in node
This;
Further, the step 3) in configuration pronunciation dictionary the step of it is as follows:31) completion word or sentence are matched somebody with somebody and are postponed,
Node correspondence for each word configures the pronunciation of pronunciation dictionary, and is associated;32) further, pronunciation dictionary is divided into
Conventional pronunciation dictionary and special pronunciation dictionary, if all words are all in conventional pronunciation dictionary in a literary section, at this moment just not
Need to configure special pronunciation dictionary, the word for being otherwise accomplished by creating to not having in each common dictionary carries out pronunciation mark addition
To special pronunciation dictionary;
Further, the step 4) Plays sound biometric data is as follows the step of generate:Using step 2) middle volume
The standard audio and received text configuration file and step 3 collected) in the pronunciation dictionary that edits use speech recognition engine work
The literary section speech model of tool generation this article section, literary section speech model is for describing user pronunciation space, in identifying user pronunciation
When, speech recognition engine is carried out rapidly and efficiently beta pruning under the conditions of vocabulary is constrained, quickly recognize user pronunciation content;
Further, the step 5) in speech model generation module the step of it is as follows:Speech recognition engine is called, it is incoming
Acoustic model and in step 4) in generation literary section speech model, successively in step 2) in generation configuration file inside each
Node configures word or sentence carries out speech recognition, preserves the audio file identification number that identification engine returns to the configuration of each node
According to local text, so far, the text of each word or sentence standard pronunciation is obtained, pronounced, rhythm, stress, intonation
Feature-based data model to user pronunciation recognize score when only need to the incoming identification engine of data model, recognize engine use
Contrast scoring directly is carried out with standard pronunciation data model after the pronunciation identification of family, mark is obtained without being identified to standard pronunciation
Quasi- sound data model.
The beneficial effects of the present invention are:By implementation steps of the invention, the speech recognition application such as spoken language exercise with
The speech recognition used time in the terminal of family reduces half, and the raising of recognition efficiency allows to be carried out using user terminal computing capability
Identification, without building server system, without network access, user can be obtained in using standalone version speech recognition application
Preferably experience.
Brief description of the drawings
The present invention is described in further detail with reference to accompanying drawing:
Fig. 1 is the FB(flow block) of the application.
Specific embodiment
Below by preferred embodiment shown with reference to the accompanying drawings, the present invention is explained in detail, but the invention is not restricted to
The embodiment.
Step as shown in Figure 1 is as follows, and first three step is resource preparation process:
1 organizational standard audio file, arrangement needs to generate the audio file of data model;Because user terminal CPU is calculated
Energy power restriction, is such as set as a class for unit within the specific limits identification goal-setting to carry out speech recognition scoring needs
Literary content;
2 editor's received texts, reduction needs the texts such as the literary section of identification scoring, sentence, word, creates an XML
Configuration file, a node is created to each sentence or word in configuration file, is quoted comprising audio file in node
Path and correspondence text;
3 configuration pronunciation dictionaries, configuration this article section needs the general pronunciation dictionary or special pronunciation dictionary used, and completes single
, with postponing, the node correspondence for each word configures the pronunciation of pronunciation dictionary, and is associated for word or sentence;Further,
Pronunciation dictionary is divided into conventional pronunciation dictionary and special pronunciation dictionary, if all words are all in conventional pronunciation dictionary a literary section
In, at this moment avoiding the need for configuring special pronunciation dictionary, the word for being otherwise accomplished by creating to not having in each common dictionary is carried out
Pronunciation mark is added to special pronunciation dictionary;
After resource is ready, speech model treatment is carried out:
4 generation correspondence speech models, use the standard audio and received text configuration file and step that are editted in step 2
3) pronunciation dictionary editted in generates the literary section speech model of this article section, literary section speech model using speech recognition engine instrument
It is, for describing user pronunciation space, when identifying user is pronounced, speech recognition engine is carried out under the conditions of vocabulary is constrained fast
Fast efficiently beta pruning, quickly recognizes user pronunciation content;
5 using generation speech model call speech recognition engine carry out to standard pronunciation characteristic be identified treatment,
Generate and preserve standard pronunciation data model;Call speech recognition engine, incoming acoustic model and the literary section language for generating in step 4
Sound model, successively in step 2) in generation configuration file inside each node configuration word or sentence carry out voice knowledge
Not, preserve identification engine and return to the audio file identification data of each node configuration to local text, so far, obtained every
The text of individual word or sentence standard pronunciation, pronunciation, rhythm, stress, the feature-based data model of intonation is recognized to user pronunciation
Need to only recognize engine after user pronunciation is recognized directly with standard pronunciation data model in the incoming identification engine of data model during scoring
Contrast scoring is carried out, standard pronunciation data model is obtained without being identified to standard pronunciation;
6 are loaded directly into pre-generatmg data model using pre-generatmg data model in application product user recording is entered
Row identification scoring;
Voice recognition data method of the invention, including standard audio file organization module, standard text editing module, hair
Sound lexicon configuration module, speech model generation module, standard pronunciation characteristic recognition processing module is preserved and is ultimately generated standard pronunciation
To file system, pre-generatmg data model is loaded directly into application product carries out that user recording is identified to comment data model
Point, solve the problems, such as to need to recognize standard pronunciation in actual applications and then the efficiency most to user recording identification is low.
Above specific embodiment is merely illustrative of the technical solution of the present invention and unrestricted, although with reference to example to this hair
It is bright to be described in detail, it will be understood by those within the art that, technical scheme can be modified
Or equivalent, without deviating from the spirit and scope of technical solution of the present invention, it all should cover in claim of the invention
In the middle of scope.
Claims (6)
1. a kind of voice recognition data preprocess method, it is characterised in that comprise the following steps:
Step 1) organizational standard audio file, arrange the audio file for needing to generate data model;
Step 2) received text is edited, reduction needs the texts such as the literary section of identification scoring, sentence, word;
Step 3) configuration pronunciation dictionary, general pronunciation dictionary or special pronunciation dictionary that configuration this article section needs are used;
Step 4) generation correspondence speech model, correspondence speech model is generated according to above step output file, preserve voice mould
Type file;
Step 5) using generation speech model call speech recognition engine carry out to standard pronunciation characteristic be identified treatment,
Generate and preserve standard pronunciation data model;
Step 6) pre-generatmg data model is loaded directly into application product using pre-generatmg data model carries out to user recording
It is identified scoring.
2. a kind of voice recognition data preprocess method according to claim 1, it is characterised in that:The step 1) in it is whole
The step of reason editor's text, is as follows:
11) because user terminal CPU computing capabilitys are limited, carrying out speech recognition scoring needs identification goal-setting in certain model
Enclose interior and be such as set as a text content for unit.
3. a kind of voice recognition data preprocess method according to claim 1, it is characterised in that:The step 2) in it is whole
The step of reason editor's text, is as follows:An XML configuration file is created, is created in configuration file to each sentence or word
One node, path and correspondence text are quoted in node comprising audio file.
4. a kind of voice recognition data preprocess method according to claim 1, it is characterised in that:The step 3) in match somebody with somebody
The step of putting pronunciation dictionary is as follows:
31) word or sentence are completed with postponing, the node correspondence for each word configures the pronunciation of pronunciation dictionary, and is closed
Connection;
32) further, pronunciation dictionary is divided into conventional pronunciation dictionary and special pronunciation dictionary, if all words in a literary section
All in conventional pronunciation dictionary, at this moment avoid the need for configuring special pronunciation dictionary, be otherwise accomplished by creating to each common dictionary
In the word that does not have carry out pronunciation mark and be added to special pronunciation dictionary.
5. a kind of voice recognition data preprocess method according to claim 1, it is characterised in that:The step 4) acceptance of the bid
The step of quasi- sound biometric data is generated is as follows:Using step 2) in the standard audio that edits and received text configuration file
With step 3) in the pronunciation dictionary that edits the literary section speech model of this article section, literary section language are generated using speech recognition engine instrument
Sound model be for describing user pronunciation space, when identifying user is pronounced, make speech recognition engine constrain vocabulary under the conditions of
Rapidly and efficiently beta pruning is carried out, user pronunciation content is quickly recognized.
6. a kind of voice recognition data preprocess method according to claim 1, it is characterised in that:The step 5) in language
The step of sound model generation module, is as follows:Call speech recognition engine, incoming acoustic model and in step 4) in generation literary section
Speech model, successively in step 2) in generation configuration file inside each node configuration word or sentence carry out voice knowledge
Not, preserve identification engine and return to the audio file identification data of each node configuration to local text, so far, obtained every
The text of individual word or sentence standard pronunciation, pronunciation, rhythm, stress, the feature-based data model of intonation is recognized to user pronunciation
Need to only recognize engine after user pronunciation is recognized directly with standard pronunciation data model in the incoming identification engine of data model during scoring
Contrast scoring is carried out, standard pronunciation data model is obtained without being identified to standard pronunciation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611184565.5A CN106710587A (en) | 2016-12-20 | 2016-12-20 | Speech recognition data pre-processing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611184565.5A CN106710587A (en) | 2016-12-20 | 2016-12-20 | Speech recognition data pre-processing method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106710587A true CN106710587A (en) | 2017-05-24 |
Family
ID=58939302
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611184565.5A Pending CN106710587A (en) | 2016-12-20 | 2016-12-20 | Speech recognition data pre-processing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106710587A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107578778A (en) * | 2017-08-16 | 2018-01-12 | 南京高讯信息科技有限公司 | A kind of method of spoken scoring |
CN109246214A (en) * | 2018-09-10 | 2019-01-18 | 北京奇艺世纪科技有限公司 | A kind of prompt tone acquisition methods, device, terminal and server |
CN112837679A (en) * | 2020-12-31 | 2021-05-25 | 北京策腾教育科技集团有限公司 | Language learning method and system |
US20220301561A1 (en) * | 2019-12-10 | 2022-09-22 | Rovi Guides, Inc. | Systems and methods for local automated speech-to-text processing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101432801A (en) * | 2006-02-23 | 2009-05-13 | 日本电气株式会社 | Speech recognition dictionary making supporting system, speech recognition dictionary making supporting method, and speech recognition dictionary making supporting program |
CN103985392A (en) * | 2014-04-16 | 2014-08-13 | 柳超 | Phoneme-level low-power consumption spoken language assessment and defect diagnosis method |
WO2016053531A1 (en) * | 2014-09-30 | 2016-04-07 | Apple Inc. | A caching apparatus for serving phonetic pronunciations |
US20160133251A1 (en) * | 2013-05-31 | 2016-05-12 | Longsand Limited | Processing of audio data |
-
2016
- 2016-12-20 CN CN201611184565.5A patent/CN106710587A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101432801A (en) * | 2006-02-23 | 2009-05-13 | 日本电气株式会社 | Speech recognition dictionary making supporting system, speech recognition dictionary making supporting method, and speech recognition dictionary making supporting program |
US20160133251A1 (en) * | 2013-05-31 | 2016-05-12 | Longsand Limited | Processing of audio data |
CN103985392A (en) * | 2014-04-16 | 2014-08-13 | 柳超 | Phoneme-level low-power consumption spoken language assessment and defect diagnosis method |
WO2016053531A1 (en) * | 2014-09-30 | 2016-04-07 | Apple Inc. | A caching apparatus for serving phonetic pronunciations |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107578778A (en) * | 2017-08-16 | 2018-01-12 | 南京高讯信息科技有限公司 | A kind of method of spoken scoring |
CN109246214A (en) * | 2018-09-10 | 2019-01-18 | 北京奇艺世纪科技有限公司 | A kind of prompt tone acquisition methods, device, terminal and server |
CN109246214B (en) * | 2018-09-10 | 2022-03-04 | 北京奇艺世纪科技有限公司 | Prompt tone obtaining method and device, terminal and server |
US20220301561A1 (en) * | 2019-12-10 | 2022-09-22 | Rovi Guides, Inc. | Systems and methods for local automated speech-to-text processing |
CN112837679A (en) * | 2020-12-31 | 2021-05-25 | 北京策腾教育科技集团有限公司 | Language learning method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108364632B (en) | Emotional Chinese text voice synthesis method | |
CN109686361B (en) | Speech synthesis method, device, computing equipment and computer storage medium | |
CN104239459B (en) | voice search method, device and system | |
US11475897B2 (en) | Method and apparatus for response using voice matching user category | |
JP5149737B2 (en) | Automatic conversation system and conversation scenario editing device | |
WO2020253509A1 (en) | Situation- and emotion-oriented chinese speech synthesis method, device, and storage medium | |
CN108428446A (en) | Audio recognition method and device | |
CN106710587A (en) | Speech recognition data pre-processing method | |
JP2018146715A (en) | Voice interactive device, processing method of the same and program | |
CN103632663B (en) | A kind of method of Mongol phonetic synthesis front-end processing based on HMM | |
EP3489951B1 (en) | Voice dialogue apparatus, voice dialogue method, and program | |
US9805740B2 (en) | Language analysis based on word-selection, and language analysis apparatus | |
CN106710591A (en) | Voice customer service system for power terminal | |
JP2015049254A (en) | Voice data recognition system and voice data recognition method | |
CN111508466A (en) | Text processing method, device and equipment and computer readable storage medium | |
CN114120985A (en) | Pacifying interaction method, system and equipment of intelligent voice terminal and storage medium | |
US9218807B2 (en) | Calibration of a speech recognition engine using validated text | |
Tsiakoulis et al. | Dialogue context sensitive HMM-based speech synthesis | |
CN110852075B (en) | Voice transcription method and device capable of automatically adding punctuation marks and readable storage medium | |
CN112185341A (en) | Dubbing method, apparatus, device and storage medium based on speech synthesis | |
CN109859746B (en) | TTS-based voice recognition corpus generation method and system | |
CN112242134A (en) | Speech synthesis method and device | |
CN115019787A (en) | Interactive homophonic and heteronym word disambiguation method, system, electronic equipment and storage medium | |
KR102376552B1 (en) | Voice synthetic apparatus and voice synthetic method | |
CN112329484A (en) | Translation method and device for natural language |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170524 |