CN106875949A - A kind of bearing calibration of speech recognition and device - Google Patents

A kind of bearing calibration of speech recognition and device Download PDF

Info

Publication number
CN106875949A
CN106875949A CN201710291330.4A CN201710291330A CN106875949A CN 106875949 A CN106875949 A CN 106875949A CN 201710291330 A CN201710291330 A CN 201710291330A CN 106875949 A CN106875949 A CN 106875949A
Authority
CN
China
Prior art keywords
speech recognition
current application
application scene
result
language material
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710291330.4A
Other languages
Chinese (zh)
Other versions
CN106875949B (en
Inventor
石日俭
贺磊
刘旭
吕晓霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SZBROAD TECHNOLOGY Co Ltd
SSK Corp
Original Assignee
SZBROAD TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZBROAD TECHNOLOGY Co Ltd filed Critical SZBROAD TECHNOLOGY Co Ltd
Priority to CN201710291330.4A priority Critical patent/CN106875949B/en
Publication of CN106875949A publication Critical patent/CN106875949A/en
Application granted granted Critical
Publication of CN106875949B publication Critical patent/CN106875949B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Image Analysis (AREA)

Abstract

Bearing calibration and device the embodiment of the invention discloses a kind of speech recognition, the method include:According to setting testing equipment detection data determine user residing for current application scene;Speech recognition is carried out to the sound for detecting under the current application scene;The language material obtained to speech recognition based on the corresponding deep learning model of the current application scene carries out deep learning, obtains learning outcome;The result of speech recognition is corrected according to the learning outcome.The embodiment of the present invention disclosure satisfy that the requirement of application-specific scene speech recognition, with targetedly speech recognition is carried out to each application scenarios, greatly improve the accuracy of speech recognition, and then promote man-machine interaction, can have wide range of applications.

Description

A kind of bearing calibration of speech recognition and device
Technical field
The present invention relates to voice processing technology, more particularly to a kind of speech recognition bearing calibration and device.
Background technology
With the development of science and technology, the mankind have been enter into artificial intelligence epoch, wisdom and energy of the artificial intelligence for the mankind of extending Power, simulates the thought process and intelligent behavior of the mankind, enables the machine to the competent complexity for generally needing human intelligence to complete Work.One of important branch of artificial intelligence includes speech recognition, character translation and phonetic synthesis, speech recognition technology It is that the voice signal of input is transformed into corresponding text by machine by identification and understanding process, realizes man-machine exchange; Character translation technology is that the word for arriving speech recognition is sentence according to correct syntactic translation;Speech synthesis technique (Text to Speech, abbreviation TTS) be by machine produce or outside input text information be changed into similar mankind's expression way voice simultaneously Output.
At present, the speech recognition technology that University of Science and Technology's news fly, Microsoft, the company such as Google develop is based on having huge cloud number Calculated according to the big data platform of disposal ability, data volume has the characteristics of greatly and extensively, can substantially realize that man-machine language is handed over Mutually, but, identification for the application-specific sentence under application-specific scene and translation are often not accurate enough.
In the bearing calibration of prior art, generally using statistics or the method for machine learning, progressively filtering obtains correction Set.But the process substantially identical that this method is corrected due to lack of targeted, the input to each user, because The accuracy of this correction is not high.For example, the voice " lihua " of different user is received, the correspondence text obtained by initial identification This is " Li Hua ", may all be corrected to " pear flower ", " physics and chemistry " or " fireworks display ", i.e., had more not according to different application scenarios Targetedly obtain correction result.
The content of the invention
The embodiment of the present invention provides bearing calibration and the device of a kind of speech recognition, to solve in the prior art to know voice The inaccurate problem of other calibration of the output results.
In a first aspect, a kind of bearing calibration of speech recognition is the embodiment of the invention provides, including:
According to setting testing equipment detection data determine user residing for current application scene;
Speech recognition is carried out to the sound for detecting under the current application scene;
The language material obtained to speech recognition based on the corresponding deep learning model of the current application scene carries out depth Practise, obtain learning outcome;
The result of speech recognition is corrected according to the learning outcome.
Further, the current application scene according to residing for the detection data for setting testing equipment determines user, bag Include following at least one:
Sound to detecting carries out speech recognition, judges that speech recognition obtains the corresponding application of corpus belonging to language material Scene;
Position where detecting mobile terminal by locating module, obtains the current application scene residing for user;
The feature of application scenarios is detected by bluetooth digital signal processing appts, current application is determined according to the feature Scape.
Further, it is described according to setting testing equipment detection data determine user residing for current application scene it Before, also include:
The corpus under each application scenarios is grouped using clustering algorithm, the result according to the packet extracts language Material feature;
The language material feature is trained, the deep learning model of each application scenarios of correspondence is created.
Further, it is described the result of speech recognition is corrected according to the learning outcome, including:
If the learning outcome is the result of the speech recognition and current application scene mismatched, the voice is known Other calibration of the output results is corresponding result under current application scene.
Further, the corpus includes:The language material of the user input for having stored, the language material by screening and/or school The language material that the result of positive speech recognition is obtained.
Second aspect, the embodiment of the present invention additionally provides a kind of means for correcting of speech recognition, including:
Scene determining module, for the current application residing for determining user according to the detection data of setting testing equipment Scape;
Sound identification module, speech recognition is carried out for the sound to detecting under the current application scene;
Deep learning module, for being obtained to speech recognition based on the corresponding deep learning model of the current application scene Language material carry out deep learning, obtain learning outcome;
Correction module, for being corrected to the result of speech recognition according to the learning outcome.
Further, the scene determining module includes:
First determining unit, for carrying out speech recognition to the sound for detecting, judges that speech recognition is obtained belonging to language material The corresponding application scenarios of corpus;
Second determining unit, the position where for detecting mobile terminal by locating module obtains working as residing for user Preceding application scenarios;
3rd determining unit, the feature for detecting application scenarios by bluetooth digital signal processing appts, according to described Feature determines current application scene.
Further, described device also includes:
Feature extraction unit, for being grouped to the corpus under each application scenarios using clustering algorithm, according to institute The result for stating packet extracts language material feature;
Model creating unit, for being trained to the language material feature, creates the depth of each application scenarios of correspondence Practise model.
Further, the correction module includes:
Correction unit, if for result that the learning outcome is the speech recognition and current application scene not Match somebody with somebody, be corresponding result under current application scene by the calibration of the output results of the speech recognition.
Further, the corpus includes:
The language material of the user input for having stored, the language obtained by the language material and/or the result of correction speech recognition that screen Material.
Bearing calibration and the device of a kind of speech recognition are the embodiment of the invention provides, is determined by obtaining detection data Current application scene, the language material that speech recognition is obtained carries out depth in the corresponding deep learning model of current application scene Practise, pair be corrected with the result of the unmatched speech recognition of current application scene, replace with correct character translation result, energy Enough meet the requirement of application-specific scene speech recognition, with targetedly speech recognition is carried out to each application scenarios, significantly Improve the accuracy of speech recognition, and then promote man-machine interaction, make one with machine can effective communication exchange, improve Consumer's Experience sense, can have wide range of applications.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the bearing calibration of the speech recognition in the embodiment of the present invention one;
Fig. 2 is a kind of flow chart of the bearing calibration of the speech recognition in the embodiment of the present invention two;
Fig. 3 a are a kind of flow charts of the bearing calibration of the speech recognition in the embodiment of the present invention three;
Fig. 3 b are a kind of schematic diagrames of the bearing calibration of the speech recognition in the embodiment of the present invention three;
Fig. 4 is a kind of flow chart of the bearing calibration of the speech recognition in the embodiment of the present invention four;
Fig. 5 is a kind of structural representation of the means for correcting of the speech recognition in the embodiment of the present invention five.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that, in order to just Part rather than entire infrastructure related to the present invention is illustrate only in description, accompanying drawing.
Embodiment one
Fig. 1 is a kind of flow chart of the bearing calibration of speech recognition that the embodiment of the present invention one is provided, and the present embodiment can be fitted Situation for being corrected the result of speech recognition according to current application scene, the method can be by a kind of speech recognition Means for correcting is performed, and the device can be realized by the way of software and/or hardware, be typically integrated in speech recognition work( In the equipment of energy.
The method of the embodiment of the present invention one is specifically included:
S101, the current application scene according to residing for the detection data of setting testing equipment determines user.
The language of China is of extensive knowledge and profound scholarship, speech recognition tool is carried out to Chinese and is acquired a certain degree of difficulty, even only one The difference of speech tone, even or even say that the tone of voice is identical, meaning to be expressed is exactly completely different, institute With, it is necessary to detect the current application scene at user, the application-specific used user according to different application scenarios Language material under scene is identified and judges, makes the final result of speech recognition more accurate.Can using setting testing equipment Current applied environment is detected, so that it is determined that the current application scene at user.
S102, the sound to detecting under the current application scene carry out speech recognition.
Specifically, after the current application scene at user is determined, the sound to detecting carries out voice knowledge Not, the result of speech recognition is obtained, that is, obtains the language material obtained by speech recognition.
S103, the language material obtained to speech recognition based on the corresponding deep learning model of the current application scene carry out depth Degree study, obtains learning outcome.
Specifically, creating the deep learning model of each application scenarios of correspondence first, set up simulation human brain and be analyzed The neutral net of habit, the language material obtained to speech recognition carries out the study and analysis of depth, including semanteme, voice, intonation, linguistic context And grammer etc., whether the PRELIMINARY RESULTS and current application scenarios for judging speech recognition are matchings, judge that speech recognition is obtained To language material whether be accurate.
S104, the result of speech recognition is corrected according to the learning outcome.
Specifically, by after deep learning, if the language material that speech recognition is obtained is inaccurate, to speech recognition Result is corrected, and voice identification result is translated as into correct word, the voice identification result before replacing it.
In the present embodiment, it is first determined the current application scene at user, with reference to current application scene, voice is known The language material not obtained carries out deep learning, if the language material that speech recognition is obtained is inaccurate, according to the knot of deep learning Really, according to current application scene, the result to speech recognition is corrected.For example:The language material of user input is for " programmer is in electricity Code is write before brain ", the reason, the identification of big data speech engine such as the accent that may be sent due to user is nonstandard, word speed is too fast Result is " programmer writes aunt before computer ", and current application scenarios can be determined according to the vocabulary such as " programmer ", " computer " It is the operative scenario of programmer, depth is carried out by the recognition result to big data speech engine in deep learning model Practise, " aunt will be write " and be corrected to " writing code ", obtain correct voice identification result.
A kind of bearing calibration of speech recognition that the embodiment of the present invention one is provided, disclosure satisfy that application-specific scene voice is known Other requirement, with targetedly speech recognition is carried out to each application scenarios, greatly improves the accuracy of speech recognition, enters And promote man-machine interaction, make one with machine can effective communication exchange, improve Consumer's Experience sense, can have a wide range of application It is general.
Embodiment two
Fig. 2 is a kind of flow chart of the bearing calibration of speech recognition that the embodiment of the present invention two is provided, the embodiment of the present invention Two are optimized based on embodiment one, specifically to according to setting testing equipment detection data determine user residing for The operation of current application scene further optimizes, as shown in Fig. 2 the embodiment of the present invention two is specifically included:
S201, the sound to detecting carry out speech recognition, judge that speech recognition obtains the corpus correspondence belonging to language material Application scenarios.
Specifically, collecting and storing the corpus for having mapping relations with each application scenarios, corpus are all collections The set of the language material for arriving, according to the language material of user input, the sound to detecting carries out speech recognition, and with the content of corpus Compare, search and judge that speech recognition obtains the corresponding current application scene of corpus belonging to language material.Can lead to The keyword for collecting application-specific scene is crossed, the mapping relations of the keyword and its application scenarios are set up.For example, collecting dining room The language materials such as all common-use words, the menu name of scape, set up the mapping relations of the language material and dining room application scenarios.
S202, by locating module detect mobile terminal where position, obtain user residing for current application scene.
Specifically, the position in the mobile terminal that can be used by user where the module detection user with positioning function Put, the current application scene according at testing result determines user.Module with positioning function can be fixed using the whole world Position system (Global Positioning System, abbreviation GPS), bluetooth location technology and connection mobile data flow or WLAN carries out the positioning of current application scene by localization methods such as map software positioning.
S203, the feature that application scenarios are detected by bluetooth digital signal processing appts, are determined current according to the feature Application scenarios.
Specifically, the collection of current application scene signals is carried out using the sensor in bluetooth digital signal processing appts, According to the feature of collection signal detection application scenarios, for example, can judge to be by the temperature of temperature sensor detection environment Indoor environment or outdoor environment, the current application scene that user is in is determined with this.
In the present embodiment, the position at global positioning system positioning user can be used, for example:Navigate to user position In some dining room, then can be determined that current application scene for dining room, then the result of speech recognition should have with dining room scene Close.
What deserves to be explained is, above three method is used to determine current application scene, can be selected according to practical situations The method of any one or any two kinds or whole therein is selected to carry out the determination of current application scene.
S204, the sound to detecting under the current application scene carry out speech recognition.
S205, the language material obtained to speech recognition based on the corresponding deep learning model of the current application scene carry out depth Degree study, obtains learning outcome.
S206, the result of speech recognition is corrected according to the learning outcome.
A kind of bearing calibration of speech recognition that the embodiment of the present invention two is provided, can accurately obtain at user Current application scene, speech recognition is targetedly carried out according to current application scene, improves the accuracy of speech recognition, lifting The actual interactive experience of user and product.
Embodiment three
Fig. 3 a are a kind of flow chart of the bearing calibration of speech recognition that the embodiment of the present invention three is provided, the embodiment of the present invention Three are optimized improvement based on the various embodiments described above, residing for determining user according to the detection data of setting testing equipment Current application scene before operation further illustrated, as shown in Figure 3 a, the method for the embodiment of the present invention three is specific Including:
S301, the corpus under each application scenarios is grouped using clustering algorithm, according to the result of the packet Extract language material feature.
Preferably, the corpus includes:The language material of the user input for having stored, the language material by screening and/or correction The language material that the result of speech recognition is obtained.
Specifically, corpus is used as the basic data in deep learning model, can be the user input for having stored Language material, and/or specialty voice technology business according to the language material screened by all kinds of topics, and/or to voice identification result Carry out phonetic synthesis, the language material that analysis and correction phonetic synthesis result are obtained.Use the clustering algorithms pair such as partitioning or stratification Corpus is grouped, and extracts every group of feature of language material.
S302, the language material feature is trained, creates the deep learning model of each application scenarios of correspondence.
Specifically, being input into corpus in a model, the feature of language material is trained by neutral net, simulation human brain The mode of thinking, creates the deep learning model for each application scenarios.For each language material, with reference to its application scenarios, sentence The accuracy of the result of disconnected its speech recognition.
S303, the current application scene according to residing for the detection data of setting testing equipment determines user.
S304, the sound to detecting under the current application scene carry out speech recognition.
S305, the language material obtained to speech recognition based on the corresponding deep learning model of the current application scene carry out depth Degree study, obtains learning outcome.
S306, the result of speech recognition is corrected according to the learning outcome.
In the present embodiment, Fig. 3 b are a kind of schematic diagram of the bearing calibration of speech recognition that the embodiment of the present invention three is provided, With reference to Fig. 3 b, the positioning function of the mobile terminal that can be used by user, bluetooth digital signal processing appts and search defeated The matching application scenarios for entering language material obtain the current geographic position of user jointly, determine the current application scene at user. Classification language material that the user's language material that to store, voice technology business provide and the language material after being corrected to phonetic synthesis result Input to model is trained, and creates the deep learning model of each application scenarios of correspondence.By the voice of big data speech engine The result of identification is input into deep learning model, and according to current application scene, the result to speech recognition carries out error correction, and right Fallibility point is predicted, and the result to the speech recognition of mistake is corrected, and inherited error is replaced with correct translation result Translation result.
A kind of bearing calibration of speech recognition that the embodiment of the present invention three is provided, is made currently by creating deep learning model Application scenarios identification is more accurate, and the judgement of accuracy is carried out so as to the result to speech recognition, corrects inaccurate voice and knows Other result, improves the accuracy of speech recognition.
Example IV
Fig. 4 is a kind of flow chart of the bearing calibration of speech recognition that the embodiment of the present invention four is provided, the embodiment of the present invention Four are optimized improvement based on the various embodiments described above, to carrying out school to the result of speech recognition according to the learning outcome Positive operation is further illustrated, as shown in figure 4, the method for the embodiment of the present invention four is specifically included:
S401, the current application scene according to residing for the detection data of setting testing equipment determines user.
S402, the sound to detecting under the current application scene carry out speech recognition.
S403, the language material obtained to speech recognition based on the corresponding deep learning model of the current application scene carry out depth Degree study, obtains learning outcome.
If S404, the result that the learning outcome is the speech recognition and current application scene are mismatched, will be described The calibration of the output results of speech recognition is corresponding result under current application scene.
Specifically, the result of the speech recognition of checking big data speech engine output and current application scenarios whether Match somebody with somebody, if mismatched, the result to speech recognition is corrected, and is corrected to the result matched with current application scene, and turn over Correct word is translated into, the result of inherited error is replaced.
A kind of bearing calibration of speech recognition that the embodiment of the present invention four is provided, pair knows with the unmatched voice of application scenarios Other result is corrected, and improves the accuracy of speech recognition and translation under application-specific scene, optimizes system logic.
Embodiment five
Fig. 5 is a kind of structural representation of the means for correcting of the speech recognition in the embodiment of the present invention five, the device application In correction and the unmatched voice identification result of application scenarios.As shown in figure 5, device includes:Scene determining module 501, voice Identification module 502, deep learning module 503 and correction module 504.
Scene determining module 501, for according to setting testing equipment detection data determine user residing for current application Scene;
Sound identification module 502, speech recognition is carried out for the sound to detecting under the current application scene;
Deep learning module 503, for based on the corresponding deep learning model of the current application scene to speech recognition The language material for obtaining carries out deep learning, obtains learning outcome;
Correction module 504, for being corrected to the result of speech recognition according to the learning outcome.
The embodiment of the present invention five determines current application scene by obtaining detection data, the language material that speech recognition is obtained Deep learning is carried out in the corresponding deep learning model of current application scene, pair is known with the unmatched voice of current application scene Other result is corrected, and replaces with correct character translation as a result, it is possible to meet the requirement of application-specific scene speech recognition, With targetedly speech recognition is carried out to each application scenarios, the accuracy of speech recognition is greatly improved, and then promoted Man-machine interaction, make one with machine can effective communication exchange, improve Consumer's Experience sense, can have wide range of applications.
On the basis of the various embodiments described above, the scene determining module 501 can include:
First determining unit, for carrying out speech recognition to the sound for detecting, judges that speech recognition is obtained belonging to language material The corresponding application scenarios of corpus;
Second determining unit, the position where for detecting mobile terminal by locating module obtains working as residing for user Preceding application scenarios;
3rd determining unit, the feature for detecting application scenarios by bluetooth digital signal processing appts, according to described Feature determines current application scene.
On the basis of the various embodiments described above, described device can also include:
Feature extraction unit, for being grouped to the corpus under each application scenarios using clustering algorithm, according to institute The result for stating packet extracts language material feature;
Model creating unit, for being trained to the language material feature, creates the depth of each application scenarios of correspondence Practise model.
On the basis of the various embodiments described above, the correction module 504 can include:
Correction unit, if for result that the learning outcome is the speech recognition and current application scene not Match somebody with somebody, be corresponding result under current application scene by the calibration of the output results of the speech recognition.
On the basis of the various embodiments described above, the corpus can include:
The language material of the user input for having stored, the language obtained by the language material and/or the result of correction speech recognition that screen Material.
In the present embodiment, the application scenarios, the second determining unit that are matched with input language material are searched by the first determining unit The method in the geographical position and the 3rd determining unit detection application scenarios feature that position user determines in scene determining module The current application scene that user is in, in sound identification module, the sound to being detected under current application scene is identified, It is identified result.By the language material of stored user input, and/or specialty voice technology business according to by all kinds of topics The language material for screening, and/or phonetic synthesis is carried out to voice identification result, the language that analysis and correction phonetic synthesis result are obtained Material is input into model as the basic data of corpus and is trained, and creates the corresponding deep learning model of each application scenarios, In deep learning module, the language material obtained to speech recognition based on the corresponding deep learning model of current application scene carries out depth Degree study, if learning outcome is mismatched for the result of speech recognition with current application scene, in the correction list of correction module Unit is corrected to the result of speech recognition, and is translated as correct word, replaces original translation result.
A kind of means for correcting of speech recognition that the embodiment of the present invention five is provided, improves the accuracy of speech recognition, promotees Enter the effective communication of man-machine interaction, meanwhile, the logic of speech recognition system being improved, can have wide range of applications.
The executable any embodiment of the present invention of the means for correcting of speech recognition provided in an embodiment of the present invention provides voice and knows The method of other correction, possesses the corresponding functional module of execution method and beneficial effect.
Note, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes, Readjust and substitute without departing from protection scope of the present invention.Therefore, although the present invention is carried out by above example It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also More other Equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.

Claims (10)

1. a kind of bearing calibration of speech recognition, it is characterised in that including:
According to setting testing equipment detection data determine user residing for current application scene;
Speech recognition is carried out to the sound for detecting under the current application scene;
The language material obtained to speech recognition based on the corresponding deep learning model of the current application scene carries out deep learning, obtains Take learning outcome;
The result of speech recognition is corrected according to the learning outcome.
2. method according to claim 1, it is characterised in that the detection data according to setting testing equipment determines to use Current application scene residing for family, including following at least one:
Sound to detecting carries out speech recognition, judges that speech recognition obtains the corresponding applied field of corpus belonging to language material Scape;
Position where detecting mobile terminal by locating module, obtains the current application scene residing for user;
The feature of application scenarios is detected by bluetooth digital signal processing appts, current application scene is determined according to the feature.
3. method according to claim 1, it is characterised in that the detection data according to setting testing equipment determines to use Before current application scene residing for family, also include:
The corpus under each application scenarios is grouped using clustering algorithm, it is special that the result according to the packet extracts language material Levy;
The language material feature is trained, the deep learning model of each application scenarios of correspondence is created.
4. method according to claim 1, it is characterised in that it is described according to the learning outcome to the result of speech recognition It is corrected, including:
If the learning outcome is the result of the speech recognition and current application scene mismatched, by the speech recognition Calibration of the output results is corresponding result under current application scene.
5. method according to claim 3, it is characterised in that the corpus includes:The language of the user input for having stored Material, the language material obtained by the language material and/or the result of correction speech recognition that screen.
6. a kind of means for correcting of speech recognition, it is characterised in that including:
Scene determining module, for according to setting testing equipment detection data determine user residing for current application scene;
Sound identification module, speech recognition is carried out for the sound to detecting under the current application scene;
Deep learning module, for the language obtained to speech recognition based on the corresponding deep learning model of the current application scene Material carries out deep learning, obtains learning outcome;
Correction module, for being corrected to the result of speech recognition according to the learning outcome.
7. device according to claim 6, it is characterised in that the scene determining module includes:
First determining unit, for carrying out speech recognition to the sound for detecting, judges that speech recognition obtains the language belonging to language material The corresponding application scenarios of material collection;
Second determining unit, the position where for detecting mobile terminal by locating module obtains currently should residing for user Use scene;
3rd determining unit, the feature for detecting application scenarios by bluetooth digital signal processing appts, according to the feature Determine current application scene.
8. device according to claim 6, it is characterised in that described device also includes:
Feature extraction unit, for being grouped to the corpus under each application scenarios using clustering algorithm, according to described point The result of group extracts language material feature;
Model creating unit, for being trained to the language material feature, creates the deep learning mould of each application scenarios of correspondence Type.
9. device according to claim 6, it is characterised in that the correction module includes:
Correction unit, if mismatched for result and current application scene that the learning outcome is the speech recognition, will The calibration of the output results of the speech recognition is corresponding result under current application scene.
10. device according to claim 8, it is characterised in that the corpus includes:
The language material of the user input for having stored, the language material obtained by the language material and/or the result of correction speech recognition that screen.
CN201710291330.4A 2017-04-28 2017-04-28 Correction method and device for voice recognition Active CN106875949B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710291330.4A CN106875949B (en) 2017-04-28 2017-04-28 Correction method and device for voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710291330.4A CN106875949B (en) 2017-04-28 2017-04-28 Correction method and device for voice recognition

Publications (2)

Publication Number Publication Date
CN106875949A true CN106875949A (en) 2017-06-20
CN106875949B CN106875949B (en) 2020-09-22

Family

ID=59161656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710291330.4A Active CN106875949B (en) 2017-04-28 2017-04-28 Correction method and device for voice recognition

Country Status (1)

Country Link
CN (1) CN106875949B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293296A (en) * 2017-06-28 2017-10-24 百度在线网络技术(北京)有限公司 Voice identification result correcting method, device, equipment and storage medium
CN107680600A (en) * 2017-09-11 2018-02-09 平安科技(深圳)有限公司 Sound-groove model training method, audio recognition method, device, equipment and medium
CN108831505A (en) * 2018-05-30 2018-11-16 百度在线网络技术(北京)有限公司 The method and apparatus for the usage scenario applied for identification
CN109104534A (en) * 2018-10-22 2018-12-28 北京智合大方科技有限公司 A kind of system for improving outgoing call robot and being intended to Detection accuracy, recall rate
CN109410913A (en) * 2018-12-13 2019-03-01 百度在线网络技术(北京)有限公司 A kind of phoneme synthesizing method, device, equipment and storage medium
CN110544234A (en) * 2019-07-30 2019-12-06 北京达佳互联信息技术有限公司 Image noise detection method, image noise detection device, electronic equipment and storage medium
CN110556127A (en) * 2019-09-24 2019-12-10 北京声智科技有限公司 method, device, equipment and medium for detecting voice recognition result
CN111104546A (en) * 2019-12-03 2020-05-05 珠海格力电器股份有限公司 Method and device for constructing corpus, computing equipment and storage medium
CN111368145A (en) * 2018-12-26 2020-07-03 沈阳新松机器人自动化股份有限公司 Knowledge graph creating method and system and terminal equipment
CN111951626A (en) * 2019-05-16 2020-11-17 上海流利说信息技术有限公司 Language learning apparatus, method, medium, and computing device
CN113660501A (en) * 2021-08-11 2021-11-16 云知声(上海)智能科技有限公司 Method and device for matching subtitles
CN114155841A (en) * 2021-11-15 2022-03-08 安徽听见科技有限公司 Voice recognition method, device, equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0661688A2 (en) * 1993-12-30 1995-07-05 International Business Machines Corporation System and method for location specific speech recognition
CN1282072A (en) * 1999-07-27 2001-01-31 国际商业机器公司 Error correcting method for voice identification result and voice identification system
CN1356628A (en) * 2000-07-05 2002-07-03 国际商业机器公司 Speech recognition correction for equipment wiht limited or no displays
CN1555553A (en) * 2001-09-17 2004-12-15 �ʼҷ����ֵ��ӹɷ����޹�˾ Correcting a text recognized by speech recognition through comparison of phonetic sequences in the recognized text with a phonetic transcription of a manually input correction word
CN102324233A (en) * 2011-08-03 2012-01-18 中国科学院计算技术研究所 Method for automatically correcting identification error of repeated words in Chinese pronunciation identification
CN103645876A (en) * 2013-12-06 2014-03-19 百度在线网络技术(北京)有限公司 Voice inputting method and device
CN103903619A (en) * 2012-12-28 2014-07-02 安徽科大讯飞信息科技股份有限公司 Method and system for improving accuracy of speech recognition
CN105448292A (en) * 2014-08-19 2016-03-30 北京羽扇智信息科技有限公司 Scene-based real-time voice recognition system and method
CN105447019A (en) * 2014-08-20 2016-03-30 北京羽扇智信息科技有限公司 User usage scene based input identification result calibration method and system
CN105786880A (en) * 2014-12-24 2016-07-20 中兴通讯股份有限公司 Voice recognition method, client and terminal device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0661688A2 (en) * 1993-12-30 1995-07-05 International Business Machines Corporation System and method for location specific speech recognition
CN1282072A (en) * 1999-07-27 2001-01-31 国际商业机器公司 Error correcting method for voice identification result and voice identification system
CN1356628A (en) * 2000-07-05 2002-07-03 国际商业机器公司 Speech recognition correction for equipment wiht limited or no displays
CN1555553A (en) * 2001-09-17 2004-12-15 �ʼҷ����ֵ��ӹɷ����޹�˾ Correcting a text recognized by speech recognition through comparison of phonetic sequences in the recognized text with a phonetic transcription of a manually input correction word
CN102324233A (en) * 2011-08-03 2012-01-18 中国科学院计算技术研究所 Method for automatically correcting identification error of repeated words in Chinese pronunciation identification
CN103903619A (en) * 2012-12-28 2014-07-02 安徽科大讯飞信息科技股份有限公司 Method and system for improving accuracy of speech recognition
CN103645876A (en) * 2013-12-06 2014-03-19 百度在线网络技术(北京)有限公司 Voice inputting method and device
CN105448292A (en) * 2014-08-19 2016-03-30 北京羽扇智信息科技有限公司 Scene-based real-time voice recognition system and method
CN105447019A (en) * 2014-08-20 2016-03-30 北京羽扇智信息科技有限公司 User usage scene based input identification result calibration method and system
CN105786880A (en) * 2014-12-24 2016-07-20 中兴通讯股份有限公司 Voice recognition method, client and terminal device

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293296A (en) * 2017-06-28 2017-10-24 百度在线网络技术(北京)有限公司 Voice identification result correcting method, device, equipment and storage medium
CN107293296B (en) * 2017-06-28 2020-11-20 百度在线网络技术(北京)有限公司 Voice recognition result correction method, device, equipment and storage medium
CN107680600A (en) * 2017-09-11 2018-02-09 平安科技(深圳)有限公司 Sound-groove model training method, audio recognition method, device, equipment and medium
WO2019047343A1 (en) * 2017-09-11 2019-03-14 平安科技(深圳)有限公司 Voiceprint model training method, voice recognition method, device and equipment and medium
CN108831505A (en) * 2018-05-30 2018-11-16 百度在线网络技术(北京)有限公司 The method and apparatus for the usage scenario applied for identification
CN109104534A (en) * 2018-10-22 2018-12-28 北京智合大方科技有限公司 A kind of system for improving outgoing call robot and being intended to Detection accuracy, recall rate
US11264006B2 (en) 2018-12-13 2022-03-01 Baidu Online Network Technology (Beijing) Co., Ltd. Voice synthesis method, device and apparatus, as well as non-volatile storage medium
CN109410913A (en) * 2018-12-13 2019-03-01 百度在线网络技术(北京)有限公司 A kind of phoneme synthesizing method, device, equipment and storage medium
CN111368145A (en) * 2018-12-26 2020-07-03 沈阳新松机器人自动化股份有限公司 Knowledge graph creating method and system and terminal equipment
CN111951626A (en) * 2019-05-16 2020-11-17 上海流利说信息技术有限公司 Language learning apparatus, method, medium, and computing device
CN110544234A (en) * 2019-07-30 2019-12-06 北京达佳互联信息技术有限公司 Image noise detection method, image noise detection device, electronic equipment and storage medium
CN110556127A (en) * 2019-09-24 2019-12-10 北京声智科技有限公司 method, device, equipment and medium for detecting voice recognition result
CN111104546A (en) * 2019-12-03 2020-05-05 珠海格力电器股份有限公司 Method and device for constructing corpus, computing equipment and storage medium
CN111104546B (en) * 2019-12-03 2021-08-27 珠海格力电器股份有限公司 Method and device for constructing corpus, computing equipment and storage medium
CN113660501A (en) * 2021-08-11 2021-11-16 云知声(上海)智能科技有限公司 Method and device for matching subtitles
CN114155841A (en) * 2021-11-15 2022-03-08 安徽听见科技有限公司 Voice recognition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN106875949B (en) 2020-09-22

Similar Documents

Publication Publication Date Title
CN106875949A (en) A kind of bearing calibration of speech recognition and device
CN107330011B (en) The recognition methods of the name entity of more strategy fusions and device
CN102243871B (en) Methods and system for grammar fitness evaluation as speech recognition error predictor
CN112329467B (en) Address recognition method and device, electronic equipment and storage medium
CN111488468B (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
CN109920414A (en) Nan-machine interrogation's method, apparatus, equipment and storage medium
US20100299138A1 (en) Apparatus and method for language expression using context and intent awareness
US20090228277A1 (en) Search Aided Voice Recognition
CN106503231B (en) Search method and device based on artificial intelligence
CN104143331B (en) A kind of method and system adding punctuate
CN104360994A (en) Natural language understanding method and natural language understanding system
CN104376065B (en) The determination method and apparatus of term importance
CN106570180A (en) Artificial intelligence based voice searching method and device
JP2005084681A (en) Method and system for semantic language modeling and reliability measurement
CN103956169A (en) Speech input method, device and system
CN103853738A (en) Identification method for webpage information related region
CN106649253B (en) Auxiliary control method and system based on rear verifying
CN109213856A (en) Semantic recognition method and system
CN110674423A (en) Address positioning method and device, readable storage medium and electronic equipment
CN103914455B (en) A kind of interest point search method and device
Lefevre et al. Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation.
CN109710949A (en) A kind of interpretation method and translator
CN110287405A (en) The method, apparatus and storage medium of sentiment analysis
CN106297765A (en) Phoneme synthesizing method and system
CN103246648A (en) Voice input control method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant