CN109686365A - A kind of audio recognition method and speech recognition system - Google Patents

A kind of audio recognition method and speech recognition system Download PDF

Info

Publication number
CN109686365A
CN109686365A CN201811599441.2A CN201811599441A CN109686365A CN 109686365 A CN109686365 A CN 109686365A CN 201811599441 A CN201811599441 A CN 201811599441A CN 109686365 A CN109686365 A CN 109686365A
Authority
CN
China
Prior art keywords
information
voice
error correction
recognition result
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811599441.2A
Other languages
Chinese (zh)
Other versions
CN109686365B (en
Inventor
张云翔
饶竹一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Power Supply Bureau Co Ltd
Original Assignee
Shenzhen Power Supply Bureau Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Power Supply Bureau Co Ltd filed Critical Shenzhen Power Supply Bureau Co Ltd
Priority to CN201811599441.2A priority Critical patent/CN109686365B/en
Publication of CN109686365A publication Critical patent/CN109686365A/en
Application granted granted Critical
Publication of CN109686365B publication Critical patent/CN109686365B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention provides a kind of audio recognition method and system, this method comprises the following steps: S1, the voice messaging with error message for obtaining user's input and the voice error correction information to voice messaging progress error correction, and stored voice message and voice error correction information respectively;S2, it carries out preliminary treatment to voice messaging and voice error correction information, and treated voice messaging and voice error correction information is encoded;S3, respectively according to encoding speech information and voice error correction information coding is counter releases corresponding text information, and the anti-text information released of contrast phone information coding and the anti-text information released of voice error correction information coding obtain the first recognition result;S4, the environmental information that user inputs the voice messaging is obtained, the second recognition result is obtained according to environmental information;S5, the second recognition result and dictinary information are compared, obtains final recognition result.The present invention can quickly identify voice messaging, improve working efficiency.

Description

A kind of audio recognition method and speech recognition system
Technical field
The present invention relates to technical field of voice recognition more particularly to a kind of audio recognition method and speech recognition systems.
Background technique
It is to have accurate definition that speech recognition system, which selects the requirement of Recognition unit, can obtain enough data and be trained, With generality.English generallys use context-sensitive phoneme modeling, and the coarticulation of Chinese is serious not as good as English, can adopt It is modeled with syllable.Training data size needed for system is related with model complexity.Modelling obtains excessively complicated so that super The ability of provided training data is gone out, can performance sharply declined.
In the prior art, it by microphone input voice messaging, re-enters, is unfavorable for if input error can only delete Voice messaging quickly identifies, reduces working efficiency.
Summary of the invention
Technical problem to be solved by the present invention lies in provide one kind when input voice information has mistake, do not deleting In the case where except the voice messaging inputted, the audio recognition method of voice messaging can be quickly identified.
In order to solve the above technical problem, the present invention provides a kind of audio recognition methods, and this method comprises the following steps:
S1, the voice messaging with error message for obtaining user's input and the voice error correction to voice messaging progress error correction Information, and the voice messaging and voice error correction information are stored respectively;
S2, it carries out preliminary treatment to the voice messaging and voice error correction information, and treated voice messaging and voice is entangled Wrong information extraction characteristic information is simultaneously encoded, and encoding speech information and voice error correction information coding are obtained;
S3, corresponding text information, and contrast phone counter released according to encoding speech information and voice error correction information coding respectively The anti-text information released of information coding and voice error correction information encode the anti-text information released, and obtain the first recognition result;
S4, the environmental information that user inputs the voice messaging is obtained, the second recognition result is obtained according to the environmental information;
S5, second recognition result and dictinary information are compared, obtains final recognition result, and will be described final Recognition result is presented to the user.
Wherein, preliminary treatment is carried out to the voice messaging and voice error correction information in the step S2 to specifically include:
The voice messaging and voice error correction information are filtered respectively, and respectively to the voice messaging after filtering processing It is sampled with voice error correction information;
The voice messaging after sampling and the voice error correction information after sampling are encoded respectively, obtain encoding speech information and language Sound error correction information coding.
Wherein, it is specifically included in the step S3:
The encoding speech information and existing acoustic model and speech model are compared, the encoding speech information is obtained To the similar coding of the acoustic model and speech model, and counter to release the encoding speech information corresponding according to similar coding First text information;
The voice error correction information is encoded and is compared with existing acoustic model and speech model, the voice error correction is obtained The similar coding of information coding and the acoustic model and speech model, and the voice error correction letter is released according to similar coding is counter Breath encodes corresponding second text information;
First text information and second text information are compared, highest first text information of similarity is obtained With the second text information, and with second text information replace the first text information in it is similar with second text information Part forms the first recognition result.
Wherein, the acoustic model is Hidden Markov Model.
Wherein, the step S4 is specifically included:
Acquisition user inputs the image of the voice messaging local environment, and identifies the environmental information in image,
The possibility demand that the user is obtained according to the environmental information filters out the second identification knot according to the possible demand Fruit.
Wherein, the step S5 is specifically included:
Second recognition result is compared with dictinary information, the second recognition result for not meeting language format is rejected, obtains Obtain third recognition result;
The recognition result of third recognition result and user's storage is subjected to similarity comparison, and according to similarity from big to small suitable Sequence is arranged, and user is showed.
The present invention provides a kind of speech recognition system, the system comprises:
Acquiring unit, for obtaining the voice messaging with error message of user's input and to voice messaging progress error correction Voice error correction information, and store the voice messaging and voice error correction information respectively;
Processing unit, for carrying out preliminary treatment to the voice messaging and voice error correction information, and voice is believed to treated Breath and voice error correction information are extracted characteristic information and are encoded, and encoding speech information and voice error correction information coding are obtained;
It is counter to push away recognition unit, for releasing corresponding text letter according to encoding speech information and voice error correction information coding are counter respectively Breath, and the anti-text information released of contrast phone information coding and voice error correction information encode the anti-text information released, and obtain First recognition result;
Context awareness unit inputs the environmental information of the voice messaging for obtaining user, is obtained according to the environmental information Second recognition result;
Recognition unit is compared, the recognition result for storing second recognition result with user compares, and obtains final Recognition result, and the final recognition result is presented to the user.
Wherein, the anti-recognition unit that pushes away includes:
First comparison is counter to push away unit, for carrying out pair the encoding speech information and existing acoustic model and speech model Than obtaining the similar coding of the encoding speech information to the acoustic model and speech model, and push away according to similar coding is counter Corresponding first text information of the encoding speech information out;
Second comparison is counter to push away unit, and voice error correction information coding and existing acoustic model and speech model are carried out pair Than obtaining the voice error correction information coding and the similar coding of the acoustic model and speech model, and according to similar coding It is counter to release corresponding second text information of the voice error correction information coding;
Replacement unit is compared, first text information and second text information are compared, obtains similarity highest The first text information and the second text information, and replaced in the first text information with second text information with described second The similar part of text information forms the first recognition result.
The beneficial effect of the embodiment of the present invention is: the present invention by voice messaging to acquisition and voice error correction information into Row coding, and according to encoding speech information and voice error correction information coding respectively obtain it is counter push away text information, compare the two it is anti- Text information is pushed away, counter is pushed away what the high voice error correction information of similarity encoded in corresponding text information replacement encoding speech information Text information obtains environmental information locating for user's input voice information, and believe according to environment to obtain the first recognition result Breath carries out screening to the first recognition result and obtains the second recognition result, by comparing the second recognition result with dictinary information To obtain final recognition result.The audio recognition method of the embodiment of the present invention, when voice inputs and there is mistake, without deleting Except re-entering, be conducive to voice messaging and quickly identify, improves working efficiency.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow diagram of audio recognition method of the embodiment of the present invention.
Fig. 2 is a kind of structural schematic diagram of speech recognition system of the embodiment of the present invention.
Specific embodiment
The explanation of following embodiment be with reference to attached drawing, can be to the specific embodiment implemented to the example present invention.
It is illustrated referring to Fig. 1, the embodiment of the present invention one provides a kind of audio recognition method, and this method includes as follows Step:
S1, the voice messaging with error message for obtaining user's input and the voice error correction to voice messaging progress error correction Information, and the voice messaging and voice error correction information are stored respectively.
Specifically, voice input is carried out by voice input option, there are error message in the voice messaging of the typing, If there is lesser mistake in Input Process, voice error correction typing option is selected to carry out error correction typing, error correction typing only needs The information at the position of typing mistake, error correction typing is voice error correction information, and voice messaging and voice error correction information are carried out respectively Storage.
It illustrates, it is assumed that for it is expected that the voice messaging of typing is " finding nearest gas station ", user is in typing In the process, for some reason, the voice messaging of typing is " finding close gas station ", and user selects voice error correction typing choosing , the voice error correction information of typing is " nearest ".
S2, preliminary treatment is carried out to the voice messaging and voice error correction information, and to treated voice messaging and language Sound error correction information is extracted characteristic information and is encoded, and encoding speech information and voice error correction information coding are obtained.
Specifically, voice messaging and voice error correction information are filtered, eliminate voice messaging and voice error correction letter Noise and echo in breath increase the quality of voice messaging and voice error correction information, entangle to filtered voice messaging and voice Wrong information is sampled, and converts analog signals into digital signal by A/D converter, respectively to the number after voice messaging conversion Digital signal after word signal and the conversion of voice error correction information is encoded and is extracted characteristic information, obtain encoding speech information and Voice error correction information coding.
Characteristic information is frequency cepstral coefficient MFCC feature, and frequency cepstral coefficient MFCC feature is based on the non-of sound frequency The linear transformation of the logarithmic energy line spectrum of linear melscale (Mel scale), is converted to frequency domain for time-domain signal with FFT first, Its logarithmic energy is composed later and carries out convolution with the triangular filter group being distributed according to Mel scale, finally to each filter The vector that output is constituted carries out discrete cosine transform, takes top n coefficient, and PLP still goes to calculate LPC parameter with De Binfa, but When calculating autocorrelation parameter is also the method for carrying out DCT to the logarithmic energy spectrum of auditory stimulus.Voice messaging is carried out just Step processing, so as to promote the quality of voice messaging and error correction voice messaging, facilitates the quality for promoting subsequent identification.
S3, respectively according to encoding speech information and voice error correction information coding is counter releases corresponding text information, and compare The anti-text information released of encoding speech information and voice error correction information encode the anti-text information released, and obtain the first identification knot Fruit.
Specifically, the encoding speech information and existing acoustic model and speech model are compared, described in acquisition The similar coding of encoding speech information and the acoustic model and speech model, and the voice letter is released according to similar coding is counter Breath encodes corresponding first text information;Voice error correction information coding is carried out with existing acoustic model and speech model Comparison obtains the voice error correction information coding and the similar coding of the acoustic model and speech model, and according to similar volume Code is counter to release corresponding second text information of the voice error correction information coding;By first text information and second text Word information compares, and obtains highest first text information of similarity and the second text information, and believed with second text Part similar with second text information in breath the first text information of replacement, forms the first recognition result.
Acoustic model is one of part mostly important in speech recognition system, and current dominant systems mostly use hidden Ma Er Section's husband's model models, and the concept of hidden Markov model is a discrete time-domain finite-state automata, hidden Markov Model HMM refers to that the internal state external world of this Markov model is invisible, and the external world can only see the output valve at each moment, Language model is a simple, unified, abstract formal system, and language objective fact passes through the description of language model, than Electronic computer is more suited to be automatically processed, thus language model has great meaning for the information processing of natural language Justice, by comparing and analyzing, to be combined into the option met, each note has opposite coding, and stored good Acoustic model and language model compare, and then select all similar codings, complete preliminary identification, help to be promoted just The efficiency and quality of identification are walked, the output valve of acoustic model is usually exactly from the calculated acoustic feature of each frame, these are special Sign is exactly the coding of acoustics, and language model be according to language objective fact and the language abstract mathematics that carry out model, and these Feature is exactly the coding of language, so that convenient and collected voice coding carries out cross validation, thus in the result obtained It compares, selects phase knowledge and magnanimity highest, and then text information can be released according to coding is anti-.
For example, obtaining similarity by acoustic model and speech model by taking " finding nearest gas station " as an example Highest coding releases multiple groups text information so as to counter, then passes through the comparison between voice error correction information and voice messaging, into And the highest voice messaging of similarity and voice error correction information can be selected, such as in " finding close gas station " " close " and entangle One of " nearest " in wrong voice messaging is similar to best, so as to be replaced, alternately, it is also possible to identify certainly At " instruction, which is covered into moral frame, swims exhibition ", and error correction voice messaging is identified as " mouth into moral ", which is replaced, then becomes " instruction cover mouth Exhibition is swum into moral frame " as one of alternative, it is also possible to " instruction covers tight-lipped gas station " occur certainly this alternative, it can be seen that, First recognition result is " finding close gas station " either " instruction covers tight-lipped gas station " or " instruction cover mouth swims exhibition into moral frame ".
S4, acquisition user input environmental information locating when the voice messaging, obtain second according to the environmental information Recognition result.
Specifically, the environment photo that periphery situation when user inputs the voice messaging is shot by picture pick-up device, takes the photograph Picture equipment uses high definition image pick-up device, to identify for environment locating at that time, passes through environment locating for identification user Position, and then can substantially judge the demand of user, for example, environment position locating for user may be urban district, highway, Suburb may be office building, cell or the hotel in urban district with the biggish noun of the urban district degree of association, be associated with highway biggish Noun may be gas station, parking lot, garage etc. on highway, and being associated with suburb biggish may be suburb village name Deng.By the location information where identification user, it is hereby achieved that it is associated with biggish noun with corresponding location information, according to Incongruent recognition result obvious in the first recognition result can be rejected by being associated with biggish noun.
For example, still by taking " finding nearest gas station " as an example, by obtaining the photo of user's input voice information, Known to user be on highway at that time, according to be associated with maximum noun with highway may be gas station on highway, parking lot, repair Depot etc., thus " instruction cover mouth swims exhibition into moral frame " in the first recognition result can be rejected, to obtain the second recognition result " finding close gas station " or " instruction covers tight-lipped gas station ".
S5, the dictinary information of second recognition result and storage is compared, obtains final recognition result, and will The final recognition result is presented to the user.
By comparing the dictinary information of the second recognition result and storage, the identification of language rule will not obviously be met Result information is deleted, so that final recognition result is obtained, by the passing identification information of final recognition result and user's storage It compares, obtains the similarity of each final recognition result, show institute to user according to the sequence of similarity from big to small Final recognition result is stated, the final recognition result is inquired convenient for user, to select the expected identification knot of user Fruit improves the efficiency and quality of identification.
It after user has selected final recognition result, is played out by loudspeaker, correct recognition result is carried out Storage facilitates and reminds other staff, to determine recognition result again, recognition result is stored, consequently facilitating being expanded It fills, is used convenient for users to next time.
A kind of audio recognition method of the embodiment of the present invention is carried out by voice messaging to acquisition and voice error correction information Coding, and according to encoding speech information and voice error correction information coding obtain respectively it is counter push away text information, compare counter pushing away for the two Text information counter pushes away text for what the high voice error correction information of similarity encoded in corresponding text information replacement encoding speech information Word information obtains environmental information locating for user's input voice information, and according to environmental information to obtain the first recognition result Screening is carried out to the first recognition result and obtains the second recognition result, by by the second recognition result and dictinary information compare from And obtain final recognition result.The audio recognition method of the embodiment of the present invention, when voice inputs and there is mistake, without deleting It re-enters, is conducive to voice messaging and quickly identifies, improve working efficiency.
Based on the embodiment of the present invention one, second embodiment of the present invention provides a kind of speech recognition systems, as shown in Fig. 2, this is System 1 includes:
Acquiring unit 11, for obtaining the voice messaging of user's input and carrying out the voice error correction of error correction to the voice messaging of input Information, and the voice messaging and voice error correction information are stored respectively;
Processing unit 12, for carrying out preliminary treatment to the voice messaging and voice error correction information, and to treated voice Information and voice error correction information are extracted characteristic information and are encoded, and encoding speech information and voice error correction information coding are obtained;
It is counter to push away recognition unit 13, for counter releasing corresponding text according to encoding speech information and voice error correction information coding respectively Information, and the anti-text information released of contrast phone information coding and voice error correction information encode the anti-text information released, and obtain Obtain the first recognition result;
Context awareness unit 14 inputs the environmental information of the voice messaging for obtaining user, is picked according to the environmental information Except in the preliminary recognition result with the incoherent recognition result of the environmental information, obtain the second recognition result;
Recognition unit 15 is compared, the recognition result for storing second recognition result with user compares, and obtains most Whole recognition result, and the final recognition result is presented to the user.
Wherein, the anti-recognition unit 13 that pushes away includes:
First comparison is counter to push away unit, for carrying out pair the encoding speech information and existing acoustic model and speech model Than obtaining the similar coding of the encoding speech information to the acoustic model and speech model, and push away according to similar coding is counter Corresponding first text information of the encoding speech information out;
Second comparison is counter to push away unit, and voice error correction information coding and existing acoustic model and speech model are carried out pair Than obtaining the voice error correction information coding and the similar coding of the acoustic model and speech model, and according to similar coding It is counter to release corresponding second text information of the voice error correction information coding;
Replacement unit is compared, first text information and second text information are compared, obtains similarity highest The first text information and the second text information, and replaced in the first text information with second text information with described second The similar part of text information forms the first recognition result.
The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainly It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.

Claims (8)

1. a kind of audio recognition method, which comprises the steps of:
S1, the voice messaging with error message for obtaining user's input and the voice error correction to voice messaging progress error correction Information, and the voice messaging and voice error correction information are stored respectively;
S2, it carries out preliminary treatment to the voice messaging and voice error correction information, and treated voice messaging and voice is entangled Wrong information extraction characteristic information is simultaneously encoded, and encoding speech information and voice error correction information coding are obtained;
S3, corresponding text information, and contrast phone counter released according to encoding speech information and voice error correction information coding respectively The anti-text information released of information coding and voice error correction information encode the anti-text information released, and obtain the first recognition result;
S4, acquisition user input environmental information locating when the voice messaging, obtain the second identification according to the environmental information As a result;
S5, second recognition result and dictinary information are compared, obtains final recognition result, and will be described final Recognition result is presented to the user.
2. the method according to claim 1, wherein to the voice messaging and voice error correction in the step S2 Information carries out preliminary treatment and specifically includes:
The voice messaging and voice error correction information are filtered respectively, and respectively to the voice messaging after filtering processing It is sampled with voice error correction information;
The voice messaging after sampling and the voice error correction information after sampling are encoded respectively, obtain encoding speech information and language Sound error correction information coding.
3. according to the method described in claim 2, it is characterized in that, being specifically included in the step S3:
The encoding speech information and existing acoustic model and speech model are compared, the encoding speech information is obtained To the similar coding of the acoustic model and speech model, and counter to release the encoding speech information corresponding according to similar coding First text information;
The voice error correction information is encoded and is compared with existing acoustic model and speech model, the voice error correction is obtained The similar coding of information coding and the acoustic model and speech model, and the voice error correction letter is released according to similar coding is counter Breath encodes corresponding second text information;
First text information and second text information are compared, highest first text information of similarity is obtained With the second text information, and with second text information replace the first text information in it is similar with second text information Part forms the first recognition result.
4. according to the method described in claim 3, it is characterized by:
The acoustic model is Hidden Markov Model.
5. according to the method described in claim 4, it is characterized in that, the step S4 is specifically included:
Acquisition user inputs the image of the voice messaging local environment, and identifies the environmental information in image,
The possibility demand that the user is obtained according to the environmental information filters out the second identification knot according to the possible demand Fruit.
6. according to the method described in claim 5, it is characterized in that, the step S5 is specifically included:
Second recognition result is compared with dictinary information, the second recognition result for not meeting language format is rejected, obtains Obtain third recognition result;
The recognition result of third recognition result and user's storage is subjected to similarity comparison, and according to similarity from big to small suitable Sequence is arranged, and user is showed.
7. a kind of speech recognition system, which is characterized in that the system comprises:
Acquiring unit, for obtaining the voice messaging with error message of user's input and to voice messaging progress error correction Voice error correction information, and store the voice messaging and voice error correction information respectively;
Processing unit, for carrying out preliminary treatment to the voice messaging and voice error correction information, and voice is believed to treated Breath and voice error correction information are extracted characteristic information and are encoded, and encoding speech information and voice error correction information coding are obtained;
It is counter to push away recognition unit, for releasing corresponding text letter according to encoding speech information and voice error correction information coding are counter respectively Breath, and the anti-text information released of contrast phone information coding and voice error correction information encode the anti-text information released, and obtain First recognition result;
Context awareness unit inputs the environmental information of the voice messaging for obtaining user, is obtained according to the environmental information Second recognition result;
Recognition unit is compared, the recognition result for storing second recognition result with user compares, and obtains final Recognition result, and the final recognition result is presented to the user.
8. system according to claim 7, which is characterized in that the anti-recognition unit that pushes away includes:
First comparison is counter to push away unit, for carrying out pair the encoding speech information and existing acoustic model and speech model Than obtaining the similar coding of the encoding speech information to the acoustic model and speech model, and push away according to similar coding is counter Corresponding first text information of the encoding speech information out;
Second comparison is counter to push away unit, and voice error correction information coding and existing acoustic model and speech model are carried out pair Than obtaining the voice error correction information coding and the similar coding of the acoustic model and speech model, and according to similar coding It is counter to release corresponding second text information of the voice error correction information coding;
Replacement unit is compared, first text information and second text information are compared, obtains similarity highest The first text information and the second text information, and replaced in the first text information with second text information with described second The similar part of text information forms the first recognition result.
CN201811599441.2A 2018-12-26 2018-12-26 Voice recognition method and voice recognition system Active CN109686365B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811599441.2A CN109686365B (en) 2018-12-26 2018-12-26 Voice recognition method and voice recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811599441.2A CN109686365B (en) 2018-12-26 2018-12-26 Voice recognition method and voice recognition system

Publications (2)

Publication Number Publication Date
CN109686365A true CN109686365A (en) 2019-04-26
CN109686365B CN109686365B (en) 2021-07-13

Family

ID=66188586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811599441.2A Active CN109686365B (en) 2018-12-26 2018-12-26 Voice recognition method and voice recognition system

Country Status (1)

Country Link
CN (1) CN109686365B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334271A (en) * 2019-05-21 2019-10-15 北京奇艺世纪科技有限公司 A kind of search result optimization method, system, electronic equipment and storage medium
CN111356022A (en) * 2020-04-18 2020-06-30 徐琼琼 Video file processing method based on voice recognition
CN111524511A (en) * 2020-04-01 2020-08-11 黑龙江省农业科学院农业遥感与信息研究所 Agricultural technology consultation man-machine conversation method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951077A (en) * 2015-06-24 2015-09-30 百度在线网络技术(北京)有限公司 Man-machine interaction method and device based on artificial intelligence and terminal equipment
CN105206260A (en) * 2015-08-31 2015-12-30 努比亚技术有限公司 Terminal voice broadcasting method, device and terminal voice operation method
CN105374356A (en) * 2014-08-29 2016-03-02 株式会社理光 Speech recognition method, speech assessment method, speech recognition system, and speech assessment system
CN107818781A (en) * 2017-09-11 2018-03-20 远光软件股份有限公司 Intelligent interactive method, equipment and storage medium
CN107993653A (en) * 2017-11-30 2018-05-04 南京云游智能科技有限公司 The incorrect pronunciations of speech recognition apparatus correct update method and more new system automatically
CN108595412A (en) * 2018-03-19 2018-09-28 百度在线网络技术(北京)有限公司 Correction processing method and device, computer equipment and readable medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105374356A (en) * 2014-08-29 2016-03-02 株式会社理光 Speech recognition method, speech assessment method, speech recognition system, and speech assessment system
CN104951077A (en) * 2015-06-24 2015-09-30 百度在线网络技术(北京)有限公司 Man-machine interaction method and device based on artificial intelligence and terminal equipment
CN105206260A (en) * 2015-08-31 2015-12-30 努比亚技术有限公司 Terminal voice broadcasting method, device and terminal voice operation method
CN107818781A (en) * 2017-09-11 2018-03-20 远光软件股份有限公司 Intelligent interactive method, equipment and storage medium
CN107993653A (en) * 2017-11-30 2018-05-04 南京云游智能科技有限公司 The incorrect pronunciations of speech recognition apparatus correct update method and more new system automatically
CN108595412A (en) * 2018-03-19 2018-09-28 百度在线网络技术(北京)有限公司 Correction processing method and device, computer equipment and readable medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334271A (en) * 2019-05-21 2019-10-15 北京奇艺世纪科技有限公司 A kind of search result optimization method, system, electronic equipment and storage medium
CN110334271B (en) * 2019-05-21 2022-01-11 北京奇艺世纪科技有限公司 Search result optimization method and system, electronic device and storage medium
CN111524511A (en) * 2020-04-01 2020-08-11 黑龙江省农业科学院农业遥感与信息研究所 Agricultural technology consultation man-machine conversation method and system
CN111356022A (en) * 2020-04-18 2020-06-30 徐琼琼 Video file processing method based on voice recognition

Also Published As

Publication number Publication date
CN109686365B (en) 2021-07-13

Similar Documents

Publication Publication Date Title
CN110097894B (en) End-to-end speech emotion recognition method and system
WO2021208287A1 (en) Voice activity detection method and apparatus for emotion recognition, electronic device, and storage medium
US11514891B2 (en) Named entity recognition method, named entity recognition equipment and medium
CN107945805B (en) A kind of across language voice identification method for transformation of intelligence
CN106057206B (en) Sound-groove model training method, method for recognizing sound-groove and device
CN107369439B (en) Voice awakening method and device
US8949125B1 (en) Annotating maps with user-contributed pronunciations
CN110827801A (en) Automatic voice recognition method and system based on artificial intelligence
US20140365221A1 (en) Method and apparatus for speech recognition
CN113488058B (en) Voiceprint recognition method based on short voice
WO2009075990A1 (en) Grapheme-to-phoneme conversion using acoustic data
CN109686365A (en) A kind of audio recognition method and speech recognition system
Ismail et al. Mfcc-vq approach for qalqalahtajweed rule checking
CN111862952B (en) Dereverberation model training method and device
WO2023030235A1 (en) Target audio output method and system, readable storage medium, and electronic apparatus
CN111667834A (en) Hearing-aid device and hearing-aid method
CN108364655A (en) Method of speech processing, medium, device and computing device
CN113782026A (en) Information processing method, device, medium and equipment
KR20170086233A (en) Method for incremental training of acoustic and language model using life speech and image logs
CN113516987B (en) Speaker recognition method, speaker recognition device, storage medium and equipment
CN109346104A (en) A kind of audio frequency characteristics dimension reduction method based on spectral clustering
CN114724589A (en) Voice quality inspection method and device, electronic equipment and storage medium
CN113409774A (en) Voice recognition method and device and electronic equipment
CN115700880A (en) Behavior monitoring method and device, electronic equipment and storage medium
CN112951237A (en) Automatic voice recognition method and system based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant