CN109389970A - A kind of speech analysis recognition methods - Google Patents
A kind of speech analysis recognition methods Download PDFInfo
- Publication number
- CN109389970A CN109389970A CN201811616352.4A CN201811616352A CN109389970A CN 109389970 A CN109389970 A CN 109389970A CN 201811616352 A CN201811616352 A CN 201811616352A CN 109389970 A CN109389970 A CN 109389970A
- Authority
- CN
- China
- Prior art keywords
- signal
- voice
- speech
- sound
- voice signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a kind of speech analysis recognition methods, and specifically include following operating procedure: S1: voice training, input speech signal is identified under quiet environment when first used, and analyze record sound characteristic parameter as target sound parameter;S2: user voice information is obtained, primary speech signal is calculated as;S3: processing is extracted to primary speech signal, and identifies sound characteristic parameter, the acoustic information of target sound characteristic parameter is extracted;S4: the acoustic information after extraction process is identified.The ambient noise that the present invention is inputted by intercepting the ambient enviroment voice signal inputted in the closest time with voice as this voice, and the ambient noise signal is removed in subsequent voice analysis processing, improve the accuracy of echo signal, by the way that targeted voice signal is carried out segment store, be conducive to carry out delay process to a large amount of targeted voice signal, avoid the problem that disposably inputting too long voice and leading to not Correct Analysis identification.
Description
Technical field
The invention belongs to technical field of voice recognition, and in particular to a kind of speech analysis recognition methods.
Background technique
Speech recognition is a cross discipline.In the latest 20 years, speech recognition technology obtains marked improvement, starts from experiment
Move towards market in room.It is contemplated that speech recognition technology will enter industry, household electrical appliances, communication, automotive electronics, doctor in coming 10 years
The every field such as treatment, home services, consumption electronic product.Voice letter of the existing audio recognition method in the case where handling complex environment
Number when, there is a problem of targeted voice signal obtain inaccuracy, especially under conditions of thering is third party to be simultaneously emitted by voice, nothing
Method distinguishes targeted voice signal, poor to the extraction accuracy of targeted voice signal, and in input voice every time, surrounding ring
Border noise has the characteristics that change in different periods, can also interfere to the acquisition of targeted voice signal.
Summary of the invention
The purpose of the present invention is to provide a kind of speech analysis recognition methods, to solve mentioned above in the background art ask
Topic.
To achieve the above object, the invention provides the following technical scheme: a kind of speech analysis recognition methods, specifically include as
Lower operating procedure:
S1: voice training, input speech signal is identified under quiet environment when first used, and analyzes record sound characteristic
Parameter is as target sound parameter;
S2: obtaining user voice information, be calculated as primary speech signal, with the storage of wav formatted file;
S3: processing is extracted to primary speech signal, sound frequency is analyzed, and identify sound characteristic parameter, to target sound
The acoustic information of characteristic parameter extracts;
S4: the acoustic information after extraction process is identified, and by recognition result and speech database and grammar database into
Row comparison, obtains speech recognition result.
Preferably, primary speech signal includes targeted voice signal, ambient noise signal and third party's voice in the S2
Signal, wherein the characteristic parameter of ambient noise signal is ambient noise extracts gained in 0.3-0.5 seconds before targeted voice signal.
Preferably, original Speech processing includes the frequency wave for analyzing primary speech signal in the S3, extracts sound
Parameters,acoustic, including ambient noise signal feature, calculating primary speech signal feature, identification targeted voice signal feature and identification
Third party's phonetic feature, the influence of wiping out background noise, and enhancing processing is carried out to targeted voice signal, improve target language message
Number identifiability, improve the accuracy of targeted voice signal.
Preferably, preferred that segmentation dividing processing is carried out to voice signal when being identified to targeted voice signal, and will
Sound bite after segmentation saves in order, while identifying according to preservation sequence to sound bite, and recognition result is pressed
Sequential output.
Technical effect and advantage of the invention:
The present invention passes through surrounding's ring before user inputs voice every time, in the interception time closest with voice input time
The ambient noise that border voice signal is inputted as this voice, and remove the ambient noise signal in subsequent voice analysis processing
It removes, the acquisition accuracy of echo signal can be effectively improved, reduce the interference of environmental noise, by believing the sound in original signal
Number carry out audio parameter signature analysis, obtain targeted voice signal, can be by the way that targeted voice signal be carried out segment store, favorably
In carrying out delay process to a large amount of targeted voice signal, the capacity of speech recognition is improved, avoids disposably inputting too long voice
And the problem of leading to not Correct Analysis identification.
Specific embodiment
Below in conjunction with the embodiment of the present invention, technical scheme in the embodiment of the invention is clearly and completely described,
Obviously, described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based in the present invention
Embodiment, every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all
Belong to the scope of protection of the invention.
A kind of speech analysis recognition methods, specifically includes following operating procedure:
S1: voice training, input speech signal is identified under quiet environment when first used, and analyzes record sound characteristic
Parameter is as target sound parameter;
S2: obtaining user voice information, be calculated as primary speech signal, with the storage of wav formatted file;
S3: processing is extracted to primary speech signal, sound frequency is analyzed, and identify sound characteristic parameter, to target sound
The acoustic information of characteristic parameter extracts;
S4: the acoustic information after extraction process is identified, and by recognition result and speech database and grammar database into
Row comparison, obtains speech recognition result.
Primary speech signal includes targeted voice signal, ambient noise signal and third party's voice signal in the S2,
The characteristic parameter of middle ambient noise signal is ambient noise extracts gained in 0.3-0.5 seconds before targeted voice signal.When user exists
When open air uses speech recognition system, in external environment there are certain environmental noise can together with the voice messaging of user it is defeated
Enter, therefore, surrounding's ring before user inputs voice every time, in the interception 0.3-0.5 second closest with voice input time
The ambient noise that border voice signal is inputted as this voice, and remove the ambient noise signal in subsequent voice analysis processing
It removes, the acquisition accuracy of echo signal can be effectively improved, reduce the interference of environmental noise.
Original Speech processing includes the frequency wave for analyzing primary speech signal in the S3, extracts the acoustics ginseng of sound
Number, including ambient noise signal feature, calculating primary speech signal feature, identification targeted voice signal feature and identification third party
Phonetic feature, the influence of wiping out background noise, and carry out enhancing processing to targeted voice signal, that improves targeted voice signal can
Identity improves the accuracy of targeted voice signal.Each primary speech signal is subjected to analysis extraction, obtain and is deposited in advance
The targeted voice signal that voice signal similar in the audio parameter feature of targeted voice signal is the input of this voice is stored up, and to this
Targeted voice signal is analyzed and processed, and third-party voice signal is certain poor with the presence of pre-stored targeted voice signal
It is different, therefore, ignore third party's voice signal automatically, improves the precision of targeted voice signal.
It is preferred that segmentation dividing processing is carried out to voice signal when being identified to targeted voice signal, and will be after segmentation
Sound bite save in order, while sound bite is identified according to preservation sequence, and recognition result is defeated in order
Out.It can be conducive to carry out delay process to a large amount of targeted voice signal, mention by the way that targeted voice signal is carried out segment store
The capacity of high speech recognition avoids the problem that disposably inputting too long voice and leading to not Correct Analysis identification.
The present invention passes through the week before user inputs voice every time, in the interception time closest with voice input time
The ambient noise that environmental sound signal is inputted as this voice is enclosed, and believes the ambient noise in subsequent voice analysis processing
Number removal, can effectively improve the acquisition accuracy of echo signal, reduce the interference of environmental noise, by by the sound in original signal
Sound signal carry out audio parameter signature analysis, obtain targeted voice signal, can by the way that targeted voice signal is carried out segment store,
Be conducive to carry out delay process to a large amount of targeted voice signal, improve the capacity of speech recognition, avoid disposably inputting too long
Voice and lead to not Correct Analysis identification the problem of.
Finally, it should be noted that the foregoing is only a preferred embodiment of the present invention, it is not intended to restrict the invention,
Although the present invention is described in detail referring to the foregoing embodiments, for those skilled in the art, still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features,
All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention
Within protection scope.
Claims (4)
1. a kind of speech analysis recognition methods, which is characterized in that specifically include following operating procedure:
S1: voice training, input speech signal is identified under quiet environment when first used, and analyzes record sound characteristic
Parameter is as target sound parameter;
S2: obtaining user voice information, be calculated as primary speech signal, with the storage of wav formatted file;
S3: processing is extracted to primary speech signal, sound frequency is analyzed, and identify sound characteristic parameter, to target sound
The acoustic information of characteristic parameter extracts;
S4: the acoustic information after extraction process is identified, and by recognition result and speech database and grammar database into
Row comparison, obtains speech recognition result.
2. a kind of speech analysis recognition methods according to claim 1, it is characterised in that: primary speech signal in the S2
Including targeted voice signal, ambient noise signal and third party's voice signal, wherein the characteristic parameter of ambient noise signal is mesh
Ambient noise extracts gained in 0.3-0.5 seconds before poster sound signal.
3. a kind of speech analysis recognition methods according to claim 1, it is characterised in that: primary speech signal in the S3
Processing includes the frequency wave of analysis primary speech signal, extracts the parameters,acoustic of sound, including ambient noise signal feature, calculating original
Beginning phonic signal character, identification targeted voice signal feature and identification third party's phonetic feature, the influence of wiping out background noise, and
Enhancing processing is carried out to targeted voice signal.
4. a kind of speech analysis recognition methods according to claim 3, it is characterised in that: carried out to targeted voice signal
When identification, first choice carries out segmentation dividing processing to voice signal, and the sound bite after segmentation is saved in order, while according to
Preservation sequence identifies sound bite, and recognition result is exported in order.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811616352.4A CN109389970A (en) | 2018-12-28 | 2018-12-28 | A kind of speech analysis recognition methods |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811616352.4A CN109389970A (en) | 2018-12-28 | 2018-12-28 | A kind of speech analysis recognition methods |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109389970A true CN109389970A (en) | 2019-02-26 |
Family
ID=65430770
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811616352.4A Pending CN109389970A (en) | 2018-12-28 | 2018-12-28 | A kind of speech analysis recognition methods |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109389970A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110021292A (en) * | 2019-04-23 | 2019-07-16 | 四川长虹空调有限公司 | Method of speech processing, device and smart home device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020023020A1 (en) * | 1999-09-21 | 2002-02-21 | Kenyon Stephen C. | Audio identification system and method |
CN102592595A (en) * | 2012-03-19 | 2012-07-18 | 安徽科大讯飞信息科技股份有限公司 | Voice recognition method and system |
CN103198829A (en) * | 2013-02-25 | 2013-07-10 | 惠州市车仆电子科技有限公司 | Method, device and equipment of reducing interior noise and improving voice recognition rate |
CN103594092A (en) * | 2013-11-25 | 2014-02-19 | 广东欧珀移动通信有限公司 | Single microphone voice noise reduction method and device |
CN105225665A (en) * | 2015-10-15 | 2016-01-06 | 桂林电子科技大学 | A kind of audio recognition method and speech recognition equipment |
CN106548781A (en) * | 2015-09-21 | 2017-03-29 | 上海日趋信息技术有限公司 | A kind of method for eliminating background noise for speech recognition system |
CN107799115A (en) * | 2016-08-29 | 2018-03-13 | 法乐第(北京)网络科技有限公司 | A kind of audio recognition method and device |
-
2018
- 2018-12-28 CN CN201811616352.4A patent/CN109389970A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020023020A1 (en) * | 1999-09-21 | 2002-02-21 | Kenyon Stephen C. | Audio identification system and method |
CN102592595A (en) * | 2012-03-19 | 2012-07-18 | 安徽科大讯飞信息科技股份有限公司 | Voice recognition method and system |
CN103198829A (en) * | 2013-02-25 | 2013-07-10 | 惠州市车仆电子科技有限公司 | Method, device and equipment of reducing interior noise and improving voice recognition rate |
CN103594092A (en) * | 2013-11-25 | 2014-02-19 | 广东欧珀移动通信有限公司 | Single microphone voice noise reduction method and device |
CN106548781A (en) * | 2015-09-21 | 2017-03-29 | 上海日趋信息技术有限公司 | A kind of method for eliminating background noise for speech recognition system |
CN105225665A (en) * | 2015-10-15 | 2016-01-06 | 桂林电子科技大学 | A kind of audio recognition method and speech recognition equipment |
CN107799115A (en) * | 2016-08-29 | 2018-03-13 | 法乐第(北京)网络科技有限公司 | A kind of audio recognition method and device |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110021292A (en) * | 2019-04-23 | 2019-07-16 | 四川长虹空调有限公司 | Method of speech processing, device and smart home device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108630193B (en) | Voice recognition method and device | |
US10878824B2 (en) | Speech-to-text generation using video-speech matching from a primary speaker | |
US9412371B2 (en) | Visualization interface of continuous waveform multi-speaker identification | |
CN104538043A (en) | Real-time emotion reminder for call | |
CN108877823B (en) | Speech enhancement method and device | |
CN106847305B (en) | Method and device for processing recording data of customer service telephone | |
CN106887231A (en) | A kind of identification model update method and system and intelligent terminal | |
CN111797632A (en) | Information processing method and device and electronic equipment | |
CN108735200A (en) | A kind of speaker's automatic marking method | |
US11462219B2 (en) | Voice filtering other speakers from calls and audio messages | |
CN106531195B (en) | A kind of dialogue collision detection method and device | |
CN107705791A (en) | Caller identity confirmation method, device and Voiceprint Recognition System based on Application on Voiceprint Recognition | |
CN106372653A (en) | Stack type automatic coder-based advertisement identification method | |
Sapra et al. | Emotion recognition from speech | |
CN110428835A (en) | A kind of adjusting method of speech ciphering equipment, device, storage medium and speech ciphering equipment | |
CN109389970A (en) | A kind of speech analysis recognition methods | |
CN101460994A (en) | Speech differentiation | |
CN112802498B (en) | Voice detection method, device, computer equipment and storage medium | |
CN111933120A (en) | Voice data automatic labeling method and system for voice recognition | |
CN110459206A (en) | A kind of speech recognition system and method based on track planning of dual robots identification | |
CN114461842A (en) | Method, device, equipment and storage medium for generating discouraging call | |
CN114220430A (en) | Multi-sound-zone voice interaction method, device, equipment and storage medium | |
CN106971734A (en) | It is a kind of that the method and system of identification model can be trained according to the extraction frequency of model | |
CN113077784A (en) | Intelligent voice equipment for role recognition | |
CN111009258A (en) | Single sound channel speaker separation model, training method and separation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190226 |
|
RJ01 | Rejection of invention patent application after publication |