CN109389970A - A kind of speech analysis recognition methods - Google Patents

A kind of speech analysis recognition methods Download PDF

Info

Publication number
CN109389970A
CN109389970A CN201811616352.4A CN201811616352A CN109389970A CN 109389970 A CN109389970 A CN 109389970A CN 201811616352 A CN201811616352 A CN 201811616352A CN 109389970 A CN109389970 A CN 109389970A
Authority
CN
China
Prior art keywords
signal
voice
speech
sound
voice signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811616352.4A
Other languages
Chinese (zh)
Inventor
孙昊
程庚
陈雅芹
黄健
殷国龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Kaijie Technology Co Ltd
Original Assignee
Hefei Kaijie Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Kaijie Technology Co Ltd filed Critical Hefei Kaijie Technology Co Ltd
Priority to CN201811616352.4A priority Critical patent/CN109389970A/en
Publication of CN109389970A publication Critical patent/CN109389970A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a kind of speech analysis recognition methods, and specifically include following operating procedure: S1: voice training, input speech signal is identified under quiet environment when first used, and analyze record sound characteristic parameter as target sound parameter;S2: user voice information is obtained, primary speech signal is calculated as;S3: processing is extracted to primary speech signal, and identifies sound characteristic parameter, the acoustic information of target sound characteristic parameter is extracted;S4: the acoustic information after extraction process is identified.The ambient noise that the present invention is inputted by intercepting the ambient enviroment voice signal inputted in the closest time with voice as this voice, and the ambient noise signal is removed in subsequent voice analysis processing, improve the accuracy of echo signal, by the way that targeted voice signal is carried out segment store, be conducive to carry out delay process to a large amount of targeted voice signal, avoid the problem that disposably inputting too long voice and leading to not Correct Analysis identification.

Description

A kind of speech analysis recognition methods
Technical field
The invention belongs to technical field of voice recognition, and in particular to a kind of speech analysis recognition methods.
Background technique
Speech recognition is a cross discipline.In the latest 20 years, speech recognition technology obtains marked improvement, starts from experiment Move towards market in room.It is contemplated that speech recognition technology will enter industry, household electrical appliances, communication, automotive electronics, doctor in coming 10 years The every field such as treatment, home services, consumption electronic product.Voice letter of the existing audio recognition method in the case where handling complex environment Number when, there is a problem of targeted voice signal obtain inaccuracy, especially under conditions of thering is third party to be simultaneously emitted by voice, nothing Method distinguishes targeted voice signal, poor to the extraction accuracy of targeted voice signal, and in input voice every time, surrounding ring Border noise has the characteristics that change in different periods, can also interfere to the acquisition of targeted voice signal.
Summary of the invention
The purpose of the present invention is to provide a kind of speech analysis recognition methods, to solve mentioned above in the background art ask Topic.
To achieve the above object, the invention provides the following technical scheme: a kind of speech analysis recognition methods, specifically include as Lower operating procedure:
S1: voice training, input speech signal is identified under quiet environment when first used, and analyzes record sound characteristic Parameter is as target sound parameter;
S2: obtaining user voice information, be calculated as primary speech signal, with the storage of wav formatted file;
S3: processing is extracted to primary speech signal, sound frequency is analyzed, and identify sound characteristic parameter, to target sound The acoustic information of characteristic parameter extracts;
S4: the acoustic information after extraction process is identified, and by recognition result and speech database and grammar database into Row comparison, obtains speech recognition result.
Preferably, primary speech signal includes targeted voice signal, ambient noise signal and third party's voice in the S2 Signal, wherein the characteristic parameter of ambient noise signal is ambient noise extracts gained in 0.3-0.5 seconds before targeted voice signal.
Preferably, original Speech processing includes the frequency wave for analyzing primary speech signal in the S3, extracts sound Parameters,acoustic, including ambient noise signal feature, calculating primary speech signal feature, identification targeted voice signal feature and identification Third party's phonetic feature, the influence of wiping out background noise, and enhancing processing is carried out to targeted voice signal, improve target language message Number identifiability, improve the accuracy of targeted voice signal.
Preferably, preferred that segmentation dividing processing is carried out to voice signal when being identified to targeted voice signal, and will Sound bite after segmentation saves in order, while identifying according to preservation sequence to sound bite, and recognition result is pressed Sequential output.
Technical effect and advantage of the invention:
The present invention passes through surrounding's ring before user inputs voice every time, in the interception time closest with voice input time The ambient noise that border voice signal is inputted as this voice, and remove the ambient noise signal in subsequent voice analysis processing It removes, the acquisition accuracy of echo signal can be effectively improved, reduce the interference of environmental noise, by believing the sound in original signal Number carry out audio parameter signature analysis, obtain targeted voice signal, can be by the way that targeted voice signal be carried out segment store, favorably In carrying out delay process to a large amount of targeted voice signal, the capacity of speech recognition is improved, avoids disposably inputting too long voice And the problem of leading to not Correct Analysis identification.
Specific embodiment
Below in conjunction with the embodiment of the present invention, technical scheme in the embodiment of the invention is clearly and completely described, Obviously, described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based in the present invention Embodiment, every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all Belong to the scope of protection of the invention.
A kind of speech analysis recognition methods, specifically includes following operating procedure:
S1: voice training, input speech signal is identified under quiet environment when first used, and analyzes record sound characteristic Parameter is as target sound parameter;
S2: obtaining user voice information, be calculated as primary speech signal, with the storage of wav formatted file;
S3: processing is extracted to primary speech signal, sound frequency is analyzed, and identify sound characteristic parameter, to target sound The acoustic information of characteristic parameter extracts;
S4: the acoustic information after extraction process is identified, and by recognition result and speech database and grammar database into Row comparison, obtains speech recognition result.
Primary speech signal includes targeted voice signal, ambient noise signal and third party's voice signal in the S2, The characteristic parameter of middle ambient noise signal is ambient noise extracts gained in 0.3-0.5 seconds before targeted voice signal.When user exists When open air uses speech recognition system, in external environment there are certain environmental noise can together with the voice messaging of user it is defeated Enter, therefore, surrounding's ring before user inputs voice every time, in the interception 0.3-0.5 second closest with voice input time The ambient noise that border voice signal is inputted as this voice, and remove the ambient noise signal in subsequent voice analysis processing It removes, the acquisition accuracy of echo signal can be effectively improved, reduce the interference of environmental noise.
Original Speech processing includes the frequency wave for analyzing primary speech signal in the S3, extracts the acoustics ginseng of sound Number, including ambient noise signal feature, calculating primary speech signal feature, identification targeted voice signal feature and identification third party Phonetic feature, the influence of wiping out background noise, and carry out enhancing processing to targeted voice signal, that improves targeted voice signal can Identity improves the accuracy of targeted voice signal.Each primary speech signal is subjected to analysis extraction, obtain and is deposited in advance The targeted voice signal that voice signal similar in the audio parameter feature of targeted voice signal is the input of this voice is stored up, and to this Targeted voice signal is analyzed and processed, and third-party voice signal is certain poor with the presence of pre-stored targeted voice signal It is different, therefore, ignore third party's voice signal automatically, improves the precision of targeted voice signal.
It is preferred that segmentation dividing processing is carried out to voice signal when being identified to targeted voice signal, and will be after segmentation Sound bite save in order, while sound bite is identified according to preservation sequence, and recognition result is defeated in order Out.It can be conducive to carry out delay process to a large amount of targeted voice signal, mention by the way that targeted voice signal is carried out segment store The capacity of high speech recognition avoids the problem that disposably inputting too long voice and leading to not Correct Analysis identification.
The present invention passes through the week before user inputs voice every time, in the interception time closest with voice input time The ambient noise that environmental sound signal is inputted as this voice is enclosed, and believes the ambient noise in subsequent voice analysis processing Number removal, can effectively improve the acquisition accuracy of echo signal, reduce the interference of environmental noise, by by the sound in original signal Sound signal carry out audio parameter signature analysis, obtain targeted voice signal, can by the way that targeted voice signal is carried out segment store, Be conducive to carry out delay process to a large amount of targeted voice signal, improve the capacity of speech recognition, avoid disposably inputting too long Voice and lead to not Correct Analysis identification the problem of.
Finally, it should be noted that the foregoing is only a preferred embodiment of the present invention, it is not intended to restrict the invention, Although the present invention is described in detail referring to the foregoing embodiments, for those skilled in the art, still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features, All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims (4)

1. a kind of speech analysis recognition methods, which is characterized in that specifically include following operating procedure:
S1: voice training, input speech signal is identified under quiet environment when first used, and analyzes record sound characteristic Parameter is as target sound parameter;
S2: obtaining user voice information, be calculated as primary speech signal, with the storage of wav formatted file;
S3: processing is extracted to primary speech signal, sound frequency is analyzed, and identify sound characteristic parameter, to target sound The acoustic information of characteristic parameter extracts;
S4: the acoustic information after extraction process is identified, and by recognition result and speech database and grammar database into Row comparison, obtains speech recognition result.
2. a kind of speech analysis recognition methods according to claim 1, it is characterised in that: primary speech signal in the S2 Including targeted voice signal, ambient noise signal and third party's voice signal, wherein the characteristic parameter of ambient noise signal is mesh Ambient noise extracts gained in 0.3-0.5 seconds before poster sound signal.
3. a kind of speech analysis recognition methods according to claim 1, it is characterised in that: primary speech signal in the S3 Processing includes the frequency wave of analysis primary speech signal, extracts the parameters,acoustic of sound, including ambient noise signal feature, calculating original Beginning phonic signal character, identification targeted voice signal feature and identification third party's phonetic feature, the influence of wiping out background noise, and Enhancing processing is carried out to targeted voice signal.
4. a kind of speech analysis recognition methods according to claim 3, it is characterised in that: carried out to targeted voice signal When identification, first choice carries out segmentation dividing processing to voice signal, and the sound bite after segmentation is saved in order, while according to Preservation sequence identifies sound bite, and recognition result is exported in order.
CN201811616352.4A 2018-12-28 2018-12-28 A kind of speech analysis recognition methods Pending CN109389970A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811616352.4A CN109389970A (en) 2018-12-28 2018-12-28 A kind of speech analysis recognition methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811616352.4A CN109389970A (en) 2018-12-28 2018-12-28 A kind of speech analysis recognition methods

Publications (1)

Publication Number Publication Date
CN109389970A true CN109389970A (en) 2019-02-26

Family

ID=65430770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811616352.4A Pending CN109389970A (en) 2018-12-28 2018-12-28 A kind of speech analysis recognition methods

Country Status (1)

Country Link
CN (1) CN109389970A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021292A (en) * 2019-04-23 2019-07-16 四川长虹空调有限公司 Method of speech processing, device and smart home device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020023020A1 (en) * 1999-09-21 2002-02-21 Kenyon Stephen C. Audio identification system and method
CN102592595A (en) * 2012-03-19 2012-07-18 安徽科大讯飞信息科技股份有限公司 Voice recognition method and system
CN103198829A (en) * 2013-02-25 2013-07-10 惠州市车仆电子科技有限公司 Method, device and equipment of reducing interior noise and improving voice recognition rate
CN103594092A (en) * 2013-11-25 2014-02-19 广东欧珀移动通信有限公司 Single microphone voice noise reduction method and device
CN105225665A (en) * 2015-10-15 2016-01-06 桂林电子科技大学 A kind of audio recognition method and speech recognition equipment
CN106548781A (en) * 2015-09-21 2017-03-29 上海日趋信息技术有限公司 A kind of method for eliminating background noise for speech recognition system
CN107799115A (en) * 2016-08-29 2018-03-13 法乐第(北京)网络科技有限公司 A kind of audio recognition method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020023020A1 (en) * 1999-09-21 2002-02-21 Kenyon Stephen C. Audio identification system and method
CN102592595A (en) * 2012-03-19 2012-07-18 安徽科大讯飞信息科技股份有限公司 Voice recognition method and system
CN103198829A (en) * 2013-02-25 2013-07-10 惠州市车仆电子科技有限公司 Method, device and equipment of reducing interior noise and improving voice recognition rate
CN103594092A (en) * 2013-11-25 2014-02-19 广东欧珀移动通信有限公司 Single microphone voice noise reduction method and device
CN106548781A (en) * 2015-09-21 2017-03-29 上海日趋信息技术有限公司 A kind of method for eliminating background noise for speech recognition system
CN105225665A (en) * 2015-10-15 2016-01-06 桂林电子科技大学 A kind of audio recognition method and speech recognition equipment
CN107799115A (en) * 2016-08-29 2018-03-13 法乐第(北京)网络科技有限公司 A kind of audio recognition method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110021292A (en) * 2019-04-23 2019-07-16 四川长虹空调有限公司 Method of speech processing, device and smart home device

Similar Documents

Publication Publication Date Title
CN108630193B (en) Voice recognition method and device
US10878824B2 (en) Speech-to-text generation using video-speech matching from a primary speaker
US9412371B2 (en) Visualization interface of continuous waveform multi-speaker identification
CN104538043A (en) Real-time emotion reminder for call
CN108877823B (en) Speech enhancement method and device
CN106847305B (en) Method and device for processing recording data of customer service telephone
CN106887231A (en) A kind of identification model update method and system and intelligent terminal
CN111797632A (en) Information processing method and device and electronic equipment
CN108735200A (en) A kind of speaker's automatic marking method
US11462219B2 (en) Voice filtering other speakers from calls and audio messages
CN106531195B (en) A kind of dialogue collision detection method and device
CN107705791A (en) Caller identity confirmation method, device and Voiceprint Recognition System based on Application on Voiceprint Recognition
CN106372653A (en) Stack type automatic coder-based advertisement identification method
Sapra et al. Emotion recognition from speech
CN110428835A (en) A kind of adjusting method of speech ciphering equipment, device, storage medium and speech ciphering equipment
CN109389970A (en) A kind of speech analysis recognition methods
CN101460994A (en) Speech differentiation
CN112802498B (en) Voice detection method, device, computer equipment and storage medium
CN111933120A (en) Voice data automatic labeling method and system for voice recognition
CN110459206A (en) A kind of speech recognition system and method based on track planning of dual robots identification
CN114461842A (en) Method, device, equipment and storage medium for generating discouraging call
CN114220430A (en) Multi-sound-zone voice interaction method, device, equipment and storage medium
CN106971734A (en) It is a kind of that the method and system of identification model can be trained according to the extraction frequency of model
CN113077784A (en) Intelligent voice equipment for role recognition
CN111009258A (en) Single sound channel speaker separation model, training method and separation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190226

RJ01 Rejection of invention patent application after publication