CN110299135A - Intelligent sound signal mode automatic recognition system device - Google Patents
Intelligent sound signal mode automatic recognition system device Download PDFInfo
- Publication number
- CN110299135A CN110299135A CN201810561739.8A CN201810561739A CN110299135A CN 110299135 A CN110299135 A CN 110299135A CN 201810561739 A CN201810561739 A CN 201810561739A CN 110299135 A CN110299135 A CN 110299135A
- Authority
- CN
- China
- Prior art keywords
- voice
- signal
- unit
- present
- acquisition device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 19
- 238000012545 processing Methods 0.000 claims abstract description 22
- 238000012360 testing method Methods 0.000 description 36
- 238000000034 method Methods 0.000 description 27
- 238000000605 extraction Methods 0.000 description 23
- 230000006870 function Effects 0.000 description 13
- 238000013528 artificial neural network Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000012549 training Methods 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 6
- 210000002569 neuron Anatomy 0.000 description 6
- 238000007635 classification algorithm Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 238000009432 framing Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 238000004321 preservation Methods 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 4
- 230000007704 transition Effects 0.000 description 4
- 230000005284 excitation Effects 0.000 description 3
- 238000000465 moulding Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000005266 casting Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000007257 malfunction Effects 0.000 description 2
- 238000011017 operating method Methods 0.000 description 2
- 238000004886 process control Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 235000010149 Brassica rapa subsp chinensis Nutrition 0.000 description 1
- 235000000536 Brassica rapa subsp pekinensis Nutrition 0.000 description 1
- 241000499436 Brassica rapa subsp. pekinensis Species 0.000 description 1
- 244000241235 Citrullus lanatus Species 0.000 description 1
- 235000012828 Citrullus lanatus var citroides Nutrition 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Telephonic Communication Services (AREA)
Abstract
A kind of intelligent sound signal mode automatic recognition system device, it include voice acquisition device 1, speech recognition equipment 2, central processing unit 3, wireless signal transceiver 4, display screen 8, power supply 9, memory 33, network module 31, RAM card 32 and loudspeaker 35, the voice acquisition device 1 includes microphone 11, wireless interspeaker 12 and fixed phonographic recorder 13, the speech recognition equipment 2 includes voice-input unit 20, voice pretreatment unit 21, speech recognition unit 22, characteristic matching identification and classification unit 23, framework 10 is provided with inner chamber body, the upper end middle position of framework 10 is provided with wireless signal transceiver 4, the left side of wireless signal transceiver 4 is provided with card slot 14, wireless interspeaker 12 is set as embedded and is loaded on card slot Inside 14, the left side of card slot 14 is provided with voice acquisition device 1, therefore, people's recognition of speech signals is more convenient.
Description
Technical field
The invention discloses a kind of intelligent sound signal mode automatic recognition system devices, belong to smart electronics product technology
Field is specifically equipped with a kind of intelligence that voice acquisition module, speech recognition module, control system and loudspeaker are integrated
It can voice signal mode automatic recognition system device.
Background technique
In people's daily life, there is the voice letters that various language signals, such as the exchange of people issue
Number, machine operation generate sound, play the sound etc. that sound, the vehicle whistle that music issues generate, voice signal almost fills
Denounce around entire living environment, has been sometimes right by which in one group of voice signal it is desirable to accurately learn and identify
As sending.In common voice signal, people can often identify different sound be issued by what object, but
Be when a variety of objects are simultaneously emitted by sound, especially multiple homogeneous objects simultaneously sounding when or playback environ-ment it is noisy, people
Being difficult to distinguish which kind of sound is issued by which object, for example, in the recording that people more than one group argues phenomenon, the number of speech
When more, people are difficult that which said distinguished by playback to be which debater says.Therefore, people usually require a kind of energy
Enough identify the device of voice.
Before making the present invention, on the market there is also the products of some identification voices, such as some voice Input Softwares, but
Being is the text or letter identified in voice mostly, or carries out pairing identification to simple single voice, and what is also had can be with
By speaking against products such as mobile phones, certain tasks are completed after handset identity voice is semantic, such as search simple task of making a phone call,
But cannot achieve the difference to phonetic feature, can not accurately cog region to separate comparable speech or identical word be by which
The similar problems that people or object are said.Therefore, it is not easy to the flexible use of people.
Summary of the invention
In order to overcome the technical drawbacks described above, the object of the present invention is to provide a kind of intelligent sound signal mode automatic identification systems
Bulk cargo is set, and people is made to carry out identifying more convenient intelligence to voice using originally practical.
In order to achieve the above objectives, the technical solution adopted by the present invention is that: include voice acquisition device 1, speech recognition dress
Set 2, central processing unit 3, wireless signal transceiver 4, display screen 8, power supply 9, memory 33, network module 31, RAM card 32
With loudspeaker 35, the voice acquisition device 1 includes microphone 11, wireless interspeaker 12 and fixed phonographic recorder 13, the voice
Identification device 2 includes voice-input unit 20, voice pretreatment unit 21, speech recognition unit 22, feature
With identification and classification unit 23, framework 10 is provided with inner chamber body, and the upper end middle position of framework 10 is provided with wireless signal
R-T unit 4 is provided with card slot 14 in the left side of wireless signal transceiver 4, and wireless interspeaker 12 is set as embedded and is loaded
Inside card slot 14, the left side of card slot 14 is provided with voice acquisition device 1, in the underface of wireless signal transceiver 4 position
The place of setting is provided with display screen 8, the lower left of display screen 8 is provided with speech recognition equipment 2, inside speech recognition equipment 2
Be disposed with voice-input unit 20, voice pretreatment unit 21,22 and of speech recognition unit from top to bottom
Characteristic matching identification and classification unit 23 is provided with power supply 9 in the right side bottom of speech recognition equipment 2, in the surface of power supply 9
Left side is provided with memory 33, and the surface of memory 33 is provided with central processing unit 3, sets on the right side of central processing unit 3
It is equipped with loudspeaker 35, the right side of memory 33 is provided with network module 31, the underface of network module 31 is provided with memory
Card 32, all electronic components are linked together by conducting wire 7 and constitute access.
The present invention devises, and the overall dimensions of wireless interspeaker 12 are 1~3mm smaller than the overall dimensions of card slot 14.
The present invention devises, and voice acquisition device 1 is built-in with voice collecting card, for collecting and handling collected voice
Signal.
The present invention devises, and fixed phonographic recorder 13 uses wind-proof type microphone.
The present invention devises, and display screen 8 uses touch screen or LED display with background light.
In the present invention, fixed phonographic recorder 13 is set as 2~10, is arranged at shell of the present invention, for increasing voice
Record intensity.
Voice signal is acquired by voice acquisition device 1, and collected signal is handled by speech recognition equipment 2, data-signal
It is saved by memory 33, the visualization of the output of the operating process and result of human-computer interaction is shown by display screen 8, loudspeaker 35
It is set as carrying out operating procedure voice prompting and casting recognition result, network module 31 is set as the present invention and internet cloud
Platform is attached, and central processing unit 3 is set as process control and data operation to whole system device, wireless signal transmitting-receiving
Device 4 is set as receiving radio signal caused by wireless interspeaker 12, smart phone, network module 31, emit
And connect the present invention with internet wireless, RAM card 32 is set as the external voice data that will have been recorded and reads in data of the present invention
In library.
The present invention devises, and voice-input unit 20 is set as including " voice input mode " and " tone testing mode "
Two types can pass through microphone 11, wireless interspeaker 12 provided by voice acquisition device 1, fixed phonographic recorder 13 and intelligent hand
Any one mode of machine inputs voice, and in " voice input mode ", voice-input unit 20 is set as once can only be to one
People or an object carry out voice input, which is characterized in that the voice of typing is one section of 5~30 seconds audio signal, the present invention
Using multimode voice input strategy, which is characterized in that may include in the voice of typing have normal speech, singing or height/in/
The multimode combine voice of bass, 8 real-time display speech waveform of display and schedule item, need after typing voice into
Row data markers, labeling method have such as acquired the sound of Zhang San using manually marking, i.e., show in the display screen of the invention 8
Dialog box in remarks: " sound of Zhang San ", preservation, the voice of typing save in the memory 33, in " tone testing mould
Under formula ", the present invention by microphone 11, wireless interspeaker 12, fixed phonographic recorder 13 and the smart phone in voice acquisition device 1 its
One of or a variety of input tools collecting test voice together, tested speech collection process be acquisition in real time, nobody
The limitation of number, object and time.
The present invention devises, and voice-input unit 20 is set as being connected with voice acquisition device 1, and microphone 11 passes through audio
Line is connected to voice acquisition device 1, and wireless interspeaker 12 is connect by radio signal with voice acquisition device 1.
The present invention devises, and voice acquisition device 1 also can be used smart phone and carry out voice signal input, by with intelligence
Mobile phone matches connection with voice acquisition device 1 of the present invention, and matching way includes bluetooth, infrared ray, WIFI and scans the two-dimensional code
It is attached, realizes voice input, be equivalent to and mobile phone is used as wireless language cylinder, more convenient for more crowd's voice interfaces.
The present invention devises, and the collected voice signal of voice acquisition device 1 is changed into electricity by voice pretreatment unit 21
Analog-signal transitions are digital signal, then carry out conventional signal processing by signal, including ambient background noise eliminate,
Signal framing, filtering, preemphasis, windowed function and end-point detection etc..
The present invention devises, and speech recognition unit 22 is set as extracting reflection language from primary speech signal
The main characteristic parameters of sound essence form feature vector xi, xi=(xi1,xi2,L xij,L,xin)T, xijIndicate i-th of object or
J-th personal of speech characteristic value, characteristic parameter extraction method preferably use frequency cepstral coefficient method (MFCC), also can be used
Spectrum envelope method, LPC interpolation method, LPC extraction of root, Hilbert transform method etc. obtain acoustic feature, extract the spy obtained after feature
Sign vector system will be automatically saved in mode class database, the corresponding mode of all sound characteristics of an object or people
Class, if after the voice of the N number of people of typing or object to get to N number of mode class can structure if each mode class has n characteristic parameter
At n dimensional feature space, that is, the characteristic signal collection after marking can be denoted as D={ (x1,y1),(x2,y2),L(xi,yi),L,(xN,yN),
Wherein xi∈ χ=Rn, xiIndicate i-th of object of institute's typing or the phonetic feature signal of people, yi∈ Y={ 1,2, L, N }, yiIt indicates
I-th of people or object, N indicate the digital number of n-th people or object, and the voice feature data after label constitutes mode class data
Library, and be stored in memory 33 of the invention.
The present invention devises, and characteristic matching identification and classification unit 23 is set as the multi classifier using intelligence, classifier
Learning algorithm be set as passing through using improved neural network classification algorithm to typing and the phonetic feature signal collection that marks
As training data, network model is allowed to learn training data, obtain classifying rules, completes the training of classifier;Then
Intelligent classification and identification are carried out to unknown tested speech signal using trained classifier;When test signal extraction is special
After sign, the present invention can carry out characteristic matching automatically, and the characteristic parameter of the tested speech signal of extraction is deposited with the present invention in real time
Typing and the sample voice characteristic parameter marked progress characteristic matching in reservoir 33, and calculate tested speech signal and own
Then the similarity of the sample speech signal of typing is divided into tested speech signal and that highest sample of its similarity
In signal mode classification, the last present invention outwardly exports recognition result, " this is the sound of XXX " similar report, for example, such as
The fruit present invention has stored the phonetic feature signal of Zhang San, and when Zhang San speaks or sings against the present invention, the present invention can be automatic
The tested speech characteristic parameter of Zhang San is calculated with typing and the typing voice signal of labeled Zhang San is most like, by knowing
Not, output " this is the sound of Zhang San " automatically.
The present invention devises, the multi-layer artificial neural network structure that multi classifier uses, characterized in that one end of network
It is defined as input layer, the other end is defined as output layer, and the part among input layer and output layer is defined as hidden layer, and input layer is used
In receiving extraneous input signal, input signal is sent to all neurons of hidden layer again, it will after hidden layer calculates
As a result output layer is passed to, output layer receives signal from hidden layer, output category result after calculating is identifying as a result, this hair
The number of plies of bright preferred hidden layer is set as 1~200 layer.
The present invention devises, and voice acquisition device 1 is built-in with voice collecting card, for collecting and handling collected voice
Signal.
The present invention devises, and fixed phonographic recorder 13 uses wind-proof type microphone.
The present invention devises, and display screen 8 uses touch screen or LED display with background light.
In the present invention, fixed phonographic recorder 13 can be set multiple, be arranged at shell of the present invention, for increasing Speech Record
Intensity processed.
The present invention is to typing and the voice signal that has marked has long-term store function, all to be stored in voice of the present invention
Voice signal in mode class database, the present invention can transfer at any time and compare identification with unknown tested speech.
Process for using of the invention is first to turn on the power switch 5, then automated system operation, and display screen 8 is lighted and shown
Operation interface, people can choose " voice input mode " and " tone testing mode " two kinds of functions.
(1) when selecting voice input, central processing unit 3 can control voice-input unit 20 and enter " voice input mould
The prompt that formula ", display screen 8 and loudspeaker 35 can prompt " being voice input mode now, please speak " similar simultaneously, people can lead to
Cross microphone 11, wireless interspeaker 12 provided by voice acquisition device 1, fixed phonographic recorder 13 and smart phone any one mode
Input voice;To guarantee the phonetic feature of the invention that can be accurately identified and quantify identified object, therefore in " voice input mould
Formula " the stage can only carry out voice input to a people or an object every time, be issued due to same people when speaking and singing
Sound signal data can have certain feature deviation, and therefore, for the accuracy for improving voice signal identification, the present invention is using more
State voice input strategy may include under normal speech, singing or high/medium/low sound and other states in the voice of typing
Multimode combined sound, long recording time be 5~30 seconds, display 8 can show voice real-time waveform and schedule item, if
The voice of recording is undesirable can to delete typing again, need to carry out data markers after typing voice, and labeling method uses
It manually marks, has such as acquired the sound of Zhang San, i.e., the remarks in the dialog box that the display screen of the invention 8 the is shown: " sound of Zhang San
Sound ", preservation, the phonetic storage of typing is in memory 33 of the invention.
(2) after voice signal typing, marked voice signal is sent into voice automatically by control system of the invention
The collected voice signal of voice acquisition device 1 is changed into electric signal by pretreatment unit 21, voice pretreatment unit 21, i.e., will
Analog-signal transitions are digital signal, then carry out conventional signal processing, including ambient background noise elimination, signal framing,
Filtering, preemphasis, windowed function and end-point detection etc..
(3) voice signal after having pre-processed is sent into signal characteristic abstraction unit 22, language automatically by control system of the invention
Sound signal feature extraction unit 22 extracts the characteristic parameter of reflection voice essence from pretreated voice signal, obtains spy
Levy vector xi, characteristic parameter extraction method preferably uses frequency cepstral coefficient method (MFCC), spectrum envelope method, LPC also can be used
Interpolation method, LPC extraction of root, Hilbert transform method etc. obtain acoustic feature, extract the feature vector system obtained after feature certainly
It is dynamic to be saved in pattern class library, the corresponding mode class of all sound characteristics of a people, if after the N number of human speech sound of typing, i.e.,
N number of mode class is obtained, if each mode class has n characteristic parameter, to obtain the data that a people corresponds to voice signal mode class
Library, all data are stored in memory 33 of the invention, and so far, speech signal typing mode contents finish.
(4) after voice input, tone testing can be carried out, when carrying out tone testing, it is only necessary in the behaviour of display screen 8
Make to select " tone testing mode " in interface, central processing unit 3 can control voice-input unit 20 and enter " tone testing mould
Formula ", display screen 8 and loudspeaker 35 can prompt prompt similar " in tone testing ... " simultaneously, at this moment people do not take to do it is any
Operation, the present invention can pass through microphone 11, wireless interspeaker 12, fixed phonographic recorder 13 and the smart phone in voice acquisition device 1
Collecting test voice, tested speech collection process be not any to acquire in real time together for one such or a variety of input tools
The limitation of time restriction and number.
(5) the collected voice data under " tone testing mode ", present system device can be automatically to test languages
Sound signal carries out pretreatment and feature extraction, converts electric signal for collected tone testing signal, and carry out conventional filter
Signal characteristic abstraction is carried out after wave, removal noise, windowed function and end-point detection.
(6) after testing signal extraction feature, the present invention can carry out characteristic matching automatically, by the tested speech signal of extraction
Marked good sample voice characteristic parameter of the characteristic parameter in real time with typing in memory 33 of the present invention carries out feature
Match, and calculates the similarity of the primary speech signal of tested speech signal and all typings, and tested speech signal is assigned to
In that highest pattern class of its similarity, the last present invention is outwardly exported, " this is the sound of XXX " similar report
It accuses, for example, if the present invention has stored the phonetic feature signal of Zhang San, when Zhang San speaks or sings against the present invention, this
Invention can export " this is the sound of Zhang San " automatically by identification.
When the present invention tests in public, since in test environment, the same period, there may be multiple objects are same
When speak, i.e., collected voice signal is the signal of broadband aliasing, to prevent the present invention special to the voice signal that acquires at this time
Sign malfunctions when extracting, and the strategy that the present invention uses is, with intelligent algorithm, first matches and identify language when single people speaks
Sound characteristic parameter is simultaneously identified and stores, and then system carries out automatic screening to voice signal when speaking jointly again and divides
From finally exporting recognition result and report " being the common sound of Zhang San, Li Si, king five ... now " similar prompt, and mention
Show that there are XX voices to fail to identify, presses power off key 6 when closing system.
The present invention has also been devised, and system and device can also export clear to the recognition result under more people's communication environments to people
It is single, comprising the quantity that how many people under test environment or object speak at the scene, and screens and play from more people while speaking
Recording in identification isolate the content that everyone is said, and filter out other people sound and ambient sound.
When occurring not stored sample phonic signal character of the invention in tested speech signal, the present invention can remember automatically
The unknown phonic signal character is recorded, to remind people whether to mark and store the voice signal of the object.
Detailed description of the invention
Fig. 1 is the structural diagram of the present invention.
Fig. 2 is system framework figure of the invention.
Fig. 3 is multi-layer artificial neural network schematic diagram of the invention.
Fig. 4 is the improved neural network classification algorithm flow chart of voice signal of the invention.
Specific embodiment
Attached drawing 1 is one embodiment of the present of invention, illustrates the present embodiment in conjunction with 1~attached drawing of attached drawing 4, includes voice
Acquisition device 1, speech recognition equipment 2, central processing unit 3, wireless signal transceiver 4, display screen 8, power supply 9, memory 33,
Network module 31, RAM card 32 and loudspeaker 35, the voice acquisition device 1 include microphone 11, wireless interspeaker 12 and consolidate
Determine phonographic recorder 13, the speech recognition equipment 2 includes voice-input unit 20, voice pretreatment unit 21, voice signal spy
Extraction unit 22, characteristic matching identification and classification unit 23 are levied, framework 10 is provided with inner chamber body, in the upper end interposition of framework 10
The place of setting is provided with wireless signal transceiver 4, the left side of wireless signal transceiver 4 is provided with card slot 14, wireless interspeaker
12, which are set as embedded, is loaded on inside card slot 14, and the left side of card slot 14 is provided with voice acquisition device 1, receives in wireless signal
It is provided with display screen 8 at the following position directly of transmitting apparatus 4, the lower left of display screen 8 is provided with speech recognition equipment 2,
It is disposed with voice-input unit 20, voice pretreatment unit 21, voice signal from top to bottom inside speech recognition equipment 2
Feature extraction unit 22 and characteristic matching identification and classification unit 23 are provided with power supply in the right side bottom of speech recognition equipment 2
9, it is provided with memory 33 on the left of the surface of power supply 9, the surface of memory 33 is provided with central processing unit 3, in
The right side of central processor 3 is provided with loudspeaker 35, the right side of memory 33 is provided with network module 31, in network module 31
Underface is provided with RAM card 32, and all electronic components are linked together by conducting wire 7 and constitute access.
In the present embodiment, the overall dimensions of wireless interspeaker 12 are 1~3mm smaller than the overall dimensions of card slot 14.
In the present embodiment, voice acquisition device 1 is built-in with voice collecting card, for collecting and handling collected voice
Signal.
In the present embodiment, fixed phonographic recorder 13 uses wind-proof type microphone.
In this example it is shown that screen 8 uses touch screen or LED display with background light.
In the present embodiment, fixed phonographic recorder 13 can be set multiple, be arranged at shell of the present invention, for increasing voice
Record intensity.
In the present embodiment, voice signal is acquired by voice acquisition device 1, and collected signal is by speech recognition equipment 2
Processing, data-signal are saved by memory 33, and the visualization of the output of the operating process and result of human-computer interaction is by display screen 8
It has been shown that, loudspeaker 35 is set as carrying out operating procedure voice prompting and casting recognition result, network module 31 are set as this
Invention is attached with internet cloud platform, and central processing unit 3 is set as process control and the data fortune to whole system device
It calculates, wireless signal transceiver 4 is set as to aerogram caused by wireless interspeaker 12, smart phone, network module 31
It number received, emitted and is connect the present invention with internet wireless, RAM card 32 is set as the external voice number that will have been recorded
According in reading database of the present invention.
In the present embodiment, voice-input unit 20 is set as including " voice input mode " and " tone testing mode "
Two types can pass through microphone 11, wireless interspeaker 12 provided by voice acquisition device 1, fixed phonographic recorder 13 and intelligent hand
Any one mode of machine inputs voice, and in " voice input mode ", voice-input unit 20 is set as once can only be to one
People or an object carry out voice input, which is characterized in that the voice of typing is one section of 5~30 seconds audio signal, the present invention
Using multimode voice input strategy, which is characterized in that may include in the voice of typing have normal speech, singing or height/in/
The multimode combine voice of bass, 8 real-time display speech waveform of display and schedule item, need after typing voice into
Row data markers, labeling method have such as acquired the sound of Zhang San using manually marking, i.e., show in the display screen of the invention 8
Dialog box in remarks: " sound of Zhang San ", preservation, the voice of typing save in the memory 33, in " tone testing mould
Under formula ", the present invention by microphone 11, wireless interspeaker 12, fixed phonographic recorder 13 and the smart phone in voice acquisition device 1 its
One of or a variety of input tools collecting test voice together, tested speech collection process be acquisition in real time, nobody
The limitation of number, object and time.
In the present embodiment, voice-input unit 20 is set as being connected with voice acquisition device 1, and microphone 11 passes through audio
Line is connected to voice acquisition device 1, and wireless interspeaker 12 is connect by radio signal with voice acquisition device 1.
In the present embodiment, voice acquisition device 1 also can be used smart phone and carry out voice signal input, by with mobile phone
Connection is matched with voice acquisition device 1 of the present invention, matching way includes bluetooth, infrared ray, WIFI and scans the two-dimensional code progress
Voice input is realized in connection, is equivalent to and mobile phone is used as wireless language cylinder, more convenient for more crowd's voice interfaces.
In the present embodiment, the collected voice signal of voice acquisition device 1 is changed into electricity by voice pretreatment unit 21
Analog-signal transitions are digital signal, then carry out conventional signal processing by signal, including ambient background noise eliminate,
Signal framing, filtering, preemphasis, windowed function and end-point detection etc..
In the present embodiment, speech recognition unit 22 is set as extracting reflection language from primary speech signal
The main characteristic parameters of sound essence form feature vector xi, xi=(xi1,xi2,L xij,L,xin)T, xijIndicate i-th of object or
J-th personal of speech characteristic value, characteristic parameter extraction method preferably use frequency cepstral coefficient method (MFCC), also can be used
Spectrum envelope method, LPC interpolation method, LPC extraction of root, Hilbert transform method etc. obtain acoustic feature, extract the spy obtained after feature
Sign vector system will be automatically saved in mode class database, the corresponding mode of all sound characteristics of an object or people
Class, if after the voice of the N number of people of typing or object to get to N number of mode class can structure if each mode class has n characteristic parameter
At n dimensional feature space, that is, the characteristic signal collection after marking can be denoted as D={ (x1,y1),(x2,y2),L(xi,yi),L,(xN,yN),
Wherein xi∈ χ=Rn, xiIndicate i-th of object of institute's typing or the phonetic feature signal of people, yi∈ Y={ 1,2, L, N }, yiIt indicates
I-th of people or object, N indicate the digital number of n-th people or object, and the voice feature data after label constitutes mode class data
Library, and be stored in memory 33 of the invention.
In the present embodiment, characteristic matching identification and classification unit 23 is set as the multi classifier using intelligence, classifier
Learning algorithm be set as passing through using improved neural network classification algorithm to typing and the phonetic feature signal collection that marks
As training data, network model is allowed to learn training data, obtain classifying rules, completes the training of classifier;Then
Intelligent classification and identification are carried out to unknown tested speech signal using trained classifier;When test signal extraction is special
After sign, the present invention can carry out characteristic matching automatically, and the characteristic parameter of the tested speech signal of extraction is deposited with the present invention in real time
Typing and the sample voice characteristic parameter marked progress characteristic matching in reservoir 33, and calculate tested speech signal and own
Then the similarity of the sample speech signal of typing is divided into tested speech signal and that highest sample of its similarity
In signal mode classification, the last present invention outwardly exports recognition result, " this is the sound of XXX " similar report, for example, such as
The fruit present invention has stored the phonetic feature signal of Zhang San, and when Zhang San speaks or sings against the present invention, the present invention can be automatic
The tested speech characteristic parameter of Zhang San is calculated with typing and the typing voice signal of labeled Zhang San is most like, by knowing
Not, output " this is the sound of Zhang San " automatically.
In the present embodiment, the multi-layer artificial neural network structure that multi classifier uses, characterized in that one end of network
It is defined as input layer, the other end is defined as output layer, and the part among input layer and output layer is defined as hidden layer, and input layer is used
In receiving extraneous input signal, input signal is sent to all neurons of hidden layer again, it will after hidden layer calculates
As a result output layer is passed to, output layer receives signal from hidden layer, output category result after calculating is identifying as a result, this hair
The number of plies of bright preferred hidden layer is set as 1~200 layer.
In the present embodiment, the process of improved artificial neural network classification algorithm training is as follows:
Step 1: netinit.According to the number of voice signal typing, algorithm data-base is constantly updated, as typing N
When the voice signal of a object, that is, N number of mode class is constituted, obtained sample space (X, Y), i-th group of sample is (Xi,Yi), XiIt indicates
To the extracted feature vector set of i-th of object, YiIndicate i-th marked of object;According to system input and output sequence
(X, Y) determines network input layer nodal point number n, hidden layer node number l, output layer nodal point number m, and wherein n value is by input signal feature
The number of corresponding eigenvalue determines in extraction, and m value is determined that the reference point of l is by the number of the speech pattern class storedWherein the value range of a is 0~10, calculates determination automatically by model, initializes input layer and hidden layer
Neuron between connection weight ωijConnection weight ω between hidden layer and output layer neuronjk, initialize implicit
Layer threshold value a and output layer threshold value b, gives learning rate η and neuron excitation function.
Step 2: calculating the output of hidden layer.According to input change X, the connection weight of the neuron of input layer and hidden layer
ωijAnd hidden layer threshold value a, it calculates hidden layer and exports H;The output for remembering j-th of hidden layer node is Hj,J=1,2, L, l, wherein l is hidden layer node number, and f is general hidden layer excitation function, the excitation
There are many functions, currently preferred to use f (x)=(1+e-x)-1。
Step 3: calculating the output of output layer.H is exported according to hidden layer, the connection between hidden layer and output layer neuron
Weight ωjkAnd output layer threshold value b, it calculates output layer and exports O, the output of k-th of output layer node of note is Ok,K=1,2, L, m, wherein m is output layer nodal point number, bkFor the threshold value of k-th of node of output layer,
HjFor the output valve of j-th of node of hidden layer.
Step 4: calculating prediction error.The output O and desired output Y (true value) obtained according to neural network forecast calculates network
Predict overall error e,ekFor k-th of output layer node generate error,
Step 5: updating weight.Network connection weight ω is updated according to neural network forecast overall error ejkAnd ωij, ωjk +=ωjk
+η·Hj·Ek, wherein j=1,2, L, l, k=1,2, L, m, η are learning rate, EkIndicate the network overall error pair of output layer node
The sensitivity of output layer network node k,
Wherein i=1,2, L, n, j=1,2, L, l.
Step 6: threshold value updates.Hidden layer threshold value a and output layer threshold value b is updated according to neural network forecast overall error e,J=1,2, L, l;bk +=bk+η·Ek, k=1,2, L, m.
Step 7: judging whether algorithm iteration restrains, if no convergence return step 2, currently preferred minimal error are not
Terminate iteration when 0.001.
In the present embodiment, voice acquisition device 1 is built-in with voice collecting card, for collecting and handling collected voice
Signal.
In the present embodiment, fixed phonographic recorder 13 uses wind-proof type microphone.
In this example it is shown that screen 8 uses touch screen or LED display with background light.
In the present embodiment, fixed phonographic recorder 13 can be set multiple, be arranged at shell of the present invention, for increasing voice
Record intensity.
The present invention is to typing and the voice signal that has marked has long-term store function, all to be stored in voice of the present invention
Voice signal in mode class database, the present invention can transfer at any time and compare identification with unknown tested speech.
Process for using of the invention is first to turn on the power switch 5, then automated system operation, and display screen 8 is lighted and shown
Operation interface, people can choose " voice input mode " and " tone testing mode " two kinds of functions.
(1) when selecting voice input, central processing unit 3 can control voice-input unit 20 and enter " voice input mould
The prompt that formula ", display screen 8 and loudspeaker 35 can prompt " being voice input mode now, please speak " similar simultaneously, people can lead to
Cross microphone 11, wireless interspeaker 12 provided by voice acquisition device 1, fixed phonographic recorder 13 and smart phone any one mode
Input voice;To guarantee the phonetic feature of the invention that can be accurately identified and quantify identified object, therefore in " voice input mould
Formula " the stage can only carry out voice input to a people or an object every time, be issued due to same people when speaking and singing
Sound signal data can have certain feature deviation, and therefore, for the accuracy for improving voice signal identification, the present invention is using more
State voice input strategy may include under normal speech, singing or high/medium/low sound and other states in the voice of typing
Multimode combined sound, long recording time be 5~30 seconds, display 8 can show voice real-time waveform and schedule item, if
The voice of recording is undesirable can to delete typing again, need to carry out data markers after typing voice, and labeling method uses
It manually marks, has such as acquired the sound of Zhang San, i.e., the remarks in the dialog box that the display screen of the invention 8 the is shown: " sound of Zhang San
Sound ", preservation, the phonetic storage of typing is in memory 33 of the invention.
(2) after voice signal typing, marked voice signal is sent into voice automatically by control system of the invention
The collected voice signal of voice acquisition device 1 is changed into electric signal by pretreatment unit 21, voice pretreatment unit 21, i.e., will
Analog-signal transitions are digital signal, then carry out conventional signal processing, including ambient background noise elimination, signal framing,
Filtering, preemphasis, windowed function and end-point detection etc..
(3) voice signal after having pre-processed is sent into signal characteristic abstraction unit 22, language automatically by control system of the invention
Sound signal feature extraction unit 22 extracts the characteristic parameter of reflection voice essence from pretreated voice signal, obtains spy
Levy vector xi, characteristic parameter extraction method preferably uses frequency cepstral coefficient method (MFCC), spectrum envelope method, LPC also can be used
Interpolation method, LPC extraction of root, Hilbert transform method etc. obtain acoustic feature, extract the feature vector system obtained after feature certainly
It is dynamic to be saved in pattern class library, the corresponding mode class of all sound characteristics of a people, if after the N number of human speech sound of typing, i.e.,
N number of mode class is obtained, if each mode class has n characteristic parameter, to obtain the data that a people corresponds to voice signal mode class
Library, all data are stored in memory 33 of the invention, and so far, speech signal typing mode contents finish.
(4) after voice input, tone testing can be carried out, when carrying out tone testing, it is only necessary in the behaviour of display screen 8
Make to select " tone testing mode " in interface, central processing unit 3 can control voice-input unit 20 and enter " tone testing mould
Formula ", display screen 8 and loudspeaker 35 can prompt prompt similar " in tone testing ... " simultaneously, at this moment people do not take to do it is any
Operation, the present invention can pass through microphone 11, wireless interspeaker 12, fixed phonographic recorder 13 and the smart phone in voice acquisition device 1
Collecting test voice, tested speech collection process be not any to acquire in real time together for one such or a variety of input tools
The limitation of time restriction and number.
(5) the collected voice data under " tone testing mode ", present system device can be automatically to test languages
Sound signal carries out pretreatment and feature extraction, converts electric signal for collected tone testing signal, and carry out conventional filter
Signal characteristic abstraction is carried out after wave, removal noise, windowed function and end-point detection.
(6) after testing signal extraction feature, the present invention can carry out characteristic matching automatically, by the tested speech signal of extraction
Marked good sample voice characteristic parameter of the characteristic parameter in real time with typing in memory 33 of the present invention carries out feature
Match, and calculates the similarity of the primary speech signal of tested speech signal and all typings, and tested speech signal is assigned to
In that highest pattern class of its similarity, the last present invention is outwardly exported, " this is the sound of XXX " similar report
It accuses, for example, if the present invention has stored the phonetic feature signal of Zhang San, when Zhang San speaks or sings against the present invention, this
Invention can export " this is the sound of Zhang San " automatically by identification.
When the present invention tests in public, since in test environment, the same period, there may be multiple objects are same
When speak, i.e., collected voice signal is the signal of broadband aliasing, to prevent the present invention special to the voice signal that acquires at this time
Sign malfunctions when extracting, and the strategy that the present invention uses is, with intelligent algorithm, first matches and identify language when single people speaks
Sound characteristic parameter is simultaneously identified and stores, and then system carries out automatic screening to voice signal when speaking jointly again and divides
From finally exporting recognition result and report " being the common sound of Zhang San, Li Si, king five ... now " similar prompt, and mention
Show that there are XX voices to fail to identify, presses power off key 6 when closing system.
In the present embodiment, system and device can also be exported to people to the recognition result inventory under more people's communication environments,
Comprising the quantity that how many people or object under test environment are spoken at the scene, and screens and play from more people while the record spoken
The content that everyone is said is isolated in identification in sound, and filters out other people sound and ambient sound.
When occurring not stored sample phonic signal character of the invention in tested speech signal, the present invention can remember automatically
The unknown phonic signal character is recorded, to remind people whether to mark and store the voice signal of the object.
In intelligent sound signal mode automatic recognition system engineering device technique field;It is all include voice acquisition device 1,
Speech recognition equipment 2, central processing unit 3, wireless signal transceiver 4, display screen 8, power supply 9, memory 33, network module
31, RAM card 32 and loudspeaker 35, the voice acquisition device 1 include microphone 11, wireless interspeaker 12 and fixed phonographic recorder
13, the speech recognition equipment 2 includes voice-input unit 20, voice pretreatment unit 21, speech recognition list
First 22, characteristic matching identification and classification unit 23, framework 10 is provided with inner chamber body, is arranged in the upper end middle position of framework 10
There is wireless signal transceiver 4, the left side of wireless signal transceiver 4 is provided with card slot 14, wireless interspeaker 12 is set as
It is embedded to be loaded on inside card slot 14, the left side of card slot 14 is provided with voice acquisition device 1, in wireless signal transceiver 4
Following position directly at be provided with display screen 8, the lower left of display screen 8 is provided with speech recognition equipment 2, in speech recognition
It is disposed with voice-input unit 20, voice pretreatment unit 21, speech recognition from top to bottom inside device 2
Unit 22 and characteristic matching identification and classification unit 23 are provided with power supply 9 in the right side bottom of speech recognition equipment 2, in power supply 9
Surface on the left of be provided with memory 33, the surface of memory 33 is provided with central processing unit 3, in central processing unit 3
Right side be provided with loudspeaker 35, the right side of memory 33 is provided with network module 31, is set in the underface of network module 31
It is equipped with RAM card 32, all electronic components link together the technology contents for constituting access all in guarantor of the invention by conducting wire 7
It protects in range.
It should be pointed out that the scope of the present invention should not be so limited to resemblance, the moulding of framework 10 of the invention can be set
It is set to rectangular, cylindrical, the polygon prism bodily form or is similar to other moulding such as Chinese cabbage, watermelon, stone, all moulding are different and substantive
Technology contents all technology contents same as the present invention also within protection scope of the present invention;Meanwhile the art
Technical staff makees conventional obvious small improvement or small combination on the basis of the content of present invention, as long as technology contents include
Technology contents within the context documented by the present invention are also within the scope of the present invention.
Claims (5)
1. a kind of intelligent sound signal mode automatic recognition system device;It is characterized in that: including voice acquisition device (1), language
Sound identification device (2), central processing unit (3), wireless signal transceiver (4), display screen (8), power supply (9), memory (33),
Network module (31), RAM card (32) and loudspeaker (35), the voice acquisition device (1) includes microphone (11), wireless right
It says machine (12) and fixes phonographic recorder (13), the speech recognition equipment (2) includes voice-input unit (20), voice pretreatment
Unit (21), speech recognition unit (22), characteristic matching identification and classification unit (23), framework (10) are provided with inner cavity
Body is provided with wireless signal transceiver (4), in wireless signal transceiver in the upper end middle position of framework (10)
(4) it is provided on the left of card slot (14), wireless interspeaker (12) is set as embedded and is loaded on card slot (14) inside, in card slot
(14) it is provided with voice acquisition device (1) on the left of, display is provided at the following position directly of wireless signal transceiver (4)
Shield (8), the lower left of display screen (8) be provided with speech recognition equipment (2), speech recognition equipment (2) it is internal from upper
Be disposed under voice-input unit (20), voice pretreatment unit (21), speech recognition unit (22) and
Characteristic matching identification and classification unit (23) is provided with power supply (9) in the right side bottom of speech recognition equipment (2), in power supply (9)
Surface on the left of be provided with memory (33), central processing unit (3) are provided with right above memory (33), in centre
Loudspeaker (35) are provided on the right side of reason device (3), network module (31) are provided on the right side of memory (33), in network module
(31) it is provided with immediately below RAM card (32), all electronic components are linked together by conducting wire (7) and constitute access.
2. intelligent sound signal mode automatic recognition system device according to claim 1;It is characterized in that: wireless interspeaker
(12) overall dimensions are 1 ~ 3mm smaller than the overall dimensions of card slot (14).
3. intelligent sound signal mode automatic recognition system device according to claim 1;It is characterized in that: display screen (8)
Using touch screen or LED display with background light.
4. intelligent sound signal mode automatic recognition system device according to claim 1;It is characterized in that: fixed phonographic recorder
(13) 2 ~ 10 are set as, is arranged at shell of the present invention, for increasing voice recording intensity.
5. intelligent sound signal mode automatic recognition system device according to claim 1;It is characterized in that: voice collecting fills
It sets (1) also and can be used smart phone and carry out voice signal input, by with smart phone and voice acquisition device (1) company of matching
It connects, matching way includes that bluetooth, infrared ray, WIFI and scanning the two-dimensional code are attached, and realizes voice input.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810561739.8A CN110299135A (en) | 2018-06-04 | 2018-06-04 | Intelligent sound signal mode automatic recognition system device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810561739.8A CN110299135A (en) | 2018-06-04 | 2018-06-04 | Intelligent sound signal mode automatic recognition system device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110299135A true CN110299135A (en) | 2019-10-01 |
Family
ID=68026322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810561739.8A Withdrawn CN110299135A (en) | 2018-06-04 | 2018-06-04 | Intelligent sound signal mode automatic recognition system device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110299135A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111176607A (en) * | 2019-12-27 | 2020-05-19 | 国网山东省电力公司临沂供电公司 | Voice interaction system and method based on power business |
CN113572492A (en) * | 2021-06-23 | 2021-10-29 | 力声通信股份有限公司 | Novel communication equipment prevents falling digital intercom |
CN113726705A (en) * | 2021-11-03 | 2021-11-30 | 天津七一二移动通信有限公司 | PDT interphone with integrated AIS coding and decoding capability |
CN115662423A (en) * | 2022-10-19 | 2023-01-31 | 博泰车联网(南京)有限公司 | Voice control method, device, equipment and storage medium |
-
2018
- 2018-06-04 CN CN201810561739.8A patent/CN110299135A/en not_active Withdrawn
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111176607A (en) * | 2019-12-27 | 2020-05-19 | 国网山东省电力公司临沂供电公司 | Voice interaction system and method based on power business |
CN113572492A (en) * | 2021-06-23 | 2021-10-29 | 力声通信股份有限公司 | Novel communication equipment prevents falling digital intercom |
CN113726705A (en) * | 2021-11-03 | 2021-11-30 | 天津七一二移动通信有限公司 | PDT interphone with integrated AIS coding and decoding capability |
CN113726705B (en) * | 2021-11-03 | 2022-01-07 | 天津七一二移动通信有限公司 | PDT interphone with integrated AIS coding and decoding capability |
CN115662423A (en) * | 2022-10-19 | 2023-01-31 | 博泰车联网(南京)有限公司 | Voice control method, device, equipment and storage medium |
CN115662423B (en) * | 2022-10-19 | 2023-11-03 | 博泰车联网(南京)有限公司 | Voice control method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107808659A (en) | Intelligent sound signal type recognition system device | |
CN110838286B (en) | Model training method, language identification method, device and equipment | |
CN108701453B (en) | Modular deep learning model | |
CN110299135A (en) | Intelligent sound signal mode automatic recognition system device | |
CN105940407B (en) | System and method for assessing the intensity of audio password | |
CN101261832B (en) | Extraction and modeling method for Chinese speech sensibility information | |
US9454958B2 (en) | Exploiting heterogeneous data in deep neural network-based speech recognition systems | |
CN107680582A (en) | Acoustic training model method, audio recognition method, device, equipment and medium | |
CN107767869A (en) | Method and apparatus for providing voice service | |
CN110136690A (en) | Phoneme synthesizing method, device and computer readable storage medium | |
CN107221320A (en) | Train method, device, equipment and the computer-readable storage medium of acoustic feature extraction model | |
CN111418009A (en) | Personalized speaker verification system and method | |
CN107610707A (en) | A kind of method for recognizing sound-groove and device | |
CN110853618A (en) | Language identification method, model training method, device and equipment | |
CN110853617B (en) | Model training method, language identification method, device and equipment | |
CN108281137A (en) | A kind of universal phonetic under whole tone element frame wakes up recognition methods and system | |
CN112259106A (en) | Voiceprint recognition method and device, storage medium and computer equipment | |
CN108269133A (en) | A kind of combination human bioequivalence and the intelligent advertisement push method and terminal of speech recognition | |
CN108364662B (en) | Voice emotion recognition method and system based on paired identification tasks | |
CN106295717B (en) | A kind of western musical instrument classification method based on rarefaction representation and machine learning | |
CN113066499B (en) | Method and device for identifying identity of land-air conversation speaker | |
CN110161480A (en) | Radar target identification method based on semi-supervised depth probabilistic model | |
CN110299132A (en) | A kind of speech digit recognition methods and device | |
Sun et al. | A novel convolutional neural network voiceprint recognition method based on improved pooling method and dropout idea | |
CN102141812A (en) | Robot |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20191001 |
|
WW01 | Invention patent application withdrawn after publication |