CN110299135A - Intelligent sound signal mode automatic recognition system device - Google Patents

Intelligent sound signal mode automatic recognition system device Download PDF

Info

Publication number
CN110299135A
CN110299135A CN201810561739.8A CN201810561739A CN110299135A CN 110299135 A CN110299135 A CN 110299135A CN 201810561739 A CN201810561739 A CN 201810561739A CN 110299135 A CN110299135 A CN 110299135A
Authority
CN
China
Prior art keywords
voice
signal
unit
present
acquisition device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810561739.8A
Other languages
Chinese (zh)
Inventor
宫文峰
张美玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201810561739.8A priority Critical patent/CN110299135A/en
Publication of CN110299135A publication Critical patent/CN110299135A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A kind of intelligent sound signal mode automatic recognition system device, it include voice acquisition device 1, speech recognition equipment 2, central processing unit 3, wireless signal transceiver 4, display screen 8, power supply 9, memory 33, network module 31, RAM card 32 and loudspeaker 35, the voice acquisition device 1 includes microphone 11, wireless interspeaker 12 and fixed phonographic recorder 13, the speech recognition equipment 2 includes voice-input unit 20, voice pretreatment unit 21, speech recognition unit 22, characteristic matching identification and classification unit 23, framework 10 is provided with inner chamber body, the upper end middle position of framework 10 is provided with wireless signal transceiver 4, the left side of wireless signal transceiver 4 is provided with card slot 14, wireless interspeaker 12 is set as embedded and is loaded on card slot Inside 14, the left side of card slot 14 is provided with voice acquisition device 1, therefore, people's recognition of speech signals is more convenient.

Description

Intelligent sound signal mode automatic recognition system device
Technical field
The invention discloses a kind of intelligent sound signal mode automatic recognition system devices, belong to smart electronics product technology Field is specifically equipped with a kind of intelligence that voice acquisition module, speech recognition module, control system and loudspeaker are integrated It can voice signal mode automatic recognition system device.
Background technique
In people's daily life, there is the voice letters that various language signals, such as the exchange of people issue Number, machine operation generate sound, play the sound etc. that sound, the vehicle whistle that music issues generate, voice signal almost fills Denounce around entire living environment, has been sometimes right by which in one group of voice signal it is desirable to accurately learn and identify As sending.In common voice signal, people can often identify different sound be issued by what object, but Be when a variety of objects are simultaneously emitted by sound, especially multiple homogeneous objects simultaneously sounding when or playback environ-ment it is noisy, people Being difficult to distinguish which kind of sound is issued by which object, for example, in the recording that people more than one group argues phenomenon, the number of speech When more, people are difficult that which said distinguished by playback to be which debater says.Therefore, people usually require a kind of energy Enough identify the device of voice.
Before making the present invention, on the market there is also the products of some identification voices, such as some voice Input Softwares, but Being is the text or letter identified in voice mostly, or carries out pairing identification to simple single voice, and what is also had can be with By speaking against products such as mobile phones, certain tasks are completed after handset identity voice is semantic, such as search simple task of making a phone call, But cannot achieve the difference to phonetic feature, can not accurately cog region to separate comparable speech or identical word be by which The similar problems that people or object are said.Therefore, it is not easy to the flexible use of people.
Summary of the invention
In order to overcome the technical drawbacks described above, the object of the present invention is to provide a kind of intelligent sound signal mode automatic identification systems Bulk cargo is set, and people is made to carry out identifying more convenient intelligence to voice using originally practical.
In order to achieve the above objectives, the technical solution adopted by the present invention is that: include voice acquisition device 1, speech recognition dress Set 2, central processing unit 3, wireless signal transceiver 4, display screen 8, power supply 9, memory 33, network module 31, RAM card 32 With loudspeaker 35, the voice acquisition device 1 includes microphone 11, wireless interspeaker 12 and fixed phonographic recorder 13, the voice Identification device 2 includes voice-input unit 20, voice pretreatment unit 21, speech recognition unit 22, feature With identification and classification unit 23, framework 10 is provided with inner chamber body, and the upper end middle position of framework 10 is provided with wireless signal R-T unit 4 is provided with card slot 14 in the left side of wireless signal transceiver 4, and wireless interspeaker 12 is set as embedded and is loaded Inside card slot 14, the left side of card slot 14 is provided with voice acquisition device 1, in the underface of wireless signal transceiver 4 position The place of setting is provided with display screen 8, the lower left of display screen 8 is provided with speech recognition equipment 2, inside speech recognition equipment 2 Be disposed with voice-input unit 20, voice pretreatment unit 21,22 and of speech recognition unit from top to bottom Characteristic matching identification and classification unit 23 is provided with power supply 9 in the right side bottom of speech recognition equipment 2, in the surface of power supply 9 Left side is provided with memory 33, and the surface of memory 33 is provided with central processing unit 3, sets on the right side of central processing unit 3 It is equipped with loudspeaker 35, the right side of memory 33 is provided with network module 31, the underface of network module 31 is provided with memory Card 32, all electronic components are linked together by conducting wire 7 and constitute access.
The present invention devises, and the overall dimensions of wireless interspeaker 12 are 1~3mm smaller than the overall dimensions of card slot 14.
The present invention devises, and voice acquisition device 1 is built-in with voice collecting card, for collecting and handling collected voice Signal.
The present invention devises, and fixed phonographic recorder 13 uses wind-proof type microphone.
The present invention devises, and display screen 8 uses touch screen or LED display with background light.
In the present invention, fixed phonographic recorder 13 is set as 2~10, is arranged at shell of the present invention, for increasing voice Record intensity.
Voice signal is acquired by voice acquisition device 1, and collected signal is handled by speech recognition equipment 2, data-signal It is saved by memory 33, the visualization of the output of the operating process and result of human-computer interaction is shown by display screen 8, loudspeaker 35 It is set as carrying out operating procedure voice prompting and casting recognition result, network module 31 is set as the present invention and internet cloud Platform is attached, and central processing unit 3 is set as process control and data operation to whole system device, wireless signal transmitting-receiving Device 4 is set as receiving radio signal caused by wireless interspeaker 12, smart phone, network module 31, emit And connect the present invention with internet wireless, RAM card 32 is set as the external voice data that will have been recorded and reads in data of the present invention In library.
The present invention devises, and voice-input unit 20 is set as including " voice input mode " and " tone testing mode " Two types can pass through microphone 11, wireless interspeaker 12 provided by voice acquisition device 1, fixed phonographic recorder 13 and intelligent hand Any one mode of machine inputs voice, and in " voice input mode ", voice-input unit 20 is set as once can only be to one People or an object carry out voice input, which is characterized in that the voice of typing is one section of 5~30 seconds audio signal, the present invention Using multimode voice input strategy, which is characterized in that may include in the voice of typing have normal speech, singing or height/in/ The multimode combine voice of bass, 8 real-time display speech waveform of display and schedule item, need after typing voice into Row data markers, labeling method have such as acquired the sound of Zhang San using manually marking, i.e., show in the display screen of the invention 8 Dialog box in remarks: " sound of Zhang San ", preservation, the voice of typing save in the memory 33, in " tone testing mould Under formula ", the present invention by microphone 11, wireless interspeaker 12, fixed phonographic recorder 13 and the smart phone in voice acquisition device 1 its One of or a variety of input tools collecting test voice together, tested speech collection process be acquisition in real time, nobody The limitation of number, object and time.
The present invention devises, and voice-input unit 20 is set as being connected with voice acquisition device 1, and microphone 11 passes through audio Line is connected to voice acquisition device 1, and wireless interspeaker 12 is connect by radio signal with voice acquisition device 1.
The present invention devises, and voice acquisition device 1 also can be used smart phone and carry out voice signal input, by with intelligence Mobile phone matches connection with voice acquisition device 1 of the present invention, and matching way includes bluetooth, infrared ray, WIFI and scans the two-dimensional code It is attached, realizes voice input, be equivalent to and mobile phone is used as wireless language cylinder, more convenient for more crowd's voice interfaces.
The present invention devises, and the collected voice signal of voice acquisition device 1 is changed into electricity by voice pretreatment unit 21 Analog-signal transitions are digital signal, then carry out conventional signal processing by signal, including ambient background noise eliminate, Signal framing, filtering, preemphasis, windowed function and end-point detection etc..
The present invention devises, and speech recognition unit 22 is set as extracting reflection language from primary speech signal The main characteristic parameters of sound essence form feature vector xi, xi=(xi1,xi2,L xij,L,xin)T, xijIndicate i-th of object or J-th personal of speech characteristic value, characteristic parameter extraction method preferably use frequency cepstral coefficient method (MFCC), also can be used Spectrum envelope method, LPC interpolation method, LPC extraction of root, Hilbert transform method etc. obtain acoustic feature, extract the spy obtained after feature Sign vector system will be automatically saved in mode class database, the corresponding mode of all sound characteristics of an object or people Class, if after the voice of the N number of people of typing or object to get to N number of mode class can structure if each mode class has n characteristic parameter At n dimensional feature space, that is, the characteristic signal collection after marking can be denoted as D={ (x1,y1),(x2,y2),L(xi,yi),L,(xN,yN), Wherein xi∈ χ=Rn, xiIndicate i-th of object of institute's typing or the phonetic feature signal of people, yi∈ Y={ 1,2, L, N }, yiIt indicates I-th of people or object, N indicate the digital number of n-th people or object, and the voice feature data after label constitutes mode class data Library, and be stored in memory 33 of the invention.
The present invention devises, and characteristic matching identification and classification unit 23 is set as the multi classifier using intelligence, classifier Learning algorithm be set as passing through using improved neural network classification algorithm to typing and the phonetic feature signal collection that marks As training data, network model is allowed to learn training data, obtain classifying rules, completes the training of classifier;Then Intelligent classification and identification are carried out to unknown tested speech signal using trained classifier;When test signal extraction is special After sign, the present invention can carry out characteristic matching automatically, and the characteristic parameter of the tested speech signal of extraction is deposited with the present invention in real time Typing and the sample voice characteristic parameter marked progress characteristic matching in reservoir 33, and calculate tested speech signal and own Then the similarity of the sample speech signal of typing is divided into tested speech signal and that highest sample of its similarity In signal mode classification, the last present invention outwardly exports recognition result, " this is the sound of XXX " similar report, for example, such as The fruit present invention has stored the phonetic feature signal of Zhang San, and when Zhang San speaks or sings against the present invention, the present invention can be automatic The tested speech characteristic parameter of Zhang San is calculated with typing and the typing voice signal of labeled Zhang San is most like, by knowing Not, output " this is the sound of Zhang San " automatically.
The present invention devises, the multi-layer artificial neural network structure that multi classifier uses, characterized in that one end of network It is defined as input layer, the other end is defined as output layer, and the part among input layer and output layer is defined as hidden layer, and input layer is used In receiving extraneous input signal, input signal is sent to all neurons of hidden layer again, it will after hidden layer calculates As a result output layer is passed to, output layer receives signal from hidden layer, output category result after calculating is identifying as a result, this hair The number of plies of bright preferred hidden layer is set as 1~200 layer.
The present invention devises, and voice acquisition device 1 is built-in with voice collecting card, for collecting and handling collected voice Signal.
The present invention devises, and fixed phonographic recorder 13 uses wind-proof type microphone.
The present invention devises, and display screen 8 uses touch screen or LED display with background light.
In the present invention, fixed phonographic recorder 13 can be set multiple, be arranged at shell of the present invention, for increasing Speech Record Intensity processed.
The present invention is to typing and the voice signal that has marked has long-term store function, all to be stored in voice of the present invention Voice signal in mode class database, the present invention can transfer at any time and compare identification with unknown tested speech.
Process for using of the invention is first to turn on the power switch 5, then automated system operation, and display screen 8 is lighted and shown Operation interface, people can choose " voice input mode " and " tone testing mode " two kinds of functions.
(1) when selecting voice input, central processing unit 3 can control voice-input unit 20 and enter " voice input mould The prompt that formula ", display screen 8 and loudspeaker 35 can prompt " being voice input mode now, please speak " similar simultaneously, people can lead to Cross microphone 11, wireless interspeaker 12 provided by voice acquisition device 1, fixed phonographic recorder 13 and smart phone any one mode Input voice;To guarantee the phonetic feature of the invention that can be accurately identified and quantify identified object, therefore in " voice input mould Formula " the stage can only carry out voice input to a people or an object every time, be issued due to same people when speaking and singing Sound signal data can have certain feature deviation, and therefore, for the accuracy for improving voice signal identification, the present invention is using more State voice input strategy may include under normal speech, singing or high/medium/low sound and other states in the voice of typing Multimode combined sound, long recording time be 5~30 seconds, display 8 can show voice real-time waveform and schedule item, if The voice of recording is undesirable can to delete typing again, need to carry out data markers after typing voice, and labeling method uses It manually marks, has such as acquired the sound of Zhang San, i.e., the remarks in the dialog box that the display screen of the invention 8 the is shown: " sound of Zhang San Sound ", preservation, the phonetic storage of typing is in memory 33 of the invention.
(2) after voice signal typing, marked voice signal is sent into voice automatically by control system of the invention The collected voice signal of voice acquisition device 1 is changed into electric signal by pretreatment unit 21, voice pretreatment unit 21, i.e., will Analog-signal transitions are digital signal, then carry out conventional signal processing, including ambient background noise elimination, signal framing, Filtering, preemphasis, windowed function and end-point detection etc..
(3) voice signal after having pre-processed is sent into signal characteristic abstraction unit 22, language automatically by control system of the invention Sound signal feature extraction unit 22 extracts the characteristic parameter of reflection voice essence from pretreated voice signal, obtains spy Levy vector xi, characteristic parameter extraction method preferably uses frequency cepstral coefficient method (MFCC), spectrum envelope method, LPC also can be used Interpolation method, LPC extraction of root, Hilbert transform method etc. obtain acoustic feature, extract the feature vector system obtained after feature certainly It is dynamic to be saved in pattern class library, the corresponding mode class of all sound characteristics of a people, if after the N number of human speech sound of typing, i.e., N number of mode class is obtained, if each mode class has n characteristic parameter, to obtain the data that a people corresponds to voice signal mode class Library, all data are stored in memory 33 of the invention, and so far, speech signal typing mode contents finish.
(4) after voice input, tone testing can be carried out, when carrying out tone testing, it is only necessary in the behaviour of display screen 8 Make to select " tone testing mode " in interface, central processing unit 3 can control voice-input unit 20 and enter " tone testing mould Formula ", display screen 8 and loudspeaker 35 can prompt prompt similar " in tone testing ... " simultaneously, at this moment people do not take to do it is any Operation, the present invention can pass through microphone 11, wireless interspeaker 12, fixed phonographic recorder 13 and the smart phone in voice acquisition device 1 Collecting test voice, tested speech collection process be not any to acquire in real time together for one such or a variety of input tools The limitation of time restriction and number.
(5) the collected voice data under " tone testing mode ", present system device can be automatically to test languages Sound signal carries out pretreatment and feature extraction, converts electric signal for collected tone testing signal, and carry out conventional filter Signal characteristic abstraction is carried out after wave, removal noise, windowed function and end-point detection.
(6) after testing signal extraction feature, the present invention can carry out characteristic matching automatically, by the tested speech signal of extraction Marked good sample voice characteristic parameter of the characteristic parameter in real time with typing in memory 33 of the present invention carries out feature Match, and calculates the similarity of the primary speech signal of tested speech signal and all typings, and tested speech signal is assigned to In that highest pattern class of its similarity, the last present invention is outwardly exported, " this is the sound of XXX " similar report It accuses, for example, if the present invention has stored the phonetic feature signal of Zhang San, when Zhang San speaks or sings against the present invention, this Invention can export " this is the sound of Zhang San " automatically by identification.
When the present invention tests in public, since in test environment, the same period, there may be multiple objects are same When speak, i.e., collected voice signal is the signal of broadband aliasing, to prevent the present invention special to the voice signal that acquires at this time Sign malfunctions when extracting, and the strategy that the present invention uses is, with intelligent algorithm, first matches and identify language when single people speaks Sound characteristic parameter is simultaneously identified and stores, and then system carries out automatic screening to voice signal when speaking jointly again and divides From finally exporting recognition result and report " being the common sound of Zhang San, Li Si, king five ... now " similar prompt, and mention Show that there are XX voices to fail to identify, presses power off key 6 when closing system.
The present invention has also been devised, and system and device can also export clear to the recognition result under more people's communication environments to people It is single, comprising the quantity that how many people under test environment or object speak at the scene, and screens and play from more people while speaking Recording in identification isolate the content that everyone is said, and filter out other people sound and ambient sound.
When occurring not stored sample phonic signal character of the invention in tested speech signal, the present invention can remember automatically The unknown phonic signal character is recorded, to remind people whether to mark and store the voice signal of the object.
Detailed description of the invention
Fig. 1 is the structural diagram of the present invention.
Fig. 2 is system framework figure of the invention.
Fig. 3 is multi-layer artificial neural network schematic diagram of the invention.
Fig. 4 is the improved neural network classification algorithm flow chart of voice signal of the invention.
Specific embodiment
Attached drawing 1 is one embodiment of the present of invention, illustrates the present embodiment in conjunction with 1~attached drawing of attached drawing 4, includes voice Acquisition device 1, speech recognition equipment 2, central processing unit 3, wireless signal transceiver 4, display screen 8, power supply 9, memory 33, Network module 31, RAM card 32 and loudspeaker 35, the voice acquisition device 1 include microphone 11, wireless interspeaker 12 and consolidate Determine phonographic recorder 13, the speech recognition equipment 2 includes voice-input unit 20, voice pretreatment unit 21, voice signal spy Extraction unit 22, characteristic matching identification and classification unit 23 are levied, framework 10 is provided with inner chamber body, in the upper end interposition of framework 10 The place of setting is provided with wireless signal transceiver 4, the left side of wireless signal transceiver 4 is provided with card slot 14, wireless interspeaker 12, which are set as embedded, is loaded on inside card slot 14, and the left side of card slot 14 is provided with voice acquisition device 1, receives in wireless signal It is provided with display screen 8 at the following position directly of transmitting apparatus 4, the lower left of display screen 8 is provided with speech recognition equipment 2, It is disposed with voice-input unit 20, voice pretreatment unit 21, voice signal from top to bottom inside speech recognition equipment 2 Feature extraction unit 22 and characteristic matching identification and classification unit 23 are provided with power supply in the right side bottom of speech recognition equipment 2 9, it is provided with memory 33 on the left of the surface of power supply 9, the surface of memory 33 is provided with central processing unit 3, in The right side of central processor 3 is provided with loudspeaker 35, the right side of memory 33 is provided with network module 31, in network module 31 Underface is provided with RAM card 32, and all electronic components are linked together by conducting wire 7 and constitute access.
In the present embodiment, the overall dimensions of wireless interspeaker 12 are 1~3mm smaller than the overall dimensions of card slot 14.
In the present embodiment, voice acquisition device 1 is built-in with voice collecting card, for collecting and handling collected voice Signal.
In the present embodiment, fixed phonographic recorder 13 uses wind-proof type microphone.
In this example it is shown that screen 8 uses touch screen or LED display with background light.
In the present embodiment, fixed phonographic recorder 13 can be set multiple, be arranged at shell of the present invention, for increasing voice Record intensity.
In the present embodiment, voice signal is acquired by voice acquisition device 1, and collected signal is by speech recognition equipment 2 Processing, data-signal are saved by memory 33, and the visualization of the output of the operating process and result of human-computer interaction is by display screen 8 It has been shown that, loudspeaker 35 is set as carrying out operating procedure voice prompting and casting recognition result, network module 31 are set as this Invention is attached with internet cloud platform, and central processing unit 3 is set as process control and the data fortune to whole system device It calculates, wireless signal transceiver 4 is set as to aerogram caused by wireless interspeaker 12, smart phone, network module 31 It number received, emitted and is connect the present invention with internet wireless, RAM card 32 is set as the external voice number that will have been recorded According in reading database of the present invention.
In the present embodiment, voice-input unit 20 is set as including " voice input mode " and " tone testing mode " Two types can pass through microphone 11, wireless interspeaker 12 provided by voice acquisition device 1, fixed phonographic recorder 13 and intelligent hand Any one mode of machine inputs voice, and in " voice input mode ", voice-input unit 20 is set as once can only be to one People or an object carry out voice input, which is characterized in that the voice of typing is one section of 5~30 seconds audio signal, the present invention Using multimode voice input strategy, which is characterized in that may include in the voice of typing have normal speech, singing or height/in/ The multimode combine voice of bass, 8 real-time display speech waveform of display and schedule item, need after typing voice into Row data markers, labeling method have such as acquired the sound of Zhang San using manually marking, i.e., show in the display screen of the invention 8 Dialog box in remarks: " sound of Zhang San ", preservation, the voice of typing save in the memory 33, in " tone testing mould Under formula ", the present invention by microphone 11, wireless interspeaker 12, fixed phonographic recorder 13 and the smart phone in voice acquisition device 1 its One of or a variety of input tools collecting test voice together, tested speech collection process be acquisition in real time, nobody The limitation of number, object and time.
In the present embodiment, voice-input unit 20 is set as being connected with voice acquisition device 1, and microphone 11 passes through audio Line is connected to voice acquisition device 1, and wireless interspeaker 12 is connect by radio signal with voice acquisition device 1.
In the present embodiment, voice acquisition device 1 also can be used smart phone and carry out voice signal input, by with mobile phone Connection is matched with voice acquisition device 1 of the present invention, matching way includes bluetooth, infrared ray, WIFI and scans the two-dimensional code progress Voice input is realized in connection, is equivalent to and mobile phone is used as wireless language cylinder, more convenient for more crowd's voice interfaces.
In the present embodiment, the collected voice signal of voice acquisition device 1 is changed into electricity by voice pretreatment unit 21 Analog-signal transitions are digital signal, then carry out conventional signal processing by signal, including ambient background noise eliminate, Signal framing, filtering, preemphasis, windowed function and end-point detection etc..
In the present embodiment, speech recognition unit 22 is set as extracting reflection language from primary speech signal The main characteristic parameters of sound essence form feature vector xi, xi=(xi1,xi2,L xij,L,xin)T, xijIndicate i-th of object or J-th personal of speech characteristic value, characteristic parameter extraction method preferably use frequency cepstral coefficient method (MFCC), also can be used Spectrum envelope method, LPC interpolation method, LPC extraction of root, Hilbert transform method etc. obtain acoustic feature, extract the spy obtained after feature Sign vector system will be automatically saved in mode class database, the corresponding mode of all sound characteristics of an object or people Class, if after the voice of the N number of people of typing or object to get to N number of mode class can structure if each mode class has n characteristic parameter At n dimensional feature space, that is, the characteristic signal collection after marking can be denoted as D={ (x1,y1),(x2,y2),L(xi,yi),L,(xN,yN), Wherein xi∈ χ=Rn, xiIndicate i-th of object of institute's typing or the phonetic feature signal of people, yi∈ Y={ 1,2, L, N }, yiIt indicates I-th of people or object, N indicate the digital number of n-th people or object, and the voice feature data after label constitutes mode class data Library, and be stored in memory 33 of the invention.
In the present embodiment, characteristic matching identification and classification unit 23 is set as the multi classifier using intelligence, classifier Learning algorithm be set as passing through using improved neural network classification algorithm to typing and the phonetic feature signal collection that marks As training data, network model is allowed to learn training data, obtain classifying rules, completes the training of classifier;Then Intelligent classification and identification are carried out to unknown tested speech signal using trained classifier;When test signal extraction is special After sign, the present invention can carry out characteristic matching automatically, and the characteristic parameter of the tested speech signal of extraction is deposited with the present invention in real time Typing and the sample voice characteristic parameter marked progress characteristic matching in reservoir 33, and calculate tested speech signal and own Then the similarity of the sample speech signal of typing is divided into tested speech signal and that highest sample of its similarity In signal mode classification, the last present invention outwardly exports recognition result, " this is the sound of XXX " similar report, for example, such as The fruit present invention has stored the phonetic feature signal of Zhang San, and when Zhang San speaks or sings against the present invention, the present invention can be automatic The tested speech characteristic parameter of Zhang San is calculated with typing and the typing voice signal of labeled Zhang San is most like, by knowing Not, output " this is the sound of Zhang San " automatically.
In the present embodiment, the multi-layer artificial neural network structure that multi classifier uses, characterized in that one end of network It is defined as input layer, the other end is defined as output layer, and the part among input layer and output layer is defined as hidden layer, and input layer is used In receiving extraneous input signal, input signal is sent to all neurons of hidden layer again, it will after hidden layer calculates As a result output layer is passed to, output layer receives signal from hidden layer, output category result after calculating is identifying as a result, this hair The number of plies of bright preferred hidden layer is set as 1~200 layer.
In the present embodiment, the process of improved artificial neural network classification algorithm training is as follows:
Step 1: netinit.According to the number of voice signal typing, algorithm data-base is constantly updated, as typing N When the voice signal of a object, that is, N number of mode class is constituted, obtained sample space (X, Y), i-th group of sample is (Xi,Yi), XiIt indicates To the extracted feature vector set of i-th of object, YiIndicate i-th marked of object;According to system input and output sequence (X, Y) determines network input layer nodal point number n, hidden layer node number l, output layer nodal point number m, and wherein n value is by input signal feature The number of corresponding eigenvalue determines in extraction, and m value is determined that the reference point of l is by the number of the speech pattern class storedWherein the value range of a is 0~10, calculates determination automatically by model, initializes input layer and hidden layer Neuron between connection weight ωijConnection weight ω between hidden layer and output layer neuronjk, initialize implicit Layer threshold value a and output layer threshold value b, gives learning rate η and neuron excitation function.
Step 2: calculating the output of hidden layer.According to input change X, the connection weight of the neuron of input layer and hidden layer ωijAnd hidden layer threshold value a, it calculates hidden layer and exports H;The output for remembering j-th of hidden layer node is Hj,J=1,2, L, l, wherein l is hidden layer node number, and f is general hidden layer excitation function, the excitation There are many functions, currently preferred to use f (x)=(1+e-x)-1
Step 3: calculating the output of output layer.H is exported according to hidden layer, the connection between hidden layer and output layer neuron Weight ωjkAnd output layer threshold value b, it calculates output layer and exports O, the output of k-th of output layer node of note is Ok,K=1,2, L, m, wherein m is output layer nodal point number, bkFor the threshold value of k-th of node of output layer, HjFor the output valve of j-th of node of hidden layer.
Step 4: calculating prediction error.The output O and desired output Y (true value) obtained according to neural network forecast calculates network Predict overall error e,ekFor k-th of output layer node generate error,
Step 5: updating weight.Network connection weight ω is updated according to neural network forecast overall error ejkAnd ωij, ωjk +jk +η·Hj·Ek, wherein j=1,2, L, l, k=1,2, L, m, η are learning rate, EkIndicate the network overall error pair of output layer node The sensitivity of output layer network node k, Wherein i=1,2, L, n, j=1,2, L, l.
Step 6: threshold value updates.Hidden layer threshold value a and output layer threshold value b is updated according to neural network forecast overall error e,J=1,2, L, l;bk +=bk+η·Ek, k=1,2, L, m.
Step 7: judging whether algorithm iteration restrains, if no convergence return step 2, currently preferred minimal error are not Terminate iteration when 0.001.
In the present embodiment, voice acquisition device 1 is built-in with voice collecting card, for collecting and handling collected voice Signal.
In the present embodiment, fixed phonographic recorder 13 uses wind-proof type microphone.
In this example it is shown that screen 8 uses touch screen or LED display with background light.
In the present embodiment, fixed phonographic recorder 13 can be set multiple, be arranged at shell of the present invention, for increasing voice Record intensity.
The present invention is to typing and the voice signal that has marked has long-term store function, all to be stored in voice of the present invention Voice signal in mode class database, the present invention can transfer at any time and compare identification with unknown tested speech.
Process for using of the invention is first to turn on the power switch 5, then automated system operation, and display screen 8 is lighted and shown Operation interface, people can choose " voice input mode " and " tone testing mode " two kinds of functions.
(1) when selecting voice input, central processing unit 3 can control voice-input unit 20 and enter " voice input mould The prompt that formula ", display screen 8 and loudspeaker 35 can prompt " being voice input mode now, please speak " similar simultaneously, people can lead to Cross microphone 11, wireless interspeaker 12 provided by voice acquisition device 1, fixed phonographic recorder 13 and smart phone any one mode Input voice;To guarantee the phonetic feature of the invention that can be accurately identified and quantify identified object, therefore in " voice input mould Formula " the stage can only carry out voice input to a people or an object every time, be issued due to same people when speaking and singing Sound signal data can have certain feature deviation, and therefore, for the accuracy for improving voice signal identification, the present invention is using more State voice input strategy may include under normal speech, singing or high/medium/low sound and other states in the voice of typing Multimode combined sound, long recording time be 5~30 seconds, display 8 can show voice real-time waveform and schedule item, if The voice of recording is undesirable can to delete typing again, need to carry out data markers after typing voice, and labeling method uses It manually marks, has such as acquired the sound of Zhang San, i.e., the remarks in the dialog box that the display screen of the invention 8 the is shown: " sound of Zhang San Sound ", preservation, the phonetic storage of typing is in memory 33 of the invention.
(2) after voice signal typing, marked voice signal is sent into voice automatically by control system of the invention The collected voice signal of voice acquisition device 1 is changed into electric signal by pretreatment unit 21, voice pretreatment unit 21, i.e., will Analog-signal transitions are digital signal, then carry out conventional signal processing, including ambient background noise elimination, signal framing, Filtering, preemphasis, windowed function and end-point detection etc..
(3) voice signal after having pre-processed is sent into signal characteristic abstraction unit 22, language automatically by control system of the invention Sound signal feature extraction unit 22 extracts the characteristic parameter of reflection voice essence from pretreated voice signal, obtains spy Levy vector xi, characteristic parameter extraction method preferably uses frequency cepstral coefficient method (MFCC), spectrum envelope method, LPC also can be used Interpolation method, LPC extraction of root, Hilbert transform method etc. obtain acoustic feature, extract the feature vector system obtained after feature certainly It is dynamic to be saved in pattern class library, the corresponding mode class of all sound characteristics of a people, if after the N number of human speech sound of typing, i.e., N number of mode class is obtained, if each mode class has n characteristic parameter, to obtain the data that a people corresponds to voice signal mode class Library, all data are stored in memory 33 of the invention, and so far, speech signal typing mode contents finish.
(4) after voice input, tone testing can be carried out, when carrying out tone testing, it is only necessary in the behaviour of display screen 8 Make to select " tone testing mode " in interface, central processing unit 3 can control voice-input unit 20 and enter " tone testing mould Formula ", display screen 8 and loudspeaker 35 can prompt prompt similar " in tone testing ... " simultaneously, at this moment people do not take to do it is any Operation, the present invention can pass through microphone 11, wireless interspeaker 12, fixed phonographic recorder 13 and the smart phone in voice acquisition device 1 Collecting test voice, tested speech collection process be not any to acquire in real time together for one such or a variety of input tools The limitation of time restriction and number.
(5) the collected voice data under " tone testing mode ", present system device can be automatically to test languages Sound signal carries out pretreatment and feature extraction, converts electric signal for collected tone testing signal, and carry out conventional filter Signal characteristic abstraction is carried out after wave, removal noise, windowed function and end-point detection.
(6) after testing signal extraction feature, the present invention can carry out characteristic matching automatically, by the tested speech signal of extraction Marked good sample voice characteristic parameter of the characteristic parameter in real time with typing in memory 33 of the present invention carries out feature Match, and calculates the similarity of the primary speech signal of tested speech signal and all typings, and tested speech signal is assigned to In that highest pattern class of its similarity, the last present invention is outwardly exported, " this is the sound of XXX " similar report It accuses, for example, if the present invention has stored the phonetic feature signal of Zhang San, when Zhang San speaks or sings against the present invention, this Invention can export " this is the sound of Zhang San " automatically by identification.
When the present invention tests in public, since in test environment, the same period, there may be multiple objects are same When speak, i.e., collected voice signal is the signal of broadband aliasing, to prevent the present invention special to the voice signal that acquires at this time Sign malfunctions when extracting, and the strategy that the present invention uses is, with intelligent algorithm, first matches and identify language when single people speaks Sound characteristic parameter is simultaneously identified and stores, and then system carries out automatic screening to voice signal when speaking jointly again and divides From finally exporting recognition result and report " being the common sound of Zhang San, Li Si, king five ... now " similar prompt, and mention Show that there are XX voices to fail to identify, presses power off key 6 when closing system.
In the present embodiment, system and device can also be exported to people to the recognition result inventory under more people's communication environments, Comprising the quantity that how many people or object under test environment are spoken at the scene, and screens and play from more people while the record spoken The content that everyone is said is isolated in identification in sound, and filters out other people sound and ambient sound.
When occurring not stored sample phonic signal character of the invention in tested speech signal, the present invention can remember automatically The unknown phonic signal character is recorded, to remind people whether to mark and store the voice signal of the object.
In intelligent sound signal mode automatic recognition system engineering device technique field;It is all include voice acquisition device 1, Speech recognition equipment 2, central processing unit 3, wireless signal transceiver 4, display screen 8, power supply 9, memory 33, network module 31, RAM card 32 and loudspeaker 35, the voice acquisition device 1 include microphone 11, wireless interspeaker 12 and fixed phonographic recorder 13, the speech recognition equipment 2 includes voice-input unit 20, voice pretreatment unit 21, speech recognition list First 22, characteristic matching identification and classification unit 23, framework 10 is provided with inner chamber body, is arranged in the upper end middle position of framework 10 There is wireless signal transceiver 4, the left side of wireless signal transceiver 4 is provided with card slot 14, wireless interspeaker 12 is set as It is embedded to be loaded on inside card slot 14, the left side of card slot 14 is provided with voice acquisition device 1, in wireless signal transceiver 4 Following position directly at be provided with display screen 8, the lower left of display screen 8 is provided with speech recognition equipment 2, in speech recognition It is disposed with voice-input unit 20, voice pretreatment unit 21, speech recognition from top to bottom inside device 2 Unit 22 and characteristic matching identification and classification unit 23 are provided with power supply 9 in the right side bottom of speech recognition equipment 2, in power supply 9 Surface on the left of be provided with memory 33, the surface of memory 33 is provided with central processing unit 3, in central processing unit 3 Right side be provided with loudspeaker 35, the right side of memory 33 is provided with network module 31, is set in the underface of network module 31 It is equipped with RAM card 32, all electronic components link together the technology contents for constituting access all in guarantor of the invention by conducting wire 7 It protects in range.
It should be pointed out that the scope of the present invention should not be so limited to resemblance, the moulding of framework 10 of the invention can be set It is set to rectangular, cylindrical, the polygon prism bodily form or is similar to other moulding such as Chinese cabbage, watermelon, stone, all moulding are different and substantive Technology contents all technology contents same as the present invention also within protection scope of the present invention;Meanwhile the art Technical staff makees conventional obvious small improvement or small combination on the basis of the content of present invention, as long as technology contents include Technology contents within the context documented by the present invention are also within the scope of the present invention.

Claims (5)

1. a kind of intelligent sound signal mode automatic recognition system device;It is characterized in that: including voice acquisition device (1), language Sound identification device (2), central processing unit (3), wireless signal transceiver (4), display screen (8), power supply (9), memory (33), Network module (31), RAM card (32) and loudspeaker (35), the voice acquisition device (1) includes microphone (11), wireless right It says machine (12) and fixes phonographic recorder (13), the speech recognition equipment (2) includes voice-input unit (20), voice pretreatment Unit (21), speech recognition unit (22), characteristic matching identification and classification unit (23), framework (10) are provided with inner cavity Body is provided with wireless signal transceiver (4), in wireless signal transceiver in the upper end middle position of framework (10) (4) it is provided on the left of card slot (14), wireless interspeaker (12) is set as embedded and is loaded on card slot (14) inside, in card slot (14) it is provided with voice acquisition device (1) on the left of, display is provided at the following position directly of wireless signal transceiver (4) Shield (8), the lower left of display screen (8) be provided with speech recognition equipment (2), speech recognition equipment (2) it is internal from upper Be disposed under voice-input unit (20), voice pretreatment unit (21), speech recognition unit (22) and Characteristic matching identification and classification unit (23) is provided with power supply (9) in the right side bottom of speech recognition equipment (2), in power supply (9) Surface on the left of be provided with memory (33), central processing unit (3) are provided with right above memory (33), in centre Loudspeaker (35) are provided on the right side of reason device (3), network module (31) are provided on the right side of memory (33), in network module (31) it is provided with immediately below RAM card (32), all electronic components are linked together by conducting wire (7) and constitute access.
2. intelligent sound signal mode automatic recognition system device according to claim 1;It is characterized in that: wireless interspeaker (12) overall dimensions are 1 ~ 3mm smaller than the overall dimensions of card slot (14).
3. intelligent sound signal mode automatic recognition system device according to claim 1;It is characterized in that: display screen (8) Using touch screen or LED display with background light.
4. intelligent sound signal mode automatic recognition system device according to claim 1;It is characterized in that: fixed phonographic recorder (13) 2 ~ 10 are set as, is arranged at shell of the present invention, for increasing voice recording intensity.
5. intelligent sound signal mode automatic recognition system device according to claim 1;It is characterized in that: voice collecting fills It sets (1) also and can be used smart phone and carry out voice signal input, by with smart phone and voice acquisition device (1) company of matching It connects, matching way includes that bluetooth, infrared ray, WIFI and scanning the two-dimensional code are attached, and realizes voice input.
CN201810561739.8A 2018-06-04 2018-06-04 Intelligent sound signal mode automatic recognition system device Withdrawn CN110299135A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810561739.8A CN110299135A (en) 2018-06-04 2018-06-04 Intelligent sound signal mode automatic recognition system device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810561739.8A CN110299135A (en) 2018-06-04 2018-06-04 Intelligent sound signal mode automatic recognition system device

Publications (1)

Publication Number Publication Date
CN110299135A true CN110299135A (en) 2019-10-01

Family

ID=68026322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810561739.8A Withdrawn CN110299135A (en) 2018-06-04 2018-06-04 Intelligent sound signal mode automatic recognition system device

Country Status (1)

Country Link
CN (1) CN110299135A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111176607A (en) * 2019-12-27 2020-05-19 国网山东省电力公司临沂供电公司 Voice interaction system and method based on power business
CN113572492A (en) * 2021-06-23 2021-10-29 力声通信股份有限公司 Novel communication equipment prevents falling digital intercom
CN113726705A (en) * 2021-11-03 2021-11-30 天津七一二移动通信有限公司 PDT interphone with integrated AIS coding and decoding capability
CN115662423A (en) * 2022-10-19 2023-01-31 博泰车联网(南京)有限公司 Voice control method, device, equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111176607A (en) * 2019-12-27 2020-05-19 国网山东省电力公司临沂供电公司 Voice interaction system and method based on power business
CN113572492A (en) * 2021-06-23 2021-10-29 力声通信股份有限公司 Novel communication equipment prevents falling digital intercom
CN113726705A (en) * 2021-11-03 2021-11-30 天津七一二移动通信有限公司 PDT interphone with integrated AIS coding and decoding capability
CN113726705B (en) * 2021-11-03 2022-01-07 天津七一二移动通信有限公司 PDT interphone with integrated AIS coding and decoding capability
CN115662423A (en) * 2022-10-19 2023-01-31 博泰车联网(南京)有限公司 Voice control method, device, equipment and storage medium
CN115662423B (en) * 2022-10-19 2023-11-03 博泰车联网(南京)有限公司 Voice control method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107808659A (en) Intelligent sound signal type recognition system device
CN110838286B (en) Model training method, language identification method, device and equipment
CN108701453B (en) Modular deep learning model
CN110299135A (en) Intelligent sound signal mode automatic recognition system device
CN105940407B (en) System and method for assessing the intensity of audio password
CN101261832B (en) Extraction and modeling method for Chinese speech sensibility information
US9454958B2 (en) Exploiting heterogeneous data in deep neural network-based speech recognition systems
CN107680582A (en) Acoustic training model method, audio recognition method, device, equipment and medium
CN107767869A (en) Method and apparatus for providing voice service
CN110136690A (en) Phoneme synthesizing method, device and computer readable storage medium
CN107221320A (en) Train method, device, equipment and the computer-readable storage medium of acoustic feature extraction model
CN111418009A (en) Personalized speaker verification system and method
CN107610707A (en) A kind of method for recognizing sound-groove and device
CN110853618A (en) Language identification method, model training method, device and equipment
CN110853617B (en) Model training method, language identification method, device and equipment
CN108281137A (en) A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN112259106A (en) Voiceprint recognition method and device, storage medium and computer equipment
CN108269133A (en) A kind of combination human bioequivalence and the intelligent advertisement push method and terminal of speech recognition
CN108364662B (en) Voice emotion recognition method and system based on paired identification tasks
CN106295717B (en) A kind of western musical instrument classification method based on rarefaction representation and machine learning
CN113066499B (en) Method and device for identifying identity of land-air conversation speaker
CN110161480A (en) Radar target identification method based on semi-supervised depth probabilistic model
CN110299132A (en) A kind of speech digit recognition methods and device
Sun et al. A novel convolutional neural network voiceprint recognition method based on improved pooling method and dropout idea
CN102141812A (en) Robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20191001

WW01 Invention patent application withdrawn after publication