CN108711429A - Electronic equipment and apparatus control method - Google Patents

Electronic equipment and apparatus control method Download PDF

Info

Publication number
CN108711429A
CN108711429A CN201810589643.2A CN201810589643A CN108711429A CN 108711429 A CN108711429 A CN 108711429A CN 201810589643 A CN201810589643 A CN 201810589643A CN 108711429 A CN108711429 A CN 108711429A
Authority
CN
China
Prior art keywords
audio signal
chip
dedicated
vocal print
electronic equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810589643.2A
Other languages
Chinese (zh)
Other versions
CN108711429B (en
Inventor
陈岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN201810589643.2A priority Critical patent/CN108711429B/en
Publication of CN108711429A publication Critical patent/CN108711429A/en
Priority to PCT/CN2019/085554 priority patent/WO2019233228A1/en
Application granted granted Critical
Publication of CN108711429B publication Critical patent/CN108711429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Abstract

The embodiment of the present application discloses a kind of electronic equipment and apparatus control method, wherein, the electronic equipment of the embodiment of the present application includes central processing unit and dedicated IC chip, first by the audio signal outside the lower dedicated IC chip acquisition of power consumption, operation is identified to the audio signal got, obtain recognition result, and it sends and indicates to identify the instruction information for operating completion to central processing unit, again by central processing unit according to instruction information, recognition result is extracted from dedicated IC chip, and executes the object run of corresponding recognition result.Thus, the audio identification task sharing of central processing unit to the lower dedicated IC chip of power consumption is completed, and corresponding object run is executed according to the recognition result of dedicated IC chip by central processing unit, in such a way that this application-specific integrated circuit cooperates with central processing unit to carry out to electronic equipment voice control, the power consumption that electronic equipment realizes voice control can be reduced.

Description

Electronic equipment and apparatus control method
Technical field
This application involves technical field of electronic equipment, and in particular to a kind of electronic equipment and apparatus control method.
Background technology
Currently, speech recognition technology is more and more extensive in the application of electronic equipment, using speech recognition technology, may be implemented To the voice control of electronic equipment, for example, user can say specific phonetic order, carrys out control electronics and take pictures, play Music etc..But in the related technology the voice control of electronic equipment is needed to be completed by the processor of electronic equipment, there are power consumptions Higher problem.
Invention content
The embodiment of the present application provides a kind of electronic equipment and apparatus control method, can reduce electronic equipment and realize voice The power consumption of control.
In a first aspect, the embodiment of the present application provides a kind of electronic equipment, which includes central processing unit and specially With IC chip, and the power consumption of the dedicated IC chip is less than the power consumption of the central processing unit, wherein
The dedicated IC chip is used to obtain external audio signal;
The dedicated IC chip is additionally operable to that operation is identified to the audio signal, obtains recognition result;
The dedicated IC chip is additionally operable to send the instruction information for indicating identification operation completion to the centre Manage device;
The central processing unit is used for according to the instruction information, and the identification is extracted from the dedicated IC chip As a result, and executing the object run for corresponding to the recognition result.
Second aspect, a kind of apparatus control method for providing of the embodiment of the present application are applied to electronic equipment, the electronics Equipment includes central processing unit and dedicated IC chip, and the power consumption of the dedicated IC chip is less than the center The power consumption of processor, the apparatus control method include:
Audio signal outside the dedicated IC chip acquisition;
The audio signal is identified in the dedicated IC chip, obtains recognition result;
The dedicated IC chip sends instruction information that identification is completed to the central processing unit;
The central processing unit extracts the identification knot according to the instruction information, from the dedicated IC chip Fruit, and execute the object run of the corresponding recognition result.
The electronic equipment of the embodiment of the present application includes central processing unit and dedicated IC chip, relatively low by power consumption first Dedicated IC chip obtain outside audio signal, operation is identified to the audio signal got, is identified As a result, and send instruction identification operation complete instruction information to central processing unit, then by central processing unit according to instruction information, Recognition result is extracted from dedicated IC chip, and executes the object run of corresponding recognition result.As a result, by central processing unit Audio identification task sharing to the lower dedicated IC chip of power consumption complete, and by central processing unit according to special integrated The recognition result of circuit chip executes corresponding object run, cooperates with central processing unit to carry out pair by this application-specific integrated circuit The mode of electronic equipment voice control can reduce the power consumption that electronic equipment realizes voice control.
Description of the drawings
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present application, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is the first structure schematic diagram of electronic equipment provided by the embodiments of the present application.
Fig. 2 is the second structural schematic diagram of electronic equipment provided by the embodiments of the present application.
Fig. 3 is the third structural schematic diagram of electronic equipment provided by the embodiments of the present application.
Fig. 4 is the 4th structural schematic diagram of electronic equipment provided by the embodiments of the present application.
Fig. 5 is the flow diagram of apparatus control method provided by the embodiments of the present application.
Fig. 6 is that dedicated IC chip illustrates the refinement flow that audio signal is identified in the embodiment of the present application Figure.
Fig. 7 is the refinement flow diagram of the embodiment of the present application central processing unit performance objective operation.
Specific implementation mode
It should be appreciated that referenced herein " embodiment " is it is meant that special characteristic, structure or the spy described in conjunction with the embodiments Property may be embodied at least one embodiment of the application.The phrase occurs in each position in the description might not be equal Refer to identical embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art Explicitly and implicitly understand, embodiment described herein can be combined with other embodiments.
The embodiment of the present application provides a kind of electronic equipment, please refers to Fig. 1, and electronic equipment 100 includes application-specific integrated circuit core Piece 101 and central processing unit 102, and the power consumption of dedicated IC chip 101 is less than the power consumption of central processing unit 102, wherein
Dedicated IC chip 101 is used to obtain external audio signal, and the audio signal got is identified Operation, obtains recognition result, and sends and indicate to identify the instruction information for operating completion to central processing unit 102.
It should be noted that the dedicated IC chip 101 in the embodiment of the present application be for the purpose of audio identification and The application-specific integrated circuit of design has higher audio identification efficiency and lower compared to general central processing unit 102 Power consumption.Dedicated IC chip 101 establishes data communication connection with central processing unit 102 by communication bus
Wherein, dedicated IC chip 101 can obtain external audio signal in a number of different ways, than Such as, in the non-External microphone of electronic equipment, dedicated IC chip 101 can pass through the microphone built in electronic equipment The sound that (Fig. 1 is not shown) sends out external enunciator is acquired, and obtains external audio signal;For another example, it is set in electronics It is standby when being circumscribed with microphone, dedicated IC chip 101 can by the external microphone of electronic equipment to external voice into Row acquisition, obtains external audio signal.
Wherein, when audio signal of the dedicated IC chip 101 outside being acquired by microphone, if microphone is mould Quasi- microphone, will collect the audio signal of simulation, and the needs of dedicated IC chip 101 adopt the audio signal of simulation The audio signal of simulation is converted to digitized audio signal by sample, for example, can be sampled with the sample frequency of 16KHz;This Outside, if microphone is digital microphone, dedicated IC chip 101 will directly be collected digitized by digital microphone Audio signal, without being converted.
After getting external audio signal, dedicated IC chip 101 according to preconfigured recognition mode, Operation is identified to the audio signal got, obtains recognition result.
For example, when the recognition mode of dedicated IC chip 101 is configured as gender identification, application-specific integrated circuit core For piece 101 when the audio signal got is identified, will be extracted from audio signal being capable of representational another characteristic letter Breath, and according to the characteristic information extracted, is identified the gender of the enunciator of audio signal, obtain the enunciator be man, Or it is woman's recognition result.
For another example, the recognition mode of dedicated IC chip 101 be configured as environmental form (subway carriage scene, Public transport compartment scene, office scenarios etc.) identification when, dedicated IC chip 101 is carried out to the audio signal that gets When identification, the characteristic information that can characterize environment scene will be extracted from audio signal, and according to the characteristic information extracted The environment scene being presently in is identified, the recognition result for describing current environment scene type is obtained.
It completes to operate the identification of audio signal, and after obtaining recognition result, dedicated IC chip 101 is sent The instruction information of identification operation completion is indicated to central processing unit 102, figuratively, the effect of the instruction information is in informing Central processor 102, dedicated IC chip 101 have completed the operation of the identification to audio signal, can be from special integrated electricity Road chip 101 extracts recognition result.Wherein, aforementioned instruction information can be sent in the form of interrupt signal signal.
Central processing unit 102 is used to, according to the instruction information received, aforementioned knowledge is extracted from dedicated IC chip 101 Not as a result, and executing the object run of corresponding aforementioned identification result.
Correspondingly, central processing unit 102 is after receiving the instruction information from dedicated IC chip 101, root According to the instruction information, audio signal is identified in extraction dedicated IC chip 101 from dedicated IC chip 101 Obtained recognition result.
After extracting the recognition result of audio signal, central processing unit 102 further executes the corresponding recognition result Object run.
For example, when dedicated IC chip 101 is configured as gender identification, if extracting the knowledge of " enunciator is man " Not as a result, the subject pattern of operating system is then switched to butch subject pattern, if extracting the knowledge of " enunciator is female " Not as a result, the subject pattern of operating system is then switched to subject pattern faggoty.
For another example, when dedicated IC chip 101 is configured as environmental form identification, if extracting " office The prompt modes of operating system are then switched to silent mode by the recognition result of scape ", if extracting the knowledge of " public transport compartment scene " Not as a result, the prompt modes of operating system are then switched to vibration+bell mode etc..
From the foregoing, it will be observed that the electronic equipment of the embodiment of the present application includes central processing unit 102 and dedicated IC chip 101, external audio signal is obtained by the lower dedicated IC chip of power consumption 101 first, to the audio signal got Operation is identified, obtains recognition result, and sends instruction identification and operates the instruction information completed to central processing unit 102, then By central processing unit 102 according to instruction information, recognition result is extracted from dedicated IC chip 101, and execute corresponding identification As a result object run.As a result, by the audio identification task sharing of central processing unit 102 to the lower application-specific integrated circuit of power consumption Chip 101 is completed, and executes corresponding target according to the recognition result of dedicated IC chip 101 by central processing unit 102 Operation can drop in such a way that this application-specific integrated circuit cooperates with central processing unit 102 to carry out to electronic equipment voice control Low electronic equipment realizes the power consumption of voice control.
In one embodiment, Fig. 2 is please referred to, dedicated IC chip 101 includes micro-control unit 1011, pretreatment Unit 1012 and algorithm unit 1013, wherein
Pretreatment unit 1012 is used for the control according to micro-control unit 1011, is carried using mel-frequency cepstrum coefficient algorithm Take the mel-frequency cepstrum coefficient of audio signal;
Algorithm unit 1013 is used for the control according to micro-control unit 1011, using deep neural network algorithm to Meier frequency Rate cepstrum coefficient carries out keyword identification, obtains the confidence level of candidate keywords and candidate keywords.
Wherein, micro-control unit 1011 gets external audio signal by microphone first, for example, in electronic equipment When non-External microphone, micro-control unit 1011 can be by the microphone (Fig. 2 is not shown) built in electronic equipment to external sound Sound is acquired, and obtains external audio signal;For another example, when electronic equipment is circumscribed with microphone, micro-control unit 1011 External voice can be acquired by electronic equipment external microphone, obtain external audio signal.
Wherein, when audio signal of the micro-control unit 1011 outside being acquired by microphone, if microphone is simulation wheat Gram wind, will collect the audio signal of simulation, and the needs of micro-control unit 1011 sample the audio signal of simulation, will simulate Audio signal be converted to digitized audio signal, for example, can be sampled with the sample frequency of 16KHz;If in addition, Mike Wind is digital microphone, and micro-control unit 1011 will directly be collected digitized audio signal by digital microphone, and will be not necessarily to It is converted.
After getting external audio signal, micro-control unit 1011 generates the first control information, this first is controlled Information processed is sent to pretreatment unit 1012.
Pretreatment unit 1012 receive from micro-control unit 1011 first control information after, according to this first Information is controlled, the mel-frequency cepstrum coefficient of audio signal is extracted using mel-frequency cepstrum coefficient algorithm.Extracting audio After the mel-frequency cepstrum coefficient of signal, pretreatment unit 1012 sends the first feedback information to micro-control unit 1011.
Micro-control unit 1011 determines pretreatment after receiving the first feedback information from pretreatment unit 1012 Unit 1012 has currently extracted the mel-frequency cepstrum coefficient of audio signal, generates the second control information at this time,
Algorithm unit 1013 uses built-in depth after receiving the second control information from micro-control unit 1011 Neural network algorithm is spent, carrying out keyword identification to aforementioned mel-frequency cepstrum coefficient, (keyword identification that is to say detection audio Whether there is pre-defined word in the corresponding voice of signal), obtain the confidence level of candidate keywords and candidate keywords. After completion keyword identifies and identifies to obtain the confidence level of candidate keywords and candidate keywords, algorithm unit 1013 is sent out Send the second feedback information to micro-control unit 1011.
Micro-control unit 1011 determines algorithm unit after receiving the second feedback information from algorithm unit 1013 1013 have completed keyword identification, and algorithm unit 1013 is identified to the confidence of obtained candidate keywords and candidate keywords Audio signal is identified as this recognition result of operation in degree.
In one embodiment, Fig. 3 is please referred to, dedicated IC chip 101 further includes memory 1014, is obtained for storing The audio signal got identifies that candidate keywords, confidence level and pretreatment unit 1012 and algorithm unit 1013 are executing The intermediate data generated in the process.
For example, the audio signal got by microphone is stored in memory 1014 by micro-control unit 1011;Pre- place Control of the unit 1012 according to micro-control unit 1011 is managed, is extracted in memory 1014 and is stored using mel-frequency cepstrum coefficient algorithm Audio signal mel-frequency cepstrum coefficient, and the mel-frequency cepstrum coefficient extracted is stored in memory 1014;It calculates Method unit 1013 is according to the control of micro-control unit 1011, using built-in deep neural network algorithm, to being deposited in memory 1014 The mel-frequency cepstrum coefficient of storage carries out keyword identification, obtains the confidence level of candidate keywords and candidate keywords, will Confidence level to candidate keywords and candidate keywords is stored in memory 1014.
In one embodiment, Fig. 4 is please referred to, dedicated IC chip 101 further includes cache memory 1015, For to being stored in the data of memory 1014, the data taken out from memory 1014 cache.
Wherein, its memory space is smaller compared to memory 1014 for cache memory 1015, but speed higher, passes through height Fast buffer storage 1015 can promote the treatment effeciency of pretreatment unit 1012 and algorithm unit 1013.
For example, pretreatment unit 1012 works as pretreatment when carrying out the extraction of mel-frequency cepstrum coefficient to audio signal Unit 1012 will wait for the period certain time when accessing data directly from memory 1014, and cache memory 1015 then may be used To preserve a part of data that pretreatment unit 1012 is just used or recycles, if pretreatment unit 1012 needs to make again It can be directly invoked from cache memory 1015 with when the partial data, avoid repeated access data in this way, reduced The stand-by period of pretreatment unit 1012, to improving its treatment effeciency.
In one embodiment, pretreatment unit 1012 is extracting audio signal using mel-frequency cepstrum coefficient algorithm Before mel-frequency cepstrum coefficient, also audio signal is pre-processed, after completing to the pretreatment of audio signal, is used Mel-frequency cepstrum coefficient algorithm extracts the mel-frequency cepstrum coefficient of audio signal.
Wherein, pretreatment unit 1012 is after receiving the first control information from micro-control unit 1011, first To carrying out the pretreatments such as preemphasis and adding window to audio signal.
Wherein, preemphasis that is to say the energy for increasing audio signal high frequency section.It is past for the frequency spectrum of audio signal It is higher than the energy of high frequency section toward the energy of low frequency part, often passes through 10 times of Hz, spectrum energy will decay 20dB, and due to The influence of microphone circuit background noise when acquiring audio signal, can also increase the energy of low frequency part, to make high frequency section Energy and low frequency part energy have similar amplitude, need pre-emphasis to collect the high-frequency energy of audio signal.
Since audio signal is usually non-stationary signal, statistical property is not changeless, but extremely short at one section Time in, it is believed that stable when signal, here it is adding windows.Window is described by three parameters:Window grow (unit millisecond), partially Shifting and shape.The audio signal of each adding window is called a frame, and the millisecond number of each frame is called frame length, adjacent two frames left margin Distance makes frame move.In the embodiment of the present application, edge-smoothing can be used to drop to 0 Hamming window progress windowing process.
After completing to the pretreatment of audio signal, pretreatment unit 1012 mel-frequency cepstrum coefficient can be used to calculate Method extracts the mel-frequency cepstrum coefficient of audio signal.Wherein, pretreatment unit 1012 extracts the mistake of mel-frequency cepstrum coefficient Journey is substantially:Using the nonlinear characteristic of human auditory system, the frequency spectrum of audio signal is converted to based on the non-linear of mel-frequency Thus frequency spectrum, reconvert to cepstrum domain obtain mel-frequency cepstrum coefficient.
In one embodiment, pretreatment unit 1012 is additionally operable to before being pre-processed to audio signal, extracts sound The vocal print feature of frequency signal, judges whether the vocal print feature matches with default vocal print feature, and in the vocal print feature and default sound When line characteristic matching, audio signal is pre-processed.
It should be noted that in real life, the characteristics of sound when everyone speaks has oneself, known people it Between, can listening and mutually it is discernable.The characteristics of this sound is exactly vocal print feature, vocal print feature mainly by two because Element determine, first be the operatic tunes size, specifically include throat, nasal cavity and oral cavity etc., shape, size and the position of these organs Determine the size of vocal chord tension and the range of sound frequency.Therefore different people is although if same, but the frequency of sound Rate distribution is different, sound it is droning have it is loud and clear.
The factor of second decision vocal print feature is mode that phonatory organ is manipulated, phonatory organ include lip, tooth, tongue, Soft palate and palate muscle etc. interact between them and just will produce clearly voice.And the cooperation mode between them is that people is logical Later incidental learning is arrived in the exchanging of day and people around.People is during study is spoken, by simulating surrounding different people Tongue will gradually form the vocal print feature of oneself.
Wherein, pretreatment unit 1012 is after receiving the first control information from micro-control unit 1011, first Extract the vocal print feature of audio signal.
After getting the vocal print feature of voice messaging, pretreatment unit 1012 is further by the vocal print got spy Sign is compared with default vocal print feature, to judge whether the vocal print feature matches with default vocal print feature.Wherein, it presets Vocal print feature can be the advance typing of owner vocal print feature, judge obtain audio signal vocal print feature whether with default sound Line characteristic matching that is to say and judge whether the enunciator of audio signal is owner.
When the vocal print feature got is matched with default vocal print feature, pretreatment unit 1012 determines the hair of audio signal Sound person is owner, further pre-processes to audio signal, and extract mel-frequency cepstrum coefficient, specifically can refer at this time Related description provided above, details are not described herein again.
In one embodiment, pretreatment unit 1012 is additionally operable to obtain the phase of aforementioned vocal print feature and default vocal print feature Like degree, judge whether the similarity got is greater than or equal to the first default similarity, and be more than in the similarity got or When equal to the first default similarity, determine that the vocal print feature got is matched with default vocal print feature.
Wherein, pretreatment unit 1012, can when judging whether the vocal print feature got matches with default vocal print feature To obtain similarity of the vocal print feature (vocal print feature i.e. accessed by the aforementioned audio signal) with default vocal print feature, and Judge whether the similarity that gets is greater than or equal to the first default similarity and (is configured according to actual needs, for example, can be with It is set as 95%).If the similarity got is greater than or equal to the first default similarity, it is determined that the vocal print feature that gets with Default vocal print feature matching;If the similarity got, which is less than, is less than similarity, it is determined that the vocal print feature that gets with it is default Vocal print feature mismatches.
In addition, when the vocal print feature got is mismatched with default vocal print feature, pretreatment unit 1012 determines currently The enunciator of audio signal is not owner, sends third feedback information to micro-control unit 1011.
After receiving the third feedback information from pretreatment unit 1012, deletion is got micro-control unit 1011 Audio signal, and continue to obtain external audio signal, when getting the audio signal of owner, just to the audio signal Carry out pretreatment and the extraction of mel-frequency cepstrum coefficient, wherein for how to carry out pretreatment and mel-frequency cepstrum The extraction of coefficient can refer to the associated description of above example, and details are not described herein again.
As a result, it is this authentication is carried out to enunciator based on vocal print feature by way of, sound that only owner is sent out Frequency signal is responded, and can be avoided the operation for executing non-owner's wish, can be promoted the usage experience of owner.
In one embodiment, pretreatment unit 1012 is additionally operable to be less than the first default similarity in the similarity got And when more than or equal to the second default similarity, obtain current location information, according to the location information judge it is current whether position In within the scope of predeterminated position, and when being currently located within the scope of predeterminated position, determine aforementioned vocal print feature and default vocal print feature Matching.
It should be noted that since vocal print feature and the physiological characteristic of human body are closely related, in daily life, if with Family is caught a cold if inflammation, and sound will become hoarse, and vocal print feature will also change therewith.In this case, even if obtaining The audio signal got is said by owner, and pretreatment unit 1012 also will be incapable of recognizing that.In addition, leading to pre- place there is also a variety of The case where reason 1012 method of unit identifies owner, details are not described herein again.
The case where be likely to occur for solution, None- identified goes out owner, pretreatment unit 1012 are completed to vocal print feature After the judgement of similarity, if the similarity of the vocal print feature got and default vocal print feature is less than the first default similarity, Then further judge whether (the second default similarity is configured with less than the vocal print feature more than or equal to the second default similarity First default similarity specifically can take desired value according to actual needs by those skilled in the art, for example, default similar first When degree is arranged to 95%, the second default similarity can be set 75%) to.
It is yes in judging result, it is pre- that is to say that the similarity of the vocal print feature and default vocal print feature got is less than first If similarity and when more than or equal to the second default similarity, pretreatment unit 1012 further gets current position letter Breath.Wherein, pretreatment unit 1012 can send position acquisition the positioning module to electronic equipment is asked (satellite to may be used The different location technology of position technology or base station location technology etc. gets current location information), instruction positioning module returns Return current location information.
After getting current location information, pretreatment unit 1012 according to the location information judge it is current whether position In within the scope of predeterminated position.Wherein, predeterminated position range is configurable to the common position range of owner, such as family and company Deng.
When being currently located within the scope of predeterminated position, pretreatment unit 1012 determines the vocal print feature got and default sound The enunciator of audio signal is identified as owner by line characteristic matching.
In one embodiment, central processing unit 102 is additionally operable to reach default confidence level in the confidence level of candidate keywords When, using candidate keywords as the target keyword of audio signal, according to the correspondence of preset keyword and predetermined registration operation, The predetermined registration operation of corresponding target keyword is determined as object run, and executes the object run.
Wherein, central processing unit 102 is in the instruction information according to dedicated IC chip 101, from application-specific integrated circuit After chip 101 extracts " confidence level of candidate keywords and candidate keywords " identified, candidate key is first determined whether Whether the confidence level of word reaches default confidence level (can take desired value according to actual needs, for example, can be with by those skilled in the art It is set as 90%)
Judgement to confidence level is completed, and when the confidence level of candidate keywords reaches default confidence level, central processing unit 102 using candidate keywords as the target keyword of audio signal.
Later, central processing unit 102 will correspond to target critical according to the correspondence of preset keyword and predetermined registration operation The predetermined registration operation of word is determined as object run.Wherein, keyword and the correspondence of predetermined registration operation can carry out according to actual needs Setting, for example, keyword can be arranged, " the corresponding predetermined registration operation of little Ou, little Ou " is " wake operation system ", in this way, working as target Keyword is that " when little Ou, little Ou ", if operating system is currently at dormant state, central processing unit 102 is by wake operation system.
Further, the embodiment of the present application also provides a kind of apparatus control methods, and the apparatus control method is by the application The electronic equipment that embodiment provides executes, which includes dedicated IC chip 101 and central processing unit 102, and The power consumption of dedicated IC chip 101 is less than the power consumption of central processing unit 102, please refers to Fig. 5, the apparatus control method packet It includes:
101, dedicated IC chip 101 obtains external audio signal.
It should be noted that the dedicated IC chip 101 in the embodiment of the present application be for the purpose of audio identification and The application-specific integrated circuit of design has higher audio identification efficiency and lower compared to general central processing unit 102 Power consumption.Dedicated IC chip 101 establishes data communication connection with central processing unit 102 by communication bus
Wherein, dedicated IC chip 101 can obtain external audio signal in a number of different ways, than Such as, in the non-External microphone of electronic equipment, dedicated IC chip 101 can pass through the microphone built in electronic equipment The sound that (Fig. 1 is not shown) sends out external enunciator is acquired, and obtains external audio signal;For another example, it is set in electronics It is standby when being circumscribed with microphone, dedicated IC chip 101 can by the external microphone of electronic equipment to external voice into Row acquisition, obtains external audio signal.
Wherein, when audio signal of the dedicated IC chip 101 outside being acquired by microphone, if microphone is mould Quasi- microphone, will collect the audio signal of simulation, and the needs of dedicated IC chip 101 adopt the audio signal of simulation The audio signal of simulation is converted to digitized audio signal by sample, for example, can be sampled with the sample frequency of 16KHz;This Outside, if microphone is digital microphone, dedicated IC chip 101 will directly be collected digitized by digital microphone Audio signal, without being converted.
102, operation is identified to the audio signal got in dedicated IC chip 101, obtains recognition result.
After getting external audio signal, dedicated IC chip 101 according to preconfigured recognition mode, Operation is identified to the audio signal got, obtains recognition result.
For example, when the recognition mode of dedicated IC chip 101 is configured as gender identification, application-specific integrated circuit core For piece 101 when the audio signal got is identified, will be extracted from audio signal being capable of representational another characteristic letter Breath, and according to the characteristic information extracted, is identified the gender of the enunciator of audio signal, obtain the enunciator be man, Or it is woman's recognition result.
For another example, the recognition mode of dedicated IC chip 101 be configured as environmental form (subway carriage scene, Public transport compartment scene, office scenarios etc.) identification when, dedicated IC chip 101 is carried out to the audio signal that gets When identification, the characteristic information that can characterize environment scene will be extracted from audio signal, and according to the characteristic information extracted The environment scene being presently in is identified, the recognition result for describing current environment scene type is obtained.
103, dedicated IC chip 101, which is sent, indicates to identify the instruction information for operating completion to central processing unit 102.
It completes to operate the identification of audio signal, and after obtaining recognition result, dedicated IC chip 101 is sent The instruction information of identification operation completion is indicated to central processing unit 102, figuratively, the effect of the instruction information is in informing Central processor 102, dedicated IC chip 101 have completed the operation of the identification to audio signal, can be from special integrated electricity Road chip 101 extracts recognition result.Wherein, aforementioned instruction information can be sent in the form of interrupt signal signal.
104, central processing unit 102 extracts aforementioned knowledge according to the instruction information received from dedicated IC chip 101 Not as a result, and executing the object run of corresponding aforementioned identification result.
Correspondingly, central processing unit 102 is after receiving the instruction information from dedicated IC chip 101, root According to the instruction information, audio signal is identified in extraction dedicated IC chip 101 from dedicated IC chip 101 Obtained recognition result.
After extracting the recognition result of audio signal, central processing unit 102 further executes the corresponding recognition result Object run.
For example, when dedicated IC chip 101 is configured as gender identification, if extracting the knowledge of " enunciator is man " Not as a result, the subject pattern of operating system is then switched to butch subject pattern, if extracting the knowledge of " enunciator is female " Not as a result, the subject pattern of operating system is then switched to subject pattern faggoty.
For another example, when dedicated IC chip 101 is configured as environmental form identification, if extracting " office The prompt modes of operating system are then switched to silent mode by the recognition result of scape ", if extracting the knowledge of " public transport compartment scene " Not as a result, the prompt modes of operating system are then switched to vibration+bell mode etc..
From the foregoing, it will be observed that the electronic equipment in the embodiment of the present application, first by the lower dedicated IC chip of power consumption 101 Audio signal outside obtaining, is identified operation to the audio signal got, obtains recognition result, and sends instruction identification The instruction information completed is operated to central processing unit 102, then by central processing unit 102 according to instruction information, from special integrated electricity Road chip 101 extracts recognition result, and executes the object run of corresponding recognition result.As a result, by the audio of central processing unit 102 Identification mission, which is shared to the lower dedicated IC chip 101 of power consumption, to be completed, and by central processing unit 102 according to special integrated The recognition result of circuit chip 101 executes corresponding object run, and central processing unit 102 is cooperateed with by this application-specific integrated circuit The mode to electronic equipment voice control is carried out, the power consumption that electronic equipment realizes voice control can be reduced.
In one embodiment, Fig. 2 is please referred to, dedicated IC chip 101 includes micro-control unit 1011, pretreatment Unit 1012 and algorithm unit 1013, please refer to Fig. 6, and dedicated IC chip 101 carries out the audio signal got The step of identifying operation, obtaining recognition result, including:
1021, pretreatment unit 1012 uses mel-frequency cepstrum coefficient algorithm according to the control of micro-control unit 1011 Extract the mel-frequency cepstrum coefficient of audio signal;
1022, algorithm unit 1013 is according to the control of micro-control unit 1011, using deep neural network algorithm to Meier Frequency cepstral coefficient carries out keyword identification, obtains the confidence level of candidate keywords and candidate keywords.
Micro-control unit 1011 gets external audio signal by microphone first, for example, not outer in electronic equipment When connecing microphone, micro-control unit 1011 can by the microphone (Fig. 2 is not shown) built in electronic equipment to external voice into Row acquisition, obtains external audio signal;For another example, when electronic equipment is circumscribed with microphone, micro-control unit 1011 can be with External voice is acquired by electronic equipment external microphone, obtains external audio signal.
Wherein, when audio signal of the micro-control unit 1011 outside being acquired by microphone, if microphone is simulation wheat Gram wind, will collect the audio signal of simulation, and the needs of micro-control unit 1011 sample the audio signal of simulation, will simulate Audio signal be converted to digitized audio signal, for example, can be sampled with the sample frequency of 16KHz;If in addition, Mike Wind is digital microphone, and micro-control unit 1011 will directly be collected digitized audio signal by digital microphone, and will be not necessarily to It is converted.
After getting external audio signal, micro-control unit 1011 generates the first control information, this first is controlled Information processed is sent to pretreatment unit 1012.
Pretreatment unit 1012 receive from micro-control unit 1011 first control information after, according to this first Information is controlled, the mel-frequency cepstrum coefficient of audio signal is extracted using mel-frequency cepstrum coefficient algorithm.Extracting audio After the mel-frequency cepstrum coefficient of signal, pretreatment unit 1012 sends the first feedback information to micro-control unit 1011.
Micro-control unit 1011 determines pretreatment after receiving the first feedback information from pretreatment unit 1012 Unit 1012 has currently extracted the mel-frequency cepstrum coefficient of audio signal, generates the second control information at this time,
Algorithm unit 1013 uses built-in depth after receiving the second control information from micro-control unit 1011 Neural network algorithm is spent, carrying out keyword identification to aforementioned mel-frequency cepstrum coefficient, (keyword identification that is to say detection audio Whether there is pre-defined word in the corresponding voice of signal), obtain the confidence level of candidate keywords and candidate keywords. After completion keyword identifies and identifies to obtain the confidence level of candidate keywords and candidate keywords, algorithm unit 1013 is sent out Send the second feedback information to micro-control unit 1011.
Micro-control unit 1011 determines algorithm unit after receiving the second feedback information from algorithm unit 1013 1013 have completed keyword identification, and algorithm unit 1013 is identified to the confidence of obtained candidate keywords and candidate keywords Audio signal is identified as this recognition result of operation in degree.
In addition, please referring to Fig. 3, dedicated IC chip 101 further includes memory 1014, and memory 1014 can be used for storing The audio signal that gets identifies that candidate keywords, confidence level and pretreatment unit 1012 and algorithm unit 1013 are being held The intermediate data that row generates in the process.
For example, the audio signal got by microphone is stored in memory 1014 by micro-control unit 1011;Pre- place Control of the unit 1012 according to micro-control unit 1011 is managed, is extracted in memory 1014 and is stored using mel-frequency cepstrum coefficient algorithm Audio signal mel-frequency cepstrum coefficient, and the mel-frequency cepstrum coefficient extracted is stored in memory 1014;It calculates Method unit 1013 is according to the control of micro-control unit 1011, using built-in deep neural network algorithm, to being deposited in memory 1014 The mel-frequency cepstrum coefficient of storage carries out keyword identification, obtains the confidence level of candidate keywords and candidate keywords, will Confidence level to candidate keywords and candidate keywords is stored in memory 1014.
Fig. 4 is please referred to, dedicated IC chip 101 further includes cache memory 1015, be can be used for deposit The data of memory 1014, the data taken out from memory 1014 are cached.
Wherein, its memory space is smaller compared to memory 1014 for cache memory 1015, but speed higher, passes through height Fast buffer storage 1015 can promote the treatment effeciency of pretreatment unit 1012 and algorithm unit 1013.
For example, pretreatment unit 1012 works as pretreatment when carrying out the extraction of mel-frequency cepstrum coefficient to audio signal Unit 1012 will wait for the period certain time when accessing data directly from memory 1014, and cache memory 1015 then may be used To preserve a part of data that pretreatment unit 1012 is just used or recycles, if pretreatment unit 1012 needs to make again It can be directly invoked from cache memory 1015 with when the partial data, avoid repeated access data in this way, reduced The stand-by period of pretreatment unit 1012, to improving its treatment effeciency.
In one embodiment, Fig. 7 is please referred to, central processing unit 102 executes the object run of corresponding aforementioned identification result The step of, including:
1041, central processing unit 102 makees candidate keywords when the confidence level of candidate keywords reaches default confidence level For the target keyword of audio signal;
1042, central processing unit 102 will correspond to target critical according to the correspondence of preset keyword and predetermined registration operation The predetermined registration operation of word is determined as object run, and executes the object run.
Wherein, central processing unit 102 is in the instruction information according to dedicated IC chip 101, from application-specific integrated circuit After chip 101 extracts " confidence level of candidate keywords and candidate keywords " identified, candidate key is first determined whether Whether the confidence level of word reaches default confidence level (can take desired value according to actual needs, for example, can be with by those skilled in the art It is set as 90%)
Judgement to confidence level is completed, and when the confidence level of candidate keywords reaches default confidence level, central processing unit 102 using candidate keywords as the target keyword of audio signal.
Later, central processing unit 102 will correspond to target critical according to the correspondence of preset keyword and predetermined registration operation The predetermined registration operation of word is determined as object run.Wherein, keyword and the correspondence of predetermined registration operation can carry out according to actual needs Setting, for example, keyword can be arranged, " the corresponding predetermined registration operation of little Ou, little Ou " is " wake operation system ", in this way, working as target Keyword is that " when little Ou, little Ou ", if operating system is currently at dormant state, central processing unit 102 is by wake operation system.
In one embodiment, pretreatment unit 1012 is extracting the audio letter using mel-frequency cepstrum coefficient algorithm Number mel-frequency cepstrum coefficient the step of before, further include:
(1) pretreatment unit 1012 pre-processes audio signal;
(2) pretreatment unit 1012 is calculated after completing to the pretreatment of audio signal using mel-frequency cepstrum coefficient Method extracts the mel-frequency cepstrum coefficient of audio signal.
Wherein, pretreatment unit 1012 is after receiving the first control information from micro-control unit 1011, first To carrying out the pretreatments such as preemphasis and adding window to audio signal.
Wherein, preemphasis that is to say the energy for increasing audio signal high frequency section.It is past for the frequency spectrum of audio signal It is higher than the energy of high frequency section toward the energy of low frequency part, often passes through 10 times of Hz, spectrum energy will decay 20dB, and due to The influence of microphone circuit background noise when acquiring audio signal, can also increase the energy of low frequency part, to make high frequency section Energy and low frequency part energy have similar amplitude, need pre-emphasis to collect the high-frequency energy of audio signal.
Since audio signal is usually non-stationary signal, statistical property is not changeless, but extremely short at one section Time in, it is believed that stable when signal, here it is adding windows.Window is described by three parameters:Window grow (unit millisecond), partially Shifting and shape.The audio signal of each adding window is called a frame, and the millisecond number of each frame is called frame length, adjacent two frames left margin Distance makes frame move.In the embodiment of the present application, edge-smoothing can be used to drop to 0 Hamming window progress windowing process.
After completing to the pretreatment of audio signal, pretreatment unit 1012 mel-frequency cepstrum coefficient can be used to calculate Method extracts the mel-frequency cepstrum coefficient of audio signal.Wherein, pretreatment unit 1012 extracts the mistake of mel-frequency cepstrum coefficient Journey is substantially:Using the nonlinear characteristic of human auditory system, the frequency spectrum of audio signal is converted to based on the non-linear of mel-frequency Thus frequency spectrum, reconvert to cepstrum domain obtain mel-frequency cepstrum coefficient.
In one embodiment, before pretreatment unit 1012 carries out pretreated step to audio signal, further include:
(1) pretreatment unit 1012 extracts the vocal print feature of audio signal;
(2) pretreatment unit 1012 judges whether the vocal print feature extracted matches with default vocal print feature;
(3) pretreatment unit 1012 believes aforementioned audio when the vocal print feature extracted is matched with default vocal print feature It number is pre-processed.
It should be noted that in real life, the characteristics of sound when everyone speaks has oneself, known people it Between, can listening and mutually it is discernable.The characteristics of this sound is exactly vocal print feature, vocal print feature mainly by two because Element determine, first be the operatic tunes size, specifically include throat, nasal cavity and oral cavity etc., shape, size and the position of these organs Determine the size of vocal chord tension and the range of sound frequency.Therefore different people is although if same, but the frequency of sound Rate distribution is different, sound it is droning have it is loud and clear.
The factor of second decision vocal print feature is mode that phonatory organ is manipulated, phonatory organ include lip, tooth, tongue, Soft palate and palate muscle etc. interact between them and just will produce clearly voice.And the cooperation mode between them is that people is logical Later incidental learning is arrived in the exchanging of day and people around.People is during study is spoken, by simulating surrounding different people Tongue will gradually form the vocal print feature of oneself.
Wherein, pretreatment unit 1012 is after receiving the first control information from micro-control unit 1011, first Extract the vocal print feature of audio signal.
After getting the vocal print feature of voice messaging, pretreatment unit 1012 is further by the vocal print got spy Sign is compared with default vocal print feature, to judge whether the vocal print feature matches with default vocal print feature.Wherein, it presets Vocal print feature can be the advance typing of owner vocal print feature, judge obtain audio signal vocal print feature whether with default sound Line characteristic matching that is to say and judge whether the enunciator of audio signal is owner.
When the vocal print feature got is matched with default vocal print feature, pretreatment unit 1012 determines the hair of audio signal Sound person is owner, further pre-processes to audio signal, and extract mel-frequency cepstrum coefficient, specifically can refer at this time Related description provided above, details are not described herein again.
In one embodiment, pretreatment unit 1012 judge the vocal print feature that extracts whether with default vocal print feature With the step of, including:
(1) pretreatment unit 1012 obtains the similarity of aforementioned vocal print feature and default vocal print feature;
(2) pretreatment unit 1012 judges whether the similarity got is greater than or equal to the first default similarity;
(3) pretreatment unit 1012 is determined and is obtained when the similarity got is greater than or equal to the first default similarity The vocal print feature arrived is matched with default vocal print feature.
Wherein, pretreatment unit 1012, can when judging whether the vocal print feature got matches with default vocal print feature To obtain similarity of the vocal print feature (vocal print feature i.e. accessed by the aforementioned audio signal) with default vocal print feature, and Judge whether the similarity that gets is greater than or equal to the first default similarity and (is configured according to actual needs, for example, can be with It is set as 95%).If the similarity got is greater than or equal to the first default similarity, it is determined that the vocal print feature that gets with Default vocal print feature matching;If the similarity got, which is less than, is less than similarity, it is determined that the vocal print feature that gets with it is default Vocal print feature mismatches.
In addition, when the vocal print feature got is mismatched with default vocal print feature, pretreatment unit 1012 determines currently The enunciator of audio signal is not owner, sends third feedback information to micro-control unit 1011.
After receiving the third feedback information from pretreatment unit 1012, deletion is got micro-control unit 1011 Audio signal, and continue to obtain external audio signal, when getting the audio signal of owner, just to the audio signal Carry out pretreatment and the extraction of mel-frequency cepstrum coefficient, wherein for how to carry out pretreatment and mel-frequency cepstrum The extraction of coefficient can refer to the associated description of above example, and details are not described herein again.
As a result, it is this authentication is carried out to enunciator based on vocal print feature by way of, sound that only owner is sent out Frequency signal is responded, and can be avoided the operation for executing non-owner's wish, can be promoted the usage experience of owner.
In one embodiment, it is default to judge whether the similarity got is greater than or equal to first for pretreatment unit 1012 After the step of similarity, further include:
(1) pretreatment unit 1012 is less than the first default similarity in aforementioned similarity and is greater than or equal to the second default phase When seemingly spending, current location information is obtained;
(2) current whether be located within the scope of predeterminated position judged according to the location information got for pretreatment unit 1012;
(3) pretreatment unit 1012 is when being currently located within the scope of predeterminated position, determine aforementioned vocal print feature with it is described pre- If vocal print feature matches.
It should be noted that since vocal print feature and the physiological characteristic of human body are closely related, in daily life, if with Family is caught a cold if inflammation, and sound will become hoarse, and vocal print feature will also change therewith.In this case, even if obtaining The audio signal got is said by owner, and pretreatment unit 1012 also will be incapable of recognizing that.In addition, leading to pre- place there is also a variety of The case where reason 1012 method of unit identifies owner, details are not described herein again.
The case where be likely to occur for solution, None- identified goes out owner, pretreatment unit 1012 are completed to vocal print feature After the judgement of similarity, if the similarity of the vocal print feature got and default vocal print feature is less than the first default similarity, Then further judge whether (the second default similarity is configured with less than the vocal print feature more than or equal to the second default similarity First default similarity specifically can take desired value according to actual needs by those skilled in the art, for example, default similar first When degree is arranged to 95%, the second default similarity can be set 75%) to.
It is yes in judging result, it is pre- that is to say that the similarity of the vocal print feature and default vocal print feature got is less than first If similarity and when more than or equal to the second default similarity, pretreatment unit 1012 further gets current position letter Breath.Wherein, pretreatment unit 1012 can send position acquisition the positioning module to electronic equipment is asked (satellite to may be used The different location technology of position technology or base station location technology etc. gets current location information), instruction positioning module returns Return current location information.
After getting current location information, pretreatment unit 1012 according to the location information judge it is current whether position In within the scope of predeterminated position.Wherein, predeterminated position range is configurable to the common position range of owner, such as family and company Deng.
When being currently located within the scope of predeterminated position, pretreatment unit 1012 determines the vocal print feature got and default sound The enunciator of audio signal is identified as owner by line characteristic matching.
The a kind of electronic equipment and apparatus control method provided above to the embodiment of the present application is described in detail, this Specific case is applied in text, and the principle and implementation of this application are described, the explanation of above example is only intended to Help understands the present processes and its core concept;Meanwhile for those skilled in the art, according to the thought of the application, There will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as to this The limitation of application.

Claims (10)

1. a kind of electronic equipment, which is characterized in that the electronic equipment includes dedicated IC chip and central processing unit, and The power consumption of the dedicated IC chip is less than the power consumption of the central processing unit, wherein
The dedicated IC chip is used to obtain external audio signal, and operation is identified to the audio signal, obtains To recognition result, and sends and indicate to identify the instruction information for operating completion to the central processing unit;
The central processing unit is used for according to the instruction information, and the identification knot is extracted from the dedicated IC chip Fruit, and execute the object run of the corresponding recognition result.
2. electronic equipment as described in claim 1, which is characterized in that the dedicated IC chip includes microcontroller list Member, pretreatment unit and algorithm unit, wherein
The pretreatment unit is used for the control according to the micro-control unit, and institute is extracted using mel-frequency cepstrum coefficient algorithm State the mel-frequency cepstrum coefficient of audio signal;
The algorithm unit is used for the control according to the micro-control unit, using deep neural network algorithm to Meier frequency Rate cepstrum coefficient carries out keyword identification, obtains the confidence level of candidate keywords and the candidate keywords.
3. electronic equipment as claimed in claim 2, which is characterized in that the central processing unit is additionally operable to reach in the confidence level When to default confidence level, using the candidate keywords as the target keyword of the audio signal, according to preset keyword With the correspondence of predetermined registration operation, the predetermined registration operation of the correspondence target keyword is determined as the object run, and execute The object run.
4. electronic equipment as claimed in claim 2, which is characterized in that the dedicated IC chip further includes memory, is used In the storage audio signal, the candidate keywords, the confidence level and the pretreatment unit and the algorithm unit The intermediate data generated in the process of implementation.
5. electronic equipment as claimed in claim 4, which is characterized in that the dedicated IC chip further includes that speed buffering is deposited Reservoir, for being stored in the data of the memory, the data taken out from the memory cache.
6. such as claim 2-5 any one of them electronic equipments, which is characterized in that the pretreatment unit is additionally operable to described Audio signal is pre-processed, and after completing to the pretreatment of the audio signal, uses mel-frequency cepstrum coefficient algorithm Extract the mel-frequency cepstrum coefficient of the audio signal.
7. electronic equipment as claimed in claim 6, which is characterized in that the pretreatment unit is additionally operable to extract the audio letter Number vocal print feature, judge whether the vocal print feature matches with default vocal print feature, and the vocal print feature with it is described pre- If vocal print feature matches, the audio signal is pre-processed.
8. electronic equipment as claimed in claim 7, which is characterized in that the pretreatment unit is additionally operable to obtain the vocal print spy Whether the similarity for the default vocal print feature of seeking peace judges the similarity more than or equal to the first default similarity, and When the similarity is greater than or equal to the first default similarity, the vocal print feature and the default vocal print feature are determined Match.
9. electronic equipment as claimed in claim 8, which is characterized in that the pretreatment unit is additionally operable to small in the similarity In the described first default similarity and more than or equal to the second default similarity when, current location information is obtained, according to described Location information judges currently whether be located within the scope of predeterminated position, and when being currently located within the scope of predeterminated position, described in determination Vocal print feature is matched with the default vocal print feature.
10. a kind of apparatus control method is applied to electronic equipment, which is characterized in that the electronic equipment includes central processing unit And dedicated IC chip, and the power consumption of the dedicated IC chip is less than the power consumption of the central processing unit, it is described Apparatus control method includes:
Audio signal outside the dedicated IC chip acquisition;
The audio signal is identified in the dedicated IC chip, obtains recognition result;
The dedicated IC chip sends instruction information that identification is completed to the central processing unit;
The central processing unit extracts the recognition result according to the instruction information, from the dedicated IC chip, and Execute the object run of the corresponding recognition result.
CN201810589643.2A 2018-06-08 2018-06-08 Electronic device and device control method Active CN108711429B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810589643.2A CN108711429B (en) 2018-06-08 2018-06-08 Electronic device and device control method
PCT/CN2019/085554 WO2019233228A1 (en) 2018-06-08 2019-05-05 Electronic device and device control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810589643.2A CN108711429B (en) 2018-06-08 2018-06-08 Electronic device and device control method

Publications (2)

Publication Number Publication Date
CN108711429A true CN108711429A (en) 2018-10-26
CN108711429B CN108711429B (en) 2021-04-02

Family

ID=63871448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810589643.2A Active CN108711429B (en) 2018-06-08 2018-06-08 Electronic device and device control method

Country Status (2)

Country Link
CN (1) CN108711429B (en)
WO (1) WO2019233228A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109636937A (en) * 2018-12-18 2019-04-16 深圳市沃特沃德股份有限公司 Voice Work attendance method, device and terminal device
CN110223687A (en) * 2019-06-03 2019-09-10 Oppo广东移动通信有限公司 Instruction executing method, device, storage medium and electronic equipment
CN110310645A (en) * 2019-07-02 2019-10-08 上海迥灵信息技术有限公司 Sound control method, device and the storage medium of intelligence control system
WO2019233228A1 (en) * 2018-06-08 2019-12-12 Oppo广东移动通信有限公司 Electronic device and device control method
CN111508475A (en) * 2020-04-16 2020-08-07 五邑大学 Robot awakening voice keyword recognition method and device and storage medium
CN113352987A (en) * 2021-05-31 2021-09-07 湖北亿咖通科技有限公司 Method and system for controlling warning tone of vehicle machine
WO2021238506A1 (en) * 2020-05-29 2021-12-02 Oppo广东移动通信有限公司 Multimedia processing chip, electronic device, and dynamic image processing method
CN115527373A (en) * 2022-01-05 2022-12-27 荣耀终端有限公司 Riding tool identification method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005181510A (en) * 2003-12-17 2005-07-07 Toshiba Corp Ic voice repeater
CN102905029A (en) * 2012-10-17 2013-01-30 广东欧珀移动通信有限公司 Mobile phone and method for looking for mobile phone through intelligent voice
CN103474071A (en) * 2013-09-16 2013-12-25 重庆邮电大学 Embedded portable voice controller and intelligent housing system with voice recognition
CN103700368A (en) * 2014-01-13 2014-04-02 联想(北京)有限公司 Speech recognition method, speech recognition device and electronic equipment
CN105575395A (en) * 2014-10-14 2016-05-11 中兴通讯股份有限公司 Voice wake-up method and apparatus, terminal, and processing method thereof
CN106250751A (en) * 2016-07-18 2016-12-21 青岛海信移动通信技术股份有限公司 A kind of mobile device and the method adjusting sign information detection threshold value
CN106940998A (en) * 2015-12-31 2017-07-11 阿里巴巴集团控股有限公司 A kind of execution method and device of setting operation

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9728184B2 (en) * 2013-06-18 2017-08-08 Microsoft Technology Licensing, Llc Restructuring deep neural network acoustic models
CN104143327B (en) * 2013-07-10 2015-12-09 腾讯科技(深圳)有限公司 A kind of acoustic training model method and apparatus
KR101844932B1 (en) * 2014-09-16 2018-04-03 한국전자통신연구원 Signal process algorithm integrated deep neural network based speech recognition apparatus and optimization learning method thereof
US10140572B2 (en) * 2015-06-25 2018-11-27 Microsoft Technology Licensing, Llc Memory bandwidth management for deep learning applications
KR102423302B1 (en) * 2015-10-06 2022-07-19 삼성전자주식회사 Apparatus and method for calculating acoustic score in speech recognition, apparatus and method for learning acoustic model
CN105488227B (en) * 2015-12-29 2019-09-20 惠州Tcl移动通信有限公司 A kind of electronic equipment and its method that audio file is handled based on vocal print feature
CN106228240B (en) * 2016-07-30 2020-09-01 复旦大学 Deep convolution neural network implementation method based on FPGA
CN108711429B (en) * 2018-06-08 2021-04-02 Oppo广东移动通信有限公司 Electronic device and device control method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005181510A (en) * 2003-12-17 2005-07-07 Toshiba Corp Ic voice repeater
CN102905029A (en) * 2012-10-17 2013-01-30 广东欧珀移动通信有限公司 Mobile phone and method for looking for mobile phone through intelligent voice
CN103474071A (en) * 2013-09-16 2013-12-25 重庆邮电大学 Embedded portable voice controller and intelligent housing system with voice recognition
CN103700368A (en) * 2014-01-13 2014-04-02 联想(北京)有限公司 Speech recognition method, speech recognition device and electronic equipment
CN105575395A (en) * 2014-10-14 2016-05-11 中兴通讯股份有限公司 Voice wake-up method and apparatus, terminal, and processing method thereof
CN106940998A (en) * 2015-12-31 2017-07-11 阿里巴巴集团控股有限公司 A kind of execution method and device of setting operation
CN106250751A (en) * 2016-07-18 2016-12-21 青岛海信移动通信技术股份有限公司 A kind of mobile device and the method adjusting sign information detection threshold value

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019233228A1 (en) * 2018-06-08 2019-12-12 Oppo广东移动通信有限公司 Electronic device and device control method
CN109636937A (en) * 2018-12-18 2019-04-16 深圳市沃特沃德股份有限公司 Voice Work attendance method, device and terminal device
CN110223687A (en) * 2019-06-03 2019-09-10 Oppo广东移动通信有限公司 Instruction executing method, device, storage medium and electronic equipment
CN110310645A (en) * 2019-07-02 2019-10-08 上海迥灵信息技术有限公司 Sound control method, device and the storage medium of intelligence control system
CN111508475A (en) * 2020-04-16 2020-08-07 五邑大学 Robot awakening voice keyword recognition method and device and storage medium
CN111508475B (en) * 2020-04-16 2022-08-09 五邑大学 Robot awakening voice keyword recognition method and device and storage medium
WO2021238506A1 (en) * 2020-05-29 2021-12-02 Oppo广东移动通信有限公司 Multimedia processing chip, electronic device, and dynamic image processing method
CN113352987A (en) * 2021-05-31 2021-09-07 湖北亿咖通科技有限公司 Method and system for controlling warning tone of vehicle machine
CN115527373A (en) * 2022-01-05 2022-12-27 荣耀终端有限公司 Riding tool identification method and device
WO2023130934A1 (en) * 2022-01-05 2023-07-13 荣耀终端有限公司 Transportation vehicle type identification method and apparatus

Also Published As

Publication number Publication date
WO2019233228A1 (en) 2019-12-12
CN108711429B (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN108711429A (en) Electronic equipment and apparatus control method
WO2021093449A1 (en) Wakeup word detection method and apparatus employing artificial intelligence, device, and medium
CN110534099B (en) Voice wake-up processing method and device, storage medium and electronic equipment
CN109036384B (en) Audio recognition method and device
CN106847292B (en) Method for recognizing sound-groove and device
CN110265040A (en) Training method, device, storage medium and the electronic equipment of sound-groove model
CN109817246A (en) Training method, emotion identification method, device, equipment and the storage medium of emotion recognition model
CN108428446A (en) Audio recognition method and device
US20200075024A1 (en) Response method and apparatus thereof
CN114186563A (en) Electronic equipment and semantic analysis method and medium thereof and man-machine conversation system
CN112562691A (en) Voiceprint recognition method and device, computer equipment and storage medium
CN108922525B (en) Voice processing method, device, storage medium and electronic equipment
CN103680497A (en) Voice recognition system and voice recognition method based on video
CN108806684B (en) Position prompting method and device, storage medium and electronic equipment
CN102404278A (en) Song request system based on voiceprint recognition and application method thereof
CN111710337B (en) Voice data processing method and device, computer readable medium and electronic equipment
CN108900965A (en) Position indicating method, device, storage medium and electronic equipment
Mian Qaisar Isolated speech recognition and its transformation in visual signs
CN111798846A (en) Voice command word recognition method and device, conference terminal and conference terminal system
CN112489628B (en) Voice data selection method and device, electronic equipment and storage medium
CN113851136A (en) Clustering-based speaker recognition method, device, equipment and storage medium
CN110992940B (en) Voice interaction method, device, equipment and computer-readable storage medium
CN111048068B (en) Voice wake-up method, device and system and electronic equipment
CN113436617B (en) Voice sentence breaking method, device, computer equipment and storage medium
CN114913859A (en) Voiceprint recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant