WO2019015435A1 - 语音识别方法、装置和存储介质 - Google Patents
语音识别方法、装置和存储介质 Download PDFInfo
- Publication number
- WO2019015435A1 WO2019015435A1 PCT/CN2018/091926 CN2018091926W WO2019015435A1 WO 2019015435 A1 WO2019015435 A1 WO 2019015435A1 CN 2018091926 W CN2018091926 W CN 2018091926W WO 2019015435 A1 WO2019015435 A1 WO 2019015435A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio data
- wake
- fuzzy
- speech recognition
- word
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000004458 analytical method Methods 0.000 claims abstract description 81
- 238000012545 processing Methods 0.000 claims description 62
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 230000002618 waking effect Effects 0.000 claims description 4
- 230000005236 sound signal Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 35
- 238000001914 filtration Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000007621 cluster analysis Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 241000699666 Mus <mouse, genus> Species 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 206010041349 Somnolence Diseases 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000010079 rubber tapping Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/083—Recognition networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
- G06F1/3215—Monitoring of peripheral devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3206—Monitoring of events, devices or parameters that trigger a change in power modality
- G06F1/3231—Monitoring the presence, absence or movement of users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
- G06F1/3203—Power management, i.e. event-based initiation of a power-saving mode
- G06F1/3234—Power saving characterised by the action undertaken
- G06F1/3293—Power saving characterised by the action undertaken by switching to a less power-consuming processor, e.g. sub-CPU
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1815—Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72403—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
- H04M1/72409—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality by interfacing with external accessories
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72448—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. TPC [Transmission Power Control], power saving or power classes
- H04W52/02—Power saving arrangements
- H04W52/0209—Power saving arrangements in terminal devices
- H04W52/0261—Power saving arrangements in terminal devices managing power supply demand, e.g. depending on battery level
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- the present invention relates to the field of communication technologies, and in particular, to speech recognition.
- intelligent hardware products With the development of artificial intelligence, intelligent hardware products have also developed rapidly.
- intelligent hardware products refer to hardware devices integrated with artificial intelligence functions, such as smart mobile terminals (referred to as mobile terminals).
- the core of intelligent hardware products is inseparable from the interaction with "people", and voice interaction as a natural, low-learning interaction has become the mainstream technology of intelligent hardware products.
- the recording function of the terminal is generally required to be turned on, and the central processing unit (CPU, Central Processing Unit) can process the audio data at any time, even if The CPU cannot sleep while the user is not speaking. Since the CPU needs to encode, decode, play, and implement various functions of various audio data, the scheme has higher specifications on the CPU, and the power consumption of the entire system is also very large, for the battery-powered mobile. For the terminal, it will greatly shorten its standby time.
- CPU Central Processing Unit
- the prior art has proposed an external power supply for power supply, or a physical button to wake up, but if an external power supply is used, it will inevitably affect its mobility, and if it is awakened by physical buttons, Therefore, voice wake-up cannot be realized; that is to say, in the existing solution, if it is required to maintain its mobility and voice wake-up function, it is necessary to consume a large amount of battery power, which will greatly reduce the standby time of the mobile terminal and affect the mobile terminal. performance.
- the embodiment of the invention provides a voice recognition method, device and storage medium; the system power consumption can be reduced, so that the standby time of the mobile terminal is prolonged and the performance of the mobile terminal is improved under the premise of maintaining the mobility and the voice wake-up function.
- an embodiment of the present invention provides a voice recognition method, including:
- DSP Digital Signal Processing
- the DSP is woken up by the DSP, and the CPU is used for semantic analysis of the audio data.
- the performing fuzzy speech recognition on the audio data by using a digital signal processor includes:
- the audio data is speech-recognized by fuzzy clustering analysis, and the fuzzy speech recognition result is obtained.
- the digital signal processor performs speech recognition on the audio data by using a fuzzy clustering analysis to obtain a fuzzy speech recognition result, including:
- fuzzy clustering neural network as an estimator of a probability density function, predicting a probability that the audio data includes an awakening word
- the performing fuzzy speech recognition on the audio data by using a digital signal processor includes:
- the audio data is speech-recognized by using a fuzzy matching algorithm to obtain a fuzzy speech recognition result.
- the digital signal processor performs a voice recognition on the audio data by using a fuzzy matching algorithm to obtain a fuzzy speech recognition result, including:
- a feature map of each word pronunciation in the audio data is analyzed to obtain a feature map to be matched;
- the method further includes:
- the audio data is semantically analyzed by the central processor, and the corresponding operation of the analysis result is performed according to the analysis result.
- the method before the semantic analysis of the audio data by the central processing unit, the method further includes:
- the central processor When the speech recognition result indicates that there is no wake-up word, the central processor is set to sleep and returns to the step of performing acquisition of audio data.
- the performing, by the central processor, performing voice recognition on the wake data includes:
- voice recognition is performed on the wakeup data.
- the semantic analysis of the audio data by the central processor includes:
- the audio data is semantically analyzed.
- the semantic analysis of the audio data by the central processor includes:
- the audio data is semantically analyzed.
- the method before performing the fuzzy speech recognition on the audio data by using a digital signal processor, the method further includes:
- Noise reduction and/or echo cancellation processing is performed on the audio data.
- the performing the corresponding operations according to the analysis result includes:
- the operation content is performed on the operation object.
- an embodiment of the present invention provides a voice recognition apparatus, including:
- a fuzzy identification unit configured to perform fuzzy speech recognition on the audio data by using a DSP
- a wake-up unit configured to wake up a CPU in a sleep state when the fuzzy speech recognition result indicates that the wake-up word exists, and the CPU is configured to perform semantic analysis on the audio data.
- the fuzzy identification unit is specifically configured to perform voice recognition on the audio data by using a fuzzy clustering analysis by using a DSP to obtain a fuzzy speech recognition result.
- the fuzzy identification unit may be specifically configured to: establish a fuzzy clustering neural network according to fuzzy clustering analysis; use the fuzzy clustering neural network as an estimator of a probability density function, and include an awakening word for the audio data.
- the probability is predicted; if the prediction result indicates that the probability is greater than or equal to the set value, a fuzzy speech recognition result indicating that the wake-up word exists is generated; and if the predicted result indication probability is less than the set value, a fuzzy speech recognition result indicating that the wake-up word does not exist is generated.
- the fuzzy identification unit is specifically configured to perform voice recognition on the audio data by using a fuzzy matching algorithm by using a DSP to obtain a fuzzy speech recognition result.
- the fuzzy identification unit may be specifically configured to obtain a feature map of the wake-up word pronunciation, obtain a standard feature map, analyze a feature map of each word pronunciation in the audio data, and obtain a feature map to be matched; according to a preset membership degree
- the function calculates a degree value of each of the to-be-matched feature maps belonging to the standard feature map; if the degree value is greater than or equal to the preset value, generating a fuzzy speech recognition result indicating that the wake-up word exists; if the degree value is less than the preset value, generating A fuzzy speech recognition result indicating that there is no wake-up word.
- the voice recognition apparatus may further include a processing unit, configured to perform semantic analysis on the audio data by using a CPU, and perform a corresponding operation according to the analysis result.
- the speech recognition apparatus may further include a precise identification unit as follows:
- the precise identification unit is configured to read data of the wake-up word in the audio data from the DSP to obtain wake-up data; perform voice recognition on the wake-up data by the CPU; and when the voice recognition result indicates that the wake-up word exists
- the trigger processing unit performs an operation of performing semantic analysis on the audio data by the CPU; when the voice recognition result indicates that there is no wake-up word, the CPU is set to sleep, and the acquisition unit is triggered to perform an operation of acquiring audio data.
- the precise identification unit may be configured to set an operating state of the CPU to a first state, where the first state is a single core and a low frequency, and in the first state, the wake data is performed. Speech Recognition.
- the processing unit may be configured to set an operating state of the CPU to a second state, where the second state is a multi-core and a high frequency, and in the second state, Audio data is semantically analyzed.
- the processing unit may be specifically configured to determine a semantic scenario according to the wake-up word corresponding to the audio data, determine a working core number and a primary frequency of the CPU according to the semantic scenario, according to the working core number and the primary The frequency size sets the working state of the CPU to obtain a third state in which the audio data is semantically analyzed.
- the voice recognition device may further include a filtering unit, as follows:
- the filtering unit is configured to perform noise reduction and/or echo cancellation processing on the audio data.
- an embodiment of the present invention further provides a mobile terminal, where the mobile terminal includes a storage medium and a processor, where the storage medium stores a plurality of instructions, the processor is configured to load and execute the instruction,
- the instructions are used to implement the steps in any of the speech recognition methods provided by the embodiments of the present invention.
- the embodiment of the present invention further provides a storage medium, where the storage medium stores a plurality of instructions, and the instructions are adapted to be loaded by a processor to perform any of the voice recognition methods provided by the embodiments of the present invention. The steps in .
- the embodiment of the present invention may perform fuzzy speech recognition on the audio data through the DSP.
- the DSP wakes up the CPU in the sleep state, and the over-CPU can be used for the audio.
- the data is semantically analyzed. Because the scheme uses a DSP with lower running power to replace the CPU with higher power consumption to monitor the audio data, the CPU does not need to be awake all the time, but can be in a dormant state and when needed.
- the solution can greatly reduce system power consumption while preserving mobility and voice wake-up functions, thus extending mobile
- the standby time of the terminal improves the performance of the mobile terminal.
- FIG. 1 is a structural diagram of a mobile terminal according to an embodiment of the present invention.
- FIG. 1b is a schematic diagram of a scenario of a voice recognition method according to an embodiment of the present invention.
- FIG. 1c is a flowchart of a voice recognition method according to an embodiment of the present invention.
- FIG. 1d is a block diagram of a voice recognition method according to an embodiment of the present invention.
- FIG. 2a is another flowchart of a voice recognition method according to an embodiment of the present invention.
- 2b is another block diagram of a voice recognition method according to an embodiment of the present invention.
- FIG. 3 is a schematic structural diagram of a voice recognition apparatus according to an embodiment of the present invention.
- FIG. 3b is another schematic structural diagram of a voice recognition apparatus according to an embodiment of the present invention.
- 3c is another schematic structural diagram of a voice recognition apparatus according to an embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of a mobile terminal according to an embodiment of the present invention.
- Embodiments of the present invention provide a voice recognition method, apparatus, and storage medium.
- the voice recognition device may be specifically integrated in a mobile terminal, such as a mobile phone, a wearable smart device, a tablet computer, and/or a laptop computer.
- a DSP may be set in the mobile terminal.
- the DSP may be set in a codec (Coder-decoder).
- a DSP-capable codec such that when the mobile terminal acquires audio data, such as receiving a sound from a user via a microphone (MIC, Microphone), the audio data can be subjected to fuzzy speech recognition through the DSP.
- the fuzzy speech recognition result indicates that there is a wake-up word, and the DSP wakes up the sleepy CPU, and the CPU can be used for semantic analysis of the audio data, for example, see FIG. 1b; otherwise, if the fuzzy speech recognition result indicates that there is no wake-up word , instead of waking up the CPU, the DSP continues to listen to the audio data.
- DSP is a kind of microprocessor that is especially suitable for digital signal processing operations. It can realize various digital signal processing algorithms in real time and quickly, because of its low overhead or no overhead loop and jump.
- the hardware supports the features, so the power consumption is lower than other processors; in addition, the DSP also has the function of noise reduction.
- a voice recognition device which may be integrated in a device such as a mobile terminal, which may include a mobile phone, a wearable smart device, a tablet computer, and/or a laptop computer. And other equipment.
- the embodiment provides a voice recognition method, including: acquiring audio data, and performing fuzzy voice recognition on the audio data by using a DSP.
- the DSP wakes up the CPU in a sleep state, and the CPU uses the CPU. Semantic analysis of the audio data.
- the specific process of the voice recognition method can be as follows:
- the audio data can be collected by an MIC, such as a MIC module built in the mobile terminal.
- an MIC such as a MIC module built in the mobile terminal.
- the audio data may include data converted into various forms of sound, and the type of the sound may be not limited, for example, it may be a voice, an animal sound, a sound of an object, and/or music, etc. Wait.
- fuzzy speech recognition There may be multiple ways of fuzzy speech recognition. For example, fuzzy cluster analysis may be used to perform speech recognition on the audio data, or a fuzzy matching algorithm may be used to perform speech recognition on the audio data, etc.
- fuzzy cluster analysis may be used to perform speech recognition on the audio data
- a fuzzy matching algorithm may be used to perform speech recognition on the audio data, etc.
- the step "fuzzy speech recognition of the audio data by the DSP" may be as follows:
- the fuzzy data is used to perform speech recognition on the audio data, and the fuzzy speech recognition result is obtained.
- a fuzzy clustering neural network may be established according to the fuzzy clustering analysis, and the fuzzy clustering neural network is used as an estimator of the probability density function to predict the probability that the audio data includes the wake-up word, if the prediction result indicates a probability greater than or equal to The set value generates a fuzzy speech recognition result indicating that the wake-up word exists, and if the prediction result indicates that the probability is less than the set value, a fuzzy speech recognition result indicating that the wake-up word does not exist is generated.
- fuzzy clustering analysis generally refers to constructing fuzzy matrices according to the attributes of the research objects themselves, and based on this, the clustering relationship is determined according to a certain degree of membership, that is, the fuzzy relations are used to quantify the fuzzy relations between samples. The determination is to perform clustering objectively and accurately. Clustering is to divide the data set into multiple classes or clusters, so that the data difference between the classes should be as large as possible, and the data difference between the classes should be as small as possible.
- the set value can be set according to the requirements of the actual application, and details are not described herein again.
- the fuzzy matching algorithm is used to perform speech recognition on the audio data, and the fuzzy speech recognition result is obtained.
- a feature map of the wake-up speech can be obtained, a standard feature map is obtained, and a feature map of each word in the audio data is analyzed, and a feature map to be matched is obtained, and then each candidate feature is calculated according to a preset membership function.
- the map belongs to the degree value of the standard feature map. If the degree value is greater than or equal to the preset value, a fuzzy speech recognition result indicating that the wake-up word exists is generated. Otherwise, if the degree value is less than the preset value, generating an indication that the wake-up word does not exist. Fuzzy speech recognition results.
- the membership function and the preset value may be set according to the requirements of the actual application, and are not described herein again.
- the audio data may also be subjected to filtering processing such as noise reduction and/or echo cancellation, that is, as shown in FIG. 1d.
- filtering processing such as noise reduction and/or echo cancellation
- the voice recognition method may further include:
- the audio data is subjected to noise reduction and/or echo cancellation processing to obtain processed audio data.
- step the step of performing fuzzy speech recognition on the audio data by using the DSP may be: performing fuzzy speech recognition on the processed audio data by using a DSP.
- the DSP wakes up the CPU in the sleep state, that is, the operating program of the CPU is activated by the DSP, for example, the related running program of the recording and audio data in the CPU may be specifically activated.
- the wake-up words may be one or multiple, and the wake-up words may be preset according to actual application requirements. For example, taking the wake-up words including "calling" and “sending information" as an example, when the fuzzy speech recognition result indicates that the word “calling” or “sending information" exists in the audio data, the CPU can be woken up by the DSP. And so on, and so on.
- the voice recognition method may further include:
- the audio data is semantically analyzed by the CPU, and corresponding operations are performed according to the analysis result.
- the operation object and the operation content may be specifically determined according to the analysis result, and then the operation content is performed on the operation object, and the like.
- the CPU may also The audio data is further identified, that is, before the step of “semantic analysis of the audio data by the CPU”, the voice recognition method may further include:
- the CPU is set to sleep and returns to the step of performing acquisition of audio data (ie, step 101).
- the CPU may not open all cores, but use a single core and a low frequency to perform arithmetic processing, that is, the step “speech recognition of the wakeup data by the CPU” may include:
- the working state of the CPU is set to a single core and a low frequency, so that the CPU performs voice recognition on the wakeup data in the working state.
- the "single core and low frequency" working state is referred to as a first state, that is, the CPU can perform voice recognition on the wake data in the first state.
- the number of cores may be increased, and the frequency is raised to perform semantic analysis on the audio data, that is, the step “semantic analysis of the audio data by the CPU” may include :
- the working state of the CPU is set to multi-core and high frequency, and in the working state, the audio data is semantically analyzed by the CPU.
- the “multi-core and high-frequency” working state is referred to as a second state, that is, the working state of the CPU may be set to a second state, in the second state.
- the semantic analysis of the audio data is performed.
- the multi-core refers to two or more complete calculation engines (cores) integrated in the processor; the low frequency refers to the main frequency lower than the preset frequency, the high frequency The frequency of the preset frequency is higher than or equal to the preset frequency.
- the preset frequency may be determined according to the requirements of the actual application, and details are not described herein again.
- Semantic analysis of the audio data may include:
- Determining a semantic scenario according to the wake-up word corresponding to the audio data determining a working core number and a primary frequency of the CPU according to the semantic scenario, and setting a working state of the CPU according to the working core number and the primary frequency to obtain a third state, where In the third state, the audio data is semantically analyzed.
- the audio data can be semantically analyzed with a lower working core number and a dominant frequency, and in the semantic context of “searching”, a higher working core can be used.
- the audio data can be subjected to fuzzy speech recognition by the DSP.
- the DSP wakes up the CPU in the sleep state, and the CPU can be used for
- the audio data is semantically analyzed. Because the scheme uses a DSP with lower running power to replace the CPU with higher power consumption to monitor the audio data, the CPU does not need to be awake all the time, but can be in a dormant state and when needed.
- the solution can greatly reduce system power consumption while preserving mobility and voice wake-up functions, thus extending mobile
- the standby time of the terminal improves the performance of the mobile terminal.
- the voice recognition device is specifically integrated into the mobile terminal as an example for description.
- a voice recognition method As shown in FIG. 2a, a voice recognition method, the specific process can be as follows:
- the mobile terminal collects the audio data by using a MIC.
- the MIC may be independent of the mobile terminal or may be built in the mobile terminal.
- the audio data may include data converted into various forms of sound, and the type of the sound may be not limited, for example, it may be a voice, an animal sound, a sound of an object, and/or music, etc. Wait.
- step 203 is performed. Otherwise, if the fuzzy speech recognition result indicates that there is no wake-up word, the process returns to step 201.
- the wake-up words may be one or more, and the wake-up words may be set in advance according to actual application requirements, for example, “calling”, “sending information”, “* who is”, “who” Yes*”, “What is *”, and/or “What is *”, etc., where “*” can be any noun, such as “Who is Zhang San”, “Who is Li Si”, or “Java What is it, and so on, and so on.
- the DSP may be disposed in a codec of the mobile terminal (ie, Codec), for example, as shown in FIG. 1a.
- the codec can compress and decompress (ie, encode and de-encode) the audio data; when the MIC collects the audio data, the audio data is transmitted to the codec for processing, such as compression and/or decompression. After processing, it is then transmitted to the DSP for fuzzy speech recognition.
- fuzzy speech recognition There may be multiple ways of fuzzy speech recognition. For example, fuzzy cluster analysis may be used to perform speech recognition on the audio data, or a fuzzy matching algorithm may be used to perform speech recognition on the audio data, etc., for example, for example, Specifically, it can be as follows:
- the mobile terminal uses the fuzzy clustering analysis to perform speech recognition on the audio data through the DSP, and obtains the fuzzy speech recognition result.
- the DSP may specifically establish a fuzzy clustering neural network according to the fuzzy clustering analysis, and then use the fuzzy clustering neural network as an estimator of the probability density function to predict the probability that the audio data includes the wake-up word, if the prediction result indicates If the probability is greater than or equal to the set value, a fuzzy speech recognition result indicating that the wake-up word exists is generated. Otherwise, if the predicted result indicating probability is less than the set value, a fuzzy speech recognition result indicating that the wake-up word does not exist is generated.
- the set value can be set according to the requirements of the actual application, and details are not described herein again.
- the mobile terminal uses the fuzzy matching algorithm to perform speech recognition on the audio data through the DSP, and obtains the fuzzy speech recognition result.
- the DSP may obtain a feature map of the wake-up speech, obtain a standard feature map, and analyze a feature map of each word in the audio data to obtain a feature map to be matched, and then calculate each to-be-match according to a preset membership function.
- the feature map belongs to the degree value of the standard feature map. If the degree value is greater than or equal to the preset value, a fuzzy speech recognition result indicating that the wake-up word exists is generated. Otherwise, if the degree value is less than the preset value, the indication indicates that no wake-up word exists. Fuzzy speech recognition results.
- the membership function and the preset value may be set according to the requirements of the actual application.
- the degree to which the to-be-matched feature map belongs to the standard feature map may also be represented by the membership degree, and the closer the membership degree is to 1, indicating that the to-be-matched The higher the degree of the feature map belongs to the standard feature map, the closer the membership degree is to 0, the lower the degree that the feature map to be matched belongs to the standard feature map, and details are not described herein again.
- the audio data may be subjected to filtering processing such as noise reduction and/or echo cancellation, that is, as shown in FIG. 2b.
- filtering processing such as noise reduction and/or echo cancellation
- the speech recognition method may further include:
- the mobile terminal performs noise reduction and/or echo cancellation processing on the audio data to obtain processed audio data.
- the step “the mobile terminal performs fuzzy speech recognition on the audio data through the DSP” may specifically be: the mobile terminal performs fuzzy speech recognition on the processed audio data through the DSP.
- the DSP wakes up the CPU in the sleep state.
- the running program of the CPU may be activated by the DSP, for example, the related running program of the recording and audio data in the CPU may be activated, and the like.
- the CPU can be woken up by the DSP. And so on, and so on.
- the mobile terminal reads data of the wake-up word in the audio data through the DSP, and obtains wake-up data.
- the mobile terminal can read the A-segment data. This segment A data is used as wake-up data.
- the mobile terminal can read the B-segment data at this time. , the B segment data as wake-up data, and so on, and so on.
- the mobile terminal performs voice recognition on the wakeup data by using the CPU.
- step 206 is performed. Otherwise, when the voice recognition result indicates that there is no wakeup word, the CPU is set to sleep and returns.
- the step of acquiring audio data is performed (ie, step 201.
- the DSP can be specifically notified to perform an operation of performing speech recognition on the audio data, see FIG. 2b.
- the CPU may not open all cores, but use a single core and a low frequency to perform arithmetic processing, that is, the step “speech recognition of the wakeup data by the CPU” may include:
- the working state of the CPU is set to a first state, that is, set to a single core and low frequency, so that the CPU performs voice recognition on the wakeup data in the first state.
- Steps 204 and 205 are optional steps.
- the mobile terminal performs semantic analysis on the audio data by using a CPU.
- the working state of the CPU may be set to the second state, that is, set to multi-core and high frequency, and in the second state, the audio data is semantically analyzed by the CPU.
- the power consumption and processing efficiency can be better balanced, and the working core number and the primary frequency of the CPU can be adjusted according to a specific voice scenario; for example, the mobile terminal can Determining a semantic scenario according to the wake-up word corresponding to the audio data, and then determining a working core number and a primary frequency of the CPU according to the semantic scenario, and setting a working state of the CPU according to the working core number and the primary frequency (ie, a third state) And in this working state, the audio data is semantically analyzed.
- the working core of the CPU needs to be a single core, and the primary frequency is X mhz; in the semantic scenario corresponding to the "sending information”, the working core of the CPU needs to be a single core.
- the main frequency is Y mhz; in the semantic scenario corresponding to “search”, the working core of the CPU needs to be dual-core, and the main frequency is Z mhz; the specifics can be as follows:
- the working core of the CPU can be set to a single core, and the primary frequency is set to X mhz, and then, in the working state, the audio data is semantically analyzed by the CPU.
- the working core number of the CPU can be set to a single core, and the main frequency size is set to Y mhz, and then, in the working state, the audio data is semantically analyzed by the CPU.
- the working core number of the CPU can be set to dual core, and the main frequency size is set to Z mhz , and then, in the working state, the audio data is semantically analyzed by the CPU.
- the mobile terminal can continue to collect other audio data through the MIC, and perform semantic analysis by the awake CPU, and perform corresponding operations according to the analysis result, where
- the manner of the semantic analysis and the manner of performing the corresponding operations according to the analysis result refer to steps 206 and 207, and details are not described herein again.
- the mobile terminal performs a corresponding operation according to the analysis result.
- the operation object and the operation content may be determined according to the analysis result, and then the operation content is performed on the operation object by the CPU, and the like.
- the mobile terminal can determine that the operation object is "the telephone number of Zhang San in the address book", and the operation content is "calling the telephone number”, so that the communication can be dialed through the CPU at this time. Record the phone number of Zhang San, thus completing the task of “calling Zhang San”.
- the mobile terminal can determine that the operation object is "search engine application”, and the operation content is "searching for a keyword “poetry” through a search engine application", so that the mobile terminal can be activated at this time.
- search engine application and through the search engine application search keyword "poetry”, to complete the task of "search poetry", and so on, and so on.
- the audio data can be subjected to fuzzy speech recognition by the DSP.
- the DSP wakes up the CPU in the sleep state, and the CPU adopts a single core and The low-frequency working state confirms whether there is a wake-up word again. If the CPU determines that there is no wake-up word, the CPU switches to the sleep state, and the DSP continues to listen.
- the audio data is only used by the CPU when the CPU determines that there is an wake-up word.
- the scheme uses a DSP with lower running power, instead of running a CPU with higher power consumption to monitor the audio data, the CPU does not need to be awake all the time. Instead, it can be in a dormant state and be woken up when needed; therefore, the solution can retain mobility and voice wake-up functionality compared to existing solutions that can only be woken up by external power or through physical buttons. Under the premise, the system power consumption is greatly reduced, thereby prolonging the standby time of the mobile terminal and improving the performance of the mobile terminal.
- the scheme can recognize the wake-up words by the DSP, the wake-up words can be recognized again by the CPU, so the recognition accuracy is high, and, because the CPU recognizes the wake-up words, the It is a low-power operating state (such as single-core and low-frequency). Only when the wake-up word is determined, the CPU uses the higher-power working state for semantic analysis. Therefore, the utilization of resources is more reasonable and effective. It is beneficial to further improve the performance of the mobile terminal.
- the embodiment of the present invention further provides a voice recognition device, which may be integrated in a mobile terminal, such as a mobile phone, a wearable smart device, a tablet computer, and/or a laptop computer. .
- a voice recognition device which may be integrated in a mobile terminal, such as a mobile phone, a wearable smart device, a tablet computer, and/or a laptop computer.
- the voice recognition apparatus may include an acquisition unit 301, a blur recognition unit 302, and a wakeup unit 303, as follows:
- the obtaining unit 301 is configured to acquire audio data.
- the obtaining unit 301 can be specifically configured to collect the audio data by using a MIC, such as a MIC module built in the mobile terminal.
- a MIC such as a MIC module built in the mobile terminal.
- the fuzzy identification unit 302 is configured to perform fuzzy speech recognition on the audio data by using a DSP.
- fuzzy speech recognition There may be multiple ways of fuzzy speech recognition. For example, fuzzy cluster analysis may be used to perform speech recognition on the audio data, or a fuzzy matching algorithm may be used to perform speech recognition on the audio data, etc. :
- the fuzzy identification unit 302 can be specifically configured to perform voice recognition on the audio data by using a fuzzy cluster analysis to obtain a fuzzy speech recognition result.
- the fuzzy identification unit 302 may be specifically configured to establish a fuzzy clustering neural network according to the fuzzy clustering analysis, and use the fuzzy clustering neural network as an estimator of the probability density function to predict the probability that the audio data includes the wake-up word. And if the prediction result indication probability is greater than or equal to the set value, generating a fuzzy speech recognition result indicating that the wake-up word exists; and if the prediction result indication probability is less than the set value, generating a fuzzy speech recognition result indicating that the wake-up word does not exist.
- the set value can be set according to the requirements of the actual application, and details are not described herein again.
- the fuzzy identification unit 302 is specifically configured to perform voice recognition on the audio data by using a fuzzy matching algorithm, and obtain a fuzzy speech recognition result.
- the fuzzy identification unit 302 may be specifically configured to obtain a feature map of the wake-up word pronunciation, obtain a standard feature map, analyze a feature map of each word pronunciation in the audio data, and obtain a feature map to be matched, according to a preset membership function. Calculating a degree value of each of the to-be-matched feature maps belonging to the standard feature map, and if the degree value is greater than or equal to the preset value, generating a fuzzy speech recognition result indicating that the wake-up word exists; if the degree value is less than the preset value, the generating indication does not exist The fuzzy speech recognition result of the wake-up word.
- the membership function and the preset value may be set according to the requirements of the actual application, and are not described herein again.
- the speech recognition apparatus can further include a processing unit 304, as shown in FIG. 3b:
- the processing unit 304 is configured to perform semantic analysis on the audio data by using a CPU, and perform a corresponding operation according to the analysis result.
- the processing unit 304 may be specifically configured to perform semantic analysis on the audio data by using a CPU, and determine an operation object and an operation content according to the analysis result, and then execute the operation content on the operation object, and the like.
- the audio data may be subjected to filtering processing such as noise reduction and/or echo cancellation, that is, as shown in FIG. 3c.
- the voice recognition device may further include a filtering unit 305 as follows:
- the filtering unit 305 can be configured to perform noise reduction and/or echo cancellation processing on the audio data.
- the fuzzy identification unit 302 can be used to perform fuzzy speech recognition on the audio data processed by the filtering unit 305.
- the waking unit 303 can be configured to wake up the CPU in the sleep state when the fuzzy speech recognition result indicates that the wake-up word exists.
- the wake-up word may be one or more, and the wake-up word may be set in advance according to the requirements of the actual application, and details are not described herein again.
- the audio data may be further identified before the processing unit 304 performs semantic analysis on the audio data by the CPU, that is, as shown in FIG. 3c.
- the speech recognition device may also include a precision identification unit 306 as follows:
- the precise identification unit 306 can be configured to read data of the wake-up word in the audio data from the DSP to obtain wake-up data; perform voice recognition on the wake-up data by the CPU; and trigger when the voice recognition result indicates that the wake-up word exists
- the processing unit 304 performs an operation of performing semantic analysis on the audio data by the CPU; when the voice recognition result indicates that there is no wake-up word, the CPU is set to sleep, and the acquisition unit is triggered to perform an operation of acquiring audio data.
- the CPU when the CPU is woken up, the CPU may not be turned on, but the single core and the low frequency are used for processing, that is:
- the precise identification unit 306 can be specifically configured to set the working state of the CPU to a first state, in which the wake-up data is voice-recognized, wherein the first state is a single core and a low frequency.
- the number of cores may be increased, and the frequency is raised to perform semantic analysis on the audio data, namely:
- the processing unit 304 is specifically configured to set an operating state of the CPU to a second state, where the audio data is semantically analyzed, wherein the second state is a multi-core and a high frequency.
- the processing unit 304 is specifically configured to determine a semantic scenario according to the wake-up word corresponding to the audio data, determine a working core number and a primary frequency of the CPU according to the semantic scenario, and perform an operating state of the CPU according to the working core number and the primary frequency. Setting, a third state is obtained, in which the audio data is semantically analyzed.
- the foregoing units may be implemented as a separate entity, and may be implemented in any combination, and may be implemented as the same entity or a plurality of entities.
- the foregoing method implementation and details are not described herein.
- the voice recognition device of the present embodiment can perform fuzzy voice recognition on the audio data by the blur recognition unit 302 after acquiring the audio data by the acquisition unit 301, and wake up by the wakeup unit 303 when it is determined that the wakeup word exists.
- CPU in sleep state This CPU can be used for semantic analysis of the audio data. Because the scheme uses a DSP with lower running power to replace the CPU with higher power consumption to monitor the audio data, the CPU does not need to be awake all the time, but can be in a dormant state and when needed.
- the solution can greatly reduce system power consumption while preserving mobility and voice wake-up functions, thus extending mobile
- the standby time of the terminal improves the performance of the mobile terminal.
- the embodiment of the present invention further provides a mobile terminal.
- the mobile terminal may include a radio frequency (RF) circuit 401, a memory 402 including one or more computer readable storage media, The input unit 403, the display unit 404, the sensor 405, the audio circuit 406, the Wireless Fidelity (WiFi) module 407, the processor 408 including one or more processing cores, and the power source 409 and the like.
- RF radio frequency
- the mobile terminal structure shown in FIG. 4 does not constitute a limitation of the mobile terminal, and may include more or less components than those illustrated, or a combination of certain components, or different component arrangements. among them:
- the RF circuit 401 can be used for transmitting and receiving information or during a call, and receiving and transmitting signals. Specifically, after receiving downlink information of the base station, the downlink information is processed by one or more processors 408. In addition, the data related to the uplink is sent to the base station. .
- the RF circuit 401 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, and a low noise amplifier (LNA, Low Noise Amplifier), duplexer, etc. In addition, the RF circuit 401 can also communicate with the network and other devices through wireless communication.
- SIM Subscriber Identity Module
- LNA Low Noise Amplifier
- the wireless communication may use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM), General Packet Radio Service (GPRS), and Code Division Multiple Access (CDMA). , Code Division Multiple Access), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), e-mail, Short Messaging Service (SMS), and the like.
- GSM Global System of Mobile communication
- GPRS General Packet Radio Service
- CDMA Code Division Multiple Access
- WCDMA Wideband Code Division Multiple Access
- LTE Long Term Evolution
- SMS Short Messaging Service
- the memory 402 can be used to store software programs and modules, and the processor 408 executes various functional applications and data processing by running software programs and modules stored in the memory 402.
- the memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored according to Data created by the use of the mobile terminal (such as audio data, phone book, etc.).
- memory 402 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 402 may also include a memory controller to provide access to memory 402 by processor 408 and input unit 403.
- Input unit 403 can be used to receive input numeric or character information, as well as to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function controls.
- input unit 403 can include a touch-sensitive surface as well as other input devices.
- Touch-sensitive surfaces also known as touch screens or trackpads, collect touch operations on or near the user (such as the user using a finger, stylus, etc., any suitable object or accessory on a touch-sensitive surface or touch-sensitive Operation near the surface), and drive the corresponding connecting device according to a preset program.
- the touch sensitive surface may include two parts of a touch detection device and a touch controller.
- the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts the touch information into contact coordinates, and sends the touch information.
- the processor 408 is provided and can receive commands from the processor 408 and execute them.
- touch-sensitive surfaces can be implemented in a variety of types, including resistive, capacitive, infrared, and surface acoustic waves.
- the input unit 403 can also include other input devices. Specifically, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackballs, mice, joysticks, and the like.
- Display unit 404 can be used to display information entered by the user or information provided to the user as well as various graphical user interfaces of the mobile terminal, which can be composed of graphics, text, icons, video, and any combination thereof.
- the display unit 404 can include a display panel.
- the display panel can be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
- the touch-sensitive surface can cover the display panel, and when the touch-sensitive surface detects a touch operation thereon or nearby, it is transmitted to the processor 408 to determine the type of the touch event, and then the processor 408 displays the type according to the type of the touch event. A corresponding visual output is provided on the panel.
- the touch-sensitive surface and display panel are implemented as two separate components to perform input and input functions, in some embodiments, the touch-sensitive surface can be integrated with the display panel to implement input and output functions.
- the mobile terminal may also include at least one type of sensor 405, such as a light sensor, motion sensor, and other sensors.
- the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display panel according to the brightness of the ambient light, and the proximity sensor may close the display panel and/or when the mobile terminal moves to the ear.
- the gravity acceleration sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity.
- gesture of the mobile phone such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.; as for the gyroscope, barometer, hygrometer, thermometer, infrared sensor and other sensors that can be configured in the mobile terminal, Let me repeat.
- the audio circuit 406, the speaker, and the microphone provide an audio interface between the user and the mobile terminal.
- the audio circuit 406 can transmit the converted electrical signal of the audio data to the speaker, and convert it into a sound signal output by the speaker; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received by the audio circuit 406 and then converted.
- the audio data is processed by the audio data output processor 408, transmitted via the RF circuit 401 to, for example, another mobile terminal, or the audio data is output to the memory 402 for further processing.
- the audio circuit 406 may also include an earbud jack to provide communication between the peripheral earphone and the mobile terminal.
- WiFi is a short-range wireless transmission technology.
- the mobile terminal can help users to send and receive emails, browse web pages, and access streaming media through the WiFi module 407, which provides wireless broadband Internet access for users.
- FIG. 4 shows the WiFi module 407, it can be understood that it does not belong to the essential configuration of the mobile terminal, and may be omitted as needed within the scope of not changing the essence of the invention.
- the processor 408 is the control center of the mobile terminal, connecting various portions of the entire handset using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 402, and recalling data stored in the memory 402, Perform various functions of the mobile terminal and process data to monitor the mobile phone as a whole.
- the processor 408 may include one or more processing cores; preferably, the processor 408 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
- the modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 408.
- the mobile terminal also includes a power source 409 (such as a battery) for powering various components.
- the power source can be logically coupled to the processor 408 through a power management system to manage functions such as charging, discharging, and power management through the power management system.
- the power supply 409 may also include any one or more of a DC or AC power source, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
- the mobile terminal may further include a camera, a Bluetooth module, and the like, and details are not described herein again.
- the processor 408 in the mobile terminal loads the executable file corresponding to the process of one or more applications into the memory 402 according to the following instruction, and is stored and stored by the processor 408.
- the audio data is obtained, and the audio data is subjected to fuzzy speech recognition by the DSP.
- the DSP wakes up the CPU in the sleep state, and the CPU performs semantic analysis on the audio data.
- the CPU After the CPU is woken up, the CPU can perform semantic analysis on the audio data and perform corresponding operations according to the analysis result.
- the fuzzy clustering analysis or the fuzzy matching algorithm may be used to perform voice recognition on the audio data, and so on.
- the fuzzy clustering analysis or the fuzzy matching algorithm may be used to perform voice recognition on the audio data, and so on.
- the audio data may be subjected to filtering processing such as noise reduction and/or echo cancellation, that is, the processor 408 may also run the storage.
- filtering processing such as noise reduction and/or echo cancellation
- the processor 408 may also run the storage.
- the application in memory 402 thus implements the following functions:
- the audio data is subjected to noise reduction and/or echo cancellation processing to obtain processed audio data.
- the audio data may be further identified by the CPU before the semantic analysis of the audio data by the CPU, that is, the processor 408 may also run the storage.
- the application in memory 402 thus implements the following functions:
- the operation otherwise, when the speech recognition result indicates that there is no wake-up word, the CPU is set to sleep, and the operation of acquiring the audio data is returned.
- the mobile terminal of the embodiment can perform fuzzy speech recognition on the audio data through the DSP.
- the DSP wakes up the CPU in the sleep state, and the CPU can Used for semantic analysis of the audio data. Because the scheme uses a DSP with lower running power to replace the CPU with higher power consumption to monitor the audio data, the CPU does not need to be awake all the time, but can be in a dormant state and when needed.
- the solution can greatly reduce system power consumption while preserving mobility and voice wake-up functions, thus extending mobile
- the standby time of the terminal improves the performance of the mobile terminal.
- an embodiment of the present invention provides a storage medium in which a plurality of instructions are stored, which can be loaded by a processor to perform the steps in any of the voice recognition methods provided by the embodiments of the present invention.
- the instruction can perform the following steps:
- the audio data is obtained, and the audio data is subjected to fuzzy speech recognition by the DSP.
- the DSP wakes up the CPU in the sleep state, and the CPU performs semantic analysis on the audio data.
- the CPU After the CPU is woken up, the CPU can perform semantic analysis on the audio data and perform corresponding operations according to the analysis result.
- the fuzzy clustering analysis or the fuzzy matching algorithm may be used to perform voice recognition on the audio data, and so on.
- the fuzzy clustering analysis or the fuzzy matching algorithm may be used to perform voice recognition on the audio data, and so on.
- the audio data may be subjected to filtering processing such as noise reduction and/or echo cancellation, that is, the instruction may also perform the following steps. :
- the audio data is subjected to noise reduction and/or echo cancellation processing to obtain processed audio data.
- the audio data may be further identified by the CPU before the semantic analysis of the audio data by the CPU, that is, the instruction may further perform the following steps. :
- the operation otherwise, when the speech recognition result indicates that there is no wake-up word, the CPU is set to sleep, and the operation of acquiring the audio data is returned.
- the storage medium may include: a read only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.
- ROM read only memory
- RAM random access memory
- magnetic disk a magnetic disk or an optical disk.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Quality & Reliability (AREA)
- Probability & Statistics with Applications (AREA)
- Telephone Function (AREA)
Abstract
Description
Claims (17)
- 一种语音识别方法,包括:获取音频数据;通过数字信号处理器对所述音频数据进行模糊语音识别;当模糊语音识别结果指示存在唤醒词时,由所述数字信号处理器唤醒处于休眠状态的中央处理器,所述中央处理器用于对所述音频数据进行语义分析。
- 根据权利要求1所述的方法,所述通过数字信号处理器对所述音频数据进行模糊语音识别,包括:通过数字信号处理器,采用模糊聚类分析对所述音频数据进行语音识别,得到模糊语音识别结果。
- 根据权利要求2所述的方法,所述通过数字信号处理器,采用模糊聚类分析对所述音频数据进行语音识别,得到模糊语音识别结果,包括:根据模糊聚类分析建立模糊聚类神经网络;将所述模糊聚类神经网络作为概率密度函数的估计器,对所述音频数据包含唤醒词的概率进行预测;若预测结果指示概率大于等于设定值,则生成指示存在唤醒词的模糊语音识别结果;若预测结果指示概率小于所述设定值,则生成指示不存在唤醒词的模糊语音识别结果。
- 根据权利要求1所述的方法,所述通过数字信号处理器对所述音频数据进行模糊语音识别,包括:通过数字信号处理器,采用模糊匹配算法对所述音频数据进行语音识别,得到模糊语音识别结果。
- 根据权利要求4所述的方法,所述通过数字信号处理器,采用模糊匹配算法对所述音频数据进行语音识别,得到模糊语音识别结果,包括:获取唤醒词读音的特征图,得到标准特征图;分析所述音频数据中各个单词读音的特征图,得到待匹配特征图;根据预设的隶属度函数计算各个待匹配特征图属于标准特征图的程度值;若所述程度值大于等于预设值,则生成指示存在唤醒词的模糊语音识别结果;若所述程度值小于所述预设值,则生成指示不存在唤醒词的模糊语音识别结果。
- 根据权利要求1所述的方法,在所述由所述数字信号处理器唤醒处于休眠状态的中央处理器之后,还包括:通过所述中央处理器对所述音频数据进行语义分析,并根据分析结果执行所述分析结果相应的操作。
- 根据权利要求6所述的方法,所述通过所述中央处理器对所述音频数据进行语义分析之前,还包括:从所述数字信号处理器中读取所述音频数据中包含唤醒词的数据,得到唤醒数据;通过所述中央处理器对所述唤醒数据进行语音识别;当语音识别结果指示存在唤醒词时,执行通过所述中央处理器对所述音频数据进行语义分析的步骤;当语音识别结果指示不存在唤醒词时,将所述中央处理器设置为休眠,并返回执行获取音频数据的步骤。
- 根据权利要求7所述的方法,所述通过所述中央处理器对所述唤醒数据进行语音识别,包括:将所述中央处理器的工作状态设置为第一状态,所述第一状态为单核且低频;在所述第一状态下,对所述唤醒数据进行语音识别。
- 根据权利要求6至8任一项所述的方法,所述通过所述中央处理器对所述音频数据进行语义分析,包括:将所述中央处理器的工作状态设置为第二状态,所述第二状态为多核且高频;在所述第二状态下,对所述音频数据进行语义分析。
- 根据权利要求6至8任一项所述的方法,所述通过所述中央处理器对所述音频数据进行语义分析,包括:根据所述音频数据对应的唤醒词确定语义场景;根据语义场景确定所述中央处理器的工作核数和主频大小;根据所述工作核数和主频大小对所述中央处理器的工作状态进行设置,得到第三状态;在所述第三状态下,对所述音频数据进行语义分析。
- 根据权利要求1至8任一项所述的方法,所述通过数字信号处理器对所述音频数据进行模糊语音识别之前,还包括:对所述音频数据进行降噪和/或回音消除处理。
- 根据权利要求6至8任一项所述的方法,所述根据分析结果执行相应操作,包括:根据所述分析结果确定操作对象和操作内容;对所述操作对象执行所述操作内容。
- 一种语音识别装置,包括:获取单元,用于获取音频数据;模糊识别单元,用于通过数字信号处理器对所述音频数据进行模糊语音识别;唤醒单元,用于当模糊语音识别结果指示存在唤醒词时,唤醒处于休眠状态的中央处理器,所述中央处理器用于对所述音频数据进行语义分析。
- 根据权利要求13所述的装置,还包括处理单元:所述处理单元,用于通过所述中央处理器对所述音频数据进行语义分析,并根据分析结果执行相应操作。
- 根据权利要求13所述的装置,其特征在于,还包括精确识别单元;所述精确识别单元,用于从所述数字信号处理器中读取所述音频数据中包含唤醒词的数据,得到唤醒数据;通过所述中央处理器对所述唤醒数据进行语音识别;当语音识别结果指示存在唤醒词时,触发所述处理单元执行通过所述中央处理器对所述音频数据进行语义分析的操作;当语音识别结果指示不存在唤醒词时,将所述中央处理器设置为休眠,并触发所述获取单元执行获取音频数据的操作。
- 根据权利要求13-15任一项所述的装置,其特征在于,所述处理单元,具体用于根据所述音频数据对应的唤醒词确定语义场景,根据语义场景确定所述中央处理器的工作核数和主频大小,根据所述工作核数和主频大小对所述中央处理器的工作状态进行设置,得到第三状态,在所述第三状态下,对所述音频数据进行语义分析。
- 一种存储介质,所述存储介质存储有多条指令,所述指令适于处理器进行加载,以执行权利要求1至12任一项所述的语音识别方法中的步骤。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020502569A JP6949195B2 (ja) | 2017-07-19 | 2018-06-20 | 音声認識方法及び装置、並びに記憶媒体 |
KR1020207004025A KR102354275B1 (ko) | 2017-07-19 | 2018-06-20 | 음성 인식 방법 및 장치, 그리고 저장 매체 |
US16/743,150 US11244672B2 (en) | 2017-07-19 | 2020-01-15 | Speech recognition method and apparatus, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710588382.8 | 2017-07-19 | ||
CN201710588382.8A CN107360327B (zh) | 2017-07-19 | 2017-07-19 | 语音识别方法、装置和存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/743,150 Continuation US11244672B2 (en) | 2017-07-19 | 2020-01-15 | Speech recognition method and apparatus, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019015435A1 true WO2019015435A1 (zh) | 2019-01-24 |
Family
ID=60285244
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/091926 WO2019015435A1 (zh) | 2017-07-19 | 2018-06-20 | 语音识别方法、装置和存储介质 |
Country Status (5)
Country | Link |
---|---|
US (1) | US11244672B2 (zh) |
JP (1) | JP6949195B2 (zh) |
KR (1) | KR102354275B1 (zh) |
CN (1) | CN107360327B (zh) |
WO (1) | WO2019015435A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175016A (zh) * | 2019-05-29 | 2019-08-27 | 英业达科技有限公司 | 启动语音助理的方法及具有语音助理的电子装置 |
EP3846162A1 (en) * | 2020-01-03 | 2021-07-07 | Baidu Online Network Technology (Beijing) Co., Ltd. | Smart audio device, calling method for audio device, electronic device and computer readable medium |
CN113223510A (zh) * | 2020-01-21 | 2021-08-06 | 青岛海尔电冰箱有限公司 | 冰箱及其设备语音交互方法、计算机可读存储介质 |
EP3851952A3 (en) * | 2020-03-12 | 2021-08-25 | Beijing Baidu Netcom Science And Technology Co. Ltd. | Signal processing method, signal processing device, and electronic device |
CN117672200A (zh) * | 2024-02-02 | 2024-03-08 | 天津市爱德科技发展有限公司 | 一种物联网设备的控制方法、设备及系统 |
Families Citing this family (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107360327B (zh) * | 2017-07-19 | 2021-05-07 | 腾讯科技(深圳)有限公司 | 语音识别方法、装置和存储介质 |
CN108337362A (zh) | 2017-12-26 | 2018-07-27 | 百度在线网络技术(北京)有限公司 | 语音交互方法、装置、设备和存储介质 |
CN110164426B (zh) * | 2018-02-10 | 2021-10-26 | 佛山市顺德区美的电热电器制造有限公司 | 语音控制方法和计算机存储介质 |
CN108831477B (zh) * | 2018-06-14 | 2021-07-09 | 出门问问信息科技有限公司 | 一种语音识别方法、装置、设备及存储介质 |
CN109003604A (zh) * | 2018-06-20 | 2018-12-14 | 恒玄科技(上海)有限公司 | 一种实现低功耗待机的语音识别方法及系统 |
CN108986822A (zh) * | 2018-08-31 | 2018-12-11 | 出门问问信息科技有限公司 | 语音识别方法、装置、电子设备及非暂态计算机存储介质 |
CN109686370A (zh) * | 2018-12-24 | 2019-04-26 | 苏州思必驰信息科技有限公司 | 基于语音控制进行斗地主游戏的方法及装置 |
CN111383632B (zh) * | 2018-12-28 | 2023-10-31 | 北京小米移动软件有限公司 | 电子设备 |
CN109886386B (zh) * | 2019-01-30 | 2020-10-27 | 北京声智科技有限公司 | 唤醒模型的确定方法及装置 |
CN109922397B (zh) * | 2019-03-20 | 2020-06-16 | 深圳趣唱科技有限公司 | 音频智能处理方法、存储介质、智能终端及智能蓝牙耳机 |
CN109979438A (zh) * | 2019-04-04 | 2019-07-05 | Oppo广东移动通信有限公司 | 语音唤醒方法及电子设备 |
CN112015258B (zh) * | 2019-05-31 | 2022-07-15 | 瑞昱半导体股份有限公司 | 处理系统与控制方法 |
CN110265029A (zh) * | 2019-06-21 | 2019-09-20 | 百度在线网络技术(北京)有限公司 | 语音芯片和电子设备 |
CN112207811B (zh) * | 2019-07-11 | 2022-05-17 | 杭州海康威视数字技术股份有限公司 | 一种机器人控制方法、装置、机器人及存储介质 |
WO2021016931A1 (zh) * | 2019-07-31 | 2021-02-04 | 华为技术有限公司 | 一种集成芯片以及处理传感器数据的方法 |
CN110968353A (zh) * | 2019-12-06 | 2020-04-07 | 惠州Tcl移动通信有限公司 | 中央处理器的唤醒方法、装置、语音处理器以及用户设备 |
CN111071879A (zh) * | 2020-01-01 | 2020-04-28 | 门鑫 | 电梯楼层登记方法、装置及存储介质 |
CN113628616A (zh) * | 2020-05-06 | 2021-11-09 | 阿里巴巴集团控股有限公司 | 音频采集设备、无线耳机以及电子设备系统 |
CN111679861A (zh) * | 2020-05-09 | 2020-09-18 | 浙江大华技术股份有限公司 | 电子设备的唤醒装置、方法和计算机设备和存储介质 |
CN113760218A (zh) * | 2020-06-01 | 2021-12-07 | 阿里巴巴集团控股有限公司 | 数据处理方法、装置、电子设备及计算机存储介质 |
CN111696553B (zh) * | 2020-06-05 | 2023-08-22 | 北京搜狗科技发展有限公司 | 一种语音处理方法、装置及可读介质 |
US11877237B2 (en) * | 2020-06-15 | 2024-01-16 | TriSpace Technologies (OPC) Pvt. Ltd. | System and method for optimizing power consumption in multimedia signal processing in mobile devices |
CN111755002B (zh) * | 2020-06-19 | 2021-08-10 | 北京百度网讯科技有限公司 | 语音识别装置、电子设备和语音识别方法 |
CN111833870A (zh) * | 2020-07-01 | 2020-10-27 | 中国第一汽车股份有限公司 | 车载语音系统的唤醒方法、装置、车辆和介质 |
CN112133302B (zh) * | 2020-08-26 | 2024-05-07 | 北京小米松果电子有限公司 | 预唤醒终端的方法、装置及存储介质 |
CN111986671B (zh) * | 2020-08-28 | 2024-04-05 | 京东科技信息技术有限公司 | 服务机器人及其语音开关机方法和装置 |
CN112216283B (zh) * | 2020-09-24 | 2024-02-23 | 建信金融科技有限责任公司 | 一种语音识别方法、装置、设备及存储介质 |
CN112698872A (zh) * | 2020-12-21 | 2021-04-23 | 北京百度网讯科技有限公司 | 语音数据处理的方法、装置、设备及存储介质 |
CN216145422U (zh) * | 2021-01-13 | 2022-03-29 | 神盾股份有限公司 | 语音助理系统 |
CN113053360A (zh) * | 2021-03-09 | 2021-06-29 | 南京师范大学 | 一种精准度高的基于语音软件识别方法 |
CN113297363A (zh) * | 2021-05-28 | 2021-08-24 | 安徽领云物联科技有限公司 | 智能语义交互机器人系统 |
CN113393838A (zh) * | 2021-06-30 | 2021-09-14 | 北京探境科技有限公司 | 语音处理方法、装置、计算机可读存储介质及计算机设备 |
CN117253488A (zh) * | 2022-06-10 | 2023-12-19 | Oppo广东移动通信有限公司 | 语音识别方法、装置、设备及存储介质 |
CN118506774A (zh) * | 2023-02-15 | 2024-08-16 | Oppo广东移动通信有限公司 | 语音唤醒方法、装置、电子设备、存储介质及产品 |
CN116822529B (zh) * | 2023-08-29 | 2023-12-29 | 国网信息通信产业集团有限公司 | 基于语义泛化的知识要素抽取方法 |
CN117524228A (zh) * | 2024-01-08 | 2024-02-06 | 腾讯科技(深圳)有限公司 | 语音数据处理方法、装置、设备及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104866274A (zh) * | 2014-12-01 | 2015-08-26 | 联想(北京)有限公司 | 信息处理方法及电子设备 |
CN105723451A (zh) * | 2013-12-20 | 2016-06-29 | 英特尔公司 | 从低功率始终侦听模式到高功率语音识别模式的转换 |
CN106356059A (zh) * | 2015-07-17 | 2017-01-25 | 中兴通讯股份有限公司 | 语音控制方法、装置及投影仪设备 |
CN107360327A (zh) * | 2017-07-19 | 2017-11-17 | 腾讯科技(深圳)有限公司 | 语音识别方法、装置和存储介质 |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2906605B2 (ja) * | 1990-07-12 | 1999-06-21 | 松下電器産業株式会社 | パターン認識装置 |
JPH06149286A (ja) * | 1992-11-10 | 1994-05-27 | Clarion Co Ltd | 不特定話者音声認識装置 |
JP2004045900A (ja) * | 2002-07-12 | 2004-02-12 | Toyota Central Res & Dev Lab Inc | 音声対話装置及びプログラム |
US9117449B2 (en) * | 2012-04-26 | 2015-08-25 | Nuance Communications, Inc. | Embedded system for construction of small footprint speech recognition with user-definable constraints |
CN102866921B (zh) | 2012-08-29 | 2016-05-11 | 惠州Tcl移动通信有限公司 | 一种多核cpu的调控方法及系统 |
US10304465B2 (en) * | 2012-10-30 | 2019-05-28 | Google Technology Holdings LLC | Voice control user interface for low power mode |
US9704486B2 (en) | 2012-12-11 | 2017-07-11 | Amazon Technologies, Inc. | Speech recognition power management |
CN105575395A (zh) * | 2014-10-14 | 2016-05-11 | 中兴通讯股份有限公司 | 语音唤醒方法及装置、终端及其处理方法 |
KR102299330B1 (ko) * | 2014-11-26 | 2021-09-08 | 삼성전자주식회사 | 음성 인식 방법 및 그 전자 장치 |
JP6501217B2 (ja) | 2015-02-16 | 2019-04-17 | アルパイン株式会社 | 情報端末システム |
GB2535766B (en) * | 2015-02-27 | 2019-06-12 | Imagination Tech Ltd | Low power detection of an activation phrase |
CN105976808B (zh) * | 2016-04-18 | 2023-07-25 | 成都启英泰伦科技有限公司 | 一种智能语音识别系统及方法 |
CN106020987A (zh) * | 2016-05-31 | 2016-10-12 | 广东欧珀移动通信有限公司 | 处理器中内核运行配置的确定方法以及装置 |
US20180293974A1 (en) * | 2017-04-10 | 2018-10-11 | Intel IP Corporation | Spoken language understanding based on buffered keyword spotting and speech recognition |
US10311870B2 (en) * | 2017-05-10 | 2019-06-04 | Ecobee Inc. | Computerized device with voice command input capability |
-
2017
- 2017-07-19 CN CN201710588382.8A patent/CN107360327B/zh active Active
-
2018
- 2018-06-20 KR KR1020207004025A patent/KR102354275B1/ko active IP Right Grant
- 2018-06-20 WO PCT/CN2018/091926 patent/WO2019015435A1/zh active Application Filing
- 2018-06-20 JP JP2020502569A patent/JP6949195B2/ja active Active
-
2020
- 2020-01-15 US US16/743,150 patent/US11244672B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105723451A (zh) * | 2013-12-20 | 2016-06-29 | 英特尔公司 | 从低功率始终侦听模式到高功率语音识别模式的转换 |
CN104866274A (zh) * | 2014-12-01 | 2015-08-26 | 联想(北京)有限公司 | 信息处理方法及电子设备 |
CN106356059A (zh) * | 2015-07-17 | 2017-01-25 | 中兴通讯股份有限公司 | 语音控制方法、装置及投影仪设备 |
CN107360327A (zh) * | 2017-07-19 | 2017-11-17 | 腾讯科技(深圳)有限公司 | 语音识别方法、装置和存储介质 |
Non-Patent Citations (1)
Title |
---|
LIU, YUHONG ET AL.: "Speech Recognition Based on Fuzzy Clustering Neural Network", CHINESE JOURNAL OF COMPUTERS, vol. 29, no. 10, 30 October 2006 (2006-10-30), pages 1894 - 1900, ISSN: 0254-4164 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175016A (zh) * | 2019-05-29 | 2019-08-27 | 英业达科技有限公司 | 启动语音助理的方法及具有语音助理的电子装置 |
EP3846162A1 (en) * | 2020-01-03 | 2021-07-07 | Baidu Online Network Technology (Beijing) Co., Ltd. | Smart audio device, calling method for audio device, electronic device and computer readable medium |
JP2021110945A (ja) * | 2020-01-03 | 2021-08-02 | バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド | スマートオーディオ装置、方法、電子デバイスおよびコンピュータ可読媒体 |
CN113223510A (zh) * | 2020-01-21 | 2021-08-06 | 青岛海尔电冰箱有限公司 | 冰箱及其设备语音交互方法、计算机可读存储介质 |
CN113223510B (zh) * | 2020-01-21 | 2022-09-20 | 青岛海尔电冰箱有限公司 | 冰箱及其设备语音交互方法、计算机可读存储介质 |
EP3851952A3 (en) * | 2020-03-12 | 2021-08-25 | Beijing Baidu Netcom Science And Technology Co. Ltd. | Signal processing method, signal processing device, and electronic device |
CN117672200A (zh) * | 2024-02-02 | 2024-03-08 | 天津市爱德科技发展有限公司 | 一种物联网设备的控制方法、设备及系统 |
CN117672200B (zh) * | 2024-02-02 | 2024-04-16 | 天津市爱德科技发展有限公司 | 一种物联网设备的控制方法、设备及系统 |
Also Published As
Publication number | Publication date |
---|---|
US11244672B2 (en) | 2022-02-08 |
CN107360327B (zh) | 2021-05-07 |
JP6949195B2 (ja) | 2021-10-13 |
KR102354275B1 (ko) | 2022-01-21 |
KR20200027554A (ko) | 2020-03-12 |
US20200152177A1 (en) | 2020-05-14 |
CN107360327A (zh) | 2017-11-17 |
JP2020527754A (ja) | 2020-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11244672B2 (en) | Speech recognition method and apparatus, and storage medium | |
WO2017206916A1 (zh) | 处理器中内核运行配置的确定方法以及相关产品 | |
WO2018032581A1 (zh) | 一种应用程序控制方法及装置 | |
WO2017063604A1 (zh) | 消息推送方法及移动终端和消息推送服务器 | |
WO2017206915A1 (zh) | 处理器中内核运行配置的确定方法以及相关产品 | |
CN105630846B (zh) | 头像更新方法及装置 | |
WO2015081664A1 (zh) | 控制无线网络开关方法、装置、设备及系统 | |
CN103699409A (zh) | 一种电子设备切入唤醒状态的方法、装置和系统 | |
CN104375886A (zh) | 信息处理方法、装置和电子设备 | |
CN109389977B (zh) | 一种语音交互方法及装置 | |
WO2017206860A1 (zh) | 移动终端的处理方法及移动终端 | |
WO2017206918A1 (zh) | 终端加速唤醒方法以及相关产品 | |
CN111443803A (zh) | 模式切换方法、装置、存储介质及移动终端 | |
CN110543333B (zh) | 针对处理器的休眠处理方法、装置、移动终端和存储介质 | |
CN115985323B (zh) | 语音唤醒方法、装置、电子设备及可读存储介质 | |
CN111027406B (zh) | 图片识别方法、装置、存储介质及电子设备 | |
WO2018214745A1 (zh) | 应用控制方法及相关产品 | |
CN110277097B (zh) | 数据处理方法及相关设备 | |
CN109062643A (zh) | 一种显示界面调整方法、装置及终端 | |
CN116486833B (zh) | 音频增益调整方法、装置、存储介质及电子设备 | |
CN113254088A (zh) | 功能程序唤醒方法、终端及存储介质 | |
CN111897916A (zh) | 语音指令识别方法、装置、终端设备及存储介质 | |
CN111580911A (zh) | 一种终端的操作提示方法、装置、存储介质及终端 | |
WO2015067206A1 (zh) | 一种文件查找的方法及终端 | |
CN112433694B (zh) | 光强度调整方法及装置、存储介质和动终端 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18835410 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020502569 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20207004025 Country of ref document: KR Kind code of ref document: A |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18835410 Country of ref document: EP Kind code of ref document: A1 |