WO2020102943A1 - Method and apparatus for generating gesture recognition model, storage medium, and electronic device - Google Patents

Method and apparatus for generating gesture recognition model, storage medium, and electronic device

Info

Publication number
WO2020102943A1
WO2020102943A1 PCT/CN2018/116221 CN2018116221W WO2020102943A1 WO 2020102943 A1 WO2020102943 A1 WO 2020102943A1 CN 2018116221 W CN2018116221 W CN 2018116221W WO 2020102943 A1 WO2020102943 A1 WO 2020102943A1
Authority
WO
WIPO (PCT)
Prior art keywords
trained
electronic
ultrasonic signal
gesture recognition
signal
Prior art date
Application number
PCT/CN2018/116221
Other languages
French (fr)
Chinese (zh)
Inventor
陈岩
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to PCT/CN2018/116221 priority Critical patent/WO2020102943A1/en
Publication of WO2020102943A1 publication Critical patent/WO2020102943A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints

Abstract

Disclosed are a method and apparatus for generating a gesture recognition model, a storage medium, and an electronic device. The method comprises: acquiring a multi-segment ultrasonic signal for training generating a spectrogram for training corresponding to each segment in the ultrasonic signal so as to obtain multiple spectrograms for training; constructing a spectrogram database according to the multiple spectrograms; and training the spectrogram database to obtain a gesture recognition model.

Description

Gesture recognition model generation method, device, storage medium and electronic equipment Technical field

The present application belongs to the technical field of terminals, and particularly relates to a method, device, storage medium, and electronic device for generating a gesture recognition model.

Background technique

With the rapid development of terminal technology, the functions on the terminal are becoming more and more abundant. For example, human-computer interaction is realized through gesture recognition. Gesture recognition generally refers to the recognition of face and hand movements. Users can use simple gestures to control or interact with the terminal, allowing the terminal to understand the user's behavior. In the related art, ultrasonic signals can be used to realize gesture recognition. A gesture recognition model is required when performing ultrasonic gesture recognition. However, in the related art, the recognition accuracy of the gesture recognition model used for ultrasonic gesture recognition is low.

Summary of the invention

Embodiments of the present application provide a method, device, storage medium, and electronic device for generating a gesture recognition model, which can obtain a gesture recognition model with higher accuracy, thereby improving the accuracy of gesture recognition.

In a first aspect, an embodiment of the present application provides a method for generating a gesture recognition model, including:

Obtain multiple ultrasonic signals to be trained;

Generate a spectrum map to be trained corresponding to each segment of the ultrasonic signal to be trained to obtain multiple spectrum maps to be trained;

Constructing a database of spectrograms according to the plurality of spectrograms to be trained;

Training the speech spectrum database to obtain a gesture recognition model.

In a second aspect, an embodiment of the present application provides a device for generating a gesture recognition model, including:

The acquisition module is used to acquire multiple ultrasonic signals to be trained;

The generating module is used to generate a spectrum map to be trained corresponding to each ultrasonic signal to be trained, and obtain multiple spectrum maps to be trained;

A building module, used to build a database of spectral charts according to the multiple spectral charts to be trained;

The training module is used for training the speech spectrum database to obtain a gesture recognition model.

In a third aspect, an embodiment of the present application provides a storage medium on which a computer program is stored, wherein, when the computer program is executed on a computer, the computer is caused to execute the method for generating a gesture recognition model provided in this embodiment .

According to a fourth aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is used to execute the computer program by calling the computer program stored in the memory:

Obtain multiple ultrasonic signals to be trained;

Generate a spectrum map to be trained corresponding to each segment of the ultrasonic signal to be trained to obtain multiple spectrum maps to be trained;

Constructing a database of spectrograms according to the plurality of spectrograms to be trained;

Training the speech spectrum database to obtain a gesture recognition model.

In the embodiment of the present application, since the feature points of the spectrogram are more and more prominent than the ultrasonic signal, compared with the gesture recognition model generated by directly training the ultrasonic signal as sample data, the In the example, by converting the ultrasonic signal into a spectrogram, and then training the spectrogram as sample data, the accuracy of the generated gesture recognition model is higher, thereby improving the accuracy of gesture recognition.

BRIEF DESCRIPTION

The technical solutions and beneficial effects of the present application will be apparent through the detailed description of the specific implementation of the present application in conjunction with the accompanying drawings.

FIG. 1 is a first schematic flowchart of a method for generating a gesture recognition model provided by an embodiment of the present application.

FIG. 2 is a second schematic flowchart of a method for generating a gesture recognition model provided by an embodiment of the present application.

FIG. 3 is a third flowchart of a method for generating a gesture recognition model provided by an embodiment of the present application.

4 is a schematic structural diagram of a device for generating a gesture recognition model provided by an embodiment of the present application.

5 is a first schematic structural diagram of an electronic device provided by an embodiment of the present application.

6 is a second schematic structural diagram of an electronic device provided by an embodiment of the present application.

detailed description

Please refer to the illustration, where the same component symbol represents the same component. The principle of the present application is illustrated by implementation in an appropriate computing environment. The following description is based on the illustrated specific embodiments of the present application, which should not be considered as limiting other specific embodiments not detailed herein.

Please refer to FIG. 1. FIG. 1 is a first schematic flowchart of a method for generating a gesture recognition model provided by an embodiment of the present application. The flow of the method for generating the gesture recognition model may include:

In 101, a plurality of ultrasonic signals to be trained are acquired.

Among them, the ultrasonic signal to be trained is one of the audio signals, and the vibration frequency is greater than 20000 Hz, which exceeds the general upper limit of human ear hearing (20000 Hz), and no noise is generated, causing uncomfortable experience for users.

For example, electronic devices can emit ultrasonic signals. Therefore, when the user's face or hand moves within the preset range of the electronic device, the electronic device acquires the ultrasonic signal reflected by the user's face or hand. Among them, the ultrasonic signal emitted by the electronic device may be a continuous frequency sweep signal. The ultrasonic signal can be expressed as:

Figure PCTCN2018116221-appb-000001
Among them, T represents the duration, f 1 represents the start frequency, f 2 represents the end frequency, A 1 represents the amplitude, f s represents the sampling frequency, and priority is not limited, the f s = 96kHz.

Wherein, the ultrasonic signal to be trained may be an ultrasonic signal reflected by the user's face when the user's face approaches or is away from the electronic device. Or the ultrasonic signal to be trained may be the electronic device transmitting the user's hand within the preset range of the electronic device to the left, the right, the upward, the downward, the left and right, or the back and forth The ultrasonic signal is reflected by the user's hand.

In this embodiment, taking user A as an example, the electronic device emits an ultrasonic signal. When the electronic device emits an ultrasonic signal, user A can bring the electronic device closer to the face. The ultrasonic signal reflected by the user's face is electronically obtained to obtain the ultrasonic signal to be trained corresponding to user A. Similarly, the electronic device can obtain ultrasonic signals to be trained corresponding to multiple users, such as user B, user C, user D, and user E, in accordance with the foregoing manners, to obtain multiple segments of ultrasonic signals to be trained.

Among them, the preset range may be automatically determined by the electronic device, or may be set by the user. The preset range can be set to an area 3 cm away from the electronic device. The preset range can also be set to an area 5 cm away from the electronic device, and so on, without specific limitations here.

It can be understood that the above are only a few examples of ultrasonic signals to be trained, and are not intended to limit the present application.

In 102, a to-be-trained speech spectrum corresponding to each segment of the to-be-trained ultrasonic signal is generated, and multiple to-be-trained speech spectrums are obtained.

In this embodiment, after the electronic device acquires a plurality of ultrasonic signals to be trained, the electronic device may generate a spectral map to be trained corresponding to each ultrasonic signal to be trained to obtain multiple spectral charts to be trained.

In 103, a spectrogram database is constructed based on a plurality of spectrograms to be trained.

It can be understood that the more sample data collected, the higher the accuracy of the model obtained after training. In this embodiment, the process 101 to the process 103 are the sample data collection process of the gesture recognition model, that is, the process of collecting the spectrogram in the spectrogram database. In order to obtain a relatively high-precision gesture recognition model, the electronic device can obtain more phonograms for training.

For example, you can build a spectrogram database in the following ways:

Method 1. Build a spectrogram database for a certain scene.

Taking user A as an example, for example, user A emits an ultrasonic signal when the electronic device is close to the face. When the electronic device emits an ultrasonic signal, user A can bring the electronic device closer to the face. The ultrasonic signal reflected by the user's face is electronically acquired to obtain the ultrasonic signal to be trained by user A. Similarly, the electronic device can acquire the ultrasound signal to be trained by user B, the ultrasound signal to be trained by user C, the ultrasound signal to be trained by user D, the ultrasound signal to be trained by user E, and so on in the above manner. For example, suppose that the electronic device obtains a total of 500 users' ultrasonic signals to be trained. Then, the electronic device generates a to-be-trained spectral map corresponding to each user's to-be-trained ultrasonic signal, and obtains to-be-trained spectral maps of 500 users. The electronic device can aggregate 500 users' to-be-trained spectrum charts into a spectrum chart database.

Method 2. Build a spectrogram database for multiple scenarios.

Taking user A as an example, for example, user A emits an ultrasonic signal when the electronic device is close to the face. When the electronic device emits an ultrasonic signal, user A can bring the electronic device closer to the face. The ultrasonic signal reflected by the user's face is electronically obtained to obtain the ultrasonic signal to be trained by user A in the scene. Similarly, the electronic device can obtain the ultrasonic signal to be trained by user B in this scenario, the ultrasonic signal to be trained by user C in this scenario, the ultrasonic signal to be trained by user D in this scenario, and the user E in this manner in the above manner. Ultrasound signals to be trained in the scene, etc. For example, assume that the electronic device obtains a total of 500 ultrasound signals to be trained in this scenario by 500 users. Then, the electronic device generates a to-be-trained spectrogram corresponding to each user's to-be-trained ultrasonic signal under the scene, to obtain 500 to-be-trained spectrograms of the user under the scene.

In the same way, the electronic device can obtain 500 user spectrum spectra to be trained in multiple scenarios in the above manner. The electronic device can aggregate 500 to-be-trained spectral graphs of 500 users in the same scene into a spectral graph sub-database. For example, assuming that there are 6 scenes, the electronic device can obtain 6 sub-databases of the spectrogram. The electronic device can combine these 6 spectral graph sub-databases into a spectral graph database.

In 104, training is performed on the spectrogram database to obtain a gesture recognition model.

For example, if the spectrogram database is only built for a certain scene. Then the electronic device can directly train the spectrogram database to obtain a gesture recognition model. The gesture recognition model can only recognize gestures in a certain scene.

If the spectrogram database is constructed for multiple scenarios, the electronic device can train each sub-database of the spectrogram database to obtain a gesture recognition model. The gesture recognition model can recognize gestures in multiple scenes.

In summary, the electronic device can construct the database of the spectrogram according to actual needs. For example, in some embodiments, the electronic device needs a gesture recognition model that can recognize gestures in 8 scenarios. Then, the electronic device can construct a spectrogram database for the eight scenes, and then train the spectrogram database to obtain a gesture recognition model, and the gesture recognition model can recognize the gestures in the eight scenes.

It should be noted that other methods can also be used to train the spectrogram database to obtain a gesture recognition model, which is not limited to the above methods.

It can be understood that, because the feature points of the spectrogram are more and more prominent than the ultrasonic signal, compared with the gesture recognition model generated by directly training the ultrasonic signal as sample data, this embodiment In, by converting the ultrasonic signal into a spectrogram, and then using the spectrogram as sample data for training, the accuracy of the generated gesture recognition model is higher, thereby improving the accuracy of gesture recognition.

Please refer to FIG. 2, which is a second schematic flowchart of a method for generating a gesture recognition model provided by an embodiment of the present application. The method for generating the gesture recognition model may include:

In 201, the electronic device acquires multiple ultrasonic signals to be trained.

Among them, the ultrasonic signal to be trained is one of the audio signals, and the vibration frequency is greater than 20000 Hz, which exceeds the general upper limit of human hearing (20000 Hz), and no noise is generated, causing a user's uncomfortable experience.

For example, electronic devices can use earpieces or speakers to emit ultrasonic signals. Therefore, when a foreign object is moving within a preset range of the electronic device, the electronic device may use a microphone to receive the ultrasonic signal reflected by the foreign object. Among them, the ultrasonic signal emitted by the electronic device may be a continuous frequency sweep signal. The ultrasonic signal can be expressed as:

Figure PCTCN2018116221-appb-000002
Among them, T represents the duration, f 1 represents the start frequency, f 2 represents the end frequency, A 1 represents the amplitude, f s represents the sampling frequency, and priority is not limited, the f s = 96kHz.

Wherein, the ultrasonic signal to be trained may be an ultrasonic signal reflected by the user's face when the user's face approaches or is away from the electronic device. Or the ultrasonic signal to be trained may be the electronic device transmitting the user's hand within the preset range of the electronic device to the left, the right, the upward, the downward, the left and right, or the back and forth. The ultrasonic signal is reflected by the user's hand.

In this embodiment, taking user A as an example, the electronic device emits an ultrasonic signal. When the electronic device emits an ultrasonic signal, user A can bring the electronic device closer to the face. The ultrasonic signal reflected by the user's face is electronically obtained to obtain the ultrasonic signal to be trained corresponding to user A. Similarly, the electronic device can obtain ultrasonic signals to be trained corresponding to multiple users, such as user B, user C, user D, and user E, in accordance with the foregoing manners, to obtain multiple segments of ultrasonic signals to be trained.

Among them, the preset range may be automatically determined by the electronic device, or may be set by the user. The preset range can be set to an area 3 cm away from the electronic device. The preset range can also be set to an area 5 cm away from the electronic device, and so on, without specific limitations here.

In 202, the electronic device performs frame windowing processing on each segment of the ultrasonic signal to be trained to obtain a multi-frame windowing signal corresponding to each segment of the ultrasonic signal to be trained.

For example, when the electronic device receives multiple segments of ultrasonic signals to be trained, the electronic device may perform frame-by-frame windowing on each segment of the ultrasonic signals to be trained to obtain multiple frame windowing signals corresponding to each segment of the ultrasonic signals to be trained. Among them, usually one frame length is 20ms, and the frame shift is 10ms to frame each ultrasonic signal to be trained. When windowing each segment of the ultrasonic signal to be trained, priority is given without limitation, and the window function can select a rectangular window, that is, w (n) = 1.

Among them, the ultrasonic signal is a time domain signal. The windowed signal is a time-domain signal.

In 203, the electronic device performs Fourier transform on the windowed signal of each frame to obtain a multi-frame frequency domain signal.

It can be understood that performing a Fourier transform on the time domain signal can convert the time domain signal into a frequency domain signal. Since the windowed signal is a time-domain signal, the electronic device performs Fourier transform on the windowed signal of each frame to obtain a multi-frame frequency domain signal.

In 204, the electronic device calculates the energy density of the signal in the frequency domain of each frame to obtain the energy density of all the frames corresponding to each ultrasonic signal to be trained.

In 205, the electronic device generates, according to the energy density of all frames corresponding to each segment of the ultrasonic signal to be trained, a to-be-trained speech spectrum map corresponding to each to-be-trained ultrasound signal, to obtain multiple to-be-trained speech spectrum maps.

For example, the electronic device calculates the energy density of the frequency domain signal of each frame to obtain the energy density of all frames corresponding to each segment of the ultrasonic signal to be trained. The electronic device generates, according to the energy density of all frames corresponding to each segment of the ultrasonic signal to be trained, a to-be-trained speech spectrum map corresponding to each to-be-trained ultrasound signal, and obtains multiple to-be-trained speech spectrum maps.

Among them, the spectrogram is the spectrum analysis view, which represents the image of the ultrasonic signal spectrum changing with time. The spectrogram uses a two-dimensional plane to express three-dimensional information. That is, in the spectrogram, the abscissa is time, the ordinate is frequency, and the energy density of the frequency domain signal of each frame is the coordinate point in the spectrogram. Among them, the energy density of the signal in the frequency domain of each frame is expressed by the color depth. For example, a dark color at a coordinate point indicates a large energy density; a light color at a coordinate point indicates a small energy density.

In 206, the electronic device constructs a database of spectral graphs based on multiple spectral graphs to be trained.

It can be understood that the more sample data collected, the higher the accuracy of the model obtained after training. In this embodiment, flow 201 to flow 206 are the sample data collection process of the gesture recognition model, that is, the process of collecting the spectrum chart in the spectrum chart database. In order to obtain a relatively high-precision gesture recognition model, the electronic device can obtain more phonograms for training.

For example, you can build a spectrogram database in the following ways:

Method 1. Build a spectrogram database for a certain scene.

Taking user A as an example, for example, user A emits an ultrasonic signal when the electronic device is close to the face. When the electronic device emits an ultrasonic signal, user A can bring the electronic device closer to the face. The ultrasonic signal reflected by the user's face is electronically acquired to obtain the ultrasonic signal to be trained by user A. Similarly, the electronic device may acquire the ultrasound signal to be trained by user B, the ultrasound signal to be trained by user C, the ultrasound signal to be trained by user D, the ultrasound signal to be trained by user E, and so on in the above manner. For example, suppose that the electronic device obtains a total of 500 users' ultrasonic signals to be trained. Then, the electronic device generates a to-be-trained speech spectrum corresponding to each user's to-be-trained ultrasound signal, to obtain 500 to-be-trained speech spectra. The electronic device can aggregate 500 users' to-be-trained spectrum charts into a spectrum chart database.

Method 2. Build a spectrogram database for multiple scenarios.

Take user A as an example. For example, in the scenario where user A is close to the face, the electronic device emits ultrasonic signals. When the electronic device emits an ultrasonic signal, user A can bring the electronic device closer to the face. The ultrasonic signal reflected by the user's face is electronically obtained to obtain the ultrasonic signal to be trained by user A in the scene. Similarly, the electronic device can obtain the ultrasonic signal to be trained by user B in this scenario, the ultrasonic signal to be trained by user C in this scenario, the ultrasonic signal to be trained by user D in this scenario, and the user E in this manner in the above manner. Ultrasound signals to be trained in the scene, etc. For example, assume that the electronic device obtains a total of 500 ultrasound signals to be trained in this scenario by 500 users. Then, the electronic device generates a to-be-trained spectrogram corresponding to each user's to-be-trained ultrasonic signal under the scene, to obtain 500 to-be-trained spectrograms of the user under the scene.

In the same way, the electronic device can obtain 500 user spectrum spectra to be trained in multiple scenarios in the above manner. Among them, the multiple scenes may include: the user brings the electronic device close to the face, the user's hand waves once to the left, once to the right, once to the right, or once to the down within the preset range of the electronic device, and so on. The electronic device can aggregate 500 users' to-be-trained spectrum charts in the same scene into a spectrum chart sub-database. For example, assuming that there are 6 scenes, the electronic device can obtain 6 sub-databases of the spectrogram. The electronic device can combine these 6 spectral graph sub-databases into a spectral graph database.

In 207, the electronic device trains the spectrogram database to obtain a gesture recognition model.

For example, if the spectrogram database is only built for a certain scene. Then the electronic device can use the convolutional neural network CNN to directly train the spectrogram database to obtain a gesture recognition model. The gesture recognition model can only recognize gestures in a certain scene.

If the spectrogram database is constructed for multiple scenarios, then the electronic device can use a convolutional neural network CNN to train each sub-database of the spectrogram database to obtain a gesture recognition model. The gesture recognition model can recognize gestures in multiple scenes.

In summary, the electronic device can construct the database of the spectrogram according to actual needs. For example, in some embodiments, the electronic device needs a gesture recognition model that can recognize gestures in 8 scenarios. Then, the electronic device can construct a spectrogram database for the eight scenes, and then train the spectrogram database to obtain a gesture recognition model, and the gesture recognition model can recognize the gestures in the eight scenes.

In 208, the electronic device obtains the output results of each model of the gesture recognition model.

For example, after obtaining the gesture recognition model, the electronic device can obtain the output results of each model of the gesture model.

For example, suppose a certain gesture recognition model M can recognize gestures in 6 scenes. It can be understood that for each of the six scenes, the gesture recognition model can output the corresponding result. That is, the gesture recognition model M has a total of 6 model output results. Assume that the six model output results are model output result a, model output result b, model output result c, model output result d, model output result e and model output result f.

In 209, the electronic device receives an operation-related instruction.

For example, the electronic device can receive the operation association instruction. Wherein, the operation association instruction carries operation information, and the operation information includes multiple operations. The operation association instruction instructs the electronic device to associate the output results of each model of the gesture recognition model with one of the operations. Assume that the multiple operations are: breathing screen, bright screen, turn off the alarm, connect the phone, hang up the phone and take pictures.

In 210, the electronic device associates the output results of each model with one of a plurality of operations according to the operation association instruction to obtain a preset association library.

The electronic device can associate the model output result a with the breath screen, the model output result b with the bright screen, the model output result c with the off alarm clock, the model output result d with the connected phone, the model output result e with the hang up call The association and the model output result f are associated with the photograph to obtain the preset association library S.

In 211, the electronic device acquires an ultrasonic signal to be recognized.

Among them, the ultrasonic signal to be recognized is one of the audio signals, the vibration frequency is greater than 20000 Hz, which exceeds the general upper limit of human ear hearing (20000 Hz), and no noise is generated, causing a user's uncomfortable experience.

For example, electronic devices can emit ultrasonic signals. Therefore, when the user's face or hand moves within the preset range of the electronic device, the electronic device acquires the ultrasonic signal reflected by the user's face or hand, that is, the ultrasonic signal to be recognized. Among them, the ultrasonic signal emitted by the electronic device may be a continuous frequency sweep signal. The ultrasonic signal can be expressed as:

Figure PCTCN2018116221-appb-000003
Among them, T represents the duration, f 1 represents the start frequency, f 2 represents the end frequency, A 1 represents the amplitude, f s represents the sampling frequency, and priority is not limited, the f s = 96kHz.

For example, suppose that user A waves his hand once to the left within the preset range of the electronic device, and during the process that user A waves his hand once to the left within the preset range of the electronic device, the electronic device acquires the ultrasound reflected by user A's hand Signal to obtain the ultrasonic signal A to be identified.

In 212, the electronic device generates a spectrogram to be recognized corresponding to the ultrasonic signal to be recognized.

When the electronic device receives the ultrasonic signal A to be recognized, the electronic device can generate the spectrogram A to be recognized according to the ultrasonic signal A to be recognized.

In 213, the electronic device inputs the spectrogram to be recognized into the gesture recognition model to obtain an output result corresponding to the spectrogram to be recognized.

For example, the electronic device inputs the to-be-recognized spectrogram A into the gesture recognition model to obtain an output result corresponding to the to-be-recognized spectrogram A. Assume that the output result corresponding to the to-be-recognized spectrogram A is e.

In 214, the electronic device performs a corresponding operation according to the output result corresponding to the spectrogram to be recognized and the preset association library.

Referring to FIG. 3, in some embodiments, the process 214 may include a process 2141, a process 2142, and a process 2143, which may be:

2141. The electronic device detects whether there is a model output result in the preset association library that matches the output result corresponding to the to-be-recognized spectrogram.

When the electronic device obtains the output result e corresponding to the to-be-recognized spectrogram A, the electronic device detects whether there is a model output result e matching the output result to be recognized e in the preset association library S.

2142. If there is a model output result in the preset association library that matches the output result corresponding to the to-be-recognized spectrogram, the electronic device obtains an operation associated with the model output result.

It can be understood that there is a model output result e matching the output result to be recognized e in the preset association library S, and the electronic device can obtain an operation associated with the model output result e, which is to hang up the phone.

2143, the electronic device performs the operation.

The electronic device performs the operation of hanging up the phone.

In some embodiments, the process 208 may be:

The electronic device detects whether the electronic device is in a communication state;

If the electronic device is in a communication state, the electronic device acquires the ultrasonic signal to be recognized.

In other embodiments, the process 208 may be:

The electronic device detects whether a reminder event of the alarm clock application in the electronic device is triggered;

If the reminder event of the alarm clock application in the electronic device is triggered, the electronic device obtains the ultrasonic signal to be recognized.

For example, in some cases, the user may unconsciously make some gestures. For example, the user may unconsciously wave his hand to the left once within the preset range of the electronic device. Suppose a wave to the left corresponds to an operation to hang up the phone. When there is no incoming call from the electronic device, even if the user waves his hand to the left once, the electronic device cannot perform the operation of hanging up the phone. Therefore, the electronic device can define the scene for acquiring the ultrasonic signal to be recognized, thereby avoiding unnecessary gesture recognition process and reducing the processing load of the processor in the electronic device. For example, it may be limited that the ultrasonic signal to be recognized is acquired only when the electronic device is in the communication state. Among them, the communication state includes the call state and the incoming call state. In other words, when the electronic device calls, the user can easily connect the phone or hang up the phone with gestures. Or, when the electronic device is talking, the user's face is close to the electronic device, the electronic device can hold the screen, the user's face is far away from the electronic device, and the electronic device can be bright.

For example, it may be limited that the electronic device acquires the ultrasonic signal to be recognized only when the alarm event of the alarm clock application is triggered. That is, when the alarm sounds, the user can easily turn off the alarm using gestures.

Please refer to FIG. 4, which is a schematic structural diagram of a device for generating a gesture recognition model according to an embodiment of the present application. The apparatus 300 for generating a gesture recognition model may include an acquisition module 301, a generation module 302, a construction module 303, and a training module 304.

The obtaining module 301 is used to obtain multiple ultrasonic signals to be trained.

The generating module 302 is configured to generate a to-be-trained speech spectrum corresponding to each segment of the to-be-trained ultrasonic signal, and obtain multiple to-be-trained speech spectra.

The construction module 303 is configured to construct a database of spectrum charts based on the generated plurality of spectrum charts to be trained.

The training module 304 is used for training the spectral map database to obtain a gesture recognition model.

In some embodiments, the generating unit 302 may be used to: perform frame-by-frame windowing processing on each segment of the ultrasonic signal to be trained, to obtain a multi-frame windowing signal corresponding to each segment of the ultrasonic signal to be trained; add to each frame Fourier transform the window signal to obtain multi-frame frequency domain signals; calculate the energy density of each frame frequency domain signal to obtain the energy density of all frames corresponding to each segment of the ultrasonic signal to be trained; according to each segment to be trained The energy density of all the frames corresponding to the ultrasonic signal generates the to-be-trained speech spectrum corresponding to each segment of the to-be-trained ultrasound signal, and obtains multiple to-be-trained speech spectra.

In some embodiments, the training unit 304 may be used to: obtain the output results of each model of the gesture recognition model; receive an operation association instruction, the operation association instruction carries operation information, and the operation information includes multiple operations; The operation association instruction associates the output results of each model with one of the operations to obtain a preset association library.

In some embodiments, the training unit 304 may be used to: acquire the ultrasonic signal to be recognized reflected by the foreign object; generate the to-be-recognized spectrogram corresponding to the to-be-recognized ultrasonic signal; input the to-be-recognized spectrogram. The gesture recognition model is used to obtain an output result corresponding to the spectrogram to be recognized; according to the output result corresponding to the spectrogram to be recognized and a preset association library, corresponding operations are performed.

In some embodiments, the training unit 304 may be used to: detect whether there is a model output result matching the output result corresponding to the spectrogram to be recognized in the preset association library; if there is a language to be recognized in the preset association library The model output result corresponding to the output result corresponding to the spectrogram obtains the operation associated with the model output result; and performs the operation.

In some embodiments, the training unit 304 may be used to: detect whether the electronic device is in a communication state, the communication state includes a call state and an incoming call state; if the electronic device is in a communication state, receive foreign objects to obtain reflection to be identified Ultrasonic signal.

In some embodiments, the training unit 304 may be used to: detect whether the alarm event of the alarm clock application in the electronic device is triggered; if the alarm event of the alarm clock application in the electronic device is triggered, receive Identify ultrasonic signals.

In some embodiments, the acquiring unit 301 may be used to acquire multiple ultrasonic signals to be trained. The ultrasonic signals to be trained are the ultrasonic signals emitted by the electronic device when the user's face approaches or is away from the electronic device. The ultrasonic signal reflected by the user's face; or, the ultrasonic signal to be trained is a process in which the user's hand is swung left, once right, once upward, or once downward within the preset range of the electronic device, The ultrasonic signal emitted by the electronic device is reflected by the user's hand.

An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed on a computer, the computer is caused to execute the gesture recognition model generation method provided in this embodiment Process.

An embodiment of the present application further provides a wireless charging transmitting device, including a memory and a processor, and the processor is used to execute a method of generating a gesture recognition model provided by this embodiment by calling a computer program stored in the memory Process.

For example, the aforementioned electronic device may be a mobile terminal such as a tablet computer or a smart phone. Please refer to FIG. 5, which is a first structural schematic diagram of an electronic device provided by an embodiment of the present application.

The mobile terminal 400 may include components such as a microphone 401, a memory 402, a processor 403, an earpiece 404, and a speaker 405. A person skilled in the art may understand that the structure of the mobile terminal shown in FIG. 5 does not constitute a limitation on the mobile terminal, and may include more or less components than those illustrated, or combine certain components, or have different component arrangements.

The microphone 401 can be used to receive ultrasonic signals, pick up voices uttered by the user, and so on.

The memory 402 may be used to store application programs and data. The application program stored in the memory 402 contains executable code. The application program can form various functional modules. The processor 403 executes application programs stored in the memory 402 to execute various functional applications and data processing.

The processor 403 is the control center of the mobile terminal, and uses various interfaces and lines to connect the various parts of the entire mobile terminal, and executes the mobile terminal by running or executing application programs stored in the memory 402 and calling data stored in the memory 402 Various functions and processing data to monitor the mobile terminal as a whole.

The earpiece 404 and the speaker 405 may be used to emit ultrasonic signals.

In this embodiment, the processor 403 in the mobile terminal loads the executable code corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 403 runs and stores the memory in the memory The application in 402, thereby implementing the process:

Obtain multiple ultrasonic signals to be trained;

Generate a spectrum map to be trained corresponding to each segment of the ultrasonic signal to be trained to obtain multiple spectrum maps to be trained;

Constructing a database of spectrograms according to the plurality of spectrograms to be trained;

Training the speech spectrum database to obtain a gesture recognition model.

Please refer to FIG. 6, which is a second structural schematic diagram of an electronic device provided by an embodiment of the present application.

The mobile terminal 500 may include components such as a microphone 501, a memory 502, a processor 503, an earpiece 504, a speaker 505, an input unit 506, and an output unit 507.

The microphone 501 can be used to receive ultrasonic signals, pick up voices uttered by the user, and so on.

The memory 502 may be used to store application programs and data. The application program stored in the memory 502 contains executable code. The application program can form various functional modules. The processor 503 executes application programs stored in the memory 502 to execute various functional applications and data processing.

The processor 503 is the control center of the mobile terminal, and uses various interfaces and lines to connect the various parts of the entire mobile terminal, and executes the mobile terminal by running or executing application programs stored in the memory 502 and calling data stored in the memory 502 Various functions and processing data to monitor the mobile terminal as a whole.

The earpiece 504 and the speaker 505 may be used to transmit ultrasonic signals.

The input unit 506 can be used to receive input numbers, character information, or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.

The output unit 507 can be used to display information input by the user or provided to the user and various graphical user interfaces of the mobile terminal. These graphical user interfaces can be composed of graphics, text, icons, videos, and any combination thereof. The output unit may include a display panel.

In this embodiment, the processor 503 in the mobile terminal will load the executable code corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 503 will run the stored code in the memory The application in 502, thereby implementing the process:

Obtain multiple ultrasonic signals to be trained;

Generate a spectrum map to be trained corresponding to each segment of the ultrasonic signal to be trained to obtain multiple spectrum maps to be trained;

Constructing a database of spectrograms according to the plurality of spectrograms to be trained;

Training the speech spectrum database to obtain a gesture recognition model.

In some embodiments, the processor 503 executes the process of generating the to-be-trained spectrogram corresponding to each segment of the to-be-trained ultrasound signal, and obtaining multiple pieces of to-be-trained spectrogram can be performed by: The signal is framed and windowed to obtain multi-frame windowed signals corresponding to each ultrasonic signal to be trained; Fourier transform is performed on each framed windowed signal to obtain multi-frame frequency domain signals; the energy of each frame frequency domain signal is calculated Density, to obtain the energy density of all frames corresponding to each segment of the ultrasonic signal to be trained; according to the energy density of all frames corresponding to each segment of the ultrasonic signal to be trained, to generate a spectrum map to be trained corresponding to each segment of the ultrasonic signal to be trained Spectral graph to be trained.

In some embodiments, after the processor 503 executes the process of training the spectrogram database to obtain a gesture recognition model, it may also execute: obtain the output results of each model of the gesture recognition model; The operation association instruction carries operation information, and the operation information includes multiple operations; according to the operation association instruction, the output results of each model are respectively associated with one of the operations to obtain a preset association library.

In some implementations, the processor 503 executes the operation association instruction and associates the output results of the respective models with one of a plurality of operations to obtain a preset association library. Execution: Obtain the ultrasonic signal to be recognized reflected by the foreign object; generate the to-be-recognized spectrogram corresponding to the to-be-recognized ultrasonic signal; input the to-be-recognized spectrogram to the gesture recognition model to obtain the corresponding to-be-recognized spectrogram Output results; perform corresponding operations according to the output results corresponding to the to-be-recognized spectrogram and the preset association library.

In some implementations, the processor 503 executes the process of performing the corresponding operation according to the output result corresponding to the to-be-recognized spectrogram and the preset associated library, and may perform: detecting whether there is a The model output result matching the output result corresponding to the spectrogram to be recognized; if there is a model output result matching the output result corresponding to the to-be-recognized spectrogram in the preset association library, the operation associated with the model output result is obtained; Described operation.

In some embodiments, when the processor 503 executes the process of acquiring the ultrasonic signal to be recognized, it may perform: detecting whether the electronic device is in a communication state, the communication state includes a call state and an incoming call state; if the electronic device is in a communication state, Then receive the foreign object to obtain the reflected ultrasonic signal to be identified.

In some embodiments, when the processor 503 executes the process of acquiring the ultrasonic signal to be recognized according to the process, it may perform: detecting whether a reminder event of the alarm clock application in the electronic device is triggered; if the reminder of the alarm clock application in the electronic device When the event is triggered, the ultrasonic signal to be recognized is received by the reflection of the foreign object.

In some embodiments, when the processor 503 executes the process of acquiring multiple ultrasonic signals to be trained, it may perform: acquiring multiple ultrasonic signals to be trained, the ultrasonic signals to be trained are the user's face close to or away from the electronic device During the process, the ultrasonic signal emitted by the electronic device is reflected by the user's face; or, the ultrasonic signal to be trained is the user's hand waving once to the left, once to the right, and upward within the preset range of the electronic device During a wave or a wave down, the ultrasonic signal emitted by the electronic device is reflected by the user's hand.

In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not detailed in an embodiment, you can refer to the detailed description of the method for generating a gesture recognition model above, which will not be repeated here.

The device for generating a gesture recognition model provided by an embodiment of the present application and the method for generating a gesture recognition model in the above embodiments belong to the same concept, and the generation of the gesture recognition model can be run on the device for generating a gesture recognition model For any method provided in the method embodiment, the specific implementation process is described in detail in the method embodiment for generating a gesture recognition model, and details are not described herein again.

It should be noted that, for the method of generating the gesture recognition model described in the embodiment of the present application, a person of ordinary skill in the art can understand that all or part of the process of implementing the method of generating the gesture recognition model described in the embodiment of the present application can be performed by a computer. A program to control related hardware to complete, the computer program may be stored in a computer-readable storage medium, such as stored in a memory, and executed by at least one processor, and may include gesture recognition as described in the execution process The flow of an embodiment of a method of generating a model. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), and so on.

For the device for generating a gesture recognition model according to an embodiment of the present application, each functional module may be integrated into one processing chip, or each module may exist alone physically, or two or more modules may be integrated into one module in. The above integrated modules may be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium, such as a read-only memory, magnetic disk, or optical disk, etc. .

The above describes a method, device, storage medium, and electronic device for generating a gesture recognition model provided in the embodiments of the present application in detail. Here, specific examples are used to explain the principles and implementation modes of the present application. The description is only used to help understand the method of this application and its core ideas; at the same time, for those skilled in the art, according to the ideas of this application, there will be changes in the specific implementation and scope of application, in summary The content of this specification should not be construed as a limitation of this application.

Claims (20)

  1. A method for generating a gesture recognition model, which includes:
    Obtain multiple ultrasonic signals to be trained;
    Generate a spectrum map to be trained corresponding to each segment of the ultrasonic signal to be trained to obtain multiple spectrum maps to be trained;
    Constructing a database of spectrograms according to the plurality of spectrograms to be trained;
    Training the speech spectrum database to obtain a gesture recognition model.
  2. The method for generating a gesture recognition model according to claim 1, wherein the generating of the spectrum to be trained corresponding to each segment of the ultrasonic signal to be trained to obtain a plurality of spectrum to be trained includes:
    Performing frame-by-frame windowing processing on each segment of the ultrasonic signal to be trained to obtain a multi-frame windowing signal corresponding to each segment of the ultrasonic signal to be trained;
    Fourier transform the windowed signal of each frame to obtain a multi-frame frequency domain signal;
    Calculate the energy density of the signal in the frequency domain of each frame to obtain the energy density of all frames corresponding to each ultrasonic signal to be trained;
    According to the energy density of all the frames corresponding to each segment of the ultrasonic signal to be trained, a spectral map to be trained corresponding to each ultrasonic signal to be trained is generated, and multiple spectral charts to be trained are obtained.
  3. The method for generating a gesture recognition model according to claim 1, wherein after the training of the spectral map database to obtain a gesture recognition model, the method further comprises:
    Obtain the output of each model of the gesture recognition model;
    Receiving an operation association instruction, where the operation association instruction carries operation information, and the operation information includes multiple operations;
    According to the operation association instruction, the output results of the respective models are respectively associated with one of the operations to obtain a preset association library.
  4. The method for generating a gesture recognition model according to claim 3, wherein, in the process of associating the output results of the respective models with one of a plurality of operations according to the operation association instruction, a preset association is obtained After the library, it also includes:
    Obtain the ultrasonic signal to be identified;
    Generating a spectrogram to be recognized corresponding to the ultrasonic signal to be recognized;
    Input the spectrogram to be recognized into the gesture recognition model to obtain an output result corresponding to the spectrogram to be recognized;
    Perform the corresponding operation according to the output result corresponding to the to-be-recognized spectrogram and the preset association library.
  5. The method for generating a gesture recognition model according to claim 4, wherein the performing corresponding operations according to the output result corresponding to the to-be-recognized spectrogram and the preset association library includes:
    Detect whether there is a model output result in the preset association library that matches the output result corresponding to the spectrogram to be recognized;
    If there is a model output result matching the output result corresponding to the to-be-recognized spectrogram in the preset association library, an operation associated with the model output result is obtained;
    Perform the operation.
  6. The method of generating a gesture recognition model according to claim 4, wherein the acquiring the ultrasonic signal to be recognized includes:
    Detect whether the electronic device is in a communication state, the communication state includes a call state and an incoming call state;
    If the electronic device is in a communication state, the ultrasonic signal to be recognized is acquired.
  7. The method of generating a gesture recognition model according to claim 4, wherein the acquiring the ultrasonic signal to be recognized includes:
    Detect whether the alarm event of the alarm clock application in the electronic device is triggered;
    If the reminder event of the alarm clock application in the electronic device is triggered, the ultrasonic signal to be recognized is acquired.
  8. The method for generating a gesture recognition model according to claim 1, wherein the acquiring multiple ultrasonic signals to be trained includes:
    Acquiring a plurality of ultrasonic signals to be trained, the ultrasonic signals to be trained are ultrasonic signals reflected by the user's face when the user's face approaches or moves away from the electronic device;
    Alternatively, the ultrasonic signal to be trained is a process in which the user's hand is swung left, once right, once upward, or once downward within the preset range of the electronic device. Hand reflected ultrasound signal.
  9. A gesture recognition model generating device, which includes:
    The acquisition module is used to acquire multiple ultrasonic signals to be trained;
    The generating module is used to generate a spectrum map to be trained corresponding to each ultrasonic signal to be trained, and obtain multiple spectrum maps to be trained;
    A building module, used to build a database of spectral charts according to the multiple spectral charts to be trained;
    The training module is used for training the speech spectrum database to obtain a gesture recognition model.
  10. The apparatus for generating a gesture recognition model according to claim 9, wherein the generating module is configured to:
    Performing frame-by-frame windowing processing on each segment of the ultrasonic signal to be trained to obtain a multi-frame windowing signal corresponding to each segment of the ultrasonic signal to be trained;
    Fourier transform the windowed signal of each frame to obtain a multi-frame frequency domain signal;
    Calculate the energy density of the signal in the frequency domain of each frame to obtain the energy density of all frames corresponding to each ultrasonic signal to be trained;
    According to the energy density of all the frames corresponding to each segment of the ultrasonic signal to be trained, a spectral map to be trained corresponding to each ultrasonic signal to be trained is generated, and multiple spectral charts to be trained are obtained.
  11. The apparatus for generating a gesture recognition model according to claim 9, wherein the training module is configured to:
    Obtain the output of each model of the gesture recognition model;
    Receiving an operation association instruction, where the operation association instruction carries operation information, and the operation information includes multiple operations;
    According to the operation association instruction, the output results of the respective models are respectively associated with one of the operations to obtain a preset association library.
  12. A storage medium, wherein a computer program is stored in the storage medium, and when the computer program runs on a computer, the computer is caused to execute the method for generating a gesture recognition model according to any one of claims 1 to 8. .
  13. An electronic device, wherein the electronic device includes a processor and a memory, a computer program is stored in the memory, and the processor is used to execute the computer program by calling the computer program stored in the memory:
    Obtain multiple ultrasonic signals to be trained;
    Generate a spectrum map to be trained corresponding to each segment of the ultrasonic signal to be trained to obtain multiple spectrum maps to be trained;
    Constructing a database of spectrograms according to the plurality of spectrograms to be trained;
    Training the speech spectrum database to obtain a gesture recognition model.
  14. The electronic device according to claim 13, wherein the processor is configured to execute:
    Performing frame-by-frame windowing processing on each segment of the ultrasonic signal to be trained to obtain a multi-frame windowing signal corresponding to each segment of the ultrasonic signal to be trained;
    Fourier transform the windowed signal of each frame to obtain a multi-frame frequency domain signal;
    Calculate the energy density of the signal in the frequency domain of each frame to obtain the energy density of all frames corresponding to each ultrasonic signal to be trained;
    According to the energy density of all the frames corresponding to each segment of the ultrasonic signal to be trained, a spectral map to be trained corresponding to each ultrasonic signal to be trained is generated, and multiple spectral charts to be trained are obtained.
  15. The electronic device according to claim 13, wherein the processor is configured to execute:
    Obtain the output of each model of the gesture recognition model;
    Receiving an operation association instruction, where the operation association instruction carries operation information, and the operation information includes multiple operations;
    According to the operation association instruction, the output results of the respective models are respectively associated with one of the operations to obtain a preset association library.
  16. The electronic device according to claim 15, wherein the processor is configured to execute:
    Obtain the ultrasonic signal to be identified;
    Generating a spectrogram to be recognized corresponding to the ultrasonic signal to be recognized;
    Input the spectrogram to be recognized into the gesture recognition model to obtain an output result corresponding to the spectrogram to be recognized;
    Perform the corresponding operation according to the output result corresponding to the to-be-recognized spectrogram and the preset association library.
  17. The electronic device according to claim 16, wherein the processor is configured to execute:
    Detect whether there is a model output result in the preset association library that matches the output result corresponding to the spectrogram to be recognized;
    If there is a model output result matching the output result corresponding to the to-be-recognized spectrogram in the preset association library, an operation associated with the model output result is obtained;
    Perform the operation.
  18. The electronic device according to claim 16, wherein the processor is configured to execute:
    Detect whether the electronic device is in a communication state, the communication state includes a call state and an incoming call state;
    If the electronic device is in a communication state, the ultrasonic signal to be recognized is acquired.
  19. The electronic device according to claim 16, wherein the processor is configured to execute:
    Detect whether the alarm event of the alarm clock application in the electronic device is triggered;
    If the reminder event of the alarm clock application in the electronic device is triggered, the ultrasonic signal to be recognized is acquired.
  20. The electronic device according to claim 13, wherein the processor is configured to execute:
    Acquiring a plurality of ultrasonic signals to be trained, the ultrasonic signals to be trained are ultrasonic signals reflected by the user's face when the user's face approaches or moves away from the electronic device;
    Alternatively, the ultrasonic signal to be trained is a process in which the user's hand is swung left, once right, once upward, or once downward within the preset range of the electronic device. Hand reflected ultrasound signal.
PCT/CN2018/116221 2018-11-19 2018-11-19 Method and apparatus for generating gesture recognition model, storage medium, and electronic device WO2020102943A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/116221 WO2020102943A1 (en) 2018-11-19 2018-11-19 Method and apparatus for generating gesture recognition model, storage medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/116221 WO2020102943A1 (en) 2018-11-19 2018-11-19 Method and apparatus for generating gesture recognition model, storage medium, and electronic device

Publications (1)

Publication Number Publication Date
WO2020102943A1 true WO2020102943A1 (en) 2020-05-28

Family

ID=70774192

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116221 WO2020102943A1 (en) 2018-11-19 2018-11-19 Method and apparatus for generating gesture recognition model, storage medium, and electronic device

Country Status (1)

Country Link
WO (1) WO2020102943A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760825A (en) * 2016-02-02 2016-07-13 深圳市广懋创新科技有限公司 Gesture identification system and method based on Chebyshev feed forward neural network
CN106203380A (en) * 2016-07-20 2016-12-07 中国科学院计算技术研究所 Ultrasound wave gesture identification method and system
US9569006B2 (en) * 2014-04-10 2017-02-14 Mediatek Inc. Ultrasound-based methods for touchless gesture recognition, and apparatuses using the same
CN107450724A (en) * 2017-07-31 2017-12-08 武汉大学 A kind of gesture identification method and system based on dual-channel audio Doppler effect
CN107526437A (en) * 2017-07-31 2017-12-29 武汉大学 A kind of gesture identification method based on Audio Doppler characteristic quantification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9569006B2 (en) * 2014-04-10 2017-02-14 Mediatek Inc. Ultrasound-based methods for touchless gesture recognition, and apparatuses using the same
CN105760825A (en) * 2016-02-02 2016-07-13 深圳市广懋创新科技有限公司 Gesture identification system and method based on Chebyshev feed forward neural network
CN106203380A (en) * 2016-07-20 2016-12-07 中国科学院计算技术研究所 Ultrasound wave gesture identification method and system
CN107450724A (en) * 2017-07-31 2017-12-08 武汉大学 A kind of gesture identification method and system based on dual-channel audio Doppler effect
CN107526437A (en) * 2017-07-31 2017-12-29 武汉大学 A kind of gesture identification method based on Audio Doppler characteristic quantification

Similar Documents

Publication Publication Date Title
EP3353677B1 (en) Device selection for providing a response
US10540140B2 (en) System and method for continuous multimodal speech and gesture interaction
US10192552B2 (en) Digital assistant providing whispered speech
AU2015202943B2 (en) Reducing the need for manual start/end-pointing and trigger phrases
US10453443B2 (en) Providing an indication of the suitability of speech recognition
US10684683B2 (en) Natural human-computer interaction for virtual personal assistant systems
TWI619114B (en) Method and system of environment-sensitive automatic speech recognition
US9685161B2 (en) Method for updating voiceprint feature model and terminal
AU2017210578B2 (en) Voice trigger for a digital assistant
US9640194B1 (en) Noise suppression for speech processing based on machine-learning mask estimation
CN106030440B (en) Intelligent circulation audio buffer
US9536540B2 (en) Speech signal separation and synthesis based on auditory scene analysis and speech modeling
DE102015100900A1 (en) Set speech recognition using context information
JP5456832B2 (en) Apparatus and method for determining relevance of an input utterance
TWI590228B (en) Voice control system, electronic device having the same, and voice control method
KR20140132246A (en) Object selection method and object selection apparatus
US9818431B2 (en) Multi-speaker speech separation
Sehgal et al. A convolutional neural network smartphone app for real-time voice activity detection
JP2017079051A (en) Zero Latency Digital Assistant
CN105793923A (en) Local and remote speech processing
US20160019886A1 (en) Method and apparatus for recognizing whisper
CN108475502B (en) For providing the method and system and computer readable storage medium of environment sensing
CN107799126B (en) Voice endpoint detection method and device based on supervised machine learning
WO2014160327A1 (en) Providing content on multiple devices
US20190066670A1 (en) Context-based device arbitration