CN112740219A

CN112740219A - Method and device for generating gesture recognition model, storage medium and electronic equipment

Info

Publication number: CN112740219A
Application number: CN201880097776.6A
Authority: CN
Inventors: 陈岩
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Huantai Technology Co Ltd
Priority date: 2018-11-19
Filing date: 2018-11-19
Publication date: 2021-04-30
Also published as: WO2020102943A1

Abstract

The application discloses a method, a device, a storage medium and an electronic device for generating a gesture recognition model, wherein the method comprises the following steps: acquiring a plurality of sections of ultrasonic signals to be trained; generating a spectrogram to be trained corresponding to each section of ultrasonic signal to be trained to obtain a plurality of spectrograms to be trained; constructing a speech spectrogram database according to the plurality of speech spectrograms to be trained; and training the spectrogram database to obtain a gesture recognition model. The method and the device can improve the accuracy of gesture recognition.

Description

Method and device for generating gesture recognition model, storage medium and electronic equipment

Technical Field

The application belongs to the technical field of terminals, and particularly relates to a method and a device for generating a gesture recognition model, a storage medium and an electronic device.

Background

With the rapid development of terminal technology, the functions on the terminal become more and more abundant. For example, human-computer interaction is achieved through gesture recognition. Gesture recognition generally refers to recognizing the movements of the face and hands. The user may use simple gestures to control or interact with the terminal, letting the terminal understand the user's behavior. In the related art, gesture recognition may be implemented using an ultrasonic signal. When ultrasonic gesture recognition is performed, a gesture recognition model needs to be used. However, in the related art, the recognition accuracy of the gesture recognition model for performing ultrasonic gesture recognition is low.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating a gesture recognition model, a storage medium and an electronic device, which can obtain the gesture recognition model with higher accuracy, so that the accuracy of gesture recognition can be improved.

In a first aspect, an embodiment of the present application provides a method for generating a gesture recognition model, including:

acquiring a plurality of sections of ultrasonic signals to be trained;

generating a spectrogram to be trained corresponding to each section of ultrasonic signal to be trained to obtain a plurality of spectrograms to be trained;

constructing a speech spectrogram database according to the plurality of speech spectrograms to be trained;

and training the spectrogram database to obtain a gesture recognition model.

In a second aspect, an embodiment of the present application provides an apparatus for generating a gesture recognition model, including:

the acquisition module is used for acquiring a plurality of sections of ultrasonic signals to be trained;

the generating module is used for generating a spectrogram of the to-be-trained speech corresponding to each section of the to-be-trained ultrasonic signal to obtain a plurality of spectrograms of the to-be-trained speech;

the construction module is used for constructing a spectrogram database according to the plurality of spectrogram to be trained;

and the training module is used for training the spectrogram database to obtain a gesture recognition model.

In a third aspect, an embodiment of the present application provides a storage medium, on which a computer program is stored, where when the computer program is executed on a computer, the computer is caused to execute the generation method of a gesture recognition model provided in this embodiment.

In a fourth aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to, by calling the computer program stored in the memory, execute:

acquiring a plurality of sections of ultrasonic signals to be trained;

and training the spectrogram database to obtain a gesture recognition model.

In the embodiment of the application, because the feature points of the spectrogram are more and more significant than those of the ultrasonic signal, compared with the gesture recognition model generated by directly training the ultrasonic signal as sample data, the generated gesture recognition model is higher in accuracy by converting the ultrasonic signal into the spectrogram and then training the spectrogram as sample data, thereby improving the accuracy of the gesture recognition.

Drawings

The technical solutions and advantages of the present application will become apparent from the following detailed description of specific embodiments of the present application when taken in conjunction with the accompanying drawings.

Fig. 1 is a first flowchart illustrating a method for generating a gesture recognition model according to an embodiment of the present disclosure.

Fig. 2 is a second flowchart of a method for generating a gesture recognition model according to an embodiment of the present disclosure.

Fig. 3 is a third flowchart illustrating a method for generating a gesture recognition model according to an embodiment of the present application.

Fig. 4 is a schematic structural diagram of a device for generating a gesture recognition model according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a first electronic device according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of a second electronic device according to an embodiment of the present application.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, the principles of the present application are illustrated as being implemented in a suitable computing environment. The following description is based on illustrated embodiments of the application and should not be taken as limiting the application with respect to other embodiments that are not detailed herein.

Referring to fig. 1, fig. 1 is a first flowchart illustrating a method for generating a gesture recognition model according to an embodiment of the present disclosure. The flow of the generation method of the gesture recognition model can comprise the following steps:

in 101, a plurality of segments of ultrasound signals to be trained are acquired.

Wherein, treat that training ultrasonic signal is one of audio signal, and the vibration frequency is greater than 20000Hz more, has surpassed the general upper limit of human ear sense of hearing (20000Hz), can not produce the noise, arouses user's uncomfortable experience.

For example, the electronic device may emit an ultrasonic signal. Therefore, when the face or the hand of the user moves within the preset range of the electronic equipment, the electronic equipment acquires the ultrasonic signal reflected by the face or the hand of the user. The ultrasonic signal emitted by the electronic device may be a continuous frequency sweep signal. The ultrasonic signal may be expressed as:

wherein T represents a time length f₁Denotes the starting frequency, f₂Denotes the termination frequency, A₁Representing the amplitude, f_sIndicating the sampling frequency, preferably but not limited to_s＝96kHz。

The ultrasonic signal to be trained can be an ultrasonic signal which is reflected by the face of the user and is emitted by the electronic equipment in the process that the face of the user is close to or far away from the electronic equipment. Or the ultrasonic signal to be trained may be an ultrasonic signal reflected by the hand of the user from the ultrasonic signal emitted by the electronic device during the process of waving the hand of the user leftwards, waving the hand rightwards, waving the hand upwards, waving the hand downwards, waving the hand left and right, or waving the hand back and forth within the preset range of the electronic device.

In the present embodiment, taking the user a as an example, the electronic device transmits an ultrasonic signal. When the electronic equipment transmits the ultrasonic wave signal, the A user can be led to approach the electronic equipment to the face. And electronically acquiring the ultrasonic signal reflected by the face of the user to obtain the ultrasonic signal to be trained corresponding to the user A. Similarly, the electronic device may obtain the ultrasonic signals to be trained corresponding to a plurality of users, such as the user B, the user C, the user D, the user E, and the like, according to the above manner, so as to obtain a plurality of sections of ultrasonic signals to be trained.

The preset range can be automatically determined by the electronic equipment or set by a user. The preset range may be set to an area 3 cm from the electronic device. The preset range may also be set to an area 5 cm from the electronic device, etc., and is not particularly limited herein.

It is to be understood that the above are only a few examples of ultrasound signals to be trained, and are not intended to limit the present application.

In 102, a spectrogram of the to-be-trained speech corresponding to each section of the to-be-trained ultrasonic signal is generated, so as to obtain a plurality of spectrograms of the to-be-trained speech.

In this embodiment, after the electronic device obtains multiple segments of ultrasonic signals to be trained, the electronic device may generate a spectrogram of a to-be-trained word corresponding to each segment of ultrasonic signals to be trained, so as to obtain multiple spectrogram of the to-be-trained word.

In 103, a spectrogram database is constructed according to a plurality of spectrogram to be trained.

It will be appreciated that the more sample data that is collected, the greater the accuracy of the model obtained after training. In this embodiment, the flow 101 to the flow 103 are sample data acquisition processes of the gesture recognition model, that is, acquisition processes of spectrogram in a spectrogram database. To obtain a gesture recognition model with relatively high accuracy, the electronic device may acquire a plurality of spectrogram patterns for training.

For example, the spectrogram database can be constructed in the following manner:

mode 1 is to construct a speech spectrum database for a certain scene.

Taking user a as an example, for example, in a scene that the electronic device is close to the face, the electronic device emits an ultrasonic signal. When the electronic equipment transmits the ultrasonic wave signal, the A user can be led to approach the electronic equipment to the face. And electronically acquiring the ultrasonic signal reflected by the face of the user to obtain the ultrasonic signal to be trained of the user A. Likewise, the electronic device may acquire the ultrasonic signal to be trained of the B user, the ultrasonic signal to be trained of the C user, the ultrasonic signal to be trained of the D user, the ultrasonic signal to be trained of the E user, and so on, in the above-described manner. For example, assume that the electronic device acquires ultrasonic signals to be trained for a total of 500 users. And then, the electronic equipment generates a spectrogram of the to-be-trained voice corresponding to the to-be-trained ultrasonic signal of each user, so as to obtain spectrograms of the to-be-trained voice of 500 users. The electronic device can summarize spectrogram databases to be trained of 500 users into a spectrogram database.

Mode 2, a speech spectrogram database is constructed for a plurality of scenes.

Taking user a as an example, for example, in a scene that the electronic device is close to the face, the electronic device emits an ultrasonic signal. When the electronic equipment transmits the ultrasonic wave signal, the A user can be led to approach the electronic equipment to the face. And electronically acquiring the ultrasonic signal reflected by the face of the user to obtain the ultrasonic signal to be trained of the user A in the scene. Similarly, the electronic device may obtain the ultrasonic signal to be trained of the B user in the scene, the ultrasonic signal to be trained of the C user in the scene, the ultrasonic signal to be trained of the D user in the scene, the ultrasonic signal to be trained of the E user in the scene, and so on, in the manner described above. For example, suppose that the electronic device acquires a total of 500 ultrasonic signals to be trained of the user in the scene. And then, the electronic equipment generates a spectrogram to be trained corresponding to the ultrasonic signal to be trained of each user in the scene, so as to obtain spectrogram to be trained of 500 users in the scene.

Similarly, the electronic device may obtain 500 spectrogram to be trained of the user in multiple scenes in the above manner. The electronic device can gather the spectrogram to be trained of 500 users in the same scene into a spectrogram sub-database. For example, assuming there are 6 scenarios, the electronic device may have 6 spectrogram databases available. The electronic device may combine the 6 spectrogram sub-databases into a spectrogram database.

At 104, the spectrogram database is trained to obtain a gesture recognition model.

For example, if the speech spectrogram database is constructed only for a certain scene. The electronic device may directly train the spectrogram database to obtain the gesture recognition model. The gesture recognition model can only recognize gestures in a certain scene.

If the spectrogram database is constructed aiming at a plurality of scenes, the electronic equipment can train each spectrogram sub-database in the spectrogram database to obtain a gesture recognition model. The gesture recognition model can recognize gestures under multiple scenes.

In conclusion, the electronic device can construct the spectrogram database according to actual requirements. For example, in some embodiments, the electronic device needs a gesture recognition model that is capable of recognizing gestures in 8 scenes. The electronic device may construct a speech spectrogram database for the 8 scenes, and then train the speech spectrogram database to obtain a gesture recognition model, where the gesture recognition model may recognize gestures in the 8 scenes.

It should be noted that other manners may also be adopted to train the spectrogram database to obtain the gesture recognition model, which is not limited to the above manners.

It can be understood that, because the feature points of the spectrogram are more numerous and more significant than those of the ultrasonic signal, in comparison with the gesture recognition model generated by directly training the ultrasonic signal as sample data, in this embodiment, the generated gesture recognition model has higher accuracy by converting the ultrasonic signal into the spectrogram and then training the spectrogram as sample data, thereby improving the accuracy of gesture recognition.

Referring to fig. 2, fig. 2 is a second flowchart illustrating a method for generating a gesture recognition model according to an embodiment of the present disclosure. The generation method of the gesture recognition model can comprise the following steps:

in 201, the electronic device acquires a plurality of segments of ultrasound signals to be trained.

For example, the electronic device may emit an ultrasonic signal using an earpiece or speaker. Therefore, in the process that a certain foreign object moves within the preset range of the electronic equipment, the electronic equipment can receive the ultrasonic wave signal reflected by the foreign object by using the microphone. The ultrasonic signal emitted by the electronic device may be a continuous frequency sweep signal. The ultrasonic signal may be expressed as:

In 202, the electronic device performs frame windowing on each section of ultrasonic signal to be trained to obtain a multi-frame windowed signal corresponding to each section of ultrasonic signal to be trained.

For example, when the electronic device receives multiple segments of ultrasonic signals to be trained, the electronic device may perform frame windowing on each segment of ultrasonic signals to be trained to obtain a multiframe windowing signal corresponding to each segment of ultrasonic signals to be trained. Wherein, the length of one frame is 20ms, the frame shift is 10ms, and each section of the ultrasonic signal to be trained is framed. Preferably, but not limited to, when windowing each section of ultrasound signal to be trained, the window function may be a rectangular window, i.e., w (n) is 1.

Wherein, the ultrasonic wave signal is a time domain signal. The windowed signal is a time domain signal.

At 203, the electronic device performs a fourier transform on each frame of the windowed signal to obtain a multi-frame frequency domain signal.

It will be appreciated that fourier transforming the time domain signal may convert the time domain signal to a frequency domain signal. Because the windowed signal is a time domain signal, the electronic device performs fourier transform on each frame of windowed signal, and can obtain a multi-frame frequency domain signal.

In 204, the electronic device calculates the energy density of each frame of frequency domain signal to obtain the energy density of all frames corresponding to each section of ultrasound wave signal to be trained.

In 205, the electronic device generates a spectrogram to be trained corresponding to each segment of ultrasonic signal to be trained according to the energy densities of all frames corresponding to each segment of ultrasonic signal to be trained, so as to obtain a plurality of spectrograms to be trained.

For example, the electronic device calculates the energy density of each frame of frequency domain signal to obtain the energy density of all frames corresponding to each section of ultrasonic signal to be trained. And the electronic equipment generates a spectrogram to be trained corresponding to each section of ultrasonic signal to be trained according to the energy density of all frames corresponding to each section of ultrasonic signal to be trained, so as to obtain a plurality of spectrograms to be trained.

The spectrogram is a spectrum analysis view and represents an image of the ultrasonic signal spectrum changing along with time. The spectrogram expresses three-dimensional information by adopting a two-dimensional plane. In the spectrogram, the abscissa is time, the ordinate is frequency, and the energy density of each frame of frequency domain signal is a coordinate point in the spectrogram. Wherein, the energy density of each frame frequency domain signal is represented by light and dark colors. For example, if a certain coordinate point is dark, it indicates that the energy density is large; when a certain coordinate point is light in color, it indicates that the energy density is small.

At 206, the electronic device constructs a spectrogram database according to the plurality of spectrogram to be trained.

It will be appreciated that the more sample data that is collected, the greater the accuracy of the model obtained after training. In this embodiment, the processes 201 to 206 are sample data acquisition processes of the gesture recognition model, that is, acquisition processes of spectrogram in the spectrogram database. To obtain a gesture recognition model with relatively high accuracy, the electronic device may acquire a plurality of spectrogram patterns for training.

mode 1 is to construct a speech spectrum database for a certain scene.

Mode 2, a speech spectrogram database is constructed for a plurality of scenes.

Similarly, the electronic device may obtain 500 spectrogram to be trained of the user in multiple scenes in the above manner. Wherein the plurality of scenes may include: the user brings the electronic device close to the face, and the hand of the user swings left, right, up or down once within a preset range of the electronic device, and so on. The electronic device can gather the spectrogram to be trained of 500 users in the same scene into a spectrogram sub-database. For example, assuming there are 6 scenarios, the electronic device may have 6 spectrogram databases available. The electronic device may combine the 6 spectrogram sub-databases into a spectrogram database.

At 207, the electronic device trains the spectrogram database to obtain a gesture recognition model.

For example, if the speech spectrogram database is constructed only for a certain scene. The electronic device may directly train the spectrogram database by using the convolutional neural network CNN to obtain a gesture recognition model. The gesture recognition model can only recognize gestures in a certain scene.

If the spectrogram database is constructed aiming at a plurality of scenes, the electronic equipment can train each spectrogram sub-database in the spectrogram database by adopting a Convolutional Neural Network (CNN) to obtain a gesture recognition model. The gesture recognition model can recognize gestures under multiple scenes.

At 208, the electronic device obtains respective model output results for the gesture recognition models.

For example, after obtaining the gesture recognition models, the electronic device may obtain various model output results of the gesture models.

For example, assume that a certain gesture recognition model M can recognize gestures in 6 scenes. It is to be understood that for a gesture for each of these 6 scenarios, the gesture recognition model may output a corresponding result. Namely, the gesture recognition model M has 6 model output results. It is assumed that the 6 model output results are a model output result a, a model output result b, a model output result c, a model output result d, a model output result e and a model output result f, respectively.

At 209, the electronic device receives an operation association instruction.

For example, the electronic device may receive an operation association instruction. The operation association instruction carries operation information, and the operation information comprises a plurality of operations. The operation association instruction instructs the electronic device to associate each model output result of the gesture recognition model with one of the plurality of operations. Assume that the operations are: turning off the screen, lightening the screen, turning off the alarm clock, connecting the telephone, hanging up the telephone and taking a picture.

In 210, the electronic device associates each model output result with one of the operations according to the operation association instruction, so as to obtain a preset association library.

The electronic equipment can associate the model output result a with a screen-off state, associate the model output result b with a screen-on state, associate the model output result c with an alarm clock-off state, associate the model output result d with a call-on state, associate the model output result e with a call-off state, associate the model output result f with a picture taking state, and obtain a preset association library S.

In 211, the electronic device acquires an ultrasonic signal to be identified.

Wherein, treat that discernment ultrasonic signal is one of audio signal, and the vibration frequency is greater than 20000Hz above, has surpassed the general upper limit of human ear sense of hearing (20000Hz), can not produce the noise, arouses user's uncomfortable experience.

For example, the electronic device may emit an ultrasonic signal. Therefore, when the face or the hand of the user moves within the preset range of the electronic equipment, the electronic equipment acquires the ultrasonic signal reflected by the face or the hand of the user, namely the ultrasonic signal to be recognized. The ultrasonic signal emitted by the electronic device may be a continuous frequency sweep signal. The ultrasonic signal may be expressed as:

For example, assume that the user a swings the hand to the left once within the preset range of the electronic device, and in the process that the user a swings the hand to the left once within the preset range of the electronic device, the electronic device acquires the ultrasonic signal reflected by the hand of the user a, and obtains the ultrasonic signal a to be recognized.

At 212, the electronic device generates a spectrogram to be recognized corresponding to the ultrasonic signal to be recognized.

When the electronic device receives the ultrasonic signal a to be recognized, the electronic device may generate a spectrogram a to be recognized according to the ultrasonic signal a to be recognized.

In 213, the electronic device inputs the spectrogram to be recognized into the gesture recognition model, and obtains an output result corresponding to the spectrogram to be recognized.

For example, the electronic device inputs the spectrogram a to be recognized into the gesture recognition model, and obtains an output result corresponding to the spectrogram a to be recognized. And assuming that the output result corresponding to the spectrogram A of the to-be-identified language is e.

At 214, the electronic device executes a corresponding operation according to the output result corresponding to the spectrogram to be recognized and a preset association library.

Referring to fig. 3, in some embodiments, the process 214 may include a process 2141, a process 2142, and a process 2143, which may be:

2141, the electronic device detects whether a model output result matched with the output result corresponding to the spectrogram to be recognized exists in the preset association library.

When the electronic equipment obtains the output result e corresponding to the spectrogram A to be recognized, the electronic equipment detects whether a model output result e matched with the output result e to be recognized exists in the preset association library S.

2142, if a model output result matching the output result corresponding to the spectrogram to be recognized exists in the preset association library, the electronic device obtains an operation associated with the model output result.

It can be understood that the preset association library S has a model output result e matching the output result e to be recognized, and the electronic device may obtain an operation associated with the model output result e, where the operation is to hang up the phone.

2143, the electronic device performs the operation.

The electronic device performs an operation of hanging up the phone.

In some embodiments, the process 208 may be:

the electronic equipment detects whether the electronic equipment is in a communication state;

and if the electronic equipment is in a communication state, the electronic equipment acquires the ultrasonic signal to be identified.

In other embodiments, the process 208 may be:

the method comprises the steps that the electronic equipment detects whether a reminding event of an alarm clock application in the electronic equipment is triggered;

and if the reminding event of the alarm clock application in the electronic equipment is triggered, the electronic equipment acquires the ultrasonic signal to be identified.

For example, in some cases, the user may make some gestures unconsciously. For example, the user may unconsciously wave his hand to the left once within a preset range of the electronic device. It is assumed that waving one hand to the left corresponds to an operation to hang up the phone. When the electronic equipment does not have an incoming call, even if the user waves his hand to the left once, the electronic equipment cannot perform the operation of hanging up the phone. Therefore, the electronic equipment can limit the scene of acquiring the ultrasonic signals to be recognized, so that an unnecessary gesture recognition process is avoided, and the processing load of a processor in the electronic equipment is reduced. For example, it may be limited that the ultrasonic signal to be recognized is acquired when the electronic device is in a communication state. The communication state comprises a call state and an incoming call state. That is to say, when the electronic device is calling, the user can conveniently put through the phone or hang up the phone by using the gesture. Or when the electronic equipment is in a call, the face of the user is close to the electronic equipment, the electronic equipment can turn off the screen, the face of the user is far away from the electronic equipment, and the electronic equipment can light the screen.

For example, the electronic device may be defined to acquire the ultrasonic signal to be recognized only when a reminder event of an alarm clock application is triggered. That is, when the alarm clock rings, the user can conveniently turn off the alarm clock by adopting gestures.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a device for generating a gesture recognition model according to an embodiment of the present disclosure. The generating device 300 of the gesture recognition model may include: the system comprises an acquisition module 301, a generation module 302, a construction module 303 and a training module 304.

An obtaining module 301, configured to obtain multiple segments of ultrasonic signals to be trained.

The generating module 302 is configured to generate a spectrogram to be trained corresponding to each section of ultrasonic signal to be trained, so as to obtain a plurality of spectrograms to be trained.

The building module 303 is configured to build a speech spectrogram database according to the generated plurality of speech spectrograms to be trained.

And the training module 304 is configured to train the spectrogram database to obtain a gesture recognition model.

In some embodiments, the generating unit 302 may be configured to: performing frame windowing on each section of ultrasonic signal to be trained to obtain a multi-frame windowing signal corresponding to each section of ultrasonic signal to be trained; carrying out Fourier transform on each frame of windowed signal to obtain a multi-frame frequency domain signal; calculating the energy density of each frame of frequency domain signal to obtain the energy density of all frames corresponding to each section of ultrasonic signal to be trained; and generating a spectrogram to be trained corresponding to each section of ultrasonic signal to be trained according to the energy density of all frames corresponding to each section of ultrasonic signal to be trained, so as to obtain a plurality of spectrograms to be trained.

In some embodiments, the training unit 304 may be configured to: obtaining each model output result of the gesture recognition model; receiving an operation association instruction, wherein the operation association instruction carries operation information, and the operation information comprises a plurality of operations; and according to the operation association instruction, associating each model output result with one of a plurality of operations respectively to obtain a preset association library.

In some embodiments, the training unit 304 may be configured to: acquiring an ultrasonic signal to be identified reflected by a foreign object; generating a spectrogram to be identified corresponding to the ultrasonic signal to be identified; inputting the spectrogram to be recognized into the gesture recognition model to obtain an output result corresponding to the spectrogram to be recognized; and executing corresponding operation according to the output result corresponding to the spectrogram to be recognized and a preset association library.

In some embodiments, the training unit 304 may be configured to: detecting whether a model output result matched with an output result corresponding to the spectrogram to be recognized exists in a preset association library or not; if a model output result matched with the output result corresponding to the spectrogram to be recognized exists in the preset association library, acquiring operation associated with the model output result; the operation is performed.

In some embodiments, the training unit 304 may be configured to: detecting whether the electronic equipment is in a communication state, wherein the communication state comprises a call state and an incoming call state; and if the electronic equipment is in a communication state, receiving the ultrasonic signal to be identified reflected by the foreign object and acquired.

In some embodiments, the training unit 304 may be configured to: detecting whether a reminding event of an alarm clock application in the electronic equipment is triggered; and if the reminding event of the alarm clock application in the electronic equipment is triggered, receiving the ultrasonic signal to be identified reflected by the foreign object.

In some embodiments, the obtaining unit 301 may be configured to: the method comprises the steps that a plurality of ultrasonic signals to be trained are obtained, wherein the ultrasonic signals to be trained are ultrasonic signals reflected by the face of a user from ultrasonic signals emitted by electronic equipment in the process that the face of the user is close to or far away from the electronic equipment; or, the ultrasonic signal to be trained is an ultrasonic signal reflected by the hand of the user from the ultrasonic signal emitted by the electronic device in the process of waving the hand of the user leftwards, waving the hand rightwards, waving the hand upwards or waving the hand downwards within the preset range of the electronic device.

The present application provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed on a computer, the computer is caused to execute the flow in the generation method of the gesture recognition model provided in this embodiment.

The embodiment of the present application further provides a wireless charging transmitting device, which includes a memory and a processor, where the processor is configured to execute a process in the method for generating a gesture recognition model provided in this embodiment by calling a computer program stored in the memory.

For example, the electronic device may be a mobile terminal such as a tablet computer or a smart phone. Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

The mobile terminal 400 may include a microphone 401, memory 402, processor 403, earpiece 404, speaker 405, and the like. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 5 is not intended to be limiting of mobile terminals and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The microphone 401 may be used to receive ultrasonic signals, pick up speech uttered by the user, and the like.

The memory 402 may be used to store applications and data. The memory 402 stores applications containing executable code. The application programs may constitute various functional modules. The processor 403 executes various functional applications and data processing by running an application program stored in the memory 402.

The processor 403 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by running or executing an application program stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the mobile terminal.

The earpiece 404 and speaker 405 may be used to emit ultrasonic signals.

In this embodiment, the processor 403 in the mobile terminal loads the executable code corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 403 runs the application programs stored in the memory 402, thereby implementing the flow:

acquiring a plurality of sections of ultrasonic signals to be trained;

and training the spectrogram database to obtain a gesture recognition model.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to a second embodiment of the present disclosure.

The mobile terminal 500 may include a microphone 501, a memory 502, a processor 503, an earpiece 504, a speaker 505, an input unit 506, an output unit 507, and the like.

The microphone 501 may be used to receive ultrasonic signals, pick up speech uttered by a user, and the like.

The memory 502 may be used to store applications and data. Memory 502 stores applications containing executable code. The application programs may constitute various functional modules. The processor 503 executes various functional applications and data processing by running an application program stored in the memory 502.

The processor 503 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by running or executing an application program stored in the memory 502 and calling data stored in the memory 502, thereby performing overall monitoring of the mobile terminal.

The earpiece 504 and speaker 505 may be used to emit ultrasonic signals.

The input unit 506 may be used to receive input numbers, character information, or user characteristic information, such as a fingerprint, and generate a keyboard, mouse, joystick, optical, or trackball signal input related to user setting and function control.

The output unit 507 may be used to display information input by or provided to a user and various graphic user interfaces of the mobile terminal, which may be configured by graphics, text, icons, video, and any combination thereof. The output unit may include a display panel.

In this embodiment, the processor 503 in the mobile terminal loads the executable code corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 503 runs the application programs stored in the memory 502, thereby implementing the flow:

acquiring a plurality of sections of ultrasonic signals to be trained;

and training the spectrogram database to obtain a gesture recognition model.

In some embodiments, when the processor 503 executes the process of generating the spectrogram to be trained corresponding to each section of ultrasound signals to be trained to obtain multiple spectrogram to be trained, the process may execute: performing frame windowing on each section of ultrasonic signal to be trained to obtain a multi-frame windowing signal corresponding to each section of ultrasonic signal to be trained; carrying out Fourier transform on each frame of windowed signal to obtain a multi-frame frequency domain signal; calculating the energy density of each frame of frequency domain signal to obtain the energy density of all frames corresponding to each section of ultrasonic signal to be trained; and generating a spectrogram of the to-be-trained speech corresponding to each section of the ultrasonic signal to be trained according to the energy density of all frames corresponding to each section of the ultrasonic signal to be trained, so as to obtain a plurality of spectrograms of the to-be-trained speech.

In some embodiments, after the process of training the spectrogram database to obtain the gesture recognition model is executed by the processor 503, the following may be further executed: obtaining each model output result of the gesture recognition model; receiving an operation association instruction, wherein the operation association instruction carries operation information, and the operation information comprises a plurality of operations; and according to the operation association instruction, associating each model output result with one of a plurality of operations respectively to obtain a preset association library.

In some embodiments, after the processor 503 executes the flow of associating the output result of each model with one of the operations according to the operation association instruction, and obtaining a preset association library, the method may further execute: acquiring an ultrasonic signal to be identified reflected by a foreign object; generating a spectrogram to be identified corresponding to the ultrasonic signal to be identified; inputting the spectrogram to be recognized into the gesture recognition model to obtain an output result corresponding to the spectrogram to be recognized; and executing corresponding operation according to the output result corresponding to the spectrogram to be recognized and a preset association library.

In some embodiments, when the processor 503 executes the flow of the corresponding operation according to the output result corresponding to the spectrogram to be recognized and the preset association library, it may execute: detecting whether a model output result matched with an output result corresponding to the spectrogram to be recognized exists in a preset association library or not; if a model output result matched with the output result corresponding to the spectrogram to be recognized exists in the preset association library, acquiring operation associated with the model output result; the operation is performed.

In some embodiments, when the processor 503 executes the procedure of acquiring the ultrasonic signal to be identified, it may execute: detecting whether the electronic equipment is in a communication state, wherein the communication state comprises a call state and an incoming call state; and if the electronic equipment is in a communication state, receiving the ultrasonic signal to be identified reflected by the foreign object and acquired.

In some embodiments, when the processor 503 executes the procedure of acquiring the ultrasonic signal to be identified, it may execute: detecting whether a reminding event of an alarm clock application in the electronic equipment is triggered; and if the reminding event of the alarm clock application in the electronic equipment is triggered, receiving the ultrasonic signal to be identified reflected by the foreign object.

In some embodiments, when the processor 503 executes the procedure of acquiring a plurality of ultrasonic signals to be trained, it may execute: the method comprises the steps that a plurality of ultrasonic signals to be trained are obtained, wherein the ultrasonic signals to be trained are ultrasonic signals reflected by the face of a user from ultrasonic signals emitted by electronic equipment in the process that the face of the user is close to or far away from the electronic equipment; or, the ultrasonic signal to be trained is an ultrasonic signal reflected by the hand of the user from the ultrasonic signal emitted by the electronic device in the process of waving the hand of the user leftwards, waving the hand rightwards, waving the hand upwards or waving the hand downwards within the preset range of the electronic device.

In the above embodiments, the descriptions of the embodiments have respective emphasis, and parts that are not described in detail in a certain embodiment may be referred to the above detailed description of the generation method for the gesture recognition model, and are not described here again.

The device for generating the gesture recognition model provided in the embodiment of the present application and the method for generating the gesture recognition model in the above embodiments belong to the same concept, and any one of the methods provided in the method for generating the gesture recognition model may be run on the device for generating the gesture recognition model.

It should be noted that, for the method for generating a gesture recognition model described in the embodiment of the present application, it can be understood by those skilled in the art that all or part of the process of implementing the method for generating a gesture recognition model described in the embodiment of the present application may be completed by controlling the relevant hardware through a computer program, where the computer program may be stored in a computer-readable storage medium, such as a memory, and executed by at least one processor, and during the execution process, the process of the embodiment of the method for generating a gesture recognition model may be included. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.

For the device for generating a gesture recognition model according to the embodiment of the present application, each functional module may be integrated into one processing chip, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, or the like.

The method, the apparatus, the storage medium, and the electronic device for generating a gesture recognition model provided in the embodiments of the present application are described in detail above, and a specific example is applied in the description to explain the principles and embodiments of the present application, and the description of the embodiments is only used to help understand the method and the core ideas of the present application; meanwhile, for those skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

A generation method of a gesture recognition model comprises the following steps:

acquiring a plurality of sections of ultrasonic signals to be trained;

generating a spectrogram to be trained corresponding to each section of ultrasonic signal to be trained to obtain a plurality of spectrograms to be trained;

constructing a speech spectrogram database according to the plurality of speech spectrograms to be trained;

and training the spectrogram database to obtain a gesture recognition model.
The generation method of the gesture recognition model according to claim 1, wherein the generating a spectrogram to be trained corresponding to each ultrasonic signal to be trained to obtain a plurality of spectrograms to be trained comprises:

performing frame windowing on each section of ultrasonic signal to be trained to obtain a multi-frame windowing signal corresponding to each section of ultrasonic signal to be trained;

carrying out Fourier transform on each frame of windowed signal to obtain a multi-frame frequency domain signal;

calculating the energy density of each frame of frequency domain signal to obtain the energy density of all frames corresponding to each section of ultrasonic signal to be trained;

and generating a spectrogram of the to-be-trained speech corresponding to each section of the ultrasonic signal to be trained according to the energy density of all frames corresponding to each section of the ultrasonic signal to be trained, so as to obtain a plurality of spectrograms of the to-be-trained speech.
The method for generating a gesture recognition model according to claim 1, wherein after the training of the spectrogram database to obtain a gesture recognition model, the method further comprises:

obtaining each model output result of the gesture recognition model;

receiving an operation association instruction, wherein the operation association instruction carries operation information, and the operation information comprises a plurality of operations;

and according to the operation association instruction, associating each model output result with one of a plurality of operations respectively to obtain a preset association library.
The method for generating a gesture recognition model according to claim 3, wherein after the associating the output result of each model with one of a plurality of operations according to the operation association instruction to obtain a preset association library, the method further comprises:

acquiring an ultrasonic signal to be identified;

generating a spectrogram to be identified corresponding to the ultrasonic signal to be identified;

inputting the spectrogram to be recognized into the gesture recognition model to obtain an output result corresponding to the spectrogram to be recognized;

and executing corresponding operation according to the output result corresponding to the spectrogram to be recognized and a preset association library.
The generation method of the gesture recognition model according to claim 4, wherein the executing corresponding operations according to the output result corresponding to the spectrogram to be recognized and a preset association library comprises:

detecting whether a model output result matched with an output result corresponding to the spectrogram to be recognized exists in a preset association library or not;

if a model output result matched with the output result corresponding to the spectrogram to be recognized exists in the preset association library, acquiring operation associated with the model output result;

the operation is performed.
The generation method of the gesture recognition model according to claim 4, wherein the acquiring the ultrasonic signal to be recognized includes:

detecting whether the electronic equipment is in a communication state, wherein the communication state comprises a call state and an incoming call state;

and if the electronic equipment is in a communication state, acquiring the ultrasonic signal to be identified.
The generation method of the gesture recognition model according to claim 4, wherein the acquiring the ultrasonic signal to be recognized includes:

detecting whether a reminding event of an alarm clock application in the electronic equipment is triggered;

and if the reminding event of the alarm clock application in the electronic equipment is triggered, acquiring the ultrasonic signal to be identified.
The method for generating a gesture recognition model according to claim 1, wherein the acquiring a plurality of ultrasonic signals to be trained comprises:

the method comprises the steps that a plurality of ultrasonic signals to be trained are obtained, wherein the ultrasonic signals to be trained are ultrasonic signals reflected by the face of a user from ultrasonic signals emitted by electronic equipment in the process that the face of the user is close to or far away from the electronic equipment;

or, the ultrasonic signal to be trained is an ultrasonic signal reflected by the hand of the user from the ultrasonic signal emitted by the electronic device in the process of waving the hand of the user leftwards, waving the hand rightwards, waving the hand upwards or waving the hand downwards within the preset range of the electronic device.
An apparatus for generating a gesture recognition model, comprising:

the acquisition module is used for acquiring a plurality of sections of ultrasonic signals to be trained;

the generating module is used for generating a spectrogram of the to-be-trained speech corresponding to each section of the to-be-trained ultrasonic signal to obtain a plurality of spectrograms of the to-be-trained speech;

the construction module is used for constructing a spectrogram database according to the plurality of spectrogram to be trained;

and the training module is used for training the spectrogram database to obtain a gesture recognition model.
The generation apparatus of the gesture recognition model according to claim 9, wherein the generation module is configured to:

performing frame windowing on each section of ultrasonic signal to be trained to obtain a multi-frame windowing signal corresponding to each section of ultrasonic signal to be trained;

carrying out Fourier transform on each frame of windowed signal to obtain a multi-frame frequency domain signal;

calculating the energy density of each frame of frequency domain signal to obtain the energy density of all frames corresponding to each section of ultrasonic signal to be trained;

and generating a spectrogram of the to-be-trained speech corresponding to each section of the ultrasonic signal to be trained according to the energy density of all frames corresponding to each section of the ultrasonic signal to be trained, so as to obtain a plurality of spectrograms of the to-be-trained speech.
The generation apparatus of a gesture recognition model according to claim 9, wherein the training module is configured to:

obtaining each model output result of the gesture recognition model;

receiving an operation association instruction, wherein the operation association instruction carries operation information, and the operation information comprises a plurality of operations;

and according to the operation association instruction, associating each model output result with one of a plurality of operations respectively to obtain a preset association library.
A storage medium having stored therein a computer program which, when run on a computer, causes the computer to execute the method of generating a gesture recognition model according to any one of claims 1 to 8.
An electronic device, wherein the electronic device comprises a processor and a memory, wherein the memory stores a computer program, and the processor is configured to execute, by calling the computer program stored in the memory:

acquiring a plurality of sections of ultrasonic signals to be trained;

generating a spectrogram to be trained corresponding to each section of ultrasonic signal to be trained to obtain a plurality of spectrograms to be trained;

constructing a speech spectrogram database according to the plurality of speech spectrograms to be trained;

and training the spectrogram database to obtain a gesture recognition model.
The electronic device of claim 13, wherein the processor is configured to perform:

performing frame windowing on each section of ultrasonic signal to be trained to obtain a multi-frame windowing signal corresponding to each section of ultrasonic signal to be trained;

carrying out Fourier transform on each frame of windowed signal to obtain a multi-frame frequency domain signal;

calculating the energy density of each frame of frequency domain signal to obtain the energy density of all frames corresponding to each section of ultrasonic signal to be trained;

and generating a spectrogram of the to-be-trained speech corresponding to each section of the ultrasonic signal to be trained according to the energy density of all frames corresponding to each section of the ultrasonic signal to be trained, so as to obtain a plurality of spectrograms of the to-be-trained speech.
The electronic device of claim 13, wherein the processor is configured to perform:

obtaining each model output result of the gesture recognition model;

receiving an operation association instruction, wherein the operation association instruction carries operation information, and the operation information comprises a plurality of operations;

and according to the operation association instruction, associating each model output result with one of a plurality of operations respectively to obtain a preset association library.
The electronic device of claim 15, wherein the processor is configured to perform:

acquiring an ultrasonic signal to be identified;

generating a spectrogram to be identified corresponding to the ultrasonic signal to be identified;

inputting the spectrogram to be recognized into the gesture recognition model to obtain an output result corresponding to the spectrogram to be recognized;

and executing corresponding operation according to the output result corresponding to the spectrogram to be recognized and a preset association library.
The electronic device of claim 16, wherein the processor is configured to perform:

detecting whether a model output result matched with an output result corresponding to the spectrogram to be recognized exists in a preset association library or not;

if a model output result matched with the output result corresponding to the spectrogram to be recognized exists in the preset association library, acquiring operation associated with the model output result;

the operation is performed.
The electronic device of claim 16, wherein the processor is configured to perform:

detecting whether the electronic equipment is in a communication state, wherein the communication state comprises a call state and an incoming call state;

and if the electronic equipment is in a communication state, acquiring the ultrasonic signal to be identified.
The electronic device of claim 16, wherein the processor is configured to perform:

detecting whether a reminding event of an alarm clock application in the electronic equipment is triggered;

and if the reminding event of the alarm clock application in the electronic equipment is triggered, acquiring the ultrasonic signal to be identified.
The electronic device of claim 13, wherein the processor is configured to perform:

the method comprises the steps that a plurality of ultrasonic signals to be trained are obtained, wherein the ultrasonic signals to be trained are ultrasonic signals reflected by the face of a user from ultrasonic signals emitted by electronic equipment in the process that the face of the user is close to or far away from the electronic equipment;

or, the ultrasonic signal to be trained is an ultrasonic signal reflected by the hand of the user from the ultrasonic signal emitted by the electronic device in the process of waving the hand of the user leftwards, waving the hand rightwards, waving the hand upwards or waving the hand downwards within the preset range of the electronic device.