WO2020102943A1 - Procédé et appareil de génération d'un modèle de reconnaissance de gestes, support d'informations et dispositif électronique - Google Patents

Procédé et appareil de génération d'un modèle de reconnaissance de gestes, support d'informations et dispositif électronique

Info

Publication number
WO2020102943A1
WO2020102943A1 PCT/CN2018/116221 CN2018116221W WO2020102943A1 WO 2020102943 A1 WO2020102943 A1 WO 2020102943A1 CN 2018116221 W CN2018116221 W CN 2018116221W WO 2020102943 A1 WO2020102943 A1 WO 2020102943A1
Authority
WO
WIPO (PCT)
Prior art keywords
trained
electronic device
ultrasonic signal
gesture recognition
recognition model
Prior art date
Application number
PCT/CN2018/116221
Other languages
English (en)
Chinese (zh)
Inventor
陈岩
Original Assignee
深圳市欢太科技有限公司
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市欢太科技有限公司, Oppo广东移动通信有限公司 filed Critical 深圳市欢太科技有限公司
Priority to PCT/CN2018/116221 priority Critical patent/WO2020102943A1/fr
Priority to CN201880097776.6A priority patent/CN112740219A/zh
Publication of WO2020102943A1 publication Critical patent/WO2020102943A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present application belongs to the technical field of terminals, and particularly relates to a method, device, storage medium, and electronic device for generating a gesture recognition model.
  • gesture recognition generally refers to the recognition of face and hand movements. Users can use simple gestures to control or interact with the terminal, allowing the terminal to understand the user's behavior.
  • ultrasonic signals can be used to realize gesture recognition.
  • a gesture recognition model is required when performing ultrasonic gesture recognition.
  • the recognition accuracy of the gesture recognition model used for ultrasonic gesture recognition is low.
  • Embodiments of the present application provide a method, device, storage medium, and electronic device for generating a gesture recognition model, which can obtain a gesture recognition model with higher accuracy, thereby improving the accuracy of gesture recognition.
  • an embodiment of the present application provides a method for generating a gesture recognition model, including:
  • an embodiment of the present application provides a device for generating a gesture recognition model, including:
  • the acquisition module is used to acquire multiple ultrasonic signals to be trained
  • the generating module is used to generate a spectrum map to be trained corresponding to each ultrasonic signal to be trained, and obtain multiple spectrum maps to be trained;
  • a building module used to build a database of spectral charts according to the multiple spectral charts to be trained
  • the training module is used for training the speech spectrum database to obtain a gesture recognition model.
  • an embodiment of the present application provides a storage medium on which a computer program is stored, wherein, when the computer program is executed on a computer, the computer is caused to execute the method for generating a gesture recognition model provided in this embodiment .
  • an embodiment of the present application provides an electronic device, including a memory and a processor, where a computer program is stored in the memory, and the processor is used to execute the computer program by calling the computer program stored in the memory:
  • the feature points of the spectrogram are more and more prominent than the ultrasonic signal, compared with the gesture recognition model generated by directly training the ultrasonic signal as sample data, the In the example, by converting the ultrasonic signal into a spectrogram, and then training the spectrogram as sample data, the accuracy of the generated gesture recognition model is higher, thereby improving the accuracy of gesture recognition.
  • FIG. 1 is a first schematic flowchart of a method for generating a gesture recognition model provided by an embodiment of the present application.
  • FIG. 2 is a second schematic flowchart of a method for generating a gesture recognition model provided by an embodiment of the present application.
  • FIG. 3 is a third flowchart of a method for generating a gesture recognition model provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a device for generating a gesture recognition model provided by an embodiment of the present application.
  • FIG. 5 is a first schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 6 is a second schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 1 is a first schematic flowchart of a method for generating a gesture recognition model provided by an embodiment of the present application.
  • the flow of the method for generating the gesture recognition model may include:
  • the ultrasonic signal to be trained is one of the audio signals, and the vibration frequency is greater than 20000 Hz, which exceeds the general upper limit of human ear hearing (20000 Hz), and no noise is generated, causing uncomfortable experience for users.
  • electronic devices can emit ultrasonic signals. Therefore, when the user's face or hand moves within the preset range of the electronic device, the electronic device acquires the ultrasonic signal reflected by the user's face or hand.
  • the ultrasonic signal emitted by the electronic device may be a continuous frequency sweep signal.
  • the ultrasonic signal to be trained may be an ultrasonic signal reflected by the user's face when the user's face approaches or is away from the electronic device.
  • the ultrasonic signal to be trained may be the electronic device transmitting the user's hand within the preset range of the electronic device to the left, the right, the upward, the downward, the left and right, or the back and forth The ultrasonic signal is reflected by the user's hand.
  • the electronic device emits an ultrasonic signal.
  • the electronic device emits an ultrasonic signal
  • user A can bring the electronic device closer to the face.
  • the ultrasonic signal reflected by the user's face is electronically obtained to obtain the ultrasonic signal to be trained corresponding to user A.
  • the electronic device can obtain ultrasonic signals to be trained corresponding to multiple users, such as user B, user C, user D, and user E, in accordance with the foregoing manners, to obtain multiple segments of ultrasonic signals to be trained.
  • the preset range may be automatically determined by the electronic device, or may be set by the user.
  • the preset range can be set to an area 3 cm away from the electronic device.
  • the preset range can also be set to an area 5 cm away from the electronic device, and so on, without specific limitations here.
  • a to-be-trained speech spectrum corresponding to each segment of the to-be-trained ultrasonic signal is generated, and multiple to-be-trained speech spectrums are obtained.
  • the electronic device may generate a spectral map to be trained corresponding to each ultrasonic signal to be trained to obtain multiple spectral charts to be trained.
  • a spectrogram database is constructed based on a plurality of spectrograms to be trained.
  • the process 101 to the process 103 are the sample data collection process of the gesture recognition model, that is, the process of collecting the spectrogram in the spectrogram database.
  • the electronic device can obtain more phonograms for training.
  • Method 1 Build a spectrogram database for a certain scene.
  • user A emits an ultrasonic signal when the electronic device is close to the face.
  • the electronic device emits an ultrasonic signal
  • user A can bring the electronic device closer to the face.
  • the ultrasonic signal reflected by the user's face is electronically acquired to obtain the ultrasonic signal to be trained by user A.
  • the electronic device can acquire the ultrasound signal to be trained by user B, the ultrasound signal to be trained by user C, the ultrasound signal to be trained by user D, the ultrasound signal to be trained by user E, and so on in the above manner. For example, suppose that the electronic device obtains a total of 500 users' ultrasonic signals to be trained.
  • the electronic device generates a to-be-trained spectral map corresponding to each user's to-be-trained ultrasonic signal, and obtains to-be-trained spectral maps of 500 users.
  • the electronic device can aggregate 500 users' to-be-trained spectrum charts into a spectrum chart database.
  • user A emits an ultrasonic signal when the electronic device is close to the face.
  • the electronic device emits an ultrasonic signal
  • user A can bring the electronic device closer to the face.
  • the ultrasonic signal reflected by the user's face is electronically obtained to obtain the ultrasonic signal to be trained by user A in the scene.
  • the electronic device can obtain the ultrasonic signal to be trained by user B in this scenario, the ultrasonic signal to be trained by user C in this scenario, the ultrasonic signal to be trained by user D in this scenario, and the user E in this manner in the above manner.
  • the electronic device obtains a total of 500 ultrasound signals to be trained in this scenario by 500 users.
  • the electronic device generates a to-be-trained spectrogram corresponding to each user's to-be-trained ultrasonic signal under the scene, to obtain 500 to-be-trained spectrograms of the user under the scene.
  • the electronic device can obtain 500 user spectrum spectra to be trained in multiple scenarios in the above manner.
  • the electronic device can aggregate 500 to-be-trained spectral graphs of 500 users in the same scene into a spectral graph sub-database. For example, assuming that there are 6 scenes, the electronic device can obtain 6 sub-databases of the spectrogram. The electronic device can combine these 6 spectral graph sub-databases into a spectral graph database.
  • the electronic device can directly train the spectrogram database to obtain a gesture recognition model.
  • the gesture recognition model can only recognize gestures in a certain scene.
  • the electronic device can train each sub-database of the spectrogram database to obtain a gesture recognition model.
  • the gesture recognition model can recognize gestures in multiple scenes.
  • the electronic device can construct the database of the spectrogram according to actual needs. For example, in some embodiments, the electronic device needs a gesture recognition model that can recognize gestures in 8 scenarios. Then, the electronic device can construct a spectrogram database for the eight scenes, and then train the spectrogram database to obtain a gesture recognition model, and the gesture recognition model can recognize the gestures in the eight scenes.
  • this embodiment In, by converting the ultrasonic signal into a spectrogram, and then using the spectrogram as sample data for training, the accuracy of the generated gesture recognition model is higher, thereby improving the accuracy of gesture recognition.
  • FIG. 2 is a second schematic flowchart of a method for generating a gesture recognition model provided by an embodiment of the present application.
  • the method for generating the gesture recognition model may include:
  • the electronic device acquires multiple ultrasonic signals to be trained.
  • the ultrasonic signal to be trained is one of the audio signals, and the vibration frequency is greater than 20000 Hz, which exceeds the general upper limit of human hearing (20000 Hz), and no noise is generated, causing a user's uncomfortable experience.
  • electronic devices can use earpieces or speakers to emit ultrasonic signals. Therefore, when a foreign object is moving within a preset range of the electronic device, the electronic device may use a microphone to receive the ultrasonic signal reflected by the foreign object.
  • the ultrasonic signal emitted by the electronic device may be a continuous frequency sweep signal.
  • the ultrasonic signal to be trained may be an ultrasonic signal reflected by the user's face when the user's face approaches or is away from the electronic device.
  • the ultrasonic signal to be trained may be the electronic device transmitting the user's hand within the preset range of the electronic device to the left, the right, the upward, the downward, the left and right, or the back and forth. The ultrasonic signal is reflected by the user's hand.
  • the electronic device emits an ultrasonic signal.
  • the electronic device emits an ultrasonic signal
  • user A can bring the electronic device closer to the face.
  • the ultrasonic signal reflected by the user's face is electronically obtained to obtain the ultrasonic signal to be trained corresponding to user A.
  • the electronic device can obtain ultrasonic signals to be trained corresponding to multiple users, such as user B, user C, user D, and user E, in accordance with the foregoing manners, to obtain multiple segments of ultrasonic signals to be trained.
  • the preset range may be automatically determined by the electronic device, or may be set by the user.
  • the preset range can be set to an area 3 cm away from the electronic device.
  • the preset range can also be set to an area 5 cm away from the electronic device, and so on, without specific limitations here.
  • the electronic device performs frame windowing processing on each segment of the ultrasonic signal to be trained to obtain a multi-frame windowing signal corresponding to each segment of the ultrasonic signal to be trained.
  • the electronic device may perform frame-by-frame windowing on each segment of the ultrasonic signals to be trained to obtain multiple frame windowing signals corresponding to each segment of the ultrasonic signals to be trained.
  • the electronic device may perform frame-by-frame windowing on each segment of the ultrasonic signals to be trained to obtain multiple frame windowing signals corresponding to each segment of the ultrasonic signals to be trained.
  • usually one frame length is 20ms, and the frame shift is 10ms to frame each ultrasonic signal to be trained.
  • the ultrasonic signal is a time domain signal.
  • the windowed signal is a time-domain signal.
  • the electronic device performs Fourier transform on the windowed signal of each frame to obtain a multi-frame frequency domain signal.
  • performing a Fourier transform on the time domain signal can convert the time domain signal into a frequency domain signal. Since the windowed signal is a time-domain signal, the electronic device performs Fourier transform on the windowed signal of each frame to obtain a multi-frame frequency domain signal.
  • the electronic device calculates the energy density of the signal in the frequency domain of each frame to obtain the energy density of all the frames corresponding to each ultrasonic signal to be trained.
  • the electronic device generates, according to the energy density of all frames corresponding to each segment of the ultrasonic signal to be trained, a to-be-trained speech spectrum map corresponding to each to-be-trained ultrasound signal, to obtain multiple to-be-trained speech spectrum maps.
  • the electronic device calculates the energy density of the frequency domain signal of each frame to obtain the energy density of all frames corresponding to each segment of the ultrasonic signal to be trained.
  • the electronic device generates, according to the energy density of all frames corresponding to each segment of the ultrasonic signal to be trained, a to-be-trained speech spectrum map corresponding to each to-be-trained ultrasound signal, and obtains multiple to-be-trained speech spectrum maps.
  • the spectrogram is the spectrum analysis view, which represents the image of the ultrasonic signal spectrum changing with time.
  • the spectrogram uses a two-dimensional plane to express three-dimensional information. That is, in the spectrogram, the abscissa is time, the ordinate is frequency, and the energy density of the frequency domain signal of each frame is the coordinate point in the spectrogram.
  • the energy density of the signal in the frequency domain of each frame is expressed by the color depth. For example, a dark color at a coordinate point indicates a large energy density; a light color at a coordinate point indicates a small energy density.
  • the electronic device constructs a database of spectral graphs based on multiple spectral graphs to be trained.
  • flow 201 to flow 206 are the sample data collection process of the gesture recognition model, that is, the process of collecting the spectrum chart in the spectrum chart database.
  • the electronic device can obtain more phonograms for training.
  • Method 1 Build a spectrogram database for a certain scene.
  • user A emits an ultrasonic signal when the electronic device is close to the face.
  • the electronic device emits an ultrasonic signal
  • user A can bring the electronic device closer to the face.
  • the ultrasonic signal reflected by the user's face is electronically acquired to obtain the ultrasonic signal to be trained by user A.
  • the electronic device may acquire the ultrasound signal to be trained by user B, the ultrasound signal to be trained by user C, the ultrasound signal to be trained by user D, the ultrasound signal to be trained by user E, and so on in the above manner. For example, suppose that the electronic device obtains a total of 500 users' ultrasonic signals to be trained.
  • the electronic device generates a to-be-trained speech spectrum corresponding to each user's to-be-trained ultrasound signal, to obtain 500 to-be-trained speech spectra.
  • the electronic device can aggregate 500 users' to-be-trained spectrum charts into a spectrum chart database.
  • the electronic device emits ultrasonic signals.
  • the electronic device emits an ultrasonic signal
  • user A can bring the electronic device closer to the face.
  • the ultrasonic signal reflected by the user's face is electronically obtained to obtain the ultrasonic signal to be trained by user A in the scene.
  • the electronic device can obtain the ultrasonic signal to be trained by user B in this scenario, the ultrasonic signal to be trained by user C in this scenario, the ultrasonic signal to be trained by user D in this scenario, and the user E in this manner in the above manner.
  • the electronic device obtains a total of 500 ultrasound signals to be trained in this scenario by 500 users. Then, the electronic device generates a to-be-trained spectrogram corresponding to each user's to-be-trained ultrasonic signal under the scene, to obtain 500 to-be-trained spectrograms of the user under the scene.
  • the electronic device can obtain 500 user spectrum spectra to be trained in multiple scenarios in the above manner.
  • the multiple scenes may include: the user brings the electronic device close to the face, the user's hand waves once to the left, once to the right, once to the right, or once to the down within the preset range of the electronic device, and so on.
  • the electronic device can aggregate 500 users' to-be-trained spectrum charts in the same scene into a spectrum chart sub-database. For example, assuming that there are 6 scenes, the electronic device can obtain 6 sub-databases of the spectrogram. The electronic device can combine these 6 spectral graph sub-databases into a spectral graph database.
  • the electronic device trains the spectrogram database to obtain a gesture recognition model.
  • the electronic device can use the convolutional neural network CNN to directly train the spectrogram database to obtain a gesture recognition model.
  • the gesture recognition model can only recognize gestures in a certain scene.
  • the electronic device can use a convolutional neural network CNN to train each sub-database of the spectrogram database to obtain a gesture recognition model.
  • the gesture recognition model can recognize gestures in multiple scenes.
  • the electronic device can construct the database of the spectrogram according to actual needs. For example, in some embodiments, the electronic device needs a gesture recognition model that can recognize gestures in 8 scenarios. Then, the electronic device can construct a spectrogram database for the eight scenes, and then train the spectrogram database to obtain a gesture recognition model, and the gesture recognition model can recognize the gestures in the eight scenes.
  • the electronic device obtains the output results of each model of the gesture recognition model.
  • the electronic device can obtain the output results of each model of the gesture model.
  • the gesture recognition model M can recognize gestures in 6 scenes. It can be understood that for each of the six scenes, the gesture recognition model can output the corresponding result. That is, the gesture recognition model M has a total of 6 model output results. Assume that the six model output results are model output result a, model output result b, model output result c, model output result d, model output result e and model output result f.
  • the electronic device receives an operation-related instruction.
  • the electronic device can receive the operation association instruction.
  • the operation association instruction carries operation information, and the operation information includes multiple operations.
  • the operation association instruction instructs the electronic device to associate the output results of each model of the gesture recognition model with one of the operations. Assume that the multiple operations are: breathing screen, bright screen, turn off the alarm, connect the phone, hang up the phone and take pictures.
  • the electronic device associates the output results of each model with one of a plurality of operations according to the operation association instruction to obtain a preset association library.
  • the electronic device can associate the model output result a with the breath screen, the model output result b with the bright screen, the model output result c with the off alarm clock, the model output result d with the connected phone, the model output result e with the hang up call
  • the association and the model output result f are associated with the photograph to obtain the preset association library S.
  • the electronic device acquires an ultrasonic signal to be recognized.
  • the ultrasonic signal to be recognized is one of the audio signals
  • the vibration frequency is greater than 20000 Hz, which exceeds the general upper limit of human ear hearing (20000 Hz), and no noise is generated, causing a user's uncomfortable experience.
  • electronic devices can emit ultrasonic signals. Therefore, when the user's face or hand moves within the preset range of the electronic device, the electronic device acquires the ultrasonic signal reflected by the user's face or hand, that is, the ultrasonic signal to be recognized.
  • the ultrasonic signal emitted by the electronic device may be a continuous frequency sweep signal.
  • the ultrasonic signal can be expressed as:
  • T represents the duration
  • f 1 represents the start frequency
  • f 2 represents the end frequency
  • a 1 represents the amplitude
  • f s the sampling frequency
  • priority is not limited
  • the f s 96kHz.
  • the electronic device acquires the ultrasound reflected by user A's hand Signal to obtain the ultrasonic signal A to be identified.
  • the electronic device generates a spectrogram to be recognized corresponding to the ultrasonic signal to be recognized.
  • the electronic device When the electronic device receives the ultrasonic signal A to be recognized, the electronic device can generate the spectrogram A to be recognized according to the ultrasonic signal A to be recognized.
  • the electronic device inputs the spectrogram to be recognized into the gesture recognition model to obtain an output result corresponding to the spectrogram to be recognized.
  • the electronic device inputs the to-be-recognized spectrogram A into the gesture recognition model to obtain an output result corresponding to the to-be-recognized spectrogram A.
  • the output result corresponding to the to-be-recognized spectrogram A is e.
  • the electronic device performs a corresponding operation according to the output result corresponding to the spectrogram to be recognized and the preset association library.
  • the process 214 may include a process 2141, a process 2142, and a process 2143, which may be:
  • the electronic device detects whether there is a model output result in the preset association library that matches the output result corresponding to the to-be-recognized spectrogram.
  • the electronic device When the electronic device obtains the output result e corresponding to the to-be-recognized spectrogram A, the electronic device detects whether there is a model output result e matching the output result to be recognized e in the preset association library S.
  • the electronic device obtains an operation associated with the model output result.
  • the electronic device performs the operation.
  • the electronic device performs the operation of hanging up the phone.
  • the process 208 may be:
  • the electronic device detects whether the electronic device is in a communication state
  • the electronic device acquires the ultrasonic signal to be recognized.
  • the process 208 may be:
  • the electronic device detects whether a reminder event of the alarm clock application in the electronic device is triggered
  • the electronic device obtains the ultrasonic signal to be recognized.
  • the user may unconsciously make some gestures.
  • the user may unconsciously wave his hand to the left once within the preset range of the electronic device.
  • a wave to the left corresponds to an operation to hang up the phone.
  • the electronic device can define the scene for acquiring the ultrasonic signal to be recognized, thereby avoiding unnecessary gesture recognition process and reducing the processing load of the processor in the electronic device.
  • the ultrasonic signal to be recognized is acquired only when the electronic device is in the communication state.
  • the communication state includes the call state and the incoming call state.
  • the user can easily connect the phone or hang up the phone with gestures.
  • the electronic device when the electronic device is talking, the user's face is close to the electronic device, the electronic device can hold the screen, the user's face is far away from the electronic device, and the electronic device can be bright.
  • the electronic device acquires the ultrasonic signal to be recognized only when the alarm event of the alarm clock application is triggered. That is, when the alarm sounds, the user can easily turn off the alarm using gestures.
  • FIG. 4 is a schematic structural diagram of a device for generating a gesture recognition model according to an embodiment of the present application.
  • the apparatus 300 for generating a gesture recognition model may include an acquisition module 301, a generation module 302, a construction module 303, and a training module 304.
  • the obtaining module 301 is used to obtain multiple ultrasonic signals to be trained.
  • the generating module 302 is configured to generate a to-be-trained speech spectrum corresponding to each segment of the to-be-trained ultrasonic signal, and obtain multiple to-be-trained speech spectra.
  • the construction module 303 is configured to construct a database of spectrum charts based on the generated plurality of spectrum charts to be trained.
  • the training module 304 is used for training the spectral map database to obtain a gesture recognition model.
  • the generating unit 302 may be used to: perform frame-by-frame windowing processing on each segment of the ultrasonic signal to be trained, to obtain a multi-frame windowing signal corresponding to each segment of the ultrasonic signal to be trained; add to each frame Fourier transform the window signal to obtain multi-frame frequency domain signals; calculate the energy density of each frame frequency domain signal to obtain the energy density of all frames corresponding to each segment of the ultrasonic signal to be trained; according to each segment to be trained The energy density of all the frames corresponding to the ultrasonic signal generates the to-be-trained speech spectrum corresponding to each segment of the to-be-trained ultrasound signal, and obtains multiple to-be-trained speech spectra.
  • the training unit 304 may be used to: obtain the output results of each model of the gesture recognition model; receive an operation association instruction, the operation association instruction carries operation information, and the operation information includes multiple operations; The operation association instruction associates the output results of each model with one of the operations to obtain a preset association library.
  • the training unit 304 may be used to: acquire the ultrasonic signal to be recognized reflected by the foreign object; generate the to-be-recognized spectrogram corresponding to the to-be-recognized ultrasonic signal; input the to-be-recognized spectrogram.
  • the gesture recognition model is used to obtain an output result corresponding to the spectrogram to be recognized; according to the output result corresponding to the spectrogram to be recognized and a preset association library, corresponding operations are performed.
  • the training unit 304 may be used to: detect whether there is a model output result matching the output result corresponding to the spectrogram to be recognized in the preset association library; if there is a language to be recognized in the preset association library
  • the model output result corresponding to the output result corresponding to the spectrogram obtains the operation associated with the model output result; and performs the operation.
  • the training unit 304 may be used to: detect whether the electronic device is in a communication state, the communication state includes a call state and an incoming call state; if the electronic device is in a communication state, receive foreign objects to obtain reflection to be identified Ultrasonic signal.
  • the training unit 304 may be used to: detect whether the alarm event of the alarm clock application in the electronic device is triggered; if the alarm event of the alarm clock application in the electronic device is triggered, receive Identify ultrasonic signals.
  • the acquiring unit 301 may be used to acquire multiple ultrasonic signals to be trained.
  • the ultrasonic signals to be trained are the ultrasonic signals emitted by the electronic device when the user's face approaches or is away from the electronic device.
  • the ultrasonic signal reflected by the user's face; or, the ultrasonic signal to be trained is a process in which the user's hand is swung left, once right, once upward, or once downward within the preset range of the electronic device, The ultrasonic signal emitted by the electronic device is reflected by the user's hand.
  • An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed on a computer, the computer is caused to execute the gesture recognition model generation method provided in this embodiment Process.
  • An embodiment of the present application further provides a wireless charging transmitting device, including a memory and a processor, and the processor is used to execute a method of generating a gesture recognition model provided by this embodiment by calling a computer program stored in the memory Process.
  • the aforementioned electronic device may be a mobile terminal such as a tablet computer or a smart phone.
  • FIG. 5 is a first structural schematic diagram of an electronic device provided by an embodiment of the present application.
  • the mobile terminal 400 may include components such as a microphone 401, a memory 402, a processor 403, an earpiece 404, and a speaker 405.
  • components such as a microphone 401, a memory 402, a processor 403, an earpiece 404, and a speaker 405.
  • FIG. 5 does not constitute a limitation on the mobile terminal, and may include more or less components than those illustrated, or combine certain components, or have different component arrangements.
  • the microphone 401 can be used to receive ultrasonic signals, pick up voices uttered by the user, and so on.
  • the memory 402 may be used to store application programs and data.
  • the application program stored in the memory 402 contains executable code.
  • the application program can form various functional modules.
  • the processor 403 executes application programs stored in the memory 402 to execute various functional applications and data processing.
  • the processor 403 is the control center of the mobile terminal, and uses various interfaces and lines to connect the various parts of the entire mobile terminal, and executes the mobile terminal by running or executing application programs stored in the memory 402 and calling data stored in the memory 402 Various functions and processing data to monitor the mobile terminal as a whole.
  • the earpiece 404 and the speaker 405 may be used to emit ultrasonic signals.
  • the processor 403 in the mobile terminal loads the executable code corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 403 runs and stores the memory in the memory The application in 402, thereby implementing the process:
  • FIG. 6 is a second structural schematic diagram of an electronic device provided by an embodiment of the present application.
  • the mobile terminal 500 may include components such as a microphone 501, a memory 502, a processor 503, an earpiece 504, a speaker 505, an input unit 506, and an output unit 507.
  • the microphone 501 can be used to receive ultrasonic signals, pick up voices uttered by the user, and so on.
  • the memory 502 may be used to store application programs and data.
  • the application program stored in the memory 502 contains executable code.
  • the application program can form various functional modules.
  • the processor 503 executes application programs stored in the memory 502 to execute various functional applications and data processing.
  • the processor 503 is the control center of the mobile terminal, and uses various interfaces and lines to connect the various parts of the entire mobile terminal, and executes the mobile terminal by running or executing application programs stored in the memory 502 and calling data stored in the memory 502 Various functions and processing data to monitor the mobile terminal as a whole.
  • the earpiece 504 and the speaker 505 may be used to transmit ultrasonic signals.
  • the input unit 506 can be used to receive input numbers, character information, or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
  • user characteristic information such as fingerprints
  • the output unit 507 can be used to display information input by the user or provided to the user and various graphical user interfaces of the mobile terminal. These graphical user interfaces can be composed of graphics, text, icons, videos, and any combination thereof.
  • the output unit may include a display panel.
  • the processor 503 in the mobile terminal will load the executable code corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 503 will run the stored code in the memory The application in 502, thereby implementing the process:
  • the processor 503 executes the process of generating the to-be-trained spectrogram corresponding to each segment of the to-be-trained ultrasound signal, and obtaining multiple pieces of to-be-trained spectrogram can be performed by: The signal is framed and windowed to obtain multi-frame windowed signals corresponding to each ultrasonic signal to be trained; Fourier transform is performed on each framed windowed signal to obtain multi-frame frequency domain signals; the energy of each frame frequency domain signal is calculated Density, to obtain the energy density of all frames corresponding to each segment of the ultrasonic signal to be trained; according to the energy density of all frames corresponding to each segment of the ultrasonic signal to be trained, to generate a spectrum map to be trained corresponding to each segment of the ultrasonic signal to be trained Spectral graph to be trained.
  • the processor 503 may also execute: obtain the output results of each model of the gesture recognition model;
  • the operation association instruction carries operation information, and the operation information includes multiple operations; according to the operation association instruction, the output results of each model are respectively associated with one of the operations to obtain a preset association library.
  • the processor 503 executes the operation association instruction and associates the output results of the respective models with one of a plurality of operations to obtain a preset association library. Execution: Obtain the ultrasonic signal to be recognized reflected by the foreign object; generate the to-be-recognized spectrogram corresponding to the to-be-recognized ultrasonic signal; input the to-be-recognized spectrogram to the gesture recognition model to obtain the corresponding to-be-recognized spectrogram Output results; perform corresponding operations according to the output results corresponding to the to-be-recognized spectrogram and the preset association library.
  • the processor 503 executes the process of performing the corresponding operation according to the output result corresponding to the to-be-recognized spectrogram and the preset associated library, and may perform: detecting whether there is a The model output result matching the output result corresponding to the spectrogram to be recognized; if there is a model output result matching the output result corresponding to the to-be-recognized spectrogram in the preset association library, the operation associated with the model output result is obtained; Described operation.
  • the processor 503 when the processor 503 executes the process of acquiring the ultrasonic signal to be recognized, it may perform: detecting whether the electronic device is in a communication state, the communication state includes a call state and an incoming call state; if the electronic device is in a communication state, Then receive the foreign object to obtain the reflected ultrasonic signal to be identified.
  • the processor 503 when the processor 503 executes the process of acquiring the ultrasonic signal to be recognized according to the process, it may perform: detecting whether a reminder event of the alarm clock application in the electronic device is triggered; if the reminder of the alarm clock application in the electronic device When the event is triggered, the ultrasonic signal to be recognized is received by the reflection of the foreign object.
  • the processor 503 when the processor 503 executes the process of acquiring multiple ultrasonic signals to be trained, it may perform: acquiring multiple ultrasonic signals to be trained, the ultrasonic signals to be trained are the user's face close to or away from the electronic device During the process, the ultrasonic signal emitted by the electronic device is reflected by the user's face; or, the ultrasonic signal to be trained is the user's hand waving once to the left, once to the right, and upward within the preset range of the electronic device During a wave or a wave down, the ultrasonic signal emitted by the electronic device is reflected by the user's hand.
  • the device for generating a gesture recognition model provided by an embodiment of the present application and the method for generating a gesture recognition model in the above embodiments belong to the same concept, and the generation of the gesture recognition model can be run on the device for generating a gesture recognition model
  • the specific implementation process is described in detail in the method embodiment for generating a gesture recognition model, and details are not described herein again.
  • the method of generating the gesture recognition model described in the embodiment of the present application a person of ordinary skill in the art can understand that all or part of the process of implementing the method of generating the gesture recognition model described in the embodiment of the present application can be performed by a computer.
  • a program to control related hardware to complete the computer program may be stored in a computer-readable storage medium, such as stored in a memory, and executed by at least one processor, and may include gesture recognition as described in the execution process
  • the flow of an embodiment of a method of generating a model The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), and so on.
  • each functional module may be integrated into one processing chip, or each module may exist alone physically, or two or more modules may be integrated into one module in.
  • the above integrated modules may be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer-readable storage medium, such as a read-only memory, magnetic disk, or optical disk, etc. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

La présente invention concerne un procédé et un appareil pour générer un modèle de reconnaissance de gestes, un support d'informations et un dispositif électronique. Le procédé comprend : l'acquisition d'un signal ultrasonore à multiples segments pour entraîner la génération d'un spectrogramme pour apprentissage correspondant à chaque segment dans le signal ultrasonore de façon à obtenir de multiples spectrogrammes pour apprentissage ; la création d'une base de données de spectrogrammes selon les multiples spectrogrammes ; et l'apprentissage de la base de données de spectrogrammes pour obtenir un modèle de reconnaissance de gestes.
PCT/CN2018/116221 2018-11-19 2018-11-19 Procédé et appareil de génération d'un modèle de reconnaissance de gestes, support d'informations et dispositif électronique WO2020102943A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/116221 WO2020102943A1 (fr) 2018-11-19 2018-11-19 Procédé et appareil de génération d'un modèle de reconnaissance de gestes, support d'informations et dispositif électronique
CN201880097776.6A CN112740219A (zh) 2018-11-19 2018-11-19 手势识别模型的生成方法、装置、存储介质及电子设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/116221 WO2020102943A1 (fr) 2018-11-19 2018-11-19 Procédé et appareil de génération d'un modèle de reconnaissance de gestes, support d'informations et dispositif électronique

Publications (1)

Publication Number Publication Date
WO2020102943A1 true WO2020102943A1 (fr) 2020-05-28

Family

ID=70774192

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/116221 WO2020102943A1 (fr) 2018-11-19 2018-11-19 Procédé et appareil de génération d'un modèle de reconnaissance de gestes, support d'informations et dispositif électronique

Country Status (2)

Country Link
CN (1) CN112740219A (fr)
WO (1) WO2020102943A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949447A (zh) * 2021-02-25 2021-06-11 北京京东方技术开发有限公司 手势识别方法、系统、装置、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760825A (zh) * 2016-02-02 2016-07-13 深圳市广懋创新科技有限公司 一种基于切比雪夫前向神经网络的手势识别系统和方法
CN106203380A (zh) * 2016-07-20 2016-12-07 中国科学院计算技术研究所 超声波手势识别方法及系统
US9569006B2 (en) * 2014-04-10 2017-02-14 Mediatek Inc. Ultrasound-based methods for touchless gesture recognition, and apparatuses using the same
CN107450724A (zh) * 2017-07-31 2017-12-08 武汉大学 一种基于双声道音频多普勒效应的手势识别方法及系统
CN107526437A (zh) * 2017-07-31 2017-12-29 武汉大学 一种基于音频多普勒特征量化的手势识别方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104898844B (zh) * 2015-01-23 2019-07-09 瑞声光电科技(常州)有限公司 基于超声波定位的手势识别与控制装置及识别与控制方法
CN105807923A (zh) * 2016-03-07 2016-07-27 中国科学院计算技术研究所 一种基于超声波的凌空手势识别方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9569006B2 (en) * 2014-04-10 2017-02-14 Mediatek Inc. Ultrasound-based methods for touchless gesture recognition, and apparatuses using the same
CN105760825A (zh) * 2016-02-02 2016-07-13 深圳市广懋创新科技有限公司 一种基于切比雪夫前向神经网络的手势识别系统和方法
CN106203380A (zh) * 2016-07-20 2016-12-07 中国科学院计算技术研究所 超声波手势识别方法及系统
CN107450724A (zh) * 2017-07-31 2017-12-08 武汉大学 一种基于双声道音频多普勒效应的手势识别方法及系统
CN107526437A (zh) * 2017-07-31 2017-12-29 武汉大学 一种基于音频多普勒特征量化的手势识别方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949447A (zh) * 2021-02-25 2021-06-11 北京京东方技术开发有限公司 手势识别方法、系统、装置、设备和介质

Also Published As

Publication number Publication date
CN112740219A (zh) 2021-04-30

Similar Documents

Publication Publication Date Title
WO2021036644A1 (fr) Procédé et appareil d'animation à commande vocale basés sur l'intelligence artificielle
WO2019214361A1 (fr) Procédé de détection d'un terme clé dans un signal vocal, dispositif, terminal et support de stockage
WO2020006935A1 (fr) Procédé et dispositif d'extraction de caractéristiques d'empreinte vocale d'animal et support de stockage lisible par ordinateur
WO2021135628A1 (fr) Procédé de traitement de signal vocal et procédé de séparation de la voix
US20140129207A1 (en) Augmented Reality Language Translation
CN111124108B (zh) 模型训练方法、手势控制方法、装置、介质及电子设备
US20130211826A1 (en) Audio Signals as Buffered Streams of Audio Signals and Metadata
WO2021114847A1 (fr) Procédé et appareil d'appel internet, dispositif informatique et support de stockage
CN110322760B (zh) 语音数据生成方法、装置、终端及存储介质
CN108922525B (zh) 语音处理方法、装置、存储介质及电子设备
CN110265011B (zh) 一种电子设备的交互方法及其电子设备
CN110364156A (zh) 语音交互方法、系统、终端及可读存储介质
US20180054688A1 (en) Personal Audio Lifestyle Analytics and Behavior Modification Feedback
CN109361995B (zh) 一种电器设备的音量调节方法、装置、电器设备和介质
CN111986691B (zh) 音频处理方法、装置、计算机设备及存储介质
CN111863020B (zh) 语音信号处理方法、装置、设备及存储介质
CN110390953B (zh) 啸叫语音信号的检测方法、装置、终端及存储介质
WO2020173211A1 (fr) Procédé et appareil pour déclencher des effets d'image spéciaux et dispositif matériel
WO2020020375A1 (fr) Procédé et appareil de traitement vocal, dispositif électronique et support de stockage lisible
WO2016206647A1 (fr) Système de commande d'appareil mécanique permettant de générer une action
CN110176242A (zh) 一种音色的识别方法、装置、计算机设备和存储介质
US20210082405A1 (en) Method for Location Reminder and Electronic Device
CN113220590A (zh) 语音交互应用的自动化测试方法、装置、设备及介质
WO2020102943A1 (fr) Procédé et appareil de génération d'un modèle de reconnaissance de gestes, support d'informations et dispositif électronique
JP7400364B2 (ja) 音声認識システム及び情報処理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18940833

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12/01/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 18940833

Country of ref document: EP

Kind code of ref document: A1