CN115393676A

CN115393676A - Gesture control optimization method and device, terminal and storage medium

Info

Publication number: CN115393676A
Application number: CN202110525205.1A
Authority: CN
Inventors: 朱海平; 郭宏伟
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2022-11-25

Abstract

The embodiment of the application provides a gesture control optimization method, a gesture control optimization device, a terminal and a storage medium, wherein the method comprises the following steps: acquiring original gesture data and target scene data of a user; and generating target gesture scene fusion data according to the original gesture data, the target scene data and the target gesture key point data, wherein the target gesture scene fusion data is used for optimizing a gesture control model on a terminal side. The technical scheme provided by the embodiment of the application has the following advantages: 1) The gesture control model is directly trained and optimized on the terminal side, and user data does not need to be uploaded to the cloud, so that the privacy of a user can be better protected; 2) The target gesture scene fusion data are provided with labels, multiple potential gesture use scenes are formed, and a gesture control model can be optimized more accurately; 3) The target gesture scene fusion data is usually from the same user, and the gesture control model can be accurately optimized for the user.

Description

Gesture control optimization method, device, terminal and storage medium

Technical Field

The present application relates to the technical field of Artificial Intelligence (AI), and in particular, to a gesture control optimization method, apparatus, terminal and storage medium.

Background

The gesture control device is one of man-machine interaction technologies, and compared with traditional mouse and keyboard input, the gesture control does not need a user to hold a specific input device, and the device can be controlled or specific information can be input into the device only through specific hand motions. Due to the convenience and interest of non-contact gestures, the gestures are being widely used in the industry to control computer terminals, mobile terminals, television terminals, and the like.

In the process that a user controls equipment through gestures, the gesture control model needs to be optimized so as to improve accuracy of gesture control and further improve user experience. In the prior art, user data are generally collected on the terminal side, then the user data are uploaded to the cloud end, the cloud end optimizes the gesture control model according to the data uploaded by the user, then the optimized gesture control model is redeployed to the terminal, and optimization of the gesture control model on the terminal side is achieved.

However, the user data usually contains user privacy information, and the above method needs to upload the user data to the cloud, so that there is a risk of revealing the user privacy.

Disclosure of Invention

In view of this, the present application provides a gesture control optimization method, device, terminal and storage medium, so as to solve the problem that in the prior art, gesture control optimization requires user data to be uploaded to a cloud, and the risk of user privacy leakage exists.

In a first aspect, an embodiment of the present application provides a gesture control optimization method, which is applied to a terminal, and the method includes: acquiring original gesture data and target scene data of a user, wherein the target scene data is used for representing background information associated with the original gesture data; generating target gesture scene fusion data according to the original gesture data, the target scene data and the target gesture key point data, wherein the target gesture scene fusion data is used for optimizing a gesture control model; the target gesture scene fusion data comprises a gesture category label and a background category label, the gesture category label is matched with the target gesture key point data, and the background category label is matched with the target gesture scene fusion data.

Preferably, the generating target gesture scene fusion data according to the original gesture data, the target scene data and the target gesture key point data includes: and inputting the original gesture data, the target scene data and the target gesture key point data into a first gesture data generation model to generate target gesture scene fusion data.

Preferably, the generating target gesture scene fusion data according to the original gesture data, the target scene data and the target gesture key point data includes: inputting the original gesture data and the target gesture key point data into a second gesture data generation model to generate target gesture data; and inputting the target gesture data and the target scene data into a third gesture data generation model to generate target gesture scene fusion data.

Preferably, before generating target gesture scene fusion data according to the original gesture data, the target scene data and the target gesture key point data, the method further includes: and calling a target gesture key point generation model to generate target gesture key point data.

Preferably, after generating target gesture scene fusion data according to the original gesture data, the target scene data and the target gesture key point data, the method further includes: training a gesture control model through the target gesture scene fusion data, and optimizing the gesture control model, wherein the gesture control model is used for recognizing gesture control operation of a user.

Preferably, the acquiring raw gesture data of the user and target scene data includes: when a user executes gesture control operation, original gesture data of the user and target scene data are collected.

In a second aspect, an embodiment of the present application provides a gesture control optimization apparatus, including: the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring original gesture data and target scene data of a user, and the target scene data is used for representing background information associated with the original gesture data; the gesture data generation module is used for generating target gesture scene fusion data according to the original gesture data, the target scene data and the target gesture key point data, and the target gesture scene fusion data is used for optimizing a gesture control model; the target gesture scene fusion data comprises a gesture category label and a background category label, the gesture category label is matched with the target gesture key point data, and the background category label is matched with the target gesture scene fusion data.

Preferably, the gesture data generation module is specifically configured to: and inputting the original gesture data, the target scene data and the target gesture key point data into a first gesture data generation model to generate target gesture scene fusion data.

Preferably, the gesture data generating module is specifically configured to: inputting the original gesture data and the target gesture key point data into a second gesture data generation model to generate target gesture data; and inputting the target gesture data and the target scene data into a third gesture data generation model to generate target gesture scene fusion data.

Preferably, the method further comprises the following steps: and the target gesture key point data generating module is used for calling a target gesture key point generating model and generating target gesture key point data.

Preferably, the method further comprises the following steps: and the training module is used for training the gesture control model through the target gesture scene fusion data and optimizing the gesture control model, wherein the gesture control model is used for identifying gesture control operation of a user.

Preferably, the acquisition module is specifically configured to: when a user executes gesture control operation, original gesture data of the user and target scene data are collected.

In a third aspect, an embodiment of the present application provides a terminal, including: one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the terminal, cause the terminal to perform the method of any of the first aspects.

In a fourth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium includes a stored program, where the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method in any one of the first aspects.

The gesture control optimization scheme provided by the embodiment of the application has the following advantages:

1) The gesture control model is directly trained and optimized on the terminal side, and user data does not need to be uploaded to the cloud, so that the privacy of a user can be better protected;

2) The size and the shape of the gesture are guided through the target gesture key point data, the background of the gesture is replaced through the target scene data, target gesture scene fusion data with rich background and different categories are generated, the target gesture scene fusion data are provided with labels, various potential gesture using scenes are formed, and the gesture control model can be optimized more accurately;

3) The target gesture scene fusion data is usually from the same user, and the gesture control model can be accurately optimized for the user.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;

fig. 2 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 3 is a schematic view of a gesture control scene according to an embodiment of the present disclosure;

fig. 4 is a schematic view of another gesture control scenario provided in the embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a gesture control optimization scheme according to the related art;

FIG. 6 is a schematic diagram of a gesture control optimization scheme in the related art;

fig. 7 is a schematic diagram of a data fusion scenario provided in an embodiment of the present application;

fig. 8 is a schematic flowchart of a gesture control optimization method according to an embodiment of the present disclosure;

fig. 9 is a schematic view of a feature fusion scene based on an integration model according to an embodiment of the present application;

fig. 10 is a schematic view of a feature fusion scene based on a cascading model according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of a gesture control optimization apparatus according to an embodiment of the present application.

Detailed Description

For better understanding of the technical solutions of the present application, the following detailed descriptions of the embodiments of the present application are provided with reference to the accompanying drawings.

It should be understood that the embodiments described are only a few embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely one type of associative relationship that describes an associated object, meaning that three types of relationships may exist, e.g., A and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Referring to fig. 1, a schematic view of an application scenario provided in the embodiment of the present application is shown. In fig. 1, a mobile phone 100 is taken as an example to illustrate a terminal. It can be understood that the terminal according to the embodiment of the present application may be a tablet computer, a Personal Computer (PC), a Personal Digital Assistant (PDA), a smart watch, a netbook, a wearable electronic device, an Augmented Reality (AR) device, a Virtual Reality (VR) device, an in-vehicle device, a smart car, a smart audio, a robot, smart glasses, a smart television, and the like, in addition to the mobile phone 100.

Referring to fig. 2, a schematic structural diagram of a terminal provided in an embodiment of the present application is shown. The terminal 200 may be the server apparatus 101 in fig. 1 or the terminal apparatus 102 in fig. 1.

The terminal 200 may include a processor 210, an external memory interface 220, an internal memory 221, a Universal Serial Bus (USB) interface 230, a charging management module 240, a power management module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication module 250, a wireless communication module 260, an audio module 270, a speaker 270A, a receiver 270B, a microphone 270C, an earphone interface 270D, a sensor module 280, a key 290, a motor 291, an indicator 292, a camera 293, a display screen 294, and a Subscriber Identity Module (SIM) card interface 295 and the like. The sensor module 280 may include a pressure sensor 280A, a gyroscope sensor 280B, an air pressure sensor 280C, a magnetic sensor 280D, an acceleration sensor 280E, a distance sensor 280F, a proximity light sensor 280G, a fingerprint sensor 280H, a temperature sensor 280J, a touch sensor 280K, an ambient light sensor 280L, a bone conduction sensor 280M, and the like.

It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the terminal 200. In other embodiments of the present application, terminal 200 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 210 may include one or more processing units, such as: the processor 210 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. Wherein, the different processing units may be independent devices or may be integrated in one or more processors.

The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

A memory may also be provided in processor 210 for storing instructions and data. In some embodiments, the memory in processor 210 is a cache memory. The memory may hold instructions or data that have just been used or recycled by processor 210. If the processor 210 needs to use the instruction or data again, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 210, thereby increasing the efficiency of the system.

In some embodiments, processor 210 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The I2C interface is a bidirectional synchronous serial bus comprising a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 210 may include multiple sets of I2C buses. The processor 210 may be coupled to the touch sensor 280K, the charger, the flash, the camera 293, and the like through different I2C bus interfaces. For example: the processor 210 may be coupled to the touch sensor 280K through an I2C interface, so that the processor 210 and the touch sensor 280K communicate through an I2C bus interface to implement the touch function of the terminal 200.

The I2S interface may be used for audio communication. In some embodiments, processor 210 may include multiple sets of I2S buses. Processor 210 may be coupled to audio module 270 via an I2S bus, enabling communication between processor 210 and audio module 270. In some embodiments, the audio module 270 may transmit an audio signal to the wireless communication module 260 through an I2S interface, so as to implement a function of answering a call through a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, audio module 270 and wireless communication module 260 may be coupled by a PCM bus interface. In some embodiments, the audio module 270 may also transmit audio signals to the wireless communication module 260 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 210 and the wireless communication module 260. For example: the processor 210 communicates with the bluetooth module in the wireless communication module 260 through the UART interface to implement the bluetooth function. In some embodiments, the audio module 270 may transmit the audio signal to the wireless communication module 260 through a UART interface, so as to realize the function of playing music through a bluetooth headset.

The MIPI interface may be used to connect the processor 210 with peripheral devices such as the display screen 294, the camera 293, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 210 and camera 293 communicate via a CSI interface to implement the capture functionality of terminal 200. The processor 210 and the display screen 294 communicate through the DSI interface to implement a display function of the terminal 200.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect processor 210 with camera 293, display 294, wireless communication module 260, audio module 270, sensor module 280, and the like. The GPIO interface may also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, and the like.

The USB interface 230 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 230 may be used to connect a charger to charge the terminal 200, and may also be used to transmit data between the terminal 200 and peripheral devices. And the method can also be used for connecting a headset and playing audio through the headset. The interface may also be used to connect other terminals, such as AR devices, etc.

It should be understood that the connection relationship between the modules according to the embodiment of the present invention is only illustrative, and is not limited to the structure of the terminal 200. In other embodiments of the present application, the terminal 200 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The charge management module 240 is configured to receive a charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 240 may receive charging input from a wired charger via the USB interface 230. In some wireless charging embodiments, the charging management module 240 may receive a wireless charging input through a wireless charging coil of the terminal 200. The charging management module 240 may also supply power to the terminal through the power management module 241 while charging the battery 242.

The power management module 241 is used to connect the battery 242, the charging management module 240 and the processor 210. The power management module 241 receives input from the battery 242 and/or the charging management module 240, and provides power to the processor 210, the internal memory 221, the display 294, the camera 293, and the wireless communication module 260. The power management module 241 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some other embodiments, the power management module 241 may also be disposed in the processor 210. In other embodiments, the power management module 241 and the charging management module 240 may be disposed in the same device.

The wireless communication function of the terminal 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in terminal 200 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 250 may provide a solution including 2G/3G/4G/5G wireless communication and the like applied on the terminal 200. The mobile communication module 250 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 250 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 250 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 250 may be disposed in the processor 210. In some embodiments, at least some of the functional modules of the mobile communication module 250 may be disposed in the same device as at least some of the modules of the processor 210.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 270A, the receiver 270B, etc.) or displays an image or video through the display screen 294. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be separate from the processor 210, and may be disposed in the same device as the mobile communication module 250 or other functional modules.

The wireless communication module 260 may provide a solution for wireless communication applied to the terminal 200, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The wireless communication module 260 may be one or more devices integrating at least one communication processing module. The wireless communication module 260 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 210. The wireless communication module 260 may also receive a signal to be transmitted from the processor 210, frequency-modulate and amplify the signal, and convert the signal into electromagnetic waves via the antenna 2 to radiate the electromagnetic waves.

In some embodiments, antenna 1 of terminal 200 is coupled to mobile communication module 250 and antenna 2 is coupled to wireless communication module 260, such that terminal 200 may communicate with networks and other devices via wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), general Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou satellite navigation system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The terminal 200 implements display functions through the GPU, the display screen 294, and the application processor, etc. The GPU is a microprocessor for image processing, coupled to a display screen 294 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 294 is used to display images, video, and the like. The display screen 294 includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the terminal 200 may include 1 or N display screens 294, N being a positive integer greater than 1.

The terminal 200 may implement a photographing function through the ISP, the camera 293, the video codec, the GPU, the display screen 294, and the application processor.

The ISP is used to process the data fed back by the camera 293. For example, when a user takes a picture, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, an optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and converting into an image visible to the naked eye. The ISP can also carry out algorithm optimization on noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 293.

The camera 293 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to be converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV and other formats. In some embodiments, terminal 200 may include 1 or N cameras 293, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the terminal 200 selects a frequency point, the digital signal processor is used to perform fourier transform or the like on the frequency point energy.

Video codecs are used to compress or decompress digital video. The terminal 200 may support one or more video codecs. In this way, the terminal 200 can play or record video in a plurality of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can implement applications such as intelligent recognition of the terminal 200, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 220 may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the terminal 200. The external memory card communicates with the processor 210 through the external memory interface 220 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

Internal memory 221 may be used to store computer-executable program code, including instructions. The internal memory 222 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (e.g., audio data, a phonebook, etc.) created during use of the terminal 200, and the like. In addition, the internal memory 221 may include a high speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a Universal Flash Storage (UFS), and the like. The processor 210 executes various functional applications of the terminal 200 and data processing by executing instructions stored in the internal memory 221 and/or instructions stored in a memory provided in the processor.

The terminal 200 may implement an audio function through the audio module 270, the speaker 270A, the receiver 270B, the microphone 270C, the earphone interface 270D, and the application processor. Such as music playing, recording, etc.

Audio module 270 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. Audio module 270 may also be used to encode and decode audio signals. In some embodiments, the audio module 270 may be disposed in the processor 210, or some functional modules of the audio module 270 may be disposed in the processor 210.

The speaker 270A, also called a "horn", is used to convert an audio electrical signal into an acoustic signal. The terminal 200 can listen to music through the speaker 270A or listen to a handsfree call.

The receiver 270B, also called "earpiece", is used to convert the electrical audio signal into a sound signal. When the terminal 200 receives a call or voice information, it is possible to receive voice by bringing the receiver 270B close to the human ear.

The microphone 270C, also referred to as a "microphone," is used to convert acoustic signals into electrical signals. When making a call or sending voice information, the user can input a voice signal into the microphone 270C by uttering a voice signal near the microphone 270C through the mouth of the user. The terminal 200 may be provided with at least one microphone 270C. In other embodiments, the terminal 200 may be provided with two microphones 270C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the terminal 200 may further include three, four or more microphones 270C to collect sound signals, reduce noise, identify sound sources, implement directional recording functions, and the like.

The headphone interface 270D is used to connect wired headphones. The headset interface 270D may be a USB interface 230, or may be an open mobile platform (OMTP) standard interface of 3.5mm, or a cellular telecommunications industry association (cellular telecommunications industry association) standard interface of the USA.

The pressure sensor 280A is used to sense a pressure signal, which can be converted into an electrical signal. In some embodiments, the pressure sensor 280A may be disposed on the display screen 294. The pressure sensor 280A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 280A, the capacitance between the electrodes changes. The terminal 200 determines the intensity of the pressure according to the change in the capacitance. When a touch operation is applied to the display screen 294, the terminal 200 detects the intensity of the touch operation based on the pressure sensor 280A. The terminal 200 may also calculate the touched position based on the detection signal of the pressure sensor 280A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 280B may be used to determine the motion attitude of the terminal 200. In some embodiments, the angular velocity of terminal 200 about three axes (i.e., x, y, and z axes) may be determined by gyroscope sensor 280B. The gyro sensor 280B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 280B detects the shake angle of the terminal 200, calculates the distance to be compensated for the lens module according to the shake angle, and allows the lens to counteract the shake of the terminal 200 by a reverse movement, thereby achieving anti-shake. The gyro sensor 280B may also be used for navigation, somatosensory gaming scenes.

The air pressure sensor 280C is used to measure air pressure. In some embodiments, the terminal 200 calculates altitude, aiding positioning and navigation, from the barometric pressure value measured by the barometric pressure sensor 280C.

The magnetic sensor 280D includes a hall sensor. The terminal 200 may detect the opening and closing of the flip holster using the magnetic sensor 280D. In some embodiments, when the terminal 200 is a folder, the terminal 200 may detect the opening and closing of the folder according to the magnetic sensor 280D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.

The acceleration sensor 280E may detect the magnitude of acceleration of the terminal 200 in various directions (typically three axes). The magnitude and direction of gravity can be detected when the terminal 200 is stationary. The method can also be used for identifying the terminal posture, and is applied to transverse and vertical screen switching, pedometers and other applications.

A distance sensor 280F for measuring distance. The terminal 200 may measure the distance by infrared or laser. In some embodiments, taking a picture of a scene, terminal 200 may utilize range sensor 280F to range for fast focus.

The proximity light sensor 280G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal 200 emits infrared light to the outside through the light emitting diode. The terminal 200 detects infrared reflected light from a nearby object using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 200. When insufficient reflected light is detected, the terminal 200 may determine that there is no object near the terminal 200. The terminal 200 can use the proximity sensor 280G to detect that the user holds the terminal 200 close to the ear for talking, so as to automatically turn off the screen for power saving. The proximity light sensor 280G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 280L is used to sense the ambient light level. The terminal 200 may adaptively adjust the brightness of the display 294 according to the perceived ambient light level. The ambient light sensor 280L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 280L may also cooperate with the proximity light sensor 280G to detect whether the terminal 200 is in a pocket to prevent inadvertent contact.

The fingerprint sensor 280H is used to collect a fingerprint. The terminal 200 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access to an application lock, fingerprint photographing, fingerprint incoming call answering and the like.

The temperature sensor 280J is used to detect temperature. In some embodiments, terminal 200 implements a temperature processing strategy using the temperature detected by temperature sensor 280J. For example, when the temperature reported by the temperature sensor 280J exceeds the threshold, the terminal 200 performs a reduction in the performance of the processor located near the temperature sensor 280J, so as to reduce power consumption and implement thermal protection. In other embodiments, the terminal 200 heats the battery 242 when the temperature is below another threshold to avoid a low temperature causing the terminal 200 to shut down abnormally. In other embodiments, terminal 200 performs a boost on the output voltage of battery 242 when the temperature is below a further threshold to avoid an abnormal shutdown due to low temperature.

The touch sensor 280K is also referred to as a "touch device". The touch sensor 280K may be disposed on the display screen 294, and the touch sensor 280K and the display screen 294 form a touch screen, which is also called a "touch screen". The touch sensor 280K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display screen 294. In other embodiments, the touch sensor 280K can be disposed on a surface of the terminal 200 at a different location than the display screen 294.

The bone conduction sensor 280M may acquire a vibration signal. In some embodiments, the bone conduction sensor 280M may acquire a vibration signal of the human vocal part vibrating the bone mass. The bone conduction sensor 280M may also be in contact with the pulse of the human body to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 280M may also be disposed in a headset, integrated into a bone conduction headset. The audio module 270 may analyze a voice signal based on the vibration signal of the bone mass vibrated by the sound part acquired by the bone conduction sensor 280M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure pulsation signal acquired by the bone conduction sensor 280M, so as to realize a heart rate detection function.

The keys 290 include a power-on key, a volume key, and the like. The keys 290 may be mechanical keys. Or may be touch keys. The terminal 200 may receive a key input, and generate a key signal input related to user setting and function control of the terminal 200.

The motor 291 may generate a vibration cue. The motor 291 can be used for both incoming call vibration prompting and touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 291 may also respond to different vibration feedback effects for touch operations on different areas of the display 294. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 292 may be an indicator light that may be used to indicate a state of charge, a change in charge, or may be used to indicate a message, missed call, notification, etc.

The SIM card interface 295 is used to connect a SIM card. The SIM card can be connected to and disconnected from the terminal 200 by being inserted into the SIM card interface 295 or being extracted from the SIM card interface 295. The terminal 200 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 295 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. Multiple cards can be inserted into the same SIM card interface 295 at the same time. The types of the plurality of cards can be the same or different. The SIM card interface 295 may also be compatible with different types of SIM cards. The SIM card interface 295 may also be compatible with external memory cards. The terminal 200 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the terminal 200 employs eSIM, namely: an embedded SIM card. The eSIM card may be embedded in the terminal 200 and cannot be separated from the terminal 200.

With the development of computer vision and the increase of end-to-end computing power, gesture control has gradually become a way for users to interact with terminals.

Referring to fig. 3, a schematic view of a gesture control scene provided in the embodiment of the present application is shown. In fig. 3, a television 301 and a user 302 are shown, and the user 302 can input a corresponding control instruction to the television 301 through an operation of "extending both arms" so that the television 301 performs a corresponding action. Such as power-on, zooming in on the display, etc.

Referring to fig. 4, a schematic view of another gesture control scenario provided in the embodiment of the present application is shown. In fig. 4 a handset 401 and a palm 402 of a user are shown. In the current state, the album in the mobile phone 401 is in an open state, a plurality of images are shown in the display interface of the mobile phone 401, and a user can input a corresponding control instruction to the mobile phone 401 through the operation of "swinging down the palm", so that the mobile phone 401 executes a corresponding action. For example, the images in the display interface slide downwards, and the switching of the images in the display interface is realized.

According to the gesture control scene, the gesture control can control the terminal or input specific information to the terminal only through specific hand motions without holding a specific input device by a user. In particular, the controlled terminal generally includes an image acquisition module and a gesture control model. The gesture control module identifies the hand action of the user according to the gesture data, and then generates a corresponding control instruction. The image acquisition module can be a camera, and the gesture control model can be a neural network model, which is not specifically limited in the embodiments of the present application.

In an actual application scenario, in order to improve user appearance, in the process that a user controls a terminal through a gesture, a gesture control model needs to be optimized so as to improve accuracy of gesture control. In other words, the user is made "better the more used" to use the gesture control function.

Referring to fig. 5, a schematic diagram of a gesture control optimization scheme in the related art is shown. In fig. 5, a cloud 501 and a terminal 502 are shown, and it is understood that the cloud 501 and the terminal 502 are connected in communication for information transmission. In some possible embodiments, the cloud 501 may also be referred to as a server.

In the embodiment of the present application, the cloud 501 obtains the gesture control model based on the common data set training in the initial state. When a terminal 502 needs to use a gesture control function, the cloud 501 deploys a gesture control model for the terminal 502. In the using process, the terminal 502 collects gesture data of a user, so that the gesture control model carries out model prediction according to the gesture data of the user, and corresponding gesture control is further realized. Meanwhile, the terminal 502 stores the gesture data (in the user data).

When the gesture control model needs to be optimized, the terminal 502 uploads the stored gesture data to the cloud 501, and the cloud 501 performs gesture control model training, gesture control model evaluation and gesture control model optimization based on the gesture data uploaded by the user. After the cloud 501 completes the optimization of the gesture control model, the optimized gesture control model is redeployed to the terminal 502, so that the gesture control model on the terminal 502 side is optimized. That is to say, the gesture control model is optimized on the cloud 501 side, and then the optimized gesture control model is deployed on the terminal 502 side.

However, the gesture control optimization method mainly has the following problems:

1) The gesture data usually comprises user privacy, and the method needs to upload the user data to a cloud end, so that the risk of user privacy leakage exists;

2) Gesture data acquired by the terminal does not contain a label, and after the gesture data are uploaded to the cloud, the category of the gesture data needs to be manually marked, so that the cost is high;

3) Data of the cloud for optimizing the gesture control model generally come from a plurality of users, and the gesture control model cannot be optimized for specific users.

In order to solve the above problem, an embodiment of the present application provides a gesture control optimization method, where target gesture scene fusion data with a label and a background is generated at a terminal side, and training and upgrading of a gesture control model are completed at the terminal side through the target gesture scene fusion data.

Fig. 6 is a schematic diagram of a gesture control optimization scheme in the related art. In the embodiment of the present application, in order to facilitate distinguishing the gesture data collected by the terminal 602 and the gesture data generated by the gesture data generation model, the gesture data collected by the terminal 602 is changed into "original gesture data"; the gesture data generated by the gesture data generation model is called target gesture data; the target gesture data after fusing the target scene data is referred to as "target gesture scene fusion data". The details will be described below.

When the terminal 602 uses the gesture control function for the first time, the cloud 601 deploys a gesture control model for the terminal 602. During use, the terminal 602 may collect, through the camera, the raw gesture data and the target scene data at an indefinite time, where the target scene data is used to represent background information associated with the raw gesture data. And after the acquisition of the original gesture data and the target scene data is finished, storing the original gesture data and the target scene data in user data for subsequent use. In addition, the terminal 602 invokes the target gesture keypoint generation model to generate a large amount of target gesture keypoint data. Inputting the original gesture data, the target scene data and the target gesture key point data into a gesture data generation model to obtain target gesture scene fusion data.

It can be understood that after the target scene data and the target gesture key point data are fused, a large amount of target gesture scene fusion data are obtained, and the gesture control model is directly trained on the terminal 602 side based on the target gesture scene fusion data, so that the gesture control model is optimized.

Fig. 7 is a schematic diagram of a data fusion scene provided in the embodiment of the present application. Fig. 7 shows the original gesture data, the target scene data, the target gesture key point data, and the target gesture scene fusion data after feature fusion.

The original gesture data are images of 'clenching a fist' collected by a terminal; the target gesture key point data is a key point of an opened palm generated by a target gesture key point generation model; the target scene data is an image of "the face of the user". And performing feature fusion on the original gesture data, the target scene data and the target gesture key point data to obtain target gesture scene fusion data with the background of the face of the user and the gesture of the open palm.

The target gesture key point data is used for guiding the size and the shape of the gesture in the target gesture scene fusion data, so that the target gesture key point data can represent the gesture category of the target gesture scene fusion data; the target scene data is used to guide the background of the gestures in the target gesture scene fusion data, and thus the target scene data may characterize the background category of the target gesture scene fusion data. In other words, the target gesture scene fusion data generated through feature fusion includes a gesture category tag and a background category tag, wherein the target gesture key point data is used for marking the gesture category tag, and the target scene data is used for marking the background category tag. In addition, the raw gesture data is used to provide information to other aspects when the features are fused, such as the user's skin tone, etc.

Referring to fig. 8, a schematic flow chart of a gesture control optimization method provided in the embodiment of the present application is shown. As shown in fig. 8, it mainly includes the following steps.

Step S801: raw gesture data and target scene data of a user are collected, the target scene data being used to characterize context information associated with the raw gesture data.

In the embodiment of the application, in order to facilitate distinguishing the gesture data acquired by the terminal and the gesture data generated by the gesture data generation model, the gesture data acquired by the terminal is changed into 'original gesture data'; the gesture data generated by the gesture data generation model is called target gesture data; the target gesture data after fusing the target scene data is referred to as "target gesture scene fusion data".

It should be noted that the terminal may collect the original gesture data and the target scene data when the user performs the gesture control operation, or may collect the original gesture data and the target scene data at other time periods according to a preset data collection rule. In addition, the original gesture data and the target scene data may be collected separately or simultaneously, which is not limited in this embodiment of the application.

It can be understood that there are usually more similar information in the use scene of the same terminal, and the accuracy of gesture recognition can be improved by using the target scene data as the background of the gesture. For example, the same terminal usually corresponds to one user, and then a facial image of the user can be collected as target scene data; alternatively, the user may use the terminal while sitting on a sofa in the living room, and may collect the wall behind the sofa as the target scene data.

Step S802: and generating target gesture scene fusion data according to the original gesture data, the target scene data and the target gesture key point data.

In an optional embodiment, the terminal may invoke a target gesture key point generation model to generate target gesture key point data. Inputting the original gesture data, the target scene data and the target gesture key point data into a gesture data generation model for feature fusion, and then generating target gesture scene fusion data. That is, the target gesture scene fusion data fuses information in the original gesture data, information in the target scene data, and information of the target gesture key point data at the same time. The target gesture key point data is used for guiding the size and the shape of the gesture in the target gesture scene fusion data; the target scene data is used for guiding the background of the gesture in the target gesture scene fusion data; the raw gesture data is used to provide information to other aspects, such as the user's skin tone, etc., when the features are fused.

In the embodiment of the application, the original gesture data is expanded by combining the target gesture key point data and the target scene data, and a large amount of target gesture scene fusion data can be generated, so that the terminal has sufficient data to perform gesture control model training. In addition, the size and the shape of the gesture are guided through the target gesture key point data, the background of the gesture is replaced through the target scene data, target gesture scene fusion data with rich background and different categories are generated, the target gesture scene fusion data are provided with labels, various potential gesture using scenes are formed, and the gesture control model can be optimized more accurately.

In an alternative embodiment, the gesture data generation model may be a Generative Adaptation Network (GAN) model. In a specific implementation, the gesture data generation model may be further divided into an integration model and a cascade model, which are described below.

Referring to fig. 9, a feature fusion scene diagram based on an integration model is provided in the embodiment of the present application. The integration model includes a gesture data generation model, i.e., a first gesture data generation model. The first gesture data generation model includes a first generator and a first discriminator.

And inputting the original gesture data, the target scene data and the target gesture key point data into a first generator, and generating a new gesture image, namely target gesture scene fusion data, through convolution and deconvolution operations. As can be seen, the gesture in the new gesture image corresponds to the gesture in the target gesture keypoint data; the background in the new gesture image corresponds to the target scene data. That is, through feature fusion, the generator replaces the category and background of the original gesture.

Further, the first discriminator judges the truth of the target gesture scene fusion data, the gesture category and the scene category. And obtaining target gesture scene fusion data with a gesture class label and a scene class label through game play between the first generator and the first discriminator.

In a specific implementation, the first gesture data generation model may be a GAN model. The embodiments of the present application do not specifically limit this.

Referring to fig. 10, a schematic diagram of a feature fusion scene based on a cascade model is provided in the embodiment of the present application. The cascade model comprises two gesture data generation models, namely a second gesture data generation module and a third gesture data generation model. The second gesture data generation model comprises a second generator and a second discriminator; the third gesture data generation model includes a third generator and a third discriminator.

Firstly, inputting original gesture data and target gesture key point data into a second generator, and generating target gesture data through convolution and deconvolution operations, wherein the target gesture data is a new gesture image (at the moment, no background information is included). As can be seen, the gesture in the new gesture image corresponds to the gesture in the target gesture keypoint data. That is, through the feature fusion this time, the generator replaces the category of the original gesture. Further, the second judging device judges the truth of the target gesture data and the gesture category. And obtaining target gesture data with gesture category labels through a game between the second generator and the second discriminator.

And secondly, performing feature fusion on the target gesture data and the target scene data obtained in the step again, inputting the target gesture data and the target scene data into a third generator, and performing convolution and deconvolution operations to generate target gesture scene fusion data. As can be seen from the figure, the target gesture scene fusion data is added with corresponding background information in the target scene data. That is, the third generator replaces the background of the original gesture through the feature fusion. Further, the third discriminator judges the truth and the scene category of the target gesture scene fusion data. And obtaining target gesture scene fusion data with gesture class labels and scene class labels through a game between the third generator and the third discriminator.

It should be noted that the scheme for generating the target gesture scene fusion data through the cascade model is simple to train and easy to implement. However, this scheme employs a two-stage gesture data generation model, which is prone to cause accumulated errors. The person skilled in the art can select an integration model or a cascade model to generate target gesture scene fusion data according to actual needs.

In a specific implementation, the second gesture data generation model and/or the third gesture data generation model may be a GAN model. This is not particularly limited by the examples of the present application.

Step S803: training a gesture control model through the target gesture scene fusion data, and optimizing the gesture control model, wherein the gesture control model is used for recognizing gesture control operation of a user.

Specifically, after the target gesture scene fusion data is obtained, the gesture control model can be trained on the terminal side based on the target gesture scene fusion data, so that the gesture control model is optimized, and the user experiences better and better to the terminal through self-learning of the terminal.

Corresponding to the method embodiment, the embodiment of the application also provides a gesture control optimization device.

Referring to fig. 11, a schematic structural diagram of a gesture control optimization apparatus provided in the embodiment of the present application is shown. As shown in fig. 11, the gesture control optimization apparatus includes an acquisition module 1101 and a gesture data generation module 1102.

Specifically, the collection module 1101 is configured to collect original gesture data of a user and target scene data, where the target scene data is used to represent background information associated with the original gesture data; a gesture data generating module 1102, configured to generate target gesture scene fusion data according to the original gesture data, the target scene data, and target gesture key point data, where the target gesture scene fusion data is used to optimize a gesture control model; the target gesture key point data is used for representing the gesture category of the target gesture scene fusion data, and the target scene data is used for representing the background category of the target gesture scene fusion data.

In a specific implementation, the acquisition module 1101 may be a camera or other type of sensor on the terminal, which is not specifically limited in this embodiment of the application.

In an optional embodiment, the gesture data generating module 1102 is specifically configured to: and inputting the original gesture data, the target scene data and the target gesture key point data into a first gesture data generation model to generate target gesture scene fusion data.

In an optional embodiment, the gesture data generating module 1102 is specifically configured to: inputting the original gesture data and the target gesture key point data into a second gesture data generation model to generate target gesture data; and inputting the target gesture data and the target scene data into a third gesture data generation model to generate target gesture scene fusion data.

In an optional embodiment, the gesture control optimization apparatus further comprises: and the target gesture key point data generating module is used for calling a target gesture key point generating model and generating target gesture key point data.

In an optional embodiment, the gesture control optimization apparatus further comprises: and the training module is used for training the gesture control model through the target gesture scene fusion data and optimizing the gesture control model, wherein the gesture control model is used for identifying gesture control operation of a user.

In an optional embodiment, the acquisition module 1101 is specifically configured to: when a user executes gesture control operation, original gesture data of the user and target scene data are collected.

Specific contents of the implementation of the above apparatus may be described in the method embodiments, and for brevity, are not described herein again.

In a specific implementation, an embodiment of the present application further provides a terminal, where the terminal includes one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the terminal, cause the terminal to perform some or all of the steps of the embodiments described above.

In a specific implementation manner, the present application further provides a computer storage medium, where the computer storage medium may store a program, and when the program runs, the computer storage medium controls a device in which the computer readable storage medium is located to perform some or all of the steps in the foregoing embodiments. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

In a specific implementation, an embodiment of the present application further provides a computer program product, where the computer program product includes executable instructions, and when the executable instructions are executed on a computer, the computer is caused to perform some or all of the steps in the foregoing method embodiments.

In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, and means that there may be three relationships, for example, a and/or B, and may mean that a exists alone, a and B exist simultaneously, and B exists alone. Wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" and similar expressions refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

Those of ordinary skill in the art will appreciate that the various elements and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or combinations of electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In several embodiments provided by the present invention, any function, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only an embodiment of the present invention, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A gesture control optimization method is applied to a terminal and comprises the following steps:

acquiring original gesture data and target scene data of a user, wherein the target scene data is used for representing background information associated with the original gesture data;

generating target gesture scene fusion data according to the original gesture data, the target scene data and the target gesture key point data, wherein the target gesture scene fusion data is used for optimizing a gesture control model;

the target gesture scene fusion data comprises a gesture category label and a background category label, the gesture category label is matched with the target gesture key point data, and the background category label is matched with the target gesture scene fusion data.

2. The method of claim 1, wherein generating target gesture scene fusion data from the original gesture data, the target scene data, and target gesture keypoint data comprises:

and inputting the original gesture data, the target scene data and the target gesture key point data into a first gesture data generation model to generate target gesture scene fusion data.

3. The method of claim 1, wherein generating target gesture scene fusion data from the original gesture data, the target scene data, and target gesture keypoint data comprises:

inputting the original gesture data and the target gesture key point data into a second gesture data generation model to generate target gesture data;

and inputting the target gesture data and the target scene data into a third gesture data generation model to generate target gesture scene fusion data.

4. The method of claim 1, further comprising, prior to said generating target gesture scene fusion data from said original gesture data, said target scene data, and target gesture keypoint data:

and calling a target gesture key point generation model to generate target gesture key point data.

5. The method of claim 1, further comprising, after said generating target gesture scene fusion data from said original gesture data, said target scene data, and target gesture keypoint data:

training a gesture control model through the target gesture scene fusion data, and optimizing the gesture control model, wherein the gesture control model is used for recognizing gesture control operation of a user.

6. The method of claim 1, wherein the collecting raw gesture data of a user and target scene data comprises:

when a user executes gesture control operation, original gesture data of the user and target scene data are collected.

7. A gesture control optimization apparatus, comprising:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring original gesture data and target scene data of a user, and the target scene data is used for representing background information associated with the original gesture data;

the gesture data generation module is used for generating target gesture scene fusion data according to the original gesture data, the target scene data and the target gesture key point data, and the target gesture scene fusion data is used for optimizing a gesture control model;

8. The apparatus of claim 7, wherein the gesture data generation module is specifically configured to:

9. The apparatus of claim 7, wherein the gesture data generation module is specifically configured to:

10. The apparatus of claim 7, further comprising:

and the target gesture key point data generating module is used for calling a target gesture key point generating model and generating target gesture key point data.

11. The apparatus of claim 7, further comprising:

and the training module is used for training the gesture control model through the target gesture scene fusion data and optimizing the gesture control model, wherein the gesture control model is used for identifying gesture control operation of a user.

12. The apparatus according to claim 7, wherein the acquisition module is specifically configured to:

13. A terminal, comprising:

one or more processors;

a memory;

and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions which, when executed by the terminal, cause the terminal to perform the method of any of claims 1-6.

14. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium resides to perform the method of any one of claims 1-6.