WO2023077886A1 - 一种显示设备及其控制方法 - Google Patents

一种显示设备及其控制方法 Download PDF

Info

Publication number
WO2023077886A1
WO2023077886A1 PCT/CN2022/109185 CN2022109185W WO2023077886A1 WO 2023077886 A1 WO2023077886 A1 WO 2023077886A1 CN 2022109185 W CN2022109185 W CN 2022109185W WO 2023077886 A1 WO2023077886 A1 WO 2023077886A1
Authority
WO
WIPO (PCT)
Prior art keywords
gesture
target
user behavior
information
display
Prior art date
Application number
PCT/CN2022/109185
Other languages
English (en)
French (fr)
Inventor
高伟
姜俊厚
贾亚洲
岳国华
祝欣培
李佳琳
修建竹
周晓磊
李保成
付廷杰
刘胤伯
Original Assignee
海信视像科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202111302345.9A external-priority patent/CN116069280A/zh
Priority claimed from CN202111302336.XA external-priority patent/CN116069229A/zh
Priority claimed from CN202210266245.3A external-priority patent/CN114610153A/zh
Priority claimed from CN202210303452.1A external-priority patent/CN114637439A/zh
Application filed by 海信视像科技股份有限公司 filed Critical 海信视像科技股份有限公司
Publication of WO2023077886A1 publication Critical patent/WO2023077886A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04815Interaction with a metaphor-based environment or interaction object displayed as three-dimensional, e.g. changing the user viewpoint with respect to the environment or object

Definitions

  • the present application relates to the field of gesture control, in particular to a display device and a control method thereof.
  • a display device can capture images of users through its video capture device, and the processor can analyze the gesture information of users in the images. After the recognition is performed, the command corresponding to the gesture information is executed.
  • the control command determined by the display device through the gesture information is generally determined by identifying a user behavior image collected to determine the target gesture information, and then determine the corresponding control command, resulting in a low degree of intelligence of the display device.
  • the experience is poor.
  • the present application provides a display device, including: a display configured to display images; an image input interface configured to acquire user behavior images; a controller configured to: acquire several frames of user behavior images; Perform gesture recognition processing on the user behavior image to obtain target gesture information; based on the target gesture information, control the display to display corresponding content.
  • the present application provides a method for controlling a display device, the method comprising: acquiring several frames of user behavior images; performing gesture recognition processing on each frame of the user behavior images to obtain target gesture information; based on the target gesture information, controlling The display displays corresponding content.
  • FIG. 1 is a usage scenario of a display device provided by an embodiment of the present application
  • Fig. 2 is a hardware configuration block diagram of the control device 100 provided by the embodiment of the present application.
  • FIG. 3 is a block diagram of a hardware configuration of a display device 200 provided in an embodiment of the present application.
  • FIG. 4 is a software configuration diagram in the display device 200 provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of a display device provided by an embodiment of the present application.
  • FIG. 6a is a schematic diagram of a built-in camera of a display device provided by an embodiment of the present application.
  • FIG. 6b is a schematic diagram of an external camera provided by a display device provided in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a user interface provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a cursor displayed on a display provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of cursor control mode confirmation information displayed on a display provided in an embodiment of the present application.
  • FIG. 10 is an interaction flowchart of components of the display device provided by the embodiment of the present application.
  • FIG. 11 is a schematic diagram of user gestures provided by the embodiment of the present application.
  • Fig. 12 is a schematic flowchart of determining the cursor position according to the target gesture information provided by the embodiment of the present application.
  • FIG. 13 is a schematic diagram of a camera area displayed on a monitor provided in an embodiment of the present application.
  • FIG. 14 is a schematic diagram of the cursor moving along a straight line provided by the embodiment of the present application.
  • FIG. 15 is a schematic diagram of the cursor moving along a curve provided by the embodiment of the present application.
  • FIG. 16 is a schematic diagram of the distance relationship between the cursor and the control provided by the embodiment of the present application.
  • Figure 17 shows the positional relationship between the cursor and controls provided by the embodiment of the present application.
  • FIG. 18 is a schematic diagram of a dynamic gesture interaction process provided by an embodiment of the present application.
  • Figure 19 is a schematic diagram of hand orientation provided by the embodiment of the present application.
  • FIG. 20 is a schematic diagram of a tree structure of a detection model provided by an embodiment of the present application.
  • FIG. 21 is an action path diagram when the pseudo-jump is successful provided by the embodiment of the present application.
  • FIG. 22 is an action path diagram when the pseudo-jump fails provided by the embodiment of the present application.
  • Fig. 23 is a schematic diagram of the data flow relationship of the dynamic gesture interaction provided by the embodiment of the present application.
  • FIG. 24 is a timing diagram of dynamic gesture interaction provided by the embodiment of the present application.
  • FIG. 25 is a schematic diagram of another usage scenario of the display device provided by the embodiment of the present application.
  • FIG. 26 is a schematic diagram of the hardware structure of another hardware system in the display device provided by the embodiment of the present application.
  • FIG. 27 is a schematic diagram of a method for controlling a display device provided by an embodiment of the present application.
  • FIG. 28 is a schematic diagram of another embodiment of a method for controlling a display device provided by an embodiment of the present application.
  • Fig. 29 is a schematic diagram of the coordinates of the key points of the hand provided by the embodiment of the present application.
  • Fig. 30 is a schematic diagram of different telescopic states of the key points of the hand provided by the embodiment of the present application.
  • FIG. 31 is a schematic diagram of an application scenario of a method for controlling a display device provided by an embodiment of the present application.
  • Fig. 32 is a schematic diagram of using gesture information and body information to jointly determine a control command provided by the embodiment of the present application;
  • FIG. 33 is a schematic flowchart of a method for controlling a display device provided by an embodiment of the present application.
  • FIG. 34 is a schematic diagram of the mapping relationship provided by the embodiment of the present application.
  • FIG. 35 is another schematic diagram of the mapping relationship provided by the embodiment of the present application.
  • Fig. 36 is a schematic diagram of target gesture information and body information in an image provided by an embodiment of the present application.
  • FIG. 37 is a schematic diagram of the moving position of the target control provided by the embodiment of the present application.
  • FIG. 38 is another schematic diagram of the moving position of the target control provided by the embodiment of the present application.
  • FIG. 39 is a schematic flowchart of a method for controlling a display device provided by an embodiment of the present application.
  • FIG. 40 is a schematic flowchart of another method for controlling a display device provided by an embodiment of the present application.
  • FIG. 41 is a schematic diagram of a virtual frame provided by the embodiment of the present application.
  • FIG. 42 is a schematic diagram of the corresponding relationship between the virtual frame and the display provided by the embodiment of the present application.
  • Fig. 43 is a schematic diagram of the movement of the target control provided by the embodiment of the present application.
  • Figure 44 is a schematic diagram of the area of the virtual frame provided by the embodiment of the present application.
  • FIG. 45 is a schematic diagram of an edge area provided by an embodiment of the present application.
  • Fig. 46 is a schematic diagram of the state of gesture information provided by the embodiment of the present application.
  • FIG. 47 is a schematic diagram of the re-established virtual frame provided by the embodiment of the present application.
  • Fig. 48 is another schematic diagram of the re-established virtual frame provided by the embodiment of the present application.
  • FIG. 49 is a schematic diagram of the movement of the target control provided by the embodiment of the present application.
  • FIG. 50 is another schematic diagram of the movement of the target control provided by the embodiment of the present application.
  • FIG. 51 is a schematic diagram of a display device control process provided by an embodiment of the present application.
  • FIG. 52 is a schematic flowchart of another display device control method provided by the embodiment of the present application.
  • FIG. 53 is a schematic flowchart of an embodiment of a method for controlling a display device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an operation scene between a display device and a control device according to an embodiment of the present application.
  • a user can operate a display device 200 through a mobile terminal 300 and a control device 100 .
  • the control device 100 may be a remote controller, and the communication between the remote controller and the display device includes infrared protocol communication, bluetooth protocol communication, wireless or other wired methods to control the display device 200 .
  • the user can control the display device 200 by inputting user commands through buttons on the remote controller, voice input, control panel input, and the like.
  • mobile terminals, tablet computers, computers, laptops, and other smart devices can also be used to control the display device 200.
  • the mobile terminal 300 can install software applications with the display device 200, realize connection and communication through a network communication protocol, and realize the purpose of one-to-one control operation and data communication.
  • the audio and video content displayed on the mobile terminal 300 can also be transmitted to the display device 200 to realize the synchronous display function.
  • the display device 200 can also perform data communication with the server 400 through various communication methods.
  • the display device 200 may be allowed to communicate via a local area network (LAN), a wireless local area network (WLAN), and other networks.
  • the server 400 may provide various contents and interactions to the display device 200 .
  • the display device 200 may be a liquid crystal display, an OLED display, or a projection display device; on the other hand, the display device may be a smart TV or a display system composed of a display and a set-top box. In addition to providing the TV function of broadcasting reception, the display device 200 may also provide an intelligent network TV function providing computer support functions. Examples include, IPTV, SmartTV, Internet Protocol Television (IPTV), and the like. In some embodiments, the display device may not have broadcast receiving television functionality.
  • Fig. 2 is a configuration block diagram of the control device 100 provided by the embodiment of the present application.
  • the control device 100 includes a controller 110 , a communication interface 130 , a user input/output interface 140 , a memory, and a power supply.
  • the control device 100 can receive the user's input operation instructions, and convert the operation instructions into instructions that the display device 200 can recognize and respond to, and play an intermediary role between the user and the display device 200 .
  • the communication interface 130 is used for communicating with the outside, and includes at least one of a WIFI chip, a Bluetooth module, NFC or an alternative module.
  • the user input/output interface 140 includes at least one of a microphone, a touch pad, a sensor, a button or an alternative module.
  • FIG. 3 is a block diagram of a hardware configuration of a display device 200 provided in an embodiment of the present application.
  • the display device 200 includes a tuner and demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, and a user interface 280. at least one.
  • the controller includes a central processing unit, a video processor, an audio processor, a graphics processor, RAM, ROM, the first interface to the nth interface for input/output.
  • the display 260 may be at least one of a liquid crystal display, an OLED display, a touch display, and a projection display, and may also be a projection device and a projection screen.
  • the communicator 220 is a component for communicating with external devices or servers according to various communication protocol types.
  • the communicator may include at least one of a Wifi module, a Bluetooth module, a wired Ethernet module and other network communication protocol chips or near field communication protocol chips, and an infrared receiver.
  • the display device 200 may establish transmission and reception of control signals and data signals with the external control device 100 or the server 400 through the communicator 220 .
  • the user input interface can be used to receive a control signal from the control device 100 (such as an infrared remote controller, etc.).
  • the tuner demodulator 210 receives broadcast TV signals through wired or wireless reception, and demodulates audio and video signals, such as EPG data signals, from multiple wireless or cable broadcast TV signals.
  • the detector 230 is used to collect signals of the external environment or interaction with the outside.
  • the detector 230 includes a light receiver, which is a sensor for collecting ambient light intensity; or, the detector 230 includes an image collector 231, such as a camera, which can be used to collect external environmental scenes, user attributes or user interaction gestures.
  • the external device interface 240 may include but not limited to the following: high-definition multimedia interface (HDMI), analog or data high-definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, etc. multiple interfaces. It may also be a composite input/output interface formed by the above-mentioned multiple interfaces.
  • HDMI high-definition multimedia interface
  • component analog or data high-definition component input interface
  • CVBS composite video input interface
  • USB USB input interface
  • RGB port etc. multiple interfaces.
  • It may also be a composite input/output interface formed by the above-mentioned multiple interfaces.
  • the controller 250 and the tuner-demodulator 210 may be located in different split devices, that is, the tuner-demodulator 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.
  • the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in the memory.
  • the controller 250 controls the overall operations of the display device 200 .
  • the user can input a user command through a graphical user interface (GUI) displayed on the display 260, and the user input interface receives the user input command through the graphical user interface (GUI).
  • GUI graphical user interface
  • the user may input a user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through a sensor to receive the user input command.
  • FIG. 4 is a schematic diagram of the software configuration in the display device 200 provided by the embodiment of the present application.
  • the system is divided into four layers, from top to bottom are respectively the application program (Applications) layer (abbreviated as "application layer”), Application Framework (Application Framework) layer (referred to as “framework layer”), Android runtime (Android runtime) and system library layer (referred to as “system runtime layer”), and the kernel layer.
  • the kernel layer contains at least one of the following drivers: audio driver, display driver, bluetooth driver, camera driver, WIFI driver, USB driver, HDMI driver, sensor driver (such as fingerprint sensor, temperature sensor, pressure sensor, etc.), and power supply drive etc.
  • the user can control the display device through gesture interaction.
  • the gesture interaction manners that can be adopted by the display device may include static gestures and dynamic gestures.
  • static gestures for interaction the display device can detect the gesture type according to the gesture type recognition algorithm, and execute corresponding control actions according to the gesture type.
  • FIG. 5 is a schematic diagram of a display device provided in the embodiment of the present application.
  • the The display device includes a display 260, an image input interface 501 and a controller 110,
  • the display 260 is configured to display images
  • Image input interface 501 configured to acquire user behavior images
  • the controller 110 is configured to:
  • the controller 110 may obtain several frames of user behavior images through the image input interface 501, and the user behavior images may only include user partial images,
  • the gesture image of the gesture made by the user may also include the collected global image of the user, for example, the collected whole-body image of the user.
  • the acquired several frames of user behavior images may be a video including several frames of user behavior images, or an image set including several frames of user behavior images.
  • the controller 110 may perform gesture recognition processing on each frame of user behavior images to obtain target gesture information.
  • the gestures contained in the user behavior images can be recognized based on image recognition technology, and the gestures in each frame of user behavior images recognized can be combined to obtain target gesture information, that is, Each gesture recognized is included in the target gesture information.
  • the recognized gestures may also be classified according to the gesture types set in advance by the device, and the gesture type that occurs most frequently is determined as the target gesture information.
  • the controller 110 may control the display 260 to display corresponding content.
  • the controller 110 acquires several frames of user behavior images, and determines the target gesture information based on the acquired frames of user behavior images, and performs corresponding control based on the target gesture information instead of acquiring A user behavior image is obtained, and the target gesture information is determined for control, which improves the accuracy of display control based on gesture recognition on the display device, thereby improving the intelligence of the display device and improving the user experience.
  • a display device refers to a terminal device capable of outputting a specific display screen.
  • the functions of display devices will become more and more abundant, and the performance will become more and more powerful. It can realize two-way human-computer interaction function, integrate audio-visual, entertainment, data and other functions into one, to meet the needs of users. Diverse and individual needs.
  • Gesture interaction is a new type of human-computer interaction mode.
  • the purpose of gesture interaction is to control the display device to execute corresponding control instructions by detecting specific gesture actions made by the user.
  • the gesture interaction manners that can be adopted by the display device may include static gestures and dynamic gestures.
  • static gestures the display device can detect the gesture type according to the gesture type recognition algorithm, and execute corresponding control actions according to the gesture type.
  • dynamic gestures the user can manipulate the cursor on the display to move.
  • the display device can establish the mapping relationship between the user's gesture and the cursor in the display, and at the same time, by continuously detecting the user image, the user's dynamic gesture can be determined, and then the trajectory of the gesture mapped to the display can be determined, so as to control the cursor to move along the gesture track to move.
  • the display device needs to continuously detect user images.
  • the user's gesture may not be detected in some images, resulting in the inability to accurately obtain the gesture movement track corresponding to the user's image, so that the cursor cannot be controlled to move, and the cursor may freeze or be interrupted. Difference.
  • the display device can detect the user's dynamic gesture, and then determine a gesture movement track mapped to the display, so as to control the cursor to move along the gesture movement track.
  • the display device When the user uses dynamic gestures to control the movement of the cursor, the display device needs to continuously detect the user image. By recognizing each frame of user image, the user gesture in the image is obtained, and then the coordinates of each frame of user gesture mapped to the display are determined, so as to control the cursor to move along these coordinates.
  • the display device may not be able to recognize the gestures of some user images and thus cannot determine the corresponding coordinates, resulting in the inability to accurately obtain to the corresponding gesture movement track. Under normal circumstances, the cursor needs to move according to the corresponding position of each frame of image to form a continuous motion track.
  • the cursor will not move, and the movement will freeze until the position corresponding to the next frame image is recognized, and the cursor will continue to move, but if the position is too far apart, the cursor will appear Sudden jumps and other situations seriously affect the user's viewing experience.
  • the display device in order to enable the display device to realize the function of gesture interaction with the user, the display device further includes an image input interface for connecting to the image collector 231 .
  • the image collector 231 may be a camera for collecting some image data. It should be noted that the camera can be used as an external device connected to the display device through an image input interface, or can be built in the display device as a detector. For a camera externally connected to a display device, the camera may be connected to an external device interface of the display device to be connected to the display device. The user can use the camera to complete the function of photographing or shooting on the display device, thereby collecting image data.
  • the camera can further include a lens assembly, in which a photosensitive element and a lens are arranged.
  • the lens can refract the light through multiple mirrors, so that the light of the image of the scene can be irradiated on the photosensitive element.
  • the photosensitive element can be selected based on the detection principle of CCD (Charge-coupled Device, charge-coupled device) or CMOS (Complementary Metal Oxide Semiconductor, Complementary Metal Oxide Semiconductor) according to the specifications of the camera, and the optical signal is converted into an electrical signal through the photosensitive material. And output the converted electrical signal into image data.
  • the camera can also acquire image data frame by frame according to the set sampling frequency, so as to form video stream data according to the image data.
  • the built-in camera of the display device may also support lifting. That is, the camera can be set on the lifting mechanism.
  • the lifting mechanism is controlled to move through specific lifting instructions, thereby driving the camera to rise for image acquisition.
  • the lifting mechanism can also be controlled to move through a specific lifting command, thereby driving the camera to lower to hide the camera.
  • Fig. 6a is a schematic diagram of a built-in camera of a display device provided by an embodiment of the present application.
  • the image acquisition device 231 externally connected to the display device can be an independent peripheral device, and is connected to the display device through a specific data interface.
  • the image collector 231 can be an independent camera device, and the display device can be provided with a Universal Serial Bus interface (Universal Serial Bus, USB) or a High Definition Multimedia Interface (High Definition Multimedia Interface, HDMI ), the image collector 231 is connected to a display device through a USB interface or an HDMI interface.
  • the image collector 231 externally connected to the display device can be set at a position close to the display device, such as the image collector 231 is clamped on the display device by a clamping device. The top, or image grabber 231 is placed on the table near the display device.
  • the image collector 231 may also support connection in other ways according to the specific hardware configuration of the display device.
  • the image collector 231 can also establish a connection relationship with the display device through a communicator of the display device, and send the collected image data to the display device according to the data transmission protocol corresponding to the communicator.
  • the display device can be connected to the image collector 231 through a local area network or the Internet, and after the network connection is established, the image collector 231 can send the collected data to the display device through a network transmission protocol.
  • the image collector 231 can also be connected to an external display device through a wireless network connection.
  • a wireless network connection For example, for a display device supporting a WiFi wireless network, its communicator is provided with a WiFi module, therefore, the display device and the image collector 231 can establish a wireless connection by connecting the image collector 231 and the display device to the same wireless network. After the image data is collected by the image collector 231, the image data can be sent to the router device of the wireless network first, and then forwarded to the display device by the router device.
  • the image collector 231 can also access the display device through other wireless connection methods.
  • wireless connection methods include but are not limited to WiFi direct connection, cellular network, analog microwave, bluetooth, infrared, etc.
  • FIG. 7 is a schematic diagram of a user interface provided by an embodiment of the present application.
  • the user interface includes a first navigation bar 700, a second navigation bar 710, a function bar 720 and a content display area 730, and the function bar 720 includes a plurality of functional controls such as "watch history", "my favorites” and "my applications", etc. .
  • the content displayed in the content display area 730 will change with the selected controls in the first navigation bar 700 and the second navigation bar 710 .
  • the user can control the display device to display the display panel corresponding to the control by touching a certain control. It should be noted that the user may also input the operation of selecting a control in other ways, for example, using a voice control function or a search function to select a certain control.
  • the user can start the image collector 231 to collect image data through specific interactive instructions or application control during the process of using the display device , and process the collected image data according to different needs.
  • camera applications may be installed in the display device, and these camera applications may call the camera to implement their respective related functions.
  • a camera application refers to a camera application that needs to access a camera, and may process image data collected by the camera to implement related functions, such as video chat. Users can view all the applications installed in the display device by touching the "My Applications" control. A list of applications may be displayed on the display.
  • the display device can run the corresponding camera application, and the camera application can wake up the image collector 231, and the image collector 231 can further detect image data in real time and send it to the display device.
  • the display device can further process the image data, such as controlling the display to display images and so on.
  • the display device can perform gesture interaction with the user, so as to recognize the user's control instruction.
  • the user can use static gestures to interact with the display device to input control commands.
  • the user can pose a specific gesture within the shooting range of the image collector 231, and the image collector 231 can collect the user's gesture image and send the collected gesture image to the display device.
  • the display device can further recognize the gesture image, and detect the type of the gesture in the image.
  • Gesture interaction strategies can be pre-stored in the display device, and each type of gesture is defined to correspond to the control command.
  • a gesture type can correspond to a control command.
  • the display device can be set for different purposes according to different purposes. Gestures for specific control commands. By successively comparing the type of the gesture in the image with the corresponding relationship in the interaction strategy, the control instruction corresponding to the gesture can be determined, and the control instruction can be implemented.
  • the display device can recognize the gesture in the gesture image collected by the image collector 231, and respond to the gesture Make sure the control command is "Pause/Start Play”. Finally, by running the control command, the current playback interface is controlled to pause or start playback. It should be noted,
  • the gesture recognition adopts a static gesture recognition method, and the static gesture recognition can recognize a gesture type and then determine a corresponding control instruction. Every time the user presents a static gesture, it means that the user has input an independent control command, such as controlling the volume to increase by one. It should be noted that when the user keeps a static gesture for a long time, the display device may still determine that the user has input a control command. Therefore, for some control commands that require coherent operations, it is too cumbersome to use static gesture interaction.
  • the focus may be moved down, right, and down in sequence.
  • the user needs to constantly change the static gesture to control the focus to move, resulting in poor user experience.
  • the focus needs to be continuously moved in one direction multiple times, the user needs to continuously make static gestures. Even if the user maintains a static gesture for a long time, it will be judged as inputting a control command, so the user needs to put down his hand after making a static gesture, and then make a static gesture again, thereby affecting the user experience.
  • the display device can also support dynamic gesture interaction.
  • the dynamic gesture means that during an interaction process, the user can input control instructions to the display device in a dynamic gesture input manner.
  • a control command can be input to the display device through a series of dynamic gestures, multiple types of different control commands can be sequentially input to the display device through different types of gestures, or continuous gestures of the same type can be used Multiple identical control commands of one type are input to the display device, thereby expanding the gesture interaction types of the display device and increasing the richness of gesture interaction forms.
  • the display device can continuously acquire gesture images within the detection period of 2s, and recognize gesture images frame by frame. Gesture type, so as to recognize the grasping action according to the gesture changes in multiple frames of images. Finally, determine the control instruction corresponding to the grabbing action, that is, "play in full screen/window", and execute the control instruction to adjust the size of the playback window.
  • the user when a user interface is displayed on the display device, the user can control the focus on the display to select a certain control and trigger it. As shown in Figure 7, the current focus has selected the "My Application" control. Considering that it may be cumbersome for the user to use the control device to control the movement of the focus, in order to increase the experience of the user, the user can also use dynamic gestures to select the control.
  • the display device may be provided with a cursor control mode.
  • the original focus on the display can be changed to the cursor, as shown in FIG. 8 , the cursor selects the "My Application" control.
  • the user can use gestures to control the movement of the cursor to select a control instead of the original focus movement.
  • the user may send a cursor control mode command to the display device by operating a designated key on the remote controller.
  • a cursor control mode button is set on the remote controller.
  • the remote controller sends a cursor control mode command to the controller.
  • the controller controls the display device to enter the cursor control mode.
  • the controller can control the display device to exit the cursor control mode.
  • the correspondence between the cursor control mode command and multiple remote control keys can be bound in advance, and when the user touches multiple keys bound to the cursor control mode command, the remote control sends the cursor control mode command .
  • the user may use a sound collector of the display device, such as a microphone, to send a cursor control mode command to the display device through voice input, so that the display device enters the cursor control mode.
  • a sound collector of the display device such as a microphone
  • the user may also send a cursor control mode instruction to the display device through a preset gesture or action.
  • the display device can detect the behavior of the user in real time through the image collector 231 .
  • the user makes a preset gesture or action, it can be considered that the user sends a cursor control mode instruction to the display device.
  • a cursor control mode instruction may also be sent to the display device.
  • a control can be set in the mobile phone, and whether to enter the cursor control mode can be selected through the control, so as to send the cursor control mode command to the display device.
  • a cursor control mode option can be set in the UI interface of the display device, and when the user clicks on this option, the display device can be controlled to enter or exit the cursor control mode.
  • FIG. 9 is a schematic diagram of displaying cursor control mode confirmation information on a display provided by an embodiment of the present application.
  • the user can control the movement of the cursor by gestures, so as to select the control to be triggered.
  • Fig. 10 is an interaction flowchart of various components of the display device provided by the embodiment of the present application, including the following steps:
  • the controller when it is detected that the display device enters the cursor control mode, the controller can wake up the image collector 231 and send a start instruction to the image collector 231, thereby starting the image collector 231 to take images.
  • the user can make dynamic gestures within the shooting range of the image collector 231, and the image collector 231 can continuously capture multiple frames of user images following the user's dynamic gesture actions.
  • the user behavior image is used to refer to The image of the user collected by the image collector 231.
  • the image collector 231 may capture user behavior images at a preset frame rate, for example, 30 frames per second (30FPS) of user behavior images. At the same time, the image collector 231 can also send each captured frame of the user behavior image to the display device in real time. It should be noted that since the image collector 231 sends the captured user behavior images to the display device in real time, the rate at which the display device acquires the user behavior images may be the same as the frame rate of the image collector 231 .
  • a preset frame rate for example, 30 frames per second (30FPS) of user behavior images.
  • the image collector 231 can also send each captured frame of the user behavior image to the display device in real time. It should be noted that since the image collector 231 sends the captured user behavior images to the display device in real time, the rate at which the display device acquires the user behavior images may be the same as the frame rate of the image collector 231 .
  • the controller may also acquire user behavior images at a frame rate of 30 frames per second.
  • the image collector 231 collects several frames of user behavior images, which may be sent to the display device in sequence.
  • the display device can recognize each frame of the user behavior image one by one, so as to recognize the user gesture contained in the user behavior image, so as to determine the control instruction input by the user.
  • the controller For the collected user behavior images, the controller performs gesture recognition processing on the user behavior images.
  • gesture recognition processing For example, a preset dynamic gesture recognition model may be used to sequentially process each frame of the user behavior images.
  • the controller can input user behavior images into the dynamic gesture recognition model, and the dynamic gesture recognition model can further recognize the user gestures contained in the images, for example, it can recognize key points such as fingers, joints, and wrists contained in the user behavior images
  • the position information of the key point refers to the position coordinates of the key point in the user behavior image.
  • the target gesture information of each frame of the user behavior image can be sequentially output.
  • S1004 Determine the gesture movement track according to the cursor position.
  • S1005 the controller controls the movement of the cursor, so that the display shows that the cursor moves along the movement track of the gesture.
  • FIG. 11 is a schematic diagram of user gestures provided by the embodiment of the present application. It may be set as follows: the key points used to characterize the gesture of the user include 21 finger key points.
  • the dynamic gesture recognition model can confirm the user gestures in the user behavior image, and recognize the position information of the 21 finger key points of the user's hand, that is, the position coordinates in the user behavior image.
  • the position information of each key point is It can be represented by the coordinates of the corresponding points.
  • the dynamic gesture recognition model when it recognizes the user behavior image, it may recognize the user's gesture and obtain the position information of the key points of each finger.
  • the output target gesture information may include position information of all finger key points. However, affected by different gestures of the user, some finger key points may be covered by the user, resulting in the absence of these finger key points in the user behavior image. At this time, the dynamic gesture recognition model cannot obtain the location information of these finger key points. , the position information of these finger key points can only be null. That is, the target gesture information includes the position information of the key points of the finger recognized by the dynamic gesture recognition model, and the position information of the key points of the finger not recognized is a null value.
  • the dynamic gesture recognition model after the dynamic gesture recognition model obtains the target gesture information of each frame, it can output it to the controller.
  • the controller can further determine the control instruction indicated by the user according to the target gesture information of each frame. Since the user wants to control the cursor to move, the control instruction indicated by the user can be regarded as a position instruction indicating that the cursor needs to be moved by the user. At this time, the controller can acquire the cursor position of each frame according to the target gesture information of each frame.
  • the display device considering that the computing capability of the display device may be weak, if the display device is currently implementing some other functions, such as far-field voice, 4K video playback, etc., the display device will be in a state of relatively high load. At this time, if the frame rate of the user behavior image input into the dynamic gesture recognition model is high, the real-time data processing capacity is too large, and the speed of the model processing the user behavior image may be slow, so that the speed of obtaining the cursor position is relatively slow. Slow, causing the cursor on the monitor to move more stuttering.
  • some other functions such as far-field voice, 4K video playback, etc.
  • the controller can first detect and display the current load rate of the device.
  • the load rate is higher than a preset threshold, for example, higher than 60%
  • the controller can make the dynamic gesture recognition model process each frame of user behavior images at regular intervals. For example, a fixed cycle can be set to process 15 frames of images per second. This enables the dynamic gesture recognition model to process images stably.
  • the dynamic gesture recognition model can be made to process each frame of user behavior images in real time.
  • the controller may input the user behavior image sent by the image collector 231 into the dynamic gesture recognition model in real time, and control the model to perform recognition. It is also possible to make the dynamic gesture recognition model process at regular intervals.
  • the rate at which the dynamic gesture recognition model outputs target gesture information and the rate at which user behavior images are processed may be the same.
  • the dynamic gesture recognition model processes images at regular intervals, it will output target gesture information at regular intervals.
  • the model processes the image in real time, it also outputs target gesture information in real time.
  • the controller in order to enable the cursor displayed on the display to generate a real-time motion track according to the user's dynamic gesture, so that the cursor smoothly follows the dynamic gesture movement, the controller can determine each The cursor position of the frame.
  • the dynamic gesture recognition model cannot recognize As a result, the relevant information of the target gesture cannot be obtained, for example, the target gesture information is null.
  • the information indicated by the user cannot be obtained according to the target gesture information, that is, the cursor position cannot be obtained, so the display device can predict the cursor position corresponding to the frame image, and avoid the cursor not moving due to the lack of the cursor position, resulting in the cursor appearing Stuck, track interrupted, and lost when following user gestures.
  • the display device may determine whether the information indicated by the user can be obtained according to the target gesture information acquired by the dynamic gesture recognition model, such as the position information of key points of the finger shown in FIG. 11 .
  • the target gesture information acquired by the dynamic gesture recognition model, such as the position information of key points of the finger shown in FIG. 11 .
  • the target gesture may be that the user shows a preset finger key point.
  • key point No. 9 can be set as the control point for the user to instruct the cursor to move, that is, when the position information of the preset key point of the finger is detected, it is determined that the user has indicated the position of the cursor. move.
  • the display device may further determine the position information of the cursor movement according to the preset position information of the key points of the finger.
  • virtual position information is used to refer to preset position information of finger key points, that is, position information of a target gesture in a user behavior image.
  • the display device may detect whether each frame of target gesture information includes virtual position information. If a frame of target gesture information includes virtual position information, that is, the position information of the preset key point of the finger is recognized, it is considered that the target gesture is detected in the frame of user behavior image, that is, the user specifically instructed how to move the cursor. At this time, the display device can determine the position information where the cursor needs to move according to the virtual position information.
  • a frame of target gesture information does not include virtual position information, that is, the preset position information of finger key points is null, it is considered that no target gesture is detected in the user behavior image of this frame, and the user does not specifically indicate the cursor at this time. How to move, the display device needs to predict and supplement the position information that the cursor needs to move.
  • FIG. 12 is a schematic flowchart of determining the cursor position according to the target gesture information provided by the embodiment of the present application, including the following steps:
  • S1201 Determine whether the gesture information of the target user includes virtual position information; if yes, execute S1202; otherwise, execute S1204.
  • the controller can obtain the position information of the cursor to be moved respectively.
  • the position information where the cursor needs to move that is, the cursor position corresponding to the user behavior image, can be obtained according to the virtual position information.
  • the virtual position information represents the position information of the preset finger key points identified in the user behavior image, and is used to represent the position information of the user's target gesture.
  • the position information is the position of the key point of the finger in the user's behavior image. Therefore, the display device can map the user's target gesture to the display to obtain the position of the cursor. It should be noted that when the user's target gesture is mapped to the display, reference can be made based on the initial position of the cursor. When the user's target gesture is detected for the first time, the position of the key point of the finger in the frame image is determined as the initial position of the cursor. position, forming a mapping relationship. In the subsequent mapping, the subsequent target gestures of the user may be sequentially mapped to the display according to a preset mapping method, so as to obtain the cursor position corresponding to each frame of image.
  • the movement direction is not only up, down, left, right, but also forward and backward.
  • the display device can also adjust and optimize the cursor position, so that the cursor can be dynamically anti-shake, and the moving track is smooth. smooth.
  • the display device can map the target gesture in the target user behavior image to the display according to the virtual position information to obtain the original cursor position F c .
  • the original cursor position refers to the coordinates recognized by the dynamic gesture recognition model directly mapped to the coordinates in the display.
  • the target cursor position can be obtained.
  • the target cursor position refers to the actual coordinate position of the cursor displayed on the display after adjustment and optimization.
  • the display device can adjust the original cursor position according to the following method:
  • the display device can obtain the first position value according to the cursor position F p corresponding to the last frame of the user behavior image of the target user behavior image and the preset adjustment threshold, and can obtain the second position value according to the original cursor position and the preset adjustment threshold .
  • the target cursor position F c1 corresponding to the target user behavior image can be obtained according to the first position value and the second position value. It can be expressed by Equation 1:
  • F c1 represents the adjusted target cursor position
  • E 1 represents a preset adjustment threshold
  • F c represents the original cursor position before adjustment
  • F p represents the cursor position corresponding to the last frame of user behavior image.
  • the original cursor position can be adjusted according to the cursor position corresponding to the previous frame image, so as to reduce the possible jitter offset of the target gesture in this frame and optimize the movement of the cursor.
  • the adjustment threshold can be preset according to the following method:
  • E 1 represents the preset regulation threshold.
  • k represents the first adjustment parameter
  • g represents the second adjustment parameter
  • both the first adjustment parameter and the second adjustment parameter are numbers between 0-1, and can be set by relevant technical personnel.
  • S g represents the size of the target user behavior image.
  • the size of the user behavior image refers to the size of the user behavior image relative to the display.
  • the display device may display captured user behavior images on the display, so that the user can intuitively determine the current gesture situation.
  • FIG. 13 is a schematic diagram of a display showing a camera area provided by an embodiment of the present application. Wherein, the image captured by the camera is displayed in the camera area, and the size of the entire camera area can be set by the display device. The user can choose to open or close the camera area, but when the camera area is closed, its size is set to be the same as when it is opened.
  • Sc represents the size of the control at the cursor position corresponding to the previous frame of the user behavior image of the target user behavior image. After each cursor movement, it can be considered that the cursor has selected a certain control. Therefore, the adjustment threshold can be set according to the control selected by the cursor in the previous frame.
  • S tv represents the size of the display.
  • the target cursor position corresponding to the target user behavior image can be determined, that is, the position to which the cursor needs to be moved.
  • the display device may predict the cursor position corresponding to the target user behavior image , so that the cursor can move normally.
  • the display device may first determine the type of cursor movement.
  • the types of cursor movement can be divided into two categories: linear movement and curved movement.
  • linear movement When the cursor moves along a straight line, it means that the user's gestures are also moving along a straight line, which is relatively stable, and generally there will be no frame loss when shooting images.
  • the cursor moves along the curve, it means that the user's gesture is also moving along the curve.
  • a threshold for detecting frame loss may be preset to determine whether the cursor is moving in a straight line or in a curve.
  • the display device can detect a number of frame images before the target user behavior image, which may be a preset detection number of user behavior images, for example, within 20 frames of images, there is a frame loss situation, that is, the user behavior of the user's target gesture is not detected Whether the number of images exceeds the preset detection threshold, the detection threshold can be set to 0.
  • the cursor may be detected whether the number of images with frame loss in the first 20 frames of images is greater than 0, that is, whether there is frame loss in the first 20 frames of images. If there is no frame loss, it is considered that the cursor is moving in a straight line, which is set as the first type of motion in the embodiment of the present application; Classified as the second category of sports.
  • the display device may perform the first processing on the target user behavior image, so as to predict the position of the target cursor.
  • FIG. 14 is a schematic diagram of a cursor moving along a straight line provided by an embodiment of the present application.
  • the initial position of the cursor is A1
  • the obtained cursor positions are A2, A3 and A4 in sequence.
  • the cursor moves along a straight line
  • A5 is the predicted target cursor position in this frame.
  • the controller may obtain the historical cursor position offset according to the cursor position corresponding to the first two frames of the user behavior image of the target user behavior image, which is used to represent the last movement of the cursor.
  • the controller can obtain the moving speed of the cursor according to the historical cursor position offset and the first time.
  • the first time refers to: the time interval between the preset dynamic gesture recognition model processing the first two frames of user behavior images of the target user behavior image.
  • the time consumed by the dynamic gesture recognition model to process one frame of image is fixed. Therefore, the first time can also be considered as: the dynamic gesture recognition model outputs the target gesture information corresponding to the first two frames of user behavior images, and the interval between time.
  • the first time is a fixed value that does not need to be acquired every time.
  • the dynamic gesture recognition model processes images in real time, it is necessary to obtain the time difference between the recognition results of the first two frame images output by the model in real time.
  • the controller can acquire the target cursor position offset of the cursor according to the moving speed of the cursor, the second time and the preset first prediction threshold.
  • the second time is: the time interval between the preset dynamic gesture recognition model processing the target user behavior image and the previous frame user behavior image, that is, the moment when the model outputs the recognition result of the previous frame image, until the model outputs the current frame The moment of the recognition result of the image, and the time interval.
  • the controller can predict the movement of the cursor this time.
  • the controller may sum the coordinate position corresponding to the user behavior image in the previous frame and the offset of the target cursor position, and obtain the target cursor position by performing this offset movement at the position of the cursor in the previous frame.
  • the prediction method can be expressed by formulas 3 and 4:
  • F 0 represents the position of the target cursor
  • v represents the speed of the current movement of the cursor
  • ⁇ t 0 represents the second time
  • S f represents a preset first prediction threshold
  • F 0-1 indicates the coordinate position corresponding to the user behavior image of the previous frame
  • F 0-2 represents the coordinate position corresponding to the user behavior image of the second previous frame; ⁇ t represents the first time.
  • the first prediction threshold can be preset according to the following method:
  • E 2 represents the first prediction threshold, which may be 0.6.
  • a1 represents the first prediction parameter;
  • a2 represents the second prediction parameter. Both the first prediction parameter and the second prediction parameter are numbers between 0 and 1, and can be set by relevant technical personnel.
  • D f represents the processing rate of the user behavior image by the preset dynamic gesture recognition model within a preset time.
  • C f represents the rate at which the image collector 231 collects user behavior images within a preset time.
  • P f represents the frame rate of cursor movement within the preset time.
  • the frame rate of the cursor movement refers to the frequency of the number of times the cursor moves, which can also be considered as how many times the cursor moves per unit time, and the cursor moves from one cursor position to the next cursor position as one movement.
  • the preset time may be 1s. Therefore, the rate at which the model processes images, the rate at which the image collector 231 captures images, and the frame rate at which the cursor moves can be obtained within one second before the target user behavior image is acquired. Furthermore, a first prediction threshold may be set.
  • the position coordinates of the cursor under linear motion can be predicted.
  • the display device may perform a second process on the target user behavior image, so as to predict the position of the target cursor.
  • FIG. 15 is a schematic diagram of a cursor moving along a curve provided by an embodiment of the present application.
  • the initial position of the cursor is A1
  • the obtained cursor positions are A2-A9 in sequence.
  • the image corresponding to the cursor position A4 has a frame loss phenomenon for the first time. Since it is the first frame loss, the current movement of the cursor (the movement between A1 and A4) is considered to be a straight line movement.
  • the positions A5 and A6 are coordinates mapped according to the user's target gesture.
  • the second frame loss phenomenon occurs in the image corresponding to the cursor position A7, so the current movement of the cursor (the movement between A5 and A7) is considered to be moving along the curve, and the cursor position A7 is obtained according to the prediction.
  • the positions A8 and A9 are coordinates mapped according to the user's target gesture.
  • the frame loss phenomenon occurs in the target user behavior image, which is the third frame loss as a whole (the preset detection number).
  • the cursor moves along the curve (the movement between A8 and A10), and the cursor can be predicted Location A10.
  • the predicted cursor position of the target user behavior image may be A8.
  • the method of predicting the position of the cursor is similar to that of straight-line motion when performing curved motion. Both can first obtain the last cursor movement, that is, the historical cursor position offset.
  • the target cursor position offset of the cursor is acquired according to the moving speed of the cursor, the second time and the preset second prediction threshold.
  • the controller can calculate the difference between the coordinate position corresponding to the user behavior image in the previous frame and the offset of the target cursor position, and obtain the target cursor position by performing this offset movement at the position of the cursor in the previous frame.
  • S b represents the second prediction threshold, which may be 0.3.
  • the second prediction threshold can be preset according to the following method:
  • b represents the third prediction parameter.
  • the third predictive parameter is a number between 0-1, which can be set by relevant technical personnel, and can be 0.5.
  • the position coordinates of the cursor under the curve movement can be predicted.
  • a preset threshold for continuous frame loss may be set, which may be 4. Within this threshold, if the user behavior image continues to lose frames, the display device can continue to predict the position of the cursor.
  • the target user behavior image of this frame Before performing gesture recognition on the target user behavior image of this frame, it is possible to detect whether all of the user behavior images with preset thresholds before the current frame image, which may be 4 frames of user behavior images, do not detect the target Gesture, that is, whether the first 4 frames of the target user's behavior image are all dropped frames.
  • the user no longer uses gestures to indicate the position of the cursor.
  • the user may have put down his hand and determined the control that the cursor should select.
  • the cursor can be controlled not to move, and it is considered that the current round of user gesture movement is over. Until the camera captures the user's gesture again, the next round of gesture recognition can be performed.
  • the controller may continue to perform gesture recognition on the target user behavior image, and determine the cursor position corresponding to the current frame image.
  • the situation of predicting the position of the cursor will only occur after the cursor has started to move, that is, the first position of the cursor will not be predicted, but will only be obtained according to the user's instruction.
  • the display device enters the cursor control mode, it can be set to: when the user's target gesture is detected for the first time, the cursor is allowed to start moving, so as to avoid frame loss in the first frame of image.
  • the user's gesture movement track may be determined according to the cursor position. Considering that the distance between the cursor positions of every two frames is relatively short, it can be considered that the cursor moves in a straight line between the two frames of cursor positions.
  • the target cursor position can be made to reach the target cursor position along a straight line from the cursor position of the previous frame. That is, the target cursor position is connected with the cursor position of the previous frame to obtain the gesture movement track.
  • the controller can then make the cursor move along the trajectory of the gesture.
  • the user may no longer control the cursor to move.
  • the cursor may be located within the range of a certain control, or at the edge of a certain control.
  • the display device can allow the user to confirm whether to trigger the control.
  • the display device cannot allow the user to confirm the trigger control.
  • the position information of the preset dimension can be determined according to the position of the cursor.
  • the preset size may be 500*500.
  • an area with a size of 500*500 can be determined with the coordinates as the center.
  • the controller can determine all controls in the area, and obtain the distance from all controls to the cursor.
  • the distance from the control to the cursor is set as: the average distance from the midpoint of the four sides of the control to the cursor. As shown in FIG. 16, the position of the cursor is point O.
  • the midpoints of its four sides are B1, B2, B3, and B4 in sequence.
  • the distances from the four midpoints to the cursor are X1, X2, X3, and X4 in turn. Therefore, the distance from the control to the cursor is: (X1+X2+X3+X4)/4.
  • the distance from the midpoint of its four sides to the cursor may be shorter, thus affecting the judgment result. Therefore, the distance from each control to the cursor can also be determined according to the following method.
  • the cursor and the control have two positional relationships. One is that the cursor and the control are located in the same horizontal direction or the same vertical direction, and the other is that the cursor and the control are neither located in the same horizontal direction nor in the same vertical direction.
  • FIG. 17 is a schematic diagram of the positional relationship between the cursor and the control provided by the embodiment of the present application.
  • the cursor position is (a, b).
  • For a control set its size as width w, height h.
  • the coordinates of the four vertices are: (x-w, y-h), (x+w, y-h), (x+w, y+h), (x-w, y+h).
  • the vertical lines corresponding to the two vertical sides of the control are L1 and L2 respectively, and the horizontal lines corresponding to the two horizontal sides are L3 and L4 respectively.
  • the cursor is located in the area between the vertical lines, it is considered that the cursor and the control are located in the same vertical direction; if the cursor is located in the area between the horizontal lines, it is considered that the cursor and the control are located at the same level direction. If the cursor is not located within these two areas, the cursor and the control are considered to be neither in the same horizontal direction nor in the same vertical direction. As shown in Figure 17, the cursor O1 and the control A are located in the same vertical direction, the cursor O2 and the control A are located in the same horizontal direction, and the cursor O3 and the control A are neither located in the same horizontal direction nor in the same vertical direction.
  • the relationship between the cursor position and the control position can be judged.
  • the distance between the cursor and the control can be calculated according to the following method.
  • the distance between the cursor and the control can be calculated according to the following method.
  • the controller can set the control with the shortest distance as the control selected by the cursor.
  • the display device may trigger the control selected by the cursor.
  • gesture interaction is to control the display device to execute corresponding control instructions by detecting specific gesture actions made by the user.
  • the user can control the display device to perform rewind or fast-forward playback operations by waving his hand left or right instead of the left and right arrow keys on a control device such as a remote control.
  • the gesture interaction mode supported by the display device is based on static gestures, that is, when the user makes a specific gesture, the shape of the hand remains unchanged. For example, when performing an action of waving to the left or right, the user needs to keep five fingers together and move the palm in parallel to perform the swaying action.
  • the display device can first detect static gestures according to the gesture type recognition algorithm, and then perform corresponding control actions according to the gesture type.
  • this static gesture-based interaction method supports a small number of gestures and is only applicable to simple interaction scenarios.
  • some display devices also support dynamic gesture interaction, that is, to achieve specific gesture interaction through continuous actions within a period of time.
  • dynamic gesture interaction due to the limitation of the model used in the dynamic gesture detection process, the above dynamic gesture interaction process does not support user-defined gestures, which cannot meet the needs of users.
  • the dynamic gesture recognition can adopt training methods such as deep learning to carry out model training to obtain a dynamic gesture recognition model, and then input multiple consecutive frames of gesture image data into the dynamic gesture recognition model obtained through training, and then calculate through the classification algorithm inside the model
  • the target gesture information corresponding to the current multi-frame gesture image is obtained.
  • the target gesture information can generally be associated with a specific control instruction, and the display device 200 can realize dynamic gesture interaction by executing the control instruction.
  • training data may be generated based on gesture image data, and each frame of user behavior image in the training data is provided with a classification label, which indicates the gesture type corresponding to the current frame of user behavior image.
  • multiple consecutive frames of user behavior images are uniformly set with dynamic gesture tags, which represent dynamic gestures corresponding to multiple frames of user behavior images.
  • the training data including multiple consecutive frames of gesture images can be input into the initial dynamic gesture recognition model to obtain the classification probability output by the recognition model.
  • the classification probability output by the model and the classification label in the training data are subjected to a loss function operation to calculate the classification loss.
  • the model parameters in the recognition model are adjusted according to the calculated classification loss backpropagation.
  • the display device 200 can input multiple consecutive frames of user behavior images detected in real time into the recognition model, thereby obtaining the classification results output by the recognition model, determining the dynamic gestures corresponding to the multiple consecutive frames of user behavior images, and then Match the control commands corresponding to dynamic gestures to realize dynamic gesture interaction.
  • dynamic gesture interaction can also support user-defined operations, that is, a display device control method is provided, and the method can be applied to the display device 200 .
  • the display device 200 should at least include a display 260 and a controller 250 .
  • at least one image collector 231 is built in or connected externally.
  • the display 260 is used to display a user interface to assist the user's interactive operation;
  • the image collector 231 is used to collect user behavior images input by the user.
  • Fig. 18 is a schematic diagram of a dynamic gesture interaction process provided by the embodiment of the present application.
  • the controller 250 is configured to execute the application program corresponding to the display device control method, including the following content:
  • the gesture information stream is video data generated by the image collector 231 through continuous image capture, so the gesture information stream includes continuous multiple frames of user behavior images.
  • the display device 200 After the display device 200 starts the gesture interaction, it can send a start instruction to the image collector 231, and start the image collector 231 to take an image.
  • the user After starting image capture, the user can make a dynamic gesture within the shooting range of the image collector 231 , and the image collector 231 can continuously capture multiple frames of user behavior images following the user's dynamic gesture. And in real time, multiple frames of user behavior images obtained by shooting are sent to the controller 250 to form a gesture information stream.
  • the frame rate of the user behavior images contained in the gesture information flow can be the same as the image capture frame rate of the image collector 231. same.
  • the controller 250 may also acquire gesture information streams at a frame rate of 30 frames per second.
  • the display device 200 can also obtain a gesture information stream with a lower frame rate.
  • the display device 200 may extract multiple frames of user behavior images at equal intervals from the images captured by the image collector 231 .
  • the display device 200 may extract a frame of the user behavior image every other frame from the gesture images captured by the image collector 231 , so as to obtain a gesture information stream with a frame rate of 15.
  • the display device 200 can also send a control instruction for frame rate adjustment to the image collector 231 to control the image collector 231 to only capture 15 frames of gesture image data per second, thereby forming a gesture information stream with a frame rate of 15.
  • the dynamic gesture input process will be affected by different user action input speeds, that is, some users' gesture input actions are faster, and some users' gesture input actions are slower.
  • the gesture difference between adjacent frames is small, and the gesture information flow at a low frame rate can also characterize the complete gesture input process.
  • the display device 200 should maintain a higher frame rate as possible to acquire user behavior images.
  • the user behavior images may be user gesture interaction images, and the frame rate of the gesture information stream can be maintained at 15 -30FPS range.
  • the display device 200 can also dynamically adjust the frame rate of the gesture information stream in a specific interval according to the current operating load, so as to improve gesture performance by obtaining a high frame rate gesture information stream when the computing power is sufficient. recognition accuracy; and when the computing power is insufficient, excessive consumption of the computing power of the controller 250 can be reduced by acquiring low frame rate gesture information streams.
  • the display device 200 may perform gesture recognition processing on each frame of the user behavior image in the gesture information stream, so as to extract key gesture information from the gesture information stream.
  • the gesture recognition processing may be based on an image recognition algorithm to identify the positions of key points such as fingers, joints, and wrists in user behavior images. That is, the key point coordinates are used to characterize the imaging position of the hand joint in the user behavior image.
  • the display device 200 may identify the position coordinates of each key point in the current user behavior image in the user behavior image by means of feature shape matching. Then the coordinates of each key point are composed into an information vector according to the set order. That is, as shown in FIG. 11 , the key points used to characterize gesture actions may include 21 finger key points, and the position information of each key point may be represented by the coordinates of the corresponding points.
  • the above-mentioned coordinate representation method is also adopted, that is, the coordinates of the middle finger of the thumb are:
  • the coordinates of the above fingertip, middle finger and finger root can be combined to form a vector representing fingertip information, finger middle information and finger root information, that is, the fingertip information FT is:
  • the middle information F M is:
  • F M [P M1 , P M2 , P M3 , P M4 , P M5 ]
  • F B [P B1 , P B2 , P B3 , P B4 , P B5 ]
  • the display device 200 may also extract palm coordinates P Palm and wrist coordinates P Wrist from the user behavior image. These coordinate information are then combined to form the gesture key coordinate set H Info . That is, the gesture key coordinate set H Info is:
  • H Info [P Palm , P Wrist , F T , F M , F B ]
  • the above gesture key coordinate set is a coordinate set composed of multiple key point coordinates. Therefore, based on the relationship between key point positions in the gesture key coordinate set, the display device 200 can determine the key gesture type according to the gesture key coordinate set.
  • the display device 200 may first identify key point coordinates in the user behavior image, and then extract preset standard key point coordinates from the database.
  • the key point standard coordinates are template coordinate sets determined by the operator of the display device 200 through statistical analysis of crowd gestures, and each gesture may have corresponding key point standard coordinates.
  • the display device 200 can calculate the difference between the key point coordinates and the key point standard coordinates. If the calculated difference is less than or equal to the preset recognition threshold, it is determined that the user gesture in the current user behavior image is similar to the gesture type in the standard gesture template, so it can be determined that the gesture type corresponding to the standard coordinates of the key point is the target gesture type.
  • the gesture key coordinate set H Info1 can be obtained, and then the standard gesture similar to the five-finger close-up gesture can be matched from the database , to extract the standard coordinates H' of key points.
  • the key gesture information may also include a confidence parameter, which is used to characterize the difference between each gesture type and a standard gesture.
  • the key gesture information can also include the following parameter items that can represent the key gesture type, that is, the gesture posture information includes but is not limited to: hand orientation information HF (Hand Face), hand orientation information H O (HandOrientation), hand orientation Angle information H OB , left and right hand information H S (Hand Side), gesture state information H T (Handstretched), etc.
  • each parameter item can be obtained through the calculation of the gesture key coordinate set above.
  • the hand orientation information can be used to indicate the orientation of the fingertips in the screen, that is, as shown in Figure 19, the fingertips are up for Up, down for Down, left for Left, right for Right, and forward (middle). ) is Center, and the default is Unknown. Therefore, the hand orientation information can be expressed as:
  • the hand orientation declination information can also be determined according to the positional relationship between the coordinates of specific key points, which is equivalent to the confidence of the hand orientation information. For example, although the hand orientation is detected as Left, there will still be a declination angle, and it may not be completely oriented to the left. At this time, some follow-up processing needs to be performed according to the declination angle information, which can also prevent false triggering. That is, the hand orientation deflection angle can be expressed as:
  • the display device 200 can preferentially extract the hand orientation information, that is, generate hand orientation information based on the key point information of the left and right hands and the index finger.
  • the display device 200 can use the index finger root information P B2 , the little finger root information P B5 , the wrist information P Wrist , the left and right hand information H s is generated, hand orientation declination information H OB , hand horizontal and vertical information H XY , hand posture declination information H XB , H YB , and finally hand orientation information H O .
  • the generation logic is as follows, calculate the deflection angle f( ⁇ X, ⁇ Y) between the vectors where the base of the index finger P B2 and the base of the little finger P B5 are located and the x-axis direction, and the value range of the deflection angle is (0°, 90°).
  • the hand orientation information can be obtained, and then the threshold value of the deflection angle can be set to determine whether the orientation information is valid.
  • the deflection angle threshold ⁇ can be set to 5, that is, the orientation information is considered invalid within the range of 45 ⁇ 5, and the horizontal and vertical information of the hand H XY , that is, the generation formula is as follows:
  • ⁇ X is the horizontal coordinate difference between the root of the index finger and the root of the little finger
  • ⁇ Y is the vertical coordinate difference between the root of the index finger and the root of the little finger
  • f( ⁇ X, ⁇ Y) is the deviation angle
  • is the deviation angle threshold.
  • H YB is the pitch angle of the hand
  • ⁇ X is the horizontal coordinate difference between the base of the index finger and the base of the little finger
  • ⁇ Y is the vertical coordinate difference between the base of the index finger and the base of the little finger.
  • H O is the hand orientation information, including Center and other two states
  • is the hand orientation pitch angle threshold
  • the display device 200 can model the user's hand and preset hand attribute information for different distances. Obtain more accurate hand posture declination information. That is, the user can pre-input hand size information at different distances, and then according to the current frame distance information, index finger root information P B2 , little finger root information P B5 , wrist information P Wrist , and left and right hand information H s can generate hand poses Declination information H XB , H YB .
  • Corresponding orientation information can be generated according to the middle point P M information, wrist information P Wrist , hand horizontal and vertical information H XY , and left and right hand information H s .
  • the middle point P M information For example, in the vertical case of the right hand, it is necessary to compare the Y-axis information of the wrist and the middle point. If the y-value of the middle point is smaller than the y-value of the wrist, it is proved to be vertical. therefore:
  • H O l(P M , P Wrist , H XY , H S )
  • the hand orientation information HF indicates the information of the hand orientation in the screen, and may include a specific value indicating the orientation, that is, the forward direction is Front, and the back direction is Back. Hand-facing information HF defaults to Unknown.
  • the declination information of the hand-facing can also be determined, which is used to characterize the degree of the hand-facing, which is equivalent to the confidence of the hand-facing information. For example, although the user's hand-facing information is detected as Front, it still has a deflection angle, which may not be completely facing forward. At this time, some follow-up processing needs to be performed according to the deflection angle information to prevent false triggering of gestures.
  • H Fb a(0 ⁇ a ⁇ 90)
  • the generation logic is as follows, taking the right hand facing up as an example, If the x of the root of the index finger is smaller than the x of the root of the little finger, it is proved to be Front. More details will not be repeated, and the general formula will be used instead:
  • H F g(P B2 , P B5 , H S , ⁇ , H O )
  • the left and right hand information it can be used to indicate whether the hand image in the screen belongs to the user's left hand or the right hand image, where the left hand is Left and the right hand is Right, so the left and right hand information can be expressed as:
  • the stretching state of the gesture it can be used to indicate the stretching state of the finger, that is, the state of the finger in the stretched state can be represented as 1, and the state of the finger in the contracted state can be represented as 0.
  • the stretching state of the finger includes not only stretching and shrinking states, so different values can also be set to represent the stretching state, for example, the values representing the stretching state can be set to 0, 1, and 2. Among them, fully contracted is 0, half-expanded is 1, and fully extended is 2, which can be flexibly changed according to specific application scenarios. Therefore, the gesture stretching state can be expressed as:
  • F 1 to F 5 respectively represent the stretching states of the five fingers.
  • the curled state of each finger is mainly extracted, based on information such as hand orientation, hand orientation, left and right hands, and key points of the gesture.
  • the finally extracted curled state attribute is 0 or 1 (this embodiment uses The state attribute is 0 or 1 as an example), where 0 is the crouched state and 1 is the extended state.
  • H o Up
  • H S Right
  • the coordinates of the index finger tip are 50
  • the middle coordinate of the index finger is 70
  • the index finger tip If it is above the middle of the finger, it means that the finger is stretched out, which is 1.
  • the tip of the index finger is 30 and the middle of the finger is 50, it is in a curled state.
  • the comparison method between the thumb and the other four fingers is different.
  • the thumb needs to compare the abscissa
  • the thumb needs to compare the ordinate.
  • the thumb orientation is Up and Down
  • the thumb needs to compare the x coordinates
  • the other four fingers need to compare the y coordinates
  • the thumb needs to compare the state of the finger root and the fingertip
  • the other four fingers need to compare the state of the middle finger and the fingertip.
  • the comparison points can also be adjusted according to the specific scene, and finally the curled state information of the five fingers can be obtained.
  • key gesture information of the current frame can be obtained, including hand orientation information HF , hand orientation information H O , hand orientation declination information H OB , left and right hand information H s , and gesture telescopic state information H T .
  • the hand orientation angle information can be used to judge the accuracy of the gesture orientation.
  • a threshold can be set to filter some fuzzy gestures and gestures to improve the accuracy of gesture recognition. Taking the right hand, the back of the hand facing the camera, and the gesture facing downward (the deflection angle is 86 degrees), compared to gesture 1 as an example, the final key gesture information G Info can be expressed as:
  • the key gesture information includes key gesture types in multiple stages.
  • the display device 200 may traverse the target gesture types corresponding to multiple consecutive frames of user behavior images, and determine the intersection of the key gesture types corresponding to the multiple frames of user behavior images, that is, divide the dynamic gesture according to the multiple consecutive frames of user behavior images. There are multiple stages of gestures, and the user behavior images in each stage belong to the same target gesture type.
  • the display device 200 may determine key gesture types type1 to typen in each frame of user behavior images by analyzing key gesture coordinate sets in multiple frames of user behavior images photo1 to photon. Then compare the key gesture types type1 ⁇ typen of multiple frames of user behavior images, so that multiple frames of user behavior images with the same key gesture type, such as photo1 ⁇ photo30 and photo31 ⁇ photon, are determined as two stages respectively, so as to determine these two stages
  • the confidence parameters include the key gesture deflection angle
  • the display device 200 can calculate the gesture deflection angle according to the key point coordinates and the key point standard coordinates; and then traverse each stage Gesture deflection angles corresponding to multiple consecutive frames of user behavior images in order to obtain the union of deflection angles in each stage; extract the extreme value of the union of deflection angles in each stage as the key gesture information in the current stage The key gesture declination of .
  • the display device 200 may call the detection model to perform dynamic gesture matching.
  • the detection model is a matching model, which includes a plurality of nodes stored in a tree structure, and a gesture template is set in each node. Multiple nodes can be at different levels. Except for the root node and leaf nodes, each level of nodes has a higher-level node, and each level of nodes is designated as a lower-level node.
  • multiple gesture gesture templates may be pre-stored, and each gesture gesture template is used to represent a static gesture action.
  • the display device 200 also builds a gesture detection model according to the stored gesture templates. In the detection model, node attributes and subordinate nodes corresponding to each gesture template can be assigned. Therefore, in the display device 200 , the gesture template can still maintain the original storage quantity, and the detection model can be constituted only by assigning attributes to the nodes.
  • each gesture template can be assigned multiple node attributes.
  • a "grab-release" dynamic gesture includes three stages, namely five-finger spread gesture, five-finger curl gesture, and five-finger spread gesture.
  • the corresponding nodes and gesture templates in the detection model are: root node - "five-finger spread gesture”; first-level node - "five-finger curl gesture”; second-level node - "five-finger spread gesture”.
  • the root node is used for initial matching, and can include multiple gesture templates, which can be used to match the initial gesture input by the user.
  • the root node may insert gesture gesture templates that characterize triggering gesture interactions.
  • the leaf nodes in the detection model usually do not insert specific gesture templates, but insert control instructions for expressing specific response actions. Therefore, in the embodiments of this application, unless otherwise specified, the nodes of the detection model are not Including leaf nodes.
  • the display device 200 can use the detection model to match key gesture information to obtain target gesture information, wherein the key gesture information of the target gesture information is the same as the gesture template at each stage, and the confidence parameter is within the confidence interval combination of nodes. Therefore, target gesture information can be represented by an action path.
  • the display device 200 may match the key gesture types at each stage in the key gesture information with the gesture templates at each level node in the detection model.
  • the display device 200 may first match gesture templates of the same type in corresponding layers based on key gesture types at each stage. And when a gesture template is matched, the node corresponding to the gesture template is recorded. At the same time, the display device 200 also judges whether the confidence parameter of the node is within a preset reasonable confidence interval. If the key gesture type in the current stage is the same as the gesture template, and the confidence parameter is within the confidence interval, start the next stage of matching.
  • the display device 200 may first match the first-stage "five-finger spread gesture” with the gesture template in the root node.
  • matching determines that the "five-finger spread gesture” is the same or similar to the five-finger spread gesture template in a root node, it can be judged whether the confidence parameter of the first stage is within the preset confidence interval, that is, whether the gesture orientation angle is within within the preset declination range. If the declination angle of the gesture is within the preset declination range, start the second phase of the key gesture "five-finger curling gesture" to perform the above-mentioned matching with the subordinate nodes of the root node.
  • the display device 200 After matching the key gestures of each stage with the nodes of the corresponding level, the display device 200 can obtain an action path composed of multiple matching hit nodes, and the action path will eventually point to a leaf node, which corresponds to a target gesture information. Therefore, the display device 200 can obtain the target gesture information after the matching is completed, and execute the control instruction associated with the target gesture information.
  • the dynamic gesture of "grab-release” can be used to delete the currently selected file. Therefore, the display device 200 can obtain the "root node-five fingers spread; After the action path of "node-five fingers curled up; second-level node-five fingers spread", a delete command is obtained, and the currently selected file is deleted by executing the delete command.
  • the display device 200 extracts the gesture information of each stage in the gesture information flow, and uses a detection model in the form of a tree structure node to match the gesture information, which can be layer by layer according to the gesture input stage. Determine the motion path to obtain target gesture information. Since the detection model adopts the node form of tree structure, it can avoid reading the dynamic gesture template every time and repeating the detection during the process of matching key gesture information. In addition, the tree-structured detection model also supports users to insert nodes at any time to realize gesture input. And by adjusting the confidence interval of each node, you can customize the hit rate of the node matching process, so that the detection model can use the gesture habits of different users to realize custom gesture operations.
  • the display device 200 may first extract the first-stage key gesture type from the multi-stage key gesture information when using the detection model to match the key gesture information. . Then match the first node according to the key gesture type in the first stage, wherein the first node is a node whose stored gesture template is the same as the key gesture type in the first stage. After matching and obtaining the first node, the display device 200 may extract the key gesture type of the second stage from the key gesture information, where the second stage is a follow-up action stage of the first stage. Then match the second node according to the key gesture type in the second stage.
  • the second node is a node whose stored gesture template is the same as the key gesture type in the second stage, that is, the subordinate nodes specified by the first node include the second node. Finally record the first node and the second node to obtain the action branch.
  • the corresponding key gesture information is G info1 -G info4 , which can be combined to form five dynamic gestures AM1 - AM5 .
  • the key gesture types of the first stage of AM 1 -AM 4 are the same, and the gesture types of the second stage of AM 3 -AM 4 are also the same, as shown in Figure 20, the corresponding tree structure detection model can be obtained, and the corresponding dynamic gesture Expressed as follows:
  • the display device 200 may preferentially match the key gesture information of G info1 and G info2 according to the node storage levels of the detection model tree structure. If the matching key gesture information is G info1 , the detection will be continued according to the designated subordinate nodes corresponding to the root node of G info1 , that is, the matching key gesture templates are subordinate nodes of G info2 , G info3 and G info4 . Similarly, if during the matching process of the second-level nodes, the key gesture information is matched as G info4 , it will continue to detect the lower-level nodes, that is, the nodes corresponding to G info2 and G info3 in the third level. The node matching of subsequent levels is performed sequentially until a leaf node is detected.
  • the action AM 3 will be returned. If during the matching of a level node, other actions not stored in the current level node of the detection model are detected, it will return to the root node of the tree and re-detect G info1 and G info2 .
  • the first stage, the second stage, and the first node and the second node are only used to characterize the sequence relationship of different stages in the dynamic gesture and the upper-lower hierarchical relationship of different nodes in the detection model, and not have corresponding numerical meanings.
  • the gesture posture at the same stage can be used as the first stage or the second stage.
  • the same node can also be used as the first node or the second node. .
  • the initial phase is the first phase
  • the next phase of the initial phase is The second stage: the root node hit by matching is the first node, and the node hit by the next level of the root node is the second node.
  • the display device 200 will continue to use the detection model to match key gesture information.
  • next stage of the start stage is the first stage, and the next stage of the first stage is the second stage; and the node that matches and hits in the node of the next level of the root node is the first node, and the next level of the first node is The node that matches the hit is the second node. Therefore, during the matching process using the detection model, the above process may be repeated until the final leaf node is matched.
  • the detection model with a tree structure also supports the user's gesture entry process, that is, in some embodiments, when the display device 200 matches the second node according to the key gesture type in the second stage, it can traverse the gesture gestures stored in the subordinate nodes of the first node template; if the gesture templates stored in all subordinate nodes are different from the key gesture types in the second stage, that is, the dynamic gesture input by the user is a new gesture, then the display device 200 can be triggered to enter the gesture, that is, the display device 260 is controlled to display Input interface.
  • the input interface can prompt the user to perform gesture input.
  • the input interface can prompt the user to repeatedly perform the dynamic gestures that need to be input through prompt messages. That is, the user performs multiple cyclic entry of the same behavior.
  • the user can also specify the control instructions associated with the recorded dynamic gestures through the input interface.
  • the display device 200 extracts the key gesture information according to the above example every time the user enters, and matches it with the nodes of the detection model. Gesture type, add a new node at the current level.
  • the display device 200 may ask the user whether to start the input through a prompt message or window before displaying the input interface, and receive an instruction input by the user based on the window. If the user has input the input gesture information, the input gesture information input by the user based on the input interface may be received, and a new node is set for the detection model in response to the input gesture information, and the new node is a subordinate node of the first node. Finally, the gesture type of the corresponding stage is stored in the new node as the gesture template of the new node.
  • the display device 200 can perform dynamic gesture registration in real time based on the tree structure detection model, and detect whether there is a corresponding Action branch in the behavior tree structure by determining the Action to be recorded and recording the user behavior. If there is no corresponding Action branch, the gesture key posture is extracted, and then the corresponding behavior template is obtained, and the corresponding node is inserted into the behavior tree to complete the dynamic gesture entry. Obviously, in the process of dynamic gesture input, if the dynamic gesture input by the user has a corresponding Action branch in the detection model, the user behavior is detected according to the branch template. If the detection is successful, there is no need to change the node status of the detection model .
  • the display device 200 when the display device 200 uses the detection model to match the key gesture information, it may also judge the corresponding confidence level, where the confidence level may include the gesture deflection angle and the key gesture maintenance frame number. For the gesture deflection angle, the display device 200 may obtain a preset confidence interval of the corresponding node in the detection model after matching a node; and then compare the key gesture deflection angle at the current stage with the confidence interval of the corresponding node. If the key gesture deflection angle is within the confidence interval, record the corresponding current node and start matching the subordinate nodes of the current node; if the key gesture deflection angle is not within the confidence interval, it is determined that the gesture deviation is large, so further judgment or Adaptive adjustment.
  • the confidence level may include the gesture deflection angle and the key gesture maintenance frame number.
  • the display device 200 may obtain a preset confidence interval of the corresponding node in the detection model after matching a node; and then compare the key gesture deflection angle at the current stage with the confidence interval of the
  • the display device 200 may also adjust the detection model parameters according to user habits. Therefore, in some embodiments, if in the process of using the detection model to match the key gesture information, the key gesture type at a stage is the same as the gesture pose template in the node, but the key gesture deflection angle is not within the confidence interval, display The device 200 may also modify the confidence interval according to the gesture deflection angle.
  • the display device 200 can match the information of hand orientation, hand orientation, and finger stretching, and if the matching is successful, then check whether the confidence threshold is successfully matched, and if successful, the gesture matching is considered successful.
  • the display device 200 only needs to match the hand orientation, hand orientation, and finger stretching information. If the matching is successful, the template matching is successful. If all the gestures in the dynamic gesture are successfully matched, the dynamic gesture matching is considered successful, and finally the template confidence is optimized according to the best confidence.
  • the best confidence can be obtained by calculating some key frames when inputting user behavior images multiple times. For example, in the gesture detection process, there is an upward movement of five fingers in the dynamic gesture, and this movement occurs 10 times in a specific sequence, and the gesture is considered detected as long as it is detected three times during detection. Then there will be 8 consecutive gestures meeting the standard (10-3+1) in these 10 times, and it is necessary to select the one with the lowest average confidence, because at the beginning and end stages of the gesture, due to the position where the gesture is connected to other gestures There may be a large declination angle, resulting in an excessively large declination angle value. If this part of the declination angle is used as the confidence value, many false detections will occur.
  • the display device 200 may also obtain the maintenance frame number before matching the second node according to the key gesture type in the second stage; if the maintenance frame number of the key gesture type in the first stage is greater than or equal to the frame number threshold, that is, the user Maintaining a gesture action for a long time is not a case of misinput, so the second node can be matched according to the key gesture type in the second stage.
  • the current input may be different from the predetermined dynamic gesture, so the gesture entry can be started according to the above embodiment, that is, the display 260 is controlled to display the entry interface to update the confidence level interval.
  • the core gesture posture features are hand orientation and finger stretching state. Therefore, the display device 200 can perform gesture key point recognition and key gesture information extraction on the action frame; , the left and right hands, and the stretching state of the fingers are the same, then it is judged as the same gesture. Every time a similar gesture is detected, the declination information and the number of similar gestures are updated. The declination information takes the maximum range, and the number of similar gestures needs to be greater than the threshold.
  • the threshold will be determined according to the frame rate, or it can be set to a fixed value, such as 3. Process the action frames, select the gestures that meet the conditions, and when processing multiple action frames, take the intersection of actions, take the union of the parameters of each action and gesture, and finally obtain the corresponding key gesture template.
  • the display device 200 may also adopt a pseudo-jump method when performing dynamic gesture detection. That is, the display device 200 may acquire the confidence parameter of an intermediate stage, and the intermediate stage is a stage between the start stage and the end stage among the multiple stages of the key gesture information. Then compare the confidence parameter of the intermediate stage with the confidence interval of the corresponding node. If the confidence parameter of the intermediate stage is not within the confidence interval of the corresponding node, mark the node corresponding to the intermediate stage as a pre-jump node. Then perform matching on the subordinate nodes of the pre-jump node according to the detection model, so as to determine the target gesture information according to the matching result of the subordinate nodes of the pre-jump node.
  • a pseudo-jump method when performing dynamic gesture detection. That is, the display device 200 may acquire the confidence parameter of an intermediate stage, and the intermediate stage is a stage between the start stage and the end stage among the multiple stages of the key gesture information. Then compare the confidence parameter of the intermediate stage with the confidence interval of the corresponding no
  • the display device 200 can obtain the matching result of the subordinate nodes of the pre-jump node; if the matching result hits any subordinate node, record the pre-jump node and the hit subordinate node The node is used as the node of the target gesture information; if the matching result is that the lower-level node is not hit, the pre-jump node is discarded, and the matching is performed from the upper-level node again.
  • the display device 200 can set a false jump threshold, such as a specific confidence parameter value that is not in the confidence interval, and only perform a false jump when the confidence parameter is less than the false jump threshold change. Moreover, there will be a prompt every time a pseudo-jump is performed, and the user can delete this pseudo-jump through a specific key or a specific gesture. After a certain number of false jumps, the display device 200 optimizes the Action nodes involved in the false jumps, and increases the specified threshold to adapt to the user's action style.
  • a false jump threshold such as a specific confidence parameter value that is not in the confidence interval
  • the display device 200 can update the false jump threshold in various ways. For example, every time a false jump is performed, a prompt will pop up, and the Action node information will be updated by default. If the user thinks that this detection is a false detection, you only need to delete This identification is enough.
  • the display device 200 may also update the false jump threshold after multiple false jumps, so as to obtain better user experience.
  • a number threshold can also be set, that is, during the detection process, there are multiple false jumps, and after a certain number of times, the previous false jumps are considered invalid.
  • the display device 200 includes: a display 260 , an image acquisition interface and a controller 250 .
  • the display 260 is configured to display a user interface;
  • the image collection interface is configured to collect user behavior images input by the user; as shown in Figures 23 and 24,
  • the controller 250 is configured to execute the following program steps:
  • the gesture information flow including multiple consecutive frames of user behavior images
  • the detection model uses a detection model to match the key gesture information to obtain target gesture information, the detection model includes a plurality of nodes stored in a tree structure; each node is provided with a gesture template and a designated subordinate node;
  • the target gesture information is a node combination in which the key gesture type is the same as the gesture gesture template at each stage, and the confidence parameter is within the confidence interval;
  • FIG. 24 is a timing diagram of the dynamic gesture interaction provided by the embodiment of the present application. As shown in FIG. 24, the dynamic gesture interaction may include the following steps:
  • S2401 The image collector collects gestures made by the user.
  • the image collector sends the collected gestures of the user to the image collection interface as a gesture information flow.
  • S2403 The image acquisition interface sends the received gesture information stream to the controller.
  • S2404 The controller detects key gesture types at each stage based on the acquired gesture information flow.
  • S2405 Use the detection model to match key gesture information to obtain target gesture information.
  • S2406 Execute the control instruction associated with the target gesture information, and make the display display corresponding content through response interaction.
  • the display device 200 can obtain a gesture information stream after the user inputs a dynamic gesture, and extract key gesture information from the gesture information stream. Then use the detection model to match the key gesture types in each stage of key gesture information to obtain node combinations with the same key gesture type and confidence parameters within the set confidence interval, as the determined target gesture information, and finally execute the target Control instructions associated with gesture information to realize dynamic gesture interaction.
  • the display device 200 detects dynamic gestures based on gesture key points, and then dynamically matches key gesture types based on a detection model stored in tree structure nodes, which can enrich dynamic gesture interaction forms and support user-defined dynamic gestures.
  • FIG. 25 is a schematic diagram of another usage scenario of a display device provided by an embodiment of the present application.
  • the user can operate the display device 200 through the control device 100, or a video collection device 201 such as a camera installed on the display device 200 can also collect video data including the user's body, and according to the images in the video data Respond to the user's gesture information, body information, etc., and then execute the corresponding control command according to the user's action information.
  • This enables the user to control the display device 200 without the need of the remote controller 100, so as to enrich the functions of the display device 200 and improve user experience.
  • the display device 200 can also perform data communication with the server through various communication methods.
  • the display device 200 may interact with an electronic program guide (EPG, Electronic Program Guide) by sending and receiving information, receive software program updates, or access a digital media library stored remotely.
  • EPG Electronic Program Guide
  • Servers can be one group or multiple groups, one or more types of servers. Provide other network service content such as video-on-demand and advertising services through the server.
  • the display device 200 may further add more functions or reduce the functions mentioned in the foregoing embodiments.
  • the present application does not specifically limit the specific implementation of the display device 200 , for example, the display device 200 may be any electronic device such as a television.
  • FIG. 26 is a schematic diagram of a hardware structure of another hardware system in a display device provided in an embodiment of the present application.
  • the display device in the display device 200 in FIG. 25 may specifically include: a panel 1 , a backlight assembly 2 , a main board 3 , a power board 4 , a rear case 5 and a base 6 .
  • the panel 1 is used to present images to the user;
  • the backlight assembly 2 is located under the panel 1, and is usually some optical components, which are used to supply sufficient brightness and evenly distributed light sources so that the panel 1 can display images normally.
  • the backboard 20 Including the backboard 20, the main board 3 and the power board 4 are arranged on the back board 20, usually stamping and forming some convex structure on the back board 20, the main board 3 and the power board 4 are fixed on the convex hull by screws or hooks; the rear shell 5
  • the cover is arranged on the panel 1 to hide the components of the display device such as the backlight assembly 2, the main board 3 and the power board 4 to achieve an aesthetic effect;
  • the base 6 is used to support the display device.
  • a keypad is also included in FIG. 26 , and the keypad may be disposed on the back panel of the display device, which is not limited in the present application.
  • the display device 200 may also include a sound reproduction device (not shown in the figure) such as an audio component, such as an I2S interface including a power amplifier (Amplifier, AMP) and a speaker (Speaker), for realizing sound reproduction.
  • a sound reproduction device such as an audio component, such as an I2S interface including a power amplifier (Amplifier, AMP) and a speaker (Speaker), for realizing sound reproduction.
  • the sound components can realize sound output of at least two channels; to achieve the effect of surround sound, multiple sound components need to be installed to output sound of multiple channels, which will not be described in detail here.
  • the display device 200 may adopt specific implementation forms such as an OLED display screen.
  • the template contained in the display device 200 as shown in FIG. 26 changes accordingly, and no further description is given here.
  • the present application does not limit the specific structure inside the display device 200 .
  • a display device can capture images of users through its video capture device, and the processor can analyze the gesture information of users in the images. After the recognition is performed, the command corresponding to the gesture information is executed.
  • control command determined by the gesture information of the display device is relatively single, resulting in a low degree of intelligence of the display device and poor user experience.
  • the execution subject of the method for controlling a display device may be a display device, specifically, a controller such as a CPU, MCU, or SOC in a display device or a control unit, a processor, a processing unit, etc.
  • the controller is used as an example for execution. Then, after the controller obtains the video data through the video acquisition device of the display device, gesture recognition is performed according to continuous multiple frames of video data, and then corresponding actions are executed according to the recognized gesture information.
  • FIG. 27 is a schematic diagram of an embodiment of a method for controlling a display device provided in an embodiment of the present application, wherein, when the controller obtains the detected image on the right side of FIG. 27 from the video data of the video acquisition device Image, the gesture A in the image to be detected is recognized, and gesture information included in the image to be detected can be recognized through a gesture recognition algorithm, the "OK" gesture included in the gesture information, as well as the position and size of the gesture. Subsequently, the controller can determine that the control command corresponding to the "OK" of the gesture information is "click the control to confirm” according to what is currently displayed on the display of the display device, and the cursor is located on the control "OK", and finally the controller can execute the command .
  • FIG. 28 is a schematic diagram of another embodiment of a method for controlling a display device provided in an embodiment of the present application, wherein, when the controller passes through the gesture in each frame of the video data of the video acquisition device After the recognition, according to the comparison of the two frames of images to be detected before and after, it can be concluded that the gesture B of the user in the image to be detected has moved from the left side of the image in the previous frame to the right side of the image in the next frame, indicating that the gesture B of the user in the image to be detected is Gesture B moves.
  • the controller can determine that the control command corresponding to the gesture information is "move the cursor to the right" according to the content displayed on the current display as the moving cursor C, and the distance moved can be compared with the gesture in the image to be detected.
  • the moving distance corresponding to the information is related.
  • embodiments of the present application will provide a correlation method for calculating the moving distance of the gesture in the image to be detected and the moving distance of the cursor on the display.
  • the present application does not limit the specific method for the controller to determine the gesture information in the image based on a frame of the image to be detected.
  • the gesture information in the image to be detected can be recognized by using a machine learning model based on image recognition.
  • the present application also provides a display device control method, which can be better applied to the display device by defining the key point coordinates of the human hand in the image to be detected, and then determining the gesture information of the hand. scene.
  • FIG. 29 is a schematic diagram of the key point coordinates of the hand provided in the embodiment of the present application.
  • the human hand is marked with 21 positions from 1 to 21 in sequence according to the positions of fingers, joints, and palms. key point.
  • Fig. 30 is a schematic diagram of different telescopic states of the key points of the hand provided by the embodiment of the present application, wherein, when the controller recognizes the gesture information in the image to be detected, it first determines the orientation of the hand in the image to be detected by algorithms such as image recognition, And when the key points on the side of the palm are included in the image, continue to identify all the key points and judge the position of each key point. For example, in the leftmost image in Figure 30, the distance between key points 9-12 corresponding to the middle finger of the hand is relatively sparse and scattered, indicating that the middle finger is in a stretched state.
  • the present application also provides a method for controlling a display device.
  • the controller can identify gesture information and body information in the image to be detected, and jointly determine and execute a control command based on the two kinds of information.
  • FIG. 31 is a schematic diagram of an application scenario of a display device control method provided by an embodiment of the present application.
  • the specific structure of the display device 200 is the same as that shown in FIGS. 25-26 .
  • the user of the display device 200 can express the control command through gestures and limbs, and then the display device 200 collects video data through its video acquisition device, and the controller in the display device 200 recognizes the image to be detected in the multi-frame image , while identifying the gesture information and body information of the user in the image to be detected.
  • Fig. 32 is a schematic diagram of using gesture information and body information to jointly determine a control command provided by the embodiment of the present application, where it is assumed that the gesture information F on the left in Fig. 32 is an "OK" gesture, and the body information G is that the elbow points to the upper left corner, Then, the control command that can be determined according to the gesture information F and the body information G is to click on the control displayed on the left side of the display; the gesture information H on the right side in Figure 32 is the "OK" gesture, and the body information I is the elbow pointing to the upper right corner, then The control command that can be determined according to the gesture information H and the body information I is to click on the control displayed on the right side of the display.
  • the controller can jointly determine different control commands according to the gesture information and body information in the image to be detected, which enriches the interaction methods that users can use.
  • the number of control commands sent to the display device further improves the intelligence of the display device and user experience.
  • the controller can perform gesture and limb information recognition on each frame of the image to be detected extracted from the video data.
  • the display device provided by this application is equipped with at least two detection Models, denoted as the first detection model and the second detection model, wherein the second detection model is used to identify gesture information and body information in the image to be detected, and the amount of calculation and data of the first detection model is smaller than that of the second detection model
  • the model can be used to identify whether gesture information is included in the image to be detected.
  • the method for controlling the display device provided by the embodiment of the present application will be specifically described below with reference to FIG. 33 .
  • FIG. 33 is a schematic flowchart of a control method for a display device provided in an embodiment of the present application.
  • the control method shown in FIG. 33 includes:
  • S3301 According to a preset time interval, extract a frame of an image to be detected from the continuous multi-frame images of video data collected by the video capture device of the display device.
  • this application can be applied in the scene shown in Figure 31, and it is executed by the controller in the display device.
  • the display device When the display device is in the working state, its video acquisition device will collect the video data of its orientation, and then it will be the execution subject. After the controller acquires the video data, a frame of an image to be detected is extracted from the video data according to a preset time interval. For example, when the frame rate of the video data collected by the video acquisition device is 60 frames per second, the controller can perform sampling at a frame rate of 30 frames per second, so as to extract a frame of an image to be detected at intervals of one frame for subsequent processing. At this time, the preset time interval is 1/30 second.
  • S3302 Use the first detection model to determine whether the image to be detected includes gesture information of a human body.
  • the image captured by the video capture device includes target gesture information and body information; when the user does not need to control the display device, the video image captured by the video capture device within its capture range does not include target gesture information and body information.
  • the controller will use the first detection model with a small amount of calculation to process the image to be detected in S3302 Processing, judging whether the image to be detected includes gesture information through the first detection model.
  • the controller uses the gesture category detection model as the first detection model to implement the global perception algorithm, and then achieve the purpose of judging whether the image to be detected includes gesture information.
  • the global sensing algorithm refers to the algorithm that the controller can open by default after booting and keep running. It has the characteristics of small calculation amount and simple detection type. It can only be used to obtain specific information and be used to open the second detection model. Perform detection and other non-global functions.
  • the first detection model is obtained by training a plurality of training images, and each training image includes different gesture information to be trained, then the controller uses the first detection model to combine the learned gesture information with the gesture information to be detected The images are compared to determine whether the image to be detected includes gesture information, but the first detection model may not be used to specifically identify gesture information, while the second detection model may be used to determine gesture information through specific recognition algorithms such as joints.
  • S3303 If it is determined in S3302 that the image to be detected includes gesture information of the human body, it is determined that the user wants to control the display device, and the controller then continues to acquire the image to be detected, and uses the second detection model to detect the target gesture information in the image to be detected and Body information is identified.
  • the controller may continue to extract the image to be detected from multiple frames of images collected by the video acquisition device at preset time intervals, and use the second detection model Instead of the first detection model, the subsequently extracted images to be detected are processed, so as to identify the target gesture information and body information of each frame of the image to be detected.
  • the controller can also reduce the preset time interval, and extract images to be detected with fewer time intervals.
  • the controller may also use the second detection model to process the image to be detected that is determined to include gesture information of the human body in S3302, and then continue to use the second detection model to process the subsequent images to be detected, that is, the user Behavioral image processing.
  • S3304 Determine a corresponding control command according to the target gesture information and body information in the preset number of frames of user behavior images determined in S3303, and execute the control command.
  • the controller can continuously collect multiple frames of images for processing. For example, when it is judged in S3302 that the image to be detected includes gesture information of the human body, in S3303, according to the preset time interval After collecting a preset number (for example, 3) of user behavior images, the target gesture information and body information are identified for these 3 user behavior images respectively, and finally the target gesture information and body information in these 3 user behavior images At the same time, it is determined to perform subsequent calculations based on the same target gesture information and body information, which can prevent inaccurate recognition caused by occasional errors caused by other factors.
  • a preset number for example, 3
  • FIG. 34 is a schematic diagram of an embodiment of the mapping relationship provided by the embodiment of the present application, wherein the mapping relationship includes multiple control commands (control command 1, control command 2...), and each control command and the corresponding target gesture Correspondence between information and body information, for example: control command 1 corresponds to gesture information 1 and body information 1, control command 2 corresponds to gesture information 2 and body information 2 . . . Its specific implementation can refer to FIG. 32 , and different combinations of target gesture information and body information can correspond to different control commands.
  • the above-mentioned mapping relationship can be preset or specified by the user of the display device, and can be stored in the controller in advance, so that the controller can, according to the determined target gesture information and body information, The corresponding control command can be determined from the mapping relationship and continue to be executed.
  • FIG. 35 is another schematic diagram of the mapping relationship provided by the embodiment of the present application.
  • target gesture information and body information correspond to a control command.
  • the device can use another information to verify the determined control command, thereby improving the accuracy of the obtained control command.
  • the control commands determined by the two information are different, If it indicates that the identification is wrong, then the control command may not be executed or re-identification and other processing measures may be taken to prevent the wrong control command from being executed.
  • the mapping relationship provided in this application may also include a control command corresponding to "do not execute any command".
  • FIG. Schematic diagram, wherein the user in the image is facing the display device with his back, and at this moment, the hand is just facing the display device.
  • the controller It can be determined according to the mapping relationship that the current target gesture information and body information do not execute any commands.
  • the mapping relationship at this time may include, for example, gesture information that palms are spread out, body information that elbows are pointing diagonally downward, and the like.
  • the controller can jointly determine different control commands according to the target gesture information and body information in the user behavior image, which enriches the user's ability to use this interactive method to send to the display device.
  • the number of control commands further improves the intelligence of the display device and user experience.
  • the first detection model with a small amount of calculation is used to identify whether the image to be detected contains gesture information, and only after the first detection model determines that the gesture information is included, the second detection model with a large amount of calculation is used Recognize target gesture information and body information, thereby reducing the amount of calculation and power consumption caused by invalid recognition, and improving the calculation efficiency of the controller.
  • control command is a movement command for controlling the target control such as the mouse on the display to move to the position corresponding to the gesture information
  • the movement command is executed in S3304
  • the process of S3303-S3304 is repeatedly executed, so as to continuously move the target control on the display by detecting the user's continuous movement actions.
  • the process can be ended, stop using the second detection model to identify the target gesture information and limb information, and return to S3301 to continue extracting images to be detected , again using the first detection model to recognize the gesture information, so as to re-execute the entire process shown in FIG. 33 .
  • the controller when the control command is a movement command for controlling a target control such as a mouse on the display to move to a position corresponding to the gesture information, and the controller repeatedly executes S3303-S3304, it can be understood that this At this time, the user's gesture should be in a state of continuous movement.
  • the controller may fail to detect the target gesture information and body information in multiple frames of user behavior images during a certain detection process. In this case, the controller does not immediately stop the execution of the process, but can predict the target gesture information and body information that may appear at present according to the previous or multiple detection results, and based on the predicted target gesture information and body information Execute subsequent move commands.
  • FIG. 37 is a schematic diagram of the moving position of the target control provided by the embodiment of the present application.
  • the controller executes S3303 for the first time to detect the target gesture information K and body information L in the user behavior image, in S3304, executes moving the target control to A move command to move the control to position 1 on the display.
  • the controller executes S3303 for the second time and detects the target gesture information K and body information L in the user behavior image, in S3304, executes the movement command to move the target control to the position 2 on the display.
  • the controller fails to recognize the target gesture information and body information in the user behavior image when the controller executes S3303 for the third time, and fails to move the target control on the display , and after the subsequent controller executes S3303 for the fourth time and can detect the target gesture information K and body information L in the user behavior image, in S3304, when executing the movement command to move the target control to the position 4 on the display, the target When the control is directly moved from the 2 position on the display to the 4 position, the change is relatively large, which brings pause and freeze viewing effect to the user and greatly affects the user experience.
  • the controller when the controller executes S3303 for the third time, when the target gesture information and body information cannot be recognized in the user behavior image, since the display is still controlling the movement of the target control, the controller can The movement speed and direction of the target gesture information K and body information L identified in the 1 and 2 times are predicted for the target gesture information K and body information L that may appear in the 3 user behavior image, and then according to the prediction The predicted position corresponding to the obtained target gesture information and body information, and then according to the predicted target gesture information and body information, execute the movement command to move the target control to the position 3 on the display.
  • Figure 38 is another schematic diagram of the moving position of the target control provided by the embodiment of the present application.
  • the target changes according to 1-2-3-4 Gesture information and body information, although the target gesture information and body information cannot be recognized in the user behavior image when S3303 is executed for the third time, the 3 position on the display is still predicted based on the predicted target gesture information and body information , so that during the whole process, the target control on the display will change evenly according to the positions of 1-2-3-4, avoiding the pause and freeze of the target control moving directly from position 2 to position 4 in Figure 37, which greatly improves the The display effect makes the operation effect of the user more smooth and smooth when controlling the display device through gestures and body, further improving the user experience.
  • the controller after the controller executes S3303 every time, it will store and record the target gesture information and body information obtained by executing S3303 this time, so as to provide for the subsequent time when no target gesture information and body information are detected. Predict when body information is used. In some embodiments, when the target gesture information and body information are not detected when the process in S3303 is executed multiple times (for example, 3 times) in a row, no prediction is made, but the execution of this process is stopped, and the process is restarted from S3301 implement.
  • the controller can maintain a gesture movement speed v and movement direction ⁇ according to the recognition result of the second detection model, and according to the frame rate and the movement distance between multiple frames (generally three frames) Gesture moving speed v and moving direction ⁇ can be obtained.
  • multi-frame action prediction generally three frames
  • the predicted gesture position in the next frame can be obtained.
  • there needs to be a speed threshold ⁇ If the gesture movement speed exceeds the threshold ⁇ , it will be fixed at the speed ⁇ , which is to prevent the speed caused by fast gestures. Affects the experience too quickly.
  • the controller of the display device may dynamically adjust the above-mentioned preset time interval according to the working parameters of the display device. For example, the controller determines that the preset time interval is 100 ms according to when the current load is light, that is, A frame of user behavior images is extracted every 100ms.
  • the preset number of user behavior images corresponds to the time range of 800ms.
  • the target gesture information and body information are detected, it means that the target gesture information and body information are real and valid, that is, the control commands corresponding to the same target gesture information and body information are feasible.
  • the controller determines that the load is heavy according to the current load greater than the threshold, the preset time is determined to be 200ms, that is, a frame of user behavior images is extracted every 200ms. At this time, the controller can adjust the preset number to 4, so as to The 800ms time range corresponding to the 4 frames of user behavior images determines the authenticity and effectiveness of the target gesture information and body information.
  • the controller can dynamically adjust the preset quantity according to the preset time interval, and the two are in an inverse proportional relationship, so that it can not only reduce the calculation amount of the controller under heavy load, but also can When the preset time interval is long, the recognition time will be extended due to the large number of presets, and finally a certain recognition efficiency will be satisfied on the basis of ensuring the accuracy of recognition.
  • FIG. 39 is a schematic flowchart of a method for controlling a display device provided in an embodiment of the present application, which can be used as a specific implementation of the control method shown in FIG. 33 , as shown in FIG. 39 , including the following steps :
  • S3901, S3903 Perform gesture detection on the image to be detected, if target gesture information is detected, execute S3904, otherwise execute S3901, S3903.
  • S3904-S3906 Turn on the gesture body control mode, and continue body recognition to determine body information.
  • S3907-S3908 Perform user behavior detection, determine whether the user's click gesture is detected, if yes, execute S3910, otherwise, execute S3909.
  • S3910 Execute click-related control instructions, reset the detection mode, stop body recognition, only enable gesture recognition, and execute S3901-S3902.
  • S3901-S3902 Perform gesture detection on the image to be detected, acquire target gesture information of the user, and execute S3907-S3908.
  • FIG. 39 The specific implementation and principles of FIG. 39 are the same as those shown in FIG. 33 , and will not be repeated in this embodiment of the present application.
  • the controller can identify the target gesture information of the human body in the user behavior image by using the second detection model, and the first detection model is also obtained through image training including gesture information, therefore, the controller executes the After completing the entire process shown in Figure 33, the target gesture information identified by the second detection model during this execution can be used for training and updating the first detection model, so as to realize the target gesture information based on the current detection.
  • the information can update the first detection model more effectively, so the real-time performance and applicability of the first detection model can be improved.
  • the human body may It is only located in a small part of the area, so that when the user completes the long-distance movement operation of the controls on the control display, the gesture information of the human body moves to a relatively long position, which brings inconvenience to the user.
  • the embodiment of the present application also provides a display device control method, by establishing the mapping relationship between the "virtual frame" in the image to be detected and the display, so that when the user controls the display device, he can only use his gestures to move within the virtual frame
  • the movement of the indicated target control on the display can be achieved, which greatly reduces the user's range of motion and can improve user experience.
  • the following describes the "virtual frame" provided by this application and related applications in combination with specific embodiments, wherein the virtual frame is only an exemplary name, and can also be called a mapping frame, an identification area, a mapping area, etc., the application Its name is not limited.
  • FIG. 40 is a schematic flowchart of a method for controlling a display device provided in an embodiment of the present application.
  • the method shown in FIG. 40 can be applied to the scenario shown in FIG.
  • the display device displays controls such as a mouse
  • the method includes:
  • the specific implementation of S4001 can refer to S3301-S3303, for example, the controller can use the first detection model to judge whether gesture information is included in each extracted image to be detected, and use the second detection model to judge whether gesture information includes gesture information.
  • the target gesture information and body information in the user behavior image are recognized, and the specific implementation and principle will not be repeated.
  • S4001 it is also possible to directly display the target control on the display device or run an application program that needs to display the target control, indicating that the target control may need to be moved at this time, so every time after obtaining the image to be detected, directly use the second
  • the detection model recognizes target gesture information and/or body information in the user behavior image, and the recognized target gesture information and/or body information can be used to subsequently determine a movement command.
  • S4002 After identifying the first user behavior image extracted in S4001, the controller determines that the first user behavior image includes target gesture information, then the controller establishes a virtual frame according to the target gesture information in the first user behavior image , and establish a mapping relationship between the virtual frame and the display of the display device, and display the target control at a preset first display position, where the first display position may be the center of the display.
  • FIG. 41 is a schematic diagram of a virtual frame provided by an embodiment of the present application, wherein, when the first user behavior image includes target gesture information K and body information L, and the target gesture information and body information are extended palms, Corresponding to the command to move the target control displayed on the display, at this time, the controller creates a virtual frame based on the first focus position P where the target gesture information K is located, and displays the target control at the center of the display.
  • the shape of the virtual frame may be a rectangle, and the aspect ratio of the rectangle is the same as that of the display, but the area of the virtual frame may be different from the area of the display.
  • the mapping relationship between the virtual frame and the display is indicated by the dotted line in the figure.
  • the midpoint P of the virtual frame corresponds to the midpoint Q of the display
  • the four vertices of the rectangular virtual frame correspond to on the four vertices of the rectangular display
  • a focus position in the rectangular virtual frame can correspond to a display position on the display, so that the rectangular virtual frame
  • the display position on the display can change correspondingly following the focus position.
  • the above mapping relationship can be determined by the relative distance between the focus position in the virtual frame and a target position in the virtual frame, and the relative distance between the display position on the display and the same target position on the display express. For example, establish a coordinate system with the vertex P0 point in the lower left corner of the hypothetical virtual frame as the origin, and the coordinates of point P can be expressed as (x, y); establish a coordinate system with the vertex Q0 point in the lower left corner of the display as the origin, and the coordinates of point Q can be Denoted as (X, Y). Then the mapping relationship can be expressed as: X/x in the direction of the long side of the rectangle and Y/y in the direction of the wide side of the rectangle.
  • the controller completes the establishment of the rectangular virtual frame and the mapping relationship, and then can apply the virtual frame and the mapping relationship in S4003-S4004, so that the movement of the focus position corresponding to the gesture information can correspond to that on the display.
  • the position of the target control moves.
  • S4004 Control the target control on the display to move to the second display position determined in S4003.
  • FIG. 42 is a schematic diagram of the corresponding relationship between the virtual frame and the display provided by the embodiment of the present application, where it is assumed that in the first user behavior image, the virtual frame is established with the first focus position P in the target gesture information. At this time , and at the same time, the target control "mouse" can be displayed at the first display position Q in the center of the display.
  • the controller can calculate the actual position of the second display position Q' on the display according to the second relative distance and the coordinates of the target position in the lower left corner, and display the target control at the second display position Q'.
  • Fig. 43 is a schematic diagram of the movement of the target control provided by the embodiment of the present application, in which it shows that in the process shown in Fig. 42, when the target gesture information between the first user behavior image and the second user behavior image changes from the first
  • the focus position P has moved to the second focus position P'
  • the controller can display the target control at the first display position Q and the second display position Q' on the display according to the change of the focus position in the virtual frame.
  • the impression presented to the user is that the target control displayed on the display moves correspondingly following the movement of its target gesture information.
  • the position where the target gesture information is located is used as the focus position, for example, a key point in the target gesture information is used as the focus position.
  • the key point of the body information can also be used as the focus position, etc.
  • the implementation methods are the same, and will not be repeated here.
  • the first user behavior image and the second user behavior image are single-frame images as an example, as shown in Figure 40, it can also be combined with the method shown in Figure 33, the user behavior image includes multiple Frame the user behavior image, so as to determine the corresponding focus position according to the target gesture information recognized in the multiple frames of user behavior images.
  • the display device control method provided by this embodiment can establish a mapping relationship between the "virtual frame" in the user behavior image and the display, so that when the user controls the display device, he can move within the virtual frame only through his gestures. By moving, the movement of the indicated target control on the display can be realized, which greatly reduces the user's range of motion and can improve user experience.
  • the size of the established virtual frame may be related to the distance between the human body and the video acquisition device.
  • FIG. 44 is a schematic diagram of the area of the virtual frame provided by the embodiment of the present application, where, when the distance between the human body and the video capture device is relatively long, the area corresponding to the gesture information in the user behavior image is relatively small, so a relatively small area can be set. Small virtual frame; when the distance between the human body and the video capture device is short, the area corresponding to the gesture information in the user behavior image is larger, so a larger virtual frame can be set.
  • the area of the virtual frame can establish a linear multiple relationship proportional to the distance, or it can be divided into a multi-level mapping relationship according to the distance (that is, a certain distance corresponds to a certain frame size), and the specific mapping relationship can be adjusted according to the actual situation.
  • the controller can determine the distance between the human body and the display device (the video acquisition device is set on the display device) according to the infrared form or any other form of distance measuring unit set by the display device, or control The controller can also determine the corresponding distance according to the area corresponding to the gesture information in the user behavior image, and then determine the area of the virtual frame according to the area of the gesture information.
  • the controller can also establish an edge area around the edge of the user behavior image to establish an optimal control range.
  • FIG. 45 is a schematic diagram of the edge area provided by the embodiment of the present application. It can be seen that the edge area refers to the user behavior image, outside the optimal range of control, and the distance between a boundary of the user behavior image is less than the preset area of distance. In the user behavior image at the top of FIG.
  • the movement distance of the focus position corresponding to the target gesture information will correspond to a larger change distance of the display position on the display, although for the user
  • the target control moves faster, but it prevents the controller from identifying the target gesture information from the edge area of the user behavior image, which can improve the recognition accuracy of the target gesture information and improve the accuracy of the entire control process.
  • the virtual frame in the user behavior image is provided, so that the user can control the movement of the target control on the display through the movement of the gesture information in the virtual frame, but in some cases, due to the large movement of the user, the physical Due to overall movement and other reasons, the gesture information may move out of the virtual frame, resulting in unrecognizable situations and affecting the control effect.
  • FIG. 1 For example, FIG. 1
  • the second user behavior image includes target gesture information
  • the second focus position corresponding to the target gesture information can be in the established Inside the virtual frame K1
  • the control method in the foregoing embodiments can be normally executed at this time, and the display position of the target control is determined by the focus position of the target gesture information in the virtual frame.
  • the second user behavior image includes target gesture information
  • the second focus position corresponding to the target gesture information may appear outside the virtual frame K1 in the user behavior image, and the target cannot be passed normally at this time.
  • the focus position of the gesture information in the virtual frame determines the display position of the target control.
  • FIG. 47 is a schematic diagram of the re-established virtual frame provided by the embodiment of the present application. It can be seen that in the re-established virtual frame K2 in FIG.
  • Fig. 48 is another schematic diagram of the re-established virtual frame provided by the embodiment of the present application, wherein, in this way, when the gesture information shown in state S2 in Fig. 46 appears outside the virtual frame K1 in the image to be detected , the controller resets the virtual box.
  • the controller displays the target control at the first relative position Q1 on the display according to the position information of the target gesture information in the virtual frame K1 in the previous user behavior image
  • the controller according to the first relative position Q1 on the display
  • the virtual frame K2 is re-established, so that the relative positional relationship of the second focus position P2 in the virtual frame K2 is the same as the relative positional relationship of the first relative position Q1 in the display. Therefore, the controller can continue to display the target control at the first relative position Q1, and the reset of the virtual box K2 is completed without the target control jumping to the center of the display position.
  • the controller can determine the display position of the target control according to the focus position of the target gesture information in the virtual frame K2, so as to realize the user-unknown Completing the focus reset can not only avoid the problem of being unable to control the target control after the target gesture information is removed from the virtual frame, but also make the whole process smoother and further improve the user experience.
  • the controller when the controller executes the above process and re-establishes the virtual frame, it may display relevant prompt information on the display to remind the user that the virtual frame has been re-established on the display and to prompt the re-established virtual frame.
  • the controller may display information in the form of text, image, etc. at the edge of the display to prompt the user that the virtual frame has been rebuilt.
  • the controller after the controller determines to re-establish the virtual frame in the above process, it can also display information prompting to update the virtual frame on the display, and after receiving the confirmation message from the user, execute the process of rebuilding the virtual frame, so that the entire The process is user-controllable, and reconstruction is performed according to the user's intention to prevent invalid reconstruction due to the user's voluntary departure.
  • the controller when the controller fails to recognize the target gesture information in a preset number of consecutive user behavior images during the control process, it may stop displaying the target gesture information on the display. control, thus ending the process shown in Figure 40. Or, when the target gesture information is not included in the user behavior images processed by the controller within a predetermined period of time, it may also stop displaying the target control on the display and end the process. Alternatively, when the controller recognizes that the target gesture information included in the user behavior image corresponds to a stop command during the control process, it may also stop displaying the target control on the display and end the process.
  • FIG. 49 is a schematic diagram of the movement of the target control provided in the embodiment of the present application. As can be seen from FIG. 49, it is assumed that when the controller determines that the target gesture information in the user behavior image 1 is located in the virtual box The focus position P1 in the virtual frame controls the display position Q1 on the display to display the target control. The target gesture information in the user behavior image 2 is located at the focus position P2 in the virtual frame, thereby controlling the display position Q2 on the display to display the target control.
  • the target gesture information is located at the focus position P3 in the virtual frame, thereby controlling the display position Q3 on the display to display the target control.
  • the target control displayed on the display is moved between Q1-Q2, causing the user to experience slow movement. The look and feel of even, targeted control transitions.
  • FIG. 50 is another schematic diagram of the movement of the target control provided by the embodiment of the present application.
  • the controller determines the first focus position P1 and the second focus position P2 in the virtual frame, it also compares the distance between the second focus position and the first focus position with the preset time interval, If the ratio between the distance between P1-P2 and the preset time interval (that is, the interval between extracting the user behavior images at the first focus position and the second focus position) is greater than the preset threshold, it means that the moving speed of the target gesture information is too fast, At this time, if the second display position of the target control is determined according to the second focus position and the target control is displayed, the display effect as shown in FIG.
  • the controller determines the third focus position P2' between the first focus position and the second focus position, wherein the ratio of the distance between the third focus position P2' and the first focus position P1 to the preset time interval is not equal to greater than the preset threshold, and the third focus position P2' may be a point on the connection line between P1-P2, and P1, P2' and P2 are in a linear connection relationship.
  • the controller can determine the second display position Q2' on the display according to the third focus position P2' and the mapping relationship, and control the target control to move from the first display position Q1 to the second display position Q2'.
  • the target control displayed on the display does not move to the display position Q2 corresponding to the second focus position, but moves to the third focus position P2' Corresponding to the second display position Q2', therefore, when the controller processes the third user behavior image after the second user behavior image, if the third user behavior image includes target gesture information and the target gesture information corresponds to the fourth
  • the focus position P3 is located in a rectangular virtual frame, and at the same time, when the ratio of the distance between the fourth focus position P3 and the third focus position P2' to the preset time interval is not greater than the preset threshold, the fourth focus can be determined according to the mapping relationship.
  • the target control displayed on the display can reduce the length of the movement, and when the target gesture information at the P2-P3 position moves at a slower speed, It can make up the "reduced" distance in the process of P1-P2 position.
  • the target control on the display will It will also move from the Q1 position on the left side of the display to the Q3 position on the right side, so that when the user's target gesture information moves too quickly between P1-P2, the target control displayed on the display can also be kept in the overall position of P1-P3 The movement speed of will not change too much, giving the user the impression that the movement speed is uniform and the target control changes continuously.
  • FIG. 51 is a schematic diagram of the display device control process provided by the embodiment of the present application. As shown in FIG. 51 , the method includes the following steps:
  • S5101 Acquire several frames of user behavior images.
  • S5102 Perform gesture recognition processing on each frame of the user behavior image to obtain target gesture information.
  • controlling the display to display corresponding content based on the target gesture information includes:
  • the cursor position is a display position in the user behavior image where the user's target gesture is mapped to the display;
  • FIG. 52 is a schematic diagram of another display device control process provided by the embodiment of the present application. As shown in FIG. 52 , the method includes the following steps:
  • Step 5201. Control the image collector to collect several frames of user behavior images of the user.
  • Step 5202 Perform gesture recognition processing on the user behavior image to obtain target gesture information of each frame of the user behavior image.
  • Step 5203 Obtain the cursor position corresponding to each frame of the user behavior image according to the target gesture information; the cursor position is the display position in the user behavior image where the user's gesture is mapped to the display.
  • Step 5204 Determine the user's gesture movement track according to the cursor position, and control the cursor in the display to move along the gesture movement track.
  • the method further includes: acquiring a gesture information flow, the gesture information flow including multiple consecutive frames of the user behavior image; extracting key gesture information from the gesture information flow, the key gesture information Key gesture types including a plurality of stages and confidence parameters of each stage; using a detection model to match the key gesture information to obtain target gesture information, the detection model including a plurality of nodes stored in a tree structure; each Gesture gesture templates and designated subordinate nodes are set in the node; the target gesture information is a node combination in which the key gesture type is the same as the gesture gesture template at each stage, and the confidence parameter is within the confidence interval; execute A control instruction associated with the target gesture information.
  • the method further includes: extracting a frame of an image to be detected from the continuous multi-frame images of video data collected by the video acquisition device of the display device at a preset time interval;
  • the first detection model judges whether the image to be detected includes gesture information of the human body; if so, according to the preset time interval and the preset number, continue to extract a preset number of user behavior images to be detected from the video data , and use the second detection model to identify the target gesture information and limb information of the human body in the preset number of user behavior images to be detected; wherein, the amount of data calculated by the first detection model is smaller than that of the second detection model The amount of data during model calculation; execute the control commands corresponding to the target gesture information and the limb information in the preset number of user behavior images to be detected.
  • the method further includes: identifying target gesture information in the first user behavior image; centering on the first focus position corresponding to the target gesture information in the first user behavior image, establishing A rectangular virtual frame, displaying the target control at the first display position of the display screen, and determining a mapping relationship between the rectangular virtual frame and the display of the display device; when the second user behavior image after the first user behavior image When the user behavior image includes the target gesture information, and the second focus position corresponding to the target gesture information is located in the rectangular virtual frame, according to the second focus position and the mapping relationship, determine the the second display position; controlling the target control on the display to move to the second display position.
  • Fig. 53 is a schematic flowchart of an embodiment of a method for controlling a display device provided in an embodiment of the present application. In a specific implementation manner shown in Fig. 53, the process includes the following steps:
  • S5301 The controller of the display device first performs gesture detection, and if the gesture state is normal, execute S5302-S5306, otherwise, execute S5307.
  • S5302-S5306 Map the cursor position of the TV interface according to the position of the hand in the virtual frame, perform gesture movement control, gesture speed and gesture direction update, gesture click detection, gesture return detection, etc.
  • S5307 Perform multi-frame (generally three) action prediction.
  • S5309-S5310 Clear the mouse on the TV interface, if no gesture is detected for a long time, execute S5311.
  • S5311 Exit gesture body recognition, enter the global gesture detection scheme, until the focus gesture is detected.
  • S5312 Reset the focus, if the distance is short, continue to move, if the distance is long, reset the focus to the center of the TV. Wherein, when the focus is reset, the virtual frame needs to be regenerated. Also, if the gesture is not detected multiple times.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种显示设备及其控制方法,该显示设备包括显示器(260),图像输入接口(501)和控制器(110),由于控制器(110)获取若干帧用户行为图像,并根据获取到的该若干帧用户行为图像,确定目标手势信息,并基于该目标手势信息进行相应的控制,而不是基于获取到的一张用户行为图像,确定目标手势信息进行控制的,提高了显示设备基于手势识别进行显示控制的准确率,从而提高了显示设备的智能化程度,提升了用户的体验感。

Description

一种显示设备及其控制方法
相关申请的交叉引用
本申请要求在2021年11月04日提交、申请号为202111302345.9;在2021年11月04日提交、申请号为202111302336.X;在2022年03月17日提交、申请号为202210266245.3;本申请要求在2022年03月24日提交、申请号为202210303452.1的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及手势控制领域,尤其涉及一种显示设备及其控制方法。
背景技术
随着电子技术的不断发展,电视机等显示设备能够实现的功能越来越多,例如,显示设备可以通过其设置的视频采集装置拍摄用户的图像,并由处理器对图像中用户的手势信息进行识别后,执行手势信息对应的命令。
然而,目前显示设备通过手势信息确定的控制命令,一般是通过识别采集到的一个用户行为图像,确定目标手势信息,进而确定出相应的控制指令,造成了显示设备的智能化程度较低、用户体验较差。
发明内容
本申请提供了一种显示设备,包括:显示器,被配置为显示图像;图像输入接口,被配置为获取用户行为图像;控制器,被配置为:获取若干帧用户行为图像;对每一帧所述用户行为图像进行手势识别处理,获得目标手势信息;基于所述目标手势信息,控制所述显示器显示对应的内容。
本申请提供了一种显示设备控制方法,所述方法包括:获取若干帧用户行为图像;对每一帧所述用户行为图像进行手势识别处理,获得目标手势信息;基于所述目标手势信息,控制所述显示器显示对应的内容。
附图说明
图1为本申请实施例提供的显示设备的使用场景;
图2为本申请实施例提供的控制装置100的硬件配置框图;
图3为本申请实施例提供的显示设备200的硬件配置框图;
图4为本申请实施例提供的显示设备200中软件配置图;
图5为本申请实施例提供的一种显示设备示意图;
图6a为本申请实施例提供的显示设备内置摄像头的示意图;
图6b为本申请实施例提供的显示设备外接摄像头的示意图;
图7为本申请实施例提供的用户界面的示意图;
图8为本申请实施例提供的显示器显示光标的示意图;
图9为本申请实施例提供的显示器中显示光标控制模式确认信息的示意图;
图10为本申请实施例提供的显示设备各部件的交互流程图;
图11为本申请实施例提供的用户手势的示意图;
图12为本申请实施例提供的根据目标手势信息确定光标位置的流程示意图;
图13为本申请实施例提供的示器显示摄像头区域的示意图;
图14为本申请实施例提供的光标沿直线运动的示意图;
图15为本申请实施例提供的光标沿曲线运动的示意图;
图16为本申请实施例提供的光标和控件距离关系的示意图;
图17为本申请实施例提供的光标和控件的位置关系;
图18为本申请实施例提供的动态手势交互流程示意图;
图19为本申请实施例提供的手朝向示意图;
图20为本申请实施例提供的一种检测模型的树结构示意图;
图21为本申请实施例提供的伪跳转成功时的动作路径图;
图22为本申请实施例提供的伪跳转失败时的动作路径图;
图23为本申请实施例提供的动态手势交互的数据流转关系示意图;
图24为本申请实施例提供的动态手势交互时序关系图;
图25为本申请实施例提供的显示设备的另一使用场景的示意图;
图26为本申请实施例提供的显示设备中另一硬件系统的硬件结构示意图;
图27为本申请实施例提供的显示设备的控制方法的示意图;
图28为本申请实施例提供的显示设备的控制方法另一实施例的示意图;
图29为本申请实施例提供的手部关键点坐标的示意图;
图30为本申请实施例提供的手部关键点的不同伸缩状态示意图;
图31为本申请实施例提供的显示设备的控制方法一应用场景的示意图;
图32为本申请实施例提供的使用手势信息和肢体信息共同确定控制命令的示意图;
图33为本申请实施例提供的显示设备的控制方法的流程示意图;
图34为本申请实施例提供的映射关系的示意图;
图35为本申请实施例提供的映射关系另一示意图;
图36为本申请实施例提供的一种图像中目标手势信息和肢体信息的示意图;
图37为本申请实施例提供的目标控件的移动位置的示意图;
图38为本申请实施例提供的目标控件的移动位置另一示意图;
图39为本申请实施例提供的显示设备的控制方法的流程示意图;
图40为本申请实施例提供的显示设备的控制方法另一的流程示意图;
图41为本申请实施例提供的虚拟框的示意图;
图42为本申请实施例提供的虚拟框和显示器的对应关系示意图;
图43为本申请实施例提供的目标控件移动的示意图;
图44为本申请实施例提供的虚拟框的面积示意图;
图45为本申请实施例提供的边缘区域的示意图;
图46为本申请实施例提供的手势信息的状态示意图;
图47为本申请实施例提供的重新建立的虚拟框的示意图;
图48为本申请实施例提供的重新建立的虚拟框的另一示意图;
图49为本申请实施例提供的目标控件的移动时的示意图;
图50为本申请实施例提供的目标控件的移动时的另一示意图;
图51为本申请实施例提供的显示设备控制过程示意图;
图52为本申请实施例提供的另一显示设备控制方法的流程示意图;
图53为本申请实施例提供的显示设备的控制方法一实施例的流程示意图。
具体实施方式
为使本申请的目的、实施方式和优点更加清楚,下面将结合本申请示例性实施例中的附图,对本申请示例性实施方式进行清楚、完整地描述,显然,所描述的示例性实施例仅是本申请一部分实施例,而不是全部的实施例。
图1为本申请实施例提供的显示设备与控制装置之间操作场景的示意图,如图1所示,用户可通过移动终端300和控制装置100操作显示设备200。控制装置100可以是遥控器,遥控器和显示设备的通信包括红外协议通信、蓝牙协议通信,无线或其他有线方式来控制显示设备200。用户可以通过遥控器上按键,语音输入、控制面板输入等输入用户指令,来控制显示设备200。在一些实施例中,也可以使用移动终端、平板电脑、计算 机、笔记本电脑、和其他智能设备以控制显示设备200。
在一些实施例中,移动终端300可与显示设备200安装软件应用,通过网络通信协议实现连接通信,实现一对一控制操作的和数据通信的目的。也可以将移动终端300上显示音视频内容传输到显示设备200上,实现同步显示功能显示设备200还与服务器400通过多种通信方式进行数据通信。可允许显示设备200通过局域网(LAN)、无线局域网(WLAN)和其他网络进行通信连接。服务器400可以向显示设备200提供各种内容和互动。
显示设备200,一方面讲,可以是液晶显示器、OLED显示器、投影显示设备;另一方面讲,显示设备被可以是智能电视或显示器和机顶盒组成的显示系统。显示设备200除了提供广播接收电视功能之外,还可以附加提供计算机支持功能的智能网络电视功能。示例的包括,网络电视、智能电视、互联网协议电视(IPTV)等。在一些实施例中,显示设备可以不具备广播接收电视功能。
图2为本申请实施例提供的控制装置100的配置框图。如图2所示,控制装置100包括控制器110、通信接口130、用户输入/输出接口140、存储器、供电电源。控制装置100可接收用户的输入操作指令,且将操作指令转换为显示设备200可识别和响应的指令,起到用户与显示设备200之间交互中介作用。通信接口130用于和外部通信,包含WIFI芯片,蓝牙模块,NFC或可替代模块中的至少一种。用户输入/输出接140包含麦克风,触摸板,传感器,按键或可替代模块中的至少一种。
图3为本申请实施例提供的显示设备200的硬件配置框图。如图3所示显示设备200包括调谐解调器210、通信器220、检测器230、外部装置接口240、控制器250、显示器260、音频输出接口270、存储器、供电电源、用户接口280中的至少一种。控制器包括中央处理器,视频处理器,音频处理器,图形处理器,RAM,ROM,用于输入/输出的第一接口至第n接口。
显示器260可为液晶显示器、OLED显示器、触控显示器以及投影显示器中的至少一种,还可以为一种投影装置和投影屏幕。
通信器220是用于根据各种通信协议类型与外部设备或服务器进行通信的组件。例如:通信器可以包括Wifi模块,蓝牙模块,有线以太网模块等其他网络通信协议芯片或近场通信协议芯片,以及红外接收器中的至少一种。显示设备200可以通过通信器220与外部控制设备100或服务器400建立控制信号和数据信号的发送和接收。
用户输入接口,可用于接收控制装置100(如:红外遥控器等)的控制信号。
调谐解调器210通过有线或无线接收方式接收广播电视信号,以及从多个无线或有线广播电视信号中解调出音视频信号,如以及EPG数据信号。
检测器230用于采集外部环境或与外部交互的信号。例如,检测器230包括光接收器,用于采集环境光线强度的传感器;或者,检测器230包括图像采集器231,如摄像头,可以用于采集外部环境场景、用户的属性或用户交互手势。
外部装置接口240可以包括但不限于如下:高清多媒体接口(HDMI)、模拟或数据高清分量输入接口(分量)、复合视频输入接口(CVBS)、USB输入接口(USB)、RGB端口等任一个或多个接口。也可以是上述多个接口形成的复合性的输入/输出接口。
控制器250和调谐解调器210可以位于不同的分体设备中,即调谐解调器210也可在控制器250所在的主体设备的外置设备中,如外置机顶盒等。
在一些实施例中,控制器250,通过存储在存储器上中各种软件控制程序,来控制显示设备的工作和响应用户的操作。控制器250控制显示设备200的整体操作。用户可在显示器260上显示的图形用户界面(GUI)输入用户命令,则用户输入接口通过图形用户界面(GUI)接收用户输入命令。或者,用户可通过输入特定的声音或手势进行输入用户命令,则用户输入接口通过传感器识别出声音或手势,来接收用户输入命令。
图4为本申请实施例提供的显示设备200中软件配置示意图,如图4所示,将系统分为四层,从上至下分别为应用程序(Applications)层(简称“应用层”),应用程序框架 (Application Framework)层(简称“框架层”),安卓运行时(Android runtime)和系统库层(简称“系统运行库层”),以及内核层。内核层至少包含以下驱动中的至少一种:音频驱动、显示驱动、蓝牙驱动、摄像头驱动、WIFI驱动、USB驱动、HDMI驱动、传感器驱动(如指纹传感器,温度传感器,压力传感器等)、以及电源驱动等。
随着显示设备的快速发展,人们不仅仅局限于利用控制装置对显示设备进行控制,而是想要更加便利地仅仅利用肢体行动或者语音来控制显示设备。用户可以利用手势交互的方式控制显示设备。显示设备能够采用的手势交互方式可以包括静态手势和动态手势。在使用静态手势的交互时,显示设备可以根据手势类型识别算法检测手势类型,根据手势类型执行相应的控制动作。
为了提高显示设备的智能化程度以及用户体验感,在本申请实施例中本申请提供了一种显示设备,图5为本申请实施例提供的一种显示设备示意图,如图5所示,该显示设备包括显示器260、图像输入接口501和控制器110,
其中,显示器260,被配置为显示图像;
图像输入接口501,被配置为获取用户行为图像;
控制器110,被配置为:
获取若干帧用户行为图像;对每一帧所述用户行为图像进行手势识别处理,获得目标手势信息;基于所述目标手势信息,控制所述显示器显示对应的内容。
为了提高显示设备的智能化程度并且提升用户的体验感,在本申请实施例中,控制器110可以通过图像输入接口501获取若干帧用户行为图像,该用户行为图像中可以只包括用户局部图像,例如,用户所做出的手势的手势图像,也可以包括采集到的用户全局图像,例如采集到的用户的全身图像。获取到的若干帧用户行为图像,可以是包含若干帧用户行为图像的视频,也可以是包含若干帧用户行为图像的图像集。
获取到若干帧用户行为图像之后,控制器110可以对每一帧用户行为图像进行手势识别处理,获得目标手势信息。在对用户行为图像进行手势识别处理时,可以基于图像识别技术识别用户行为图像中包含的手势,可以将识别到的每一帧用户行为图像中的手势合并,得到目标手势信息,也就是说,目标手势信息中包括识别到的每个手势。还可以将识别到的手势,根据预先设备设置的手势类型进行分类,将出现次数最多的手势类型确定为目标手势信息。
在确定了目标手势信息之后,控制器110可以控制显示器260显示对应的内容。
由于在本申请实施例中,控制器110获取若干帧用户行为图像,并根据获取到的该若干帧用户行为图像,确定目标手势信息,并基于该目标手势信息进行相应的控制,而不是基于获取到的一张用户行为图像,确定目标手势信息进行控制的,提高了显示设备基于手势识别进行显示控制的准确率,从而提高了显示设备的智能化程度,提升了用户的体验感。
显示设备是指能够输出具体显示画面的终端设备。随着显示设备的快速发展,显示设备的功能将越来越丰富,性能也越来越强大,可实现双向人机交互功能,集影音、娱乐、数据等多种功能于一体,用于满足用户多样化和个性化需求。
手势交互是一种新型的人机交互模式。手势交互的目的在于通过检测用户做出的特定手势动作,控制显示设备执行相对应的控制指令。显示设备能够采用的手势交互方式可以包括静态手势和动态手势。在使用静态手势的交互时,显示设备可以根据手势类型识别算法检测手势类型,根据手势类型执行相应的控制动作。在使用动态手势的交互时,用户可以操控显示器中的光标进行移动。显示设备可以建立出用户的手势和显示器中光标的映射关系,同时通过不断检测用户图像可以确定出用户的动态手势,进而确定出映射到显示器中的手势移动轨迹,从而控制光标沿着该手势移动轨迹进行移动。
对于动态手势的交互过程,显示设备需要不断检测用户图像。然而,有的图像中可能没有检测到用户的手势,导致无法准确获取到该用户图像对应的手势移动轨迹,从而无法控制光标进行移动,出现光标卡顿、中断的情况,给用户的体验性较差。
在使用动态手势的交互时,显示设备可以检测用户的动态手势,进而确定出映射到显示器中的手势移动轨迹,从而控制光标沿着该手势移动轨迹进行移动。
用户在使用动态手势控制光标移动时,显示设备需要不断检测用户图像。通过对每一帧用户图像进行识别,得到图像中的用户手势,进而确定出每一帧用户手势映射到显示器中的坐标,从而控制光标沿着这些坐标进行移动。然而,考虑到摄像头的拍摄存在误差、用户的手势不标准以及对手势进行识别时出现错误等众多因素,显示设备可能无法识别出部分用户图像的手势从而无法确定出相应的坐标,导致无法准确获取到对应的手势移动轨迹。正常情况下,光标需要按照每一帧图像对应的位置移动,形成连续的运动轨迹。如果缺少中间帧图像对应的位置,光标则不会移动,从而出现移动卡顿的情况,直到识别出下一帧图像对应的位置,光标才会继续移动,但如果位置相差太远,光标会出现突然跳跃等情况,严重影响用户的观看体验。
在一些实施例中,为了使得显示设备能够实现和用户进行手势交互的功能,显示设备还包括图像输入接口,用于连接图像采集器231。图像采集器231可以是摄像头,用于采集一些图像数据。需要说明的是,摄像头可以作为一种外部装置,通过图像输入接口外接在显示设备上,也可以作为一种检测器内置于显示设备中。对于外接在显示设备的摄像头,可以将摄像头连接至显示设备的外部装置接口,接入显示设备。用户可以利用摄像头在显示设备上完成拍照或拍摄功能,从而采集图像数据。
摄像头可以进一步包括镜头组件,镜头组件中设有感光元件和透镜。透镜可以通过多个镜片对光线的折射作用,使景物的图像的光能够照射在感光元件上。感光元件可以根据摄像头的规格选用基于CCD(Charge-coupled Device,电荷耦合器件)或CMOS(Complementary Metal Oxide Semiconductor,互补金属氧化物半导体)的检测原理,通过光感材料将光信号转化为电信号,并将转化后的电信号输出成图像数据。摄像头还可以按照设定的采样频率逐帧获取图像数据,以根据图像数据形成视频流数据。
在一些实施例中,显示设备内置的摄像头还可以支持升降。即摄像头可以设置在升降机构上,当需要进行图像采集时,通过特定的升降指令,控制升降机构进行运动,从而带动摄像头升高,以进行图像采集。而在不需要进行图像采集时,同样可以通过特定的升降指令,控制升降机构进行运动,从而带动摄像头降低,以隐藏摄像头。图6a为本申请实施例提供的显示设备内置摄像头的示意图。
对于外接于显示设备的图像采集器231,其本身可以是一个独立的外设,并通过特定的数据接口连接显示设备。例如,如图6b所示,图像采集器231可以为独立的摄像头设备,显示设备上可以设有通用串行总线接口(Universal Serial Bus,USB)或高清晰度多媒体接口(High Definition Multimedia Interface,HDMI),图像采集器231则通过USB接口或HDMI接口连接显示设备。为了便于对用户的手势交互动作进行检测,在一些实施例中,外接于显示设备上的图像采集器231可以设置在靠近显示设备的位置,如图像采集器231通过夹持装置夹在显示设备的顶部,或者图像采集器231放置在显示设备附近的桌面上。
显然,对于外接于显示设备的图像采集器231,还可以根据显示设备的具体硬件配置,支持其他方式连接。在一些实施例中,图像采集器231还可以通过显示设备的通信器与显示设备建立连接关系,并按照通信器对应的数据传输协议将采集的图像数据发送给显示设备。例如,显示设备可以通过局域网或互联网连接图像采集器231,则在建立网络连接后,图像采集器231可以将采集的数据通过网络传输协议发送给显示设备。
在一些实施例中,图像采集器231还可以通过无线网络连接的方式外接显示设备。例如,对于支持WiFi无线网络的显示设备,其通信器中设有WiFi模块,因此,可以通过将图像采集器231和显示设备连接同一个无线网络,使显示设备和图像采集器231建立无线连接。在图像采集器231采集到的图像数据后,可以先将图像数据发送给无线网络的路由器设备,在由路由器设备转发给显示设备。显然,图像采集器231还可以通过其他无线连接方式接入显示设备。其中,无线连接方式包括但不限于WiFi直连、蜂窝网络、模拟 微波、蓝牙、红外等。
在一些实施例中,用户控制显示设备开机后,显示设备可以显示用户界面。图7为本申请实施例提供的用户界面的示意图。用户界面包括第一导航栏700、第二导航栏710、功能栏720和内容显示区730,功能栏720包括多个功能控件如“观看记录”、“我的收藏”和“我的应用”等。其中,内容显示区730中显示的内容会随第一导航栏700和第二导航栏710中被选中控件的变化而变化。用户可以通过触控某个控件,以控制显示设备显示该控件对应的显示面板。需要说明的是,用户也可以通过其他方式来输入对控件的选中操作,例如,利用语音控制功能或者搜索功能等,选中某个控件。
无论是内置于显示设备的图像采集器231还是外接于显示设备的图像采集器231,用户均可以在使用显示设备的过程中,通过特定的交互指令或者应用程序控制启动图像采集器231采集图像数据,并根据不同的需要对采集的图像数据进行相应的处理。例如,显示设备中可以安装有摄像头应用,这些摄像头应用可以调用摄像头,以实现各自的相关功能。摄像头应用,是指需要访问摄像头的摄像头应用,可以对摄像头采集的图像数据进行处理,从而实现相关功能,例如视频聊天。用户可以通过触控“我的应用”控件,查看显示设备中已安装的所有应用。显示器中可以显示出应用列表。当用户选择打开某个摄像头应用时,显示设备可以运行相应的摄像头应用,该摄像头应用可以唤醒图像采集器231,图像采集器231进一步可以实时检测图像数据并发送给显示设备。显示设备可以进一步对这些图像数据进行处理,例如控制显示器显示图像等等。
在一些实施例中,显示设备可以和用户进行手势交互,从而识别出用户的控制指令。用户可以使用静态手势和显示设备进行交互,从而输入控制指令。具体的,在手势交互过程中,用户可以在图像采集器231的拍摄范围内摆出特定的手势,图像采集器231可以采集到用户的手势图像,并将采集到的手势图像发送给显示设备。显示设备进一步可以对手势图像进行识别,检测出该图像中的手势的类型。显示设备中可以预先存储有手势交互策略,限定出每种类型的手势分别对应那种控制指令,一个手势类型可以对应一种控制指令,显示设备可以根据用途不同,针对不同的用途设置用于触发特定控制指令的手势。通过将该图像中的手势的类型和交互策略中的对应关系逐次比对,可以确定出该手势对应的控制指令,并实施该控制指令。
例如,当用户在图像采集器231的拍摄范围内摆出五指并拢且手掌面向图像采集器231的手势时,显示设备可以在图像采集器231采集的手势图像中识别出该手势,并针对该手势确定控制指令为“暂停/开始播放”。最后通过运行该控制指令,对当前播放界面执行暂停或开始播放控制。需要说明的是,
上述实施例中,手势识别是采用静态手势识别方式,静态手势识可以识别出手势类型进而确定出相应的控制指令。用户每呈现出一个静态手势,代表用户输入了一个独立的控制指令,例如控制音量加一。需要说明的是,当用户长时间保持一个静态手势时,显示设备可能依旧判定为用户输入了一个控制指令。因此,对于一些需要连贯操作的控制指令,如果采用静态手势交互的方式,则太过繁琐。
例如,当用户想要控制显示器中的焦点选中某个控件时,可能会让焦点依次进行下、右、下的移动。此时,用户需要不断变换静态手势从而控制焦点进行移动,导致用户的体验性较差。或者,如果需要焦点连续向着一个方向多次移动时,用户需要连续做出静态手势。由于用户即使长时间保持一个静态手势,也会被判定为输入一个控制指令,因此用户在做出一个静态手势后需要放下手,然后再次做出静态手势,从而影响使用体验。
在一些实施例中,显示设备还可以支持动态手势交互。其中,所述动态手势是指在一次交互过程中,用户可以使用动态手势输入的方式,向显示设备输入控制指令。其中,可以设为:可以是通过一系列动态手势向显示设备输入一个控制指令,可以是通过不同类型的手势向显示设备依次输入多种类型的不同控制指令,也可以是通过相同类型的手势连续向显示设备输入一种类型的多个相同控制指令,从而扩展显示设备的手势交互类型,提高 手势交互形式的丰富程度。
例如,用户在2s时间内将手势从五指张开调整至五指并拢,即输入持续2s的抓取动作,则显示设备可以在2s的检测周期内持续获取手势图像,并逐帧识别手势图像中的手势类型,从而按照多帧图像中的手势变化识别出抓取动作。最后确定抓取动作对应的控制指令,即“全屏/窗口播放”,并执行该控制指令,对播放窗口的大小进行调节。
在一些实施例中,当显示设备中显示有用户界面时,用户可以控制显示器中的焦点选取某个控件并触发。如图7所示,当前焦点选中了“我的应用”控件。考虑到用户利用控制装置控制焦点的移动时,可能会比较繁琐,为了增加用户的体验性,用户还可以利用动态手势选取控件。
显示设备可以设置有光标控制模式。当显示设备处于光标控制模式下,显示器中的原本的焦点可以变更为光标,如图8所示,光标选中了“我的应用”控件。用户可以利用手势控制光标进行移动,从而选中某个控件,以代替原来的焦点移动。
在一些实施例中,用户可以通过操作遥控器的指定按键,向显示设备发送光标控制模式指令。在实际应用的过程中预先绑定光标控制模式指令与遥控器按键之间的对应关系。例如,在遥控器上设置一个光标控制模式按键,当用户触控该按键时,遥控器发送光标控制模式指令至控制器,此时控制器控制显示设备进入光标控制模式。当用户再次触控该按键时,控制器可以控制显示设备退出光标控制模式。
在一些实施例中,可以预先绑定光标控制模式指令与多个遥控器按键之间的对应关系,当用户触控与光标控制模式指令绑定的多个按键时,遥控器发出光标控制模式指令。
在一些实施例中,用户可以使用显示设备的声音采集器,例如麦克风,通过语音输入的方式,向显示设备发送光标控制模式指令,使得显示设备进入光标控制模式。
在一些实施例中,用户还可以通过预设的手势或动作向显示设备发送光标控制模式指令。显示设备可以通过图像采集器231实时检测用户的行为。当用户做出预设的手势或动作时,可以认为用户向显示设备发送了光标控制模式指令。
在一些实施例中,当用户使用智能设备控制显示设备时,例如使用手机时,也可以向显示设备发送光标控制模式指令。在实际应用的过程中可以在手机中设置一个控件,可以通过该控件选择是否进入光标控制模式,从而发送光标控制模式指令至显示设备。
在一些实施例中,可以在显示设备的UI界面中设置光标控制模式选项,当用户点击该选项时,可以控制显示设备进入或退出光标控制模式。
在一些实施例中,为防止用户误触发光标控制模式,当控制器接收到光标控制模式指令时,可以控制显示器显示光标控制模式确认信息,从而使得用户进行二次确认,是否要控制显示设备进入光标控制模式。图9为本申请实施例提供的显示器中显示光标控制模式确认信息的示意图。
当显示设备进入光标控制模式后,用户可以利用手势控制光标进行移动,从而选中想要触发的控件。
图10为本申请实施例提供的显示设备各部件的交互流程图,包括以下步骤:
S1001:获取用户行为图像。
在一些实施例中,当检测到显示设备进入光标控制模式时,控制器可以唤醒图像采集器231,向图像采集器231发送开启指令,从而启动图像采集器231进行图像拍摄。此时,用户可以在图像采集器231的拍摄范围内做出动态手势,图像采集器231可以随着用户的动态手势动作,连续拍摄多帧用户图像,本申请实施例中利用用户行为图像指代图像采集器231采集到的用户图像。
具体的,图像采集器231可以按照预设的帧率拍摄用户行为图像,例如每秒拍摄30帧(30FPS)用户行为图像。同时,图像采集器231还可以实时将拍摄得到的每一帧用户行为图像发送至显示设备。需要说明的是,由于图像采集器231将拍摄的用户行为图像实时发送至显示设备,因此显示设备获取到用户行为图像的速率可以和图像采集器231的拍摄 帧率相同。
例如,当图像采集器231以每秒30帧的帧率进行图像拍摄时,控制器也可以按照每秒30帧的帧率获取到用户行为图像。
在一些实施例中,图像采集器231采集到若干帧用户行为图像,可以依次发送给显示设备。显示设备可以对每一帧用户行为图像逐次进行识别,从而识别出用户行为图像中所包含的用户手势,以确定用户输入的控制指令。
S1002:对于采集到的用户行为图像,控制器对用户行为图像进行手势识别处理,例如可以使用预设的动态手势识别模型对每一帧用户行为图像逐次进行处理。
控制器可以将用户行为图像输入到动态手势识别模型中,动态手势识别模型进一步可以识别图像中所包含的用户手势,例如,可以识别出用户行为图像中所包含的手指、关节、手腕等关键点的位置信息,关键点位置指的是关键点在用户行为图像中的位置坐标。在识别之后,可以依次输出每一帧用户行为图像的目标手势信息。
S1003:根据用户手势信息获取光标位置。
S1004:根据光标位置确定手势移动轨迹。
S1005:控制器控制光标移动,使显示器显示光标沿着手势移动轨迹移动。
图11为本申请实施例提供的用户手势的示意图。可以设定为:用于表征用户手势的关键点包括21个手指关键点。动态手势识别模型可以对用户行为图像中的用户手势进行确认,并识别出用户手部这21个手指关键点的位置信息,即位于用户行为图像中的位置坐标,每个关键点的位置信息都可以通过对应点的坐标进行表示。
需要说明的是,动态手势识别模型在识别用户行为图像时,可能识别出用户手势,获取到了每个手指关键点的位置信息。此时,输出的目标手势信息中可以包括所有手指关键点的位置信息。但受到用户不同手势的影响,有的手指关键点可能被用户掩盖住,导致用户行为图像中并未出现这些手指关键点,此时,动态手势识别模型则无法获取到这些手指关键点的位置信息,这些手指关键点的位置信息只能是空值。即,在目标手势信息中,包括动态手势识别模型识别到的手指关键点的位置信息,没有识别到的手指关键点的位置信息则为空值。
在一些实施例中,动态手势识别模型得到每一帧的目标手势信息后,可以输出至控制器。控制器进一步可以根据每一帧的目标手势信息,确定出用户指示的控制指令。由于用户想要控制光标进行移动,因此用户指示的控制指令可以认为是用户指示光标需要移动的位置指令。此时,控制器可以根据每一帧目标手势信息获取每一帧的光标位置。
在一些实施例中,考虑到显示设备的计算能力可能较弱,如果显示设备当前处在实现一些其他的功能,例如远场语音、4K视频播放等,显示设备会处于一个较高负载的状态。此时,如果向动态手势识别模型中输入的用户行为图像的帧率较高时,实时数据处理量过大,模型处理用户行为图像时的速率便可能较慢,从而使得获取光标位置的速率较慢,导致显示器中光标移动时会较为卡顿。
因此,控制器可以先检测显示设备当前的负载率的情况。当负载率高于预设阈值,例如高于60%时,控制器可以令动态手势识别模型以固定周期等间隔处理每一帧用户行为图像。例如,可以设定固定周期为一秒处理15帧图像。使得动态手势识别模型可以稳定处理图像。当检测到显示设备的负载率没有高于预设阈值时,则可以令动态手势识别模型实时处理每一帧用户行为图像。此时,控制器可以实时将图像采集器231发送过来的用户行为图像输入到动态手势识别模型中,并控制模型进行识别。也可以令动态手势识别模型以固定周期等间隔处理。
需要说明的是,动态手势识别模型输出目标手势信息的速率和处理用户行为图像的速率可以是相同的。当动态手势识别模型以固定周期等间隔处理图像时,其会以固定周期等间隔地输出目标手势信息。当模型实时处理图像时,其也会实时输出目标手势信息。
在一些实施例中,为了使显示器中显示的光标能根据用户的动态手势生成实时的运动 轨迹,使得光标流畅的跟随动态手势运动,控制器可以根据用户利用手势所指示的信息,确定出每一帧的光标位置。
考虑到用户利用手势控制光标时,在一段时间内连续拍摄到的用户运动的手势图像中,有些帧拍到的图像可能较为模糊或者出现手势被遮挡的情况,此时动态手势识别模型无法识别出结果,未能得到目标手势的相关信息,例如目标手势信息为空值。此时,则无法根据目标手势信息获取用户所指示的信息,即无法获取到光标位置,因此显示设备可以预测该帧图像对应的光标位置,避免由于缺少光标位置使得光标不移动,而导致光标出现卡顿、轨迹中断、跟随用户手势时丢失的情况。
显示设备可以根据动态手势识别模型获取到的目标手势信息,例如图11所示的手指关键点的位置信息,确定是否能够获取到用户指示的信息。当动态手势识别模型的结果为空,即目标手势信息为空值时,可以进行光标位置预测。
在本申请实施例中,可以设定为:当检测到预先设定的目标手势时,认为用户指示了光标移动的位置信息。其中,目标手势可以是用户展示预设的手指关键点。对于如图11所示的用户手势示意图,可以设定为9号关键点为用户指示光标进行移动的控制点,即当检测到预设的手指关键点的位置信息时,确定用户指示了光标的移动。显示设备可以根据该预设的手指关键点的位置信息进一步确定出光标移动的位置信息。
因此,当在目标手势信息中检测到了该预设的手指关键点的位置信息,则可以获取到光标移动的位置信息。本申请实施例中使用虚拟位置信息指代预设的手指关键点的位置信息,即目标手势在用户行为图像中的位置信息。
在一些实施例中,显示设备可以检测每一帧目标手势信息中是否包括虚拟位置信息。如果某一帧目标手势信息中包括虚拟位置信息,即识别出了预设的手指关键点的位置信息,则认为该帧用户行为图像中检测到了目标手势,即用户具体指示了光标如何进行移动。此时,显示设备可以根据虚拟位置信息确定出光标需要移动的位置信息。
如果某一帧目标手势信息中不包括虚拟位置信息,即预设的手指关键点的位置信息为空值,则认为该帧用户行为图像中没有检测到目标手势,此时用户没有具体指示出光标应该如何进行移动,显示设备需要自行预测补充光标需要移动的位置信息。
下面结合一个具体的实施例进行说明,图12为本申请实施例提供的根据目标手势信息确定光标位置的流程示意图,包括以下步骤:
S1201:判断目标用户手势信息是否包括虚拟位置信息;若是,则执行S1202,否则,执行1204。
S1202:根据虚拟位置信息获取初始光标位置。
S1203:对初始光标位置进行调节。
S1204:预测光标位置。
在一些实施例中,对于某一帧的用户行为图像,对于其中是否检测到了目标手势的两种情况,控制器均可以分别获取到光标需要移动的位置信息。
如果检测到了目标手势,即该帧目标手势信息中包含虚拟位置信息,此时,可以根据该虚拟位置信息获取到光标需要移动的位置信息,即用户行为图像所对应的光标位置。
具体的,虚拟位置信息表征的为用户行为图像中识别到的,预设的手指关键点的位置信息,用于表示用户的目标手势的位置信息。但该位置信息为手指关键点位于用户行为图像中的位置,因此,显示设备可以将用户的目标手势映射到显示器中,从而得到光标的位置。需要说明的是,在将用户的目标手势映射到显示器中时,可以根据光标的初始位置进行参照,当首次检测到用户的目标手势时,将该帧图像中手指关键点的位置确定为光标初始的位置,形成一个映射关系。在后续的映射中,可以按照预设的映射方法,将后续用户的目标手势依次映射到显示器中,从而得到各帧图像所对应的光标位置。
在一些实施例中,在获取到光标的位置信息后,考虑到用户的手势运动是立体的,在空中运动的,运动方向不仅有上下左右,还有前后,在光标的映射过程中,如果手势频繁 运动,手势状态不稳定,那么光标会出现抖动等问题,为了使光标运动更平滑,用户体验更好,显示设备还可以对光标位置进行调节优化,使得光标能够动态防抖,移动轨迹平滑、平稳。
显示设备可以根据虚拟位置信息,将目标用户行为图像中的目标手势映射到显示器中,得到原始光标位置F c。本申请实施例中原始光标位置指的是:动态手势识别模型识别的坐标直接映射到显示器中的坐标。通过对原始光标位置进行调节优化,可以得到目标光标位置,本申请实施例中目标光标位置指的是:经过调节优化后,光标真正在显示器中显示的坐标位置。
具体的,显示设备对原始光标位置可以按照以下方法进行调节:
显示设备可以根据目标用户行为图像的上一帧用户行为图像对应的光标位置F p以及预设的调节阈值获取第一位置数值,同时可以根据原始光标位置和预设的调节阈值获取第二位置数值。根据第一位置数值和第二位置数值可以获取目标用户行为图像对应的目标光标位置F c1。可以用公式1表示:
F c1=E 1*F p+(1-E 1)*F c         (1)
其中:
F c1表示调节后的目标光标位置;
E 1表示预设的调节阈值;
F c表示调节前的原始光标位置,F p表示上一帧用户行为图像对应的光标位置。
通过预设的调节阈值,可以根据上一帧图像对应的光标位置对原始光标位置进行调节,从而减小该帧目标手势可能出现的抖动偏移,以优化光标的移动。
其中,调节阈值可以根据以下方法预先设定:
Figure PCTCN2022109185-appb-000001
其中:
E 1表示预设的调节阈值。
k表示第一调节参数;g表示第二调节参数;第一调节参数和第二调节参数均为0-1之间的数,可以由相关技术人员自行进行设定。
S g表示目标用户行为图像的尺寸。用户行为图像的尺寸指的是用户行为图像相对于显示器的尺寸。
具体的,显示设备可以将拍摄到的用户行为图像展示在显示器中,使得用户能够直观的确定当前的手势情况。图13为本申请实施例提供的显示器显示摄像头区域的示意图。其中,摄像头区域中显示摄像头拍摄的画面情况,整个摄像头区域的尺寸可以由显示设备进行设置。用户可以选择开启或者关闭摄像头区域,但在摄像头区域关闭时,其尺寸大小和开启时设定为相同。
S c表示目标用户行为图像的前一帧用户行为图像对应的光标位置处的控件的尺寸。对于每一次光标移动后,均可认为光标选中了某个控件。因此,可以根据上一帧光标选中的控件设定调节阈值。
S tv表示显示器的尺寸。
在对原始光标位置进行调节后,可以确定出目标用户行为图像对应的目标光标位置,即光标需要移动到的位置。
在一些实施例中,在目标用户行为图像中,如果没有检测到用户的目标手势,即该帧目标手势信息中不包含虚拟位置信息,此时,显示设备可以预测目标用户行为图像对应的光标位置,从而令光标能够正常移动。
具体的,为了更好地预测出光标的位置,显示设备可以先确定出光标移动的类型。需要说明的是,光标移动的类型可以分为两类:直线运动和曲线运动。当光标沿着直线进行移动时,表示用户的手势动作也是沿着直线进行移动的,相对来说会比较稳定,拍摄图像 时一般不会出现丢帧现象。但当光标沿着曲线进行移动时,表示用户的手势动作也是沿着曲线进行移动的,此时,相比于直线来说,稳定性较差,导致丢帧率会略高。因此,可以预先设定一个用于检测丢帧的阈值,来判断光标时直线运动还是曲线运动。
显示设备可以检测目标用户行为图像之前的若干帧图像中,可以是预设的检测数量的用户行为图像中,例如20帧图像内,出现丢帧情况,即没有检测到用户的目标手势的用户行为图像的数量,是否超过了预设的检测阈值,可以将检测阈值设定为0。
因此,可以检测前20帧图像中,出现丢帧情况的图像的数量是否大于0,也即检测前20帧图像中是否存在图像出现丢帧情况。如果没有出现过丢帧情况,则认为光标正在做直线运动,本申请实施例中设定为第一类运动;如果出现过丢帧情况,则认为光标正在做曲线运动,本申请实施例中设定为第二类运动。
在一些实施例中,当检测到光标正在做直线运动时,显示设备可以对目标用户行为图像进行第一处理,从而预测得到目标光标位置。
图14为本申请实施例提供的光标沿直线运动的示意图。其中,光标的初始位置为A1,已经获取到的光标位置依次为A2、A3和A4。光标沿着直线运动,A5为预测得到的本帧目标光标位置。
具体的,控制器可以根据目标用户行为图像的前两帧用户行为图像对应的光标位置获取历史光标位置偏移量,用于表征上次光标的移动情况。
控制器可以根据历史光标位置偏移量和第一时间获取光标移动速度。其中,第一时间指的是:预设的动态手势识别模型处理该目标用户行为图像的前两帧用户行为图像所间隔的时间。一般来说,动态手势识别模型处理一帧图像所消耗的时间是固定的,因此,第一时间也可以认为是:动态手势识别模型输出前两帧用户行为图像对应的目标手势信息,所间隔的时间。
需要说明的是,当动态手势识别模型以固定周期等间隔处理图像时,第一时间是一个固定值不变,不需要每次都获取。当动态手势识别模型实时处理图像时,则需要实时获取模型输出前两帧图像的识别结果相差的时间。
控制器可以根据光标移动速度、第二时间和预设的第一预测阈值获取光标的目标光标位置偏移量。其中,第二时间为:预设的动态手势识别模型处理目标用户行为图像和前一帧用户行为图像所间隔的时间,也即模型输出前一帧图像的识别结果的时刻,到模型输出本帧图像的识别结果的时刻,所间隔的时间。控制器可以预测出本次光标的移动情况。
最后,控制器可以对前一帧用户行为图像对应的坐标位置和目标光标位置偏移量求和,通过在前一帧光标的位置处进行本次的偏移移动,可以得到目标光标位置。
预测方法可以用公式3、4表示:
F 0=v*Δt 0*E 2+F 0-1           (3)
v=(F 0-1-F 0-2)/Δt            (4)
其中:
F 0表示目标光标位置;v表示光标本次移动的速度,表示Δt 0表示第二时间;
S f表示预设的第一预测阈值;
F 0-1表示前一帧用户行为图像对应的坐标位置;
F 0-2表示前第二帧用户行为图像对应的坐标位置;Δt表示第一时间。
其中,第一预测阈值可以根据以下方法预先设定:
Figure PCTCN2022109185-appb-000002
其中:
E 2表示第一预测阈值,可以是值为0.6。a1表示第一预测参数;a2表示第二预测参数。第一预测参数和第二预测参数均为0-1之间的数,可以由相关技术人员自行进行设定。
D f表示预设时间内预设的动态手势识别模型对用户行为图像的处理速率。
C f表示预设时间内所述图像采集器231采集用户行为图像的速率。
P f表示预设时间内光标移动的帧率。其中,光标移动的帧率指的是光标移动次数的频率,也可以认为是单位时间内光标移动了多少次,光标从一个光标位置移动到下一个光标位置为移动一次。
具体的,预设时间可以是1s。因此可以获取目标用户行为图像前一秒内,模型处理图像的速率、图像采集器231拍摄图像的速率以及光标移动的帧率。进而,可以设定出第一预测阈值。
根据上述公式,可以预测直线运动下光标的位置坐标。
在一些实施例中,当检测到光标正在做曲线运动时,显示设备可以对目标用户行为图像进行第二处理,从而预测得到目标光标位置。
图15为本申请实施例提供的光标沿曲线运动的示意图。其中,光标的初始位置为A1,已经获取到的光标位置依次为A2-A9。光标位置A4对应图像出现了第一次丢帧现象,由于是第一次丢帧,认定光标当前运动(A1到A4之间的运动)为直线运动情况。A5、A6位置为根据用户的目标手势映射得到的坐标。光标位置A7对应图像出现了第二次丢帧现象,因此认为光标当前运动(A5到A7之间的运动)沿着曲线运动,并根据预测得到光标位置A7。A8、A9位置为根据用户的目标手势映射得到的坐标。此时,目标用户行为图像出现丢帧现象,为整体(预设的检测数量)第三次丢帧,此时,认为光标沿着曲线运动(A8到A10之间的运动),可以预测得到光标位置A10。
对于目标用户行为图像出现了第二次丢帧,因此认为光标沿着曲线运动,目标用户行为图像经预测得到的光标位置可以为A8。
需要说明的是,在进行曲线运动时,预测光标位置的方法和直线运动类似。均可以先获取上次光标的移动情况,即历史光标位置偏移量。
再根据历史光标位置偏移量和第一时间获取光标移动速度。并根据光标移动速度、第二时间和预设的第二预测阈值获取光标的目标光标位置偏移量。
最后,控制器可以对前一帧用户行为图像对应的坐标位置和目标光标位置偏移量求差,通过在前一帧光标的位置处进行本次的偏移移动,可以得到目标光标位置。
具体的预测方法可以用公式6、7表示:
F 0=F 0-1-v*Δt 0*E 3            (6)
v=(F 0-1-F 0-2)/Δt            (7)
其中:
S b表示第二预测阈值,可以是值为0.3。
具体的,第二预测阈值可以根据以下方法预先设定:
E 3=b*E 2             (8)
其中,b表示第三预测参数。第三预测参数为0-1之间的数,可以由相关技术人员自行进行设定,可以是0.5。
根据上述公式,可以预测曲线运动下光标的位置坐标。
在一些实施例中,考虑到连续多帧用户行为图像可能会出现丢帧情况,可以设定一个连续丢帧的预设阈值,可以是4。在这个阈值内,如果用户行为图像持续出现丢帧情况,显示设备可以持续预测光标的位置。
具体的,在对本帧的目标用户行为图像进行手势识别之前,可先检测本帧图像之前预设阈值的用户行为图像中,可以是4帧用户行为图像中,这些图像是否全部都没有检测到目标手势,也即目标用户行为图像的前4帧图像是否全部丢帧。
如果是,则可以认为用户不再利用手势指示光标位置,此时用户可能已经放下手,已经确定了光标应该选中的控件。此时,可以控制光标不进行移动,认为本轮用户手势运动结束。直到摄像头再次拍摄到用户手势时,可以进行下一轮手势识别。
如果不是,则认为用户还在利用手势指示光标位置,只是前几帧因为一些情况全部丢 帧。此时,控制器可以继续对目标用户行为图像进行手势识别,并确定本帧图像对应的光标位置。
在一些实施例中,对于预测光标位置的情况,只会出现在光标已经开始移动之后的过程中,也即光标的第一个位置不会是预测得到的,只会是根据用户指示得到的,具体的,当显示设备进入光标控制模式后,可以设定为:当首次检测到用户的目标手势后,允许光标开始移动,以避免首帧图像出现丢帧的情况。
在一些实施例中,在确定了目标用户行为图像对应的目标光标位置后,可以根据光标位置确定用户的手势移动轨迹。考虑到每两帧光标位置之间的距离会比较短,因此可以认为两帧光标位置之间光标进行直线运动。可以使目标光标位置从上一帧的光标位置沿直线到达目标光标位置。即将目标光标位置和上一帧的光标位置相连接,得到手势移动轨迹。
控制器可以再令光标沿着手势移动轨迹进行移动。
在一些实施例中,当光标沿着手势移动轨迹进行移动后,用户可能不再控制光标进行移动。此时,光标可能会位于某个控件的区域范围内,也可能位于某个控件的边缘处。当光标位于某个控件的区域内,可以确定用户选择了该控件,显示设备可以让用户确认是否触发该控件。然而如果光标位于控件的边缘及以外的区域,导致未能选中某个控件,显示设备便无法令用户确认触发控件。
因此,当光标没有明确落入到某个控件的区域内时,需要确定出光标停止不动时对应的控件,即确定用户最终选择的控件。
具体的,可以根据光标的位置确定出预设寸尺的位置信息。例如,预设尺寸可以是500*500。对于光标位置(a,b),可以以该坐标为中心,确定出尺寸为500*500的一块区域。
控制器可以确定出该区域内所有的控件,并获取所有的控件到光标的距离。控件到光标的距离设定为:控件的四条边的中点到光标的平均距离。如图16所示,光标的位置为点O。对于一个控件A,其四条边的中点依次为B1、B2、B3、B4。四个中点到光标的距离依次为X1、X2、X3、X4。因此,控件到光标的距离为:(X1+X2+X3+X4)/4。
在一些实施例中,考虑到当控件尺寸较小时,其四边中点到光标距离可能会较短,从而影响判断结果。因此,还可以按照下述方法确定每个控件到光标的距离。
具体的,本申请实施例中设定为光标和控件有两种位置关系。一种是光标和控件位于同一水平方向或同一竖直方向,一种是光标和控件既不位于同一水平方向也不位于同一竖直方向。
图17为本申请实施例提供的光标和控件的位置关系的示意图。
光标位置为(a,b)。对于一个控件来说,设定其尺寸为宽w、高h。四个顶点的坐标依次为:(x-w,y-h)、(x+w,y-h)、(x+w,y+h)、(x-w,y+h)。控件的两条竖直边对应的竖直直线分别为L1和L2,两条水平边对应的水平直线分别为L3和L4。本申请实施例中设定,如果光标位于竖直直线之间的区域内,则认为光标和控件位于同一竖直方向;如果光标位于水平直线之间的区域内,则认为光标和控件位于同一水平方向。如果光标没有位于这两个区域内,则认为光标和控件既不位于同一水平方向也不位于同一竖直方向。如图17中,光标O1和控件A位于同一竖直方向,光标O2和控件A位于同一水平方向,光标O3和控件A既不位于同一水平方向也不位于同一竖直方向。
具体的,对区域内所有的控件来说,可以判断光标位置和控件位置的关系。
如果x<a<x+w,且y<b<y+h。说明光标位于该控件区域内,此时无需考虑其他控件,可以确定该控件为用户选择的控件。
如果满足x<a<x+w,但不满足y<b<y+h,则光标和控件位于同一竖直方向。
如果不满足x<a<x+w,但满足y<b<y+h,则光标和控件位于同一水平方向。
如果不满足x<a<x+w,也不满足y<b<y+h,则光标和控件既不位于同一水平方向也不位于同一竖直方向。
如果光标和控件位于同一竖直方向或同一水平方向,可以按照下述方法计算光标和控件之间的距离。
分别获取控件A四条边到光标O的距离:T1、T2、T3、T4。并将四个距离中数值最小的结果作为光标和控件之间的距离,距离为:MIN(T1、T2、T3、T4)。
如果光标和控件既不位于同一水平方向也不位于同一竖直方向,可以按照下述方法计算光标和控件之间的距离。
分别获取控件A四个顶点到光标O的距离:P1、P2、P3、P4。并将四个距离中数值最小的结果作为光标和控件之间的距离,距离为:MIN(P1、P2、P3、P4)。
控制器可以将距离最短的控件设定为光标选中的控件。
当用户触控确认键时,显示设备可以触发光标选中的控件。
随着人工智能(Artificial Intelligence,AI)技术的发展,越来越多的手势交互方式可被应用于显示设备的交互过程中。手势交互的目的在于通过检测用户做出的特定手势动作,控制显示设备执行相对应的控制指令。例如,用户可以通过向左或向右挥手的动作,代替遥控器等控制装置上的左右方向键,控制显示设备进行快退或快进播放操作。
通常,显示设备所支持手势交互方式基于静态手势,即用户在做出特定手势动作时,手型是保持不变的。例如,在进行向左或向右挥手的动作时,用户需要保持五指并拢,且手掌平行移动进行挥摆动作。在进行交互时,显示设备可以先根据手势类型识别算法检测静态手势,再根据手势类型执行相应的控制动作。
可见,这种基于静态手势的交互方式所支持的手势数量较少,只适用于简单的交互场景。为了增加支持的手势数量,部分显示设备还支持动态手势交互,即通过一个时间段内的连续动作,实现特定的手势交互。但是,由于动态手势检测过程中所使用的模型限制,使得上述动态手势交互过程不支持用户自定义手势,无法满足用户的需求。
在一些实施例中,动态手势识别可以采用深度学习等训练方法进行模型训练获得动态手势识别模型,再将多个连续帧手势图像数据输入训练获得的动态手势识别模型,经过模型内部的分类算法计算得到当前多帧手势图像对应的目标手势信息。目标手势信息通常可以关联一个特定的控制指令,显示设备200可以通过执行该控制指令,实现动态手势交互。
例如,可以基于手势图像数据生成训练数据,训练数据中每一帧用户行为图像都被设置有分类标签,即表示当前帧用户行为图像对应的手势类型。同时,多个连续帧用户行为图像还被统一设置动态手势标签,即表示多帧用户行为图像对应的动态手势。在生成训练数据后,可以将包含多个连续帧手势图像的训练数据输入初始动态手势识别模型,以获得识别模型输出的分类概率。再将模型输出的分类概率与训练数据中的分类标签进行损失函数运算,计算分类损失。最后根据计算获得的分类损失反向传播调整识别模型中的模型参数。重复上述“分类计算-损失计算-反向传播”的模型训练过程,通过大量训练数据即可获得能够输出准确分类概率的识别模型。利用训练获得的识别模型,显示设备200可以将实时检测的多个连续帧用户行为图像输入该识别模型,从而获得识别模型输出的分类结果,确定多个连续帧用户行为图像对应的动态手势,再匹配动态手势对应的控制指令,实现动态手势交互。
在一些实施例中,动态手势交互还可以支持用户的自定义操作,即提供一种显示设备控制方法,所述方法可应用于显示设备200。为了满足用户与显示设备的手势交互,显示设备200应至少包括显示器260和控制器250。并内置或外接至少一个图像采集器231。其中,显示器260用于显示用户界面,辅助用户的交互操作;图像采集器231用于采集用户输入的用户行为图像。图18为本申请实施例提供的动态手势交互流程示意图,如图18所示,控制器250则被配置为执行所述显示设备控制方法对应的应用程序,包括如下内容:
获取手势信息流。其中,所述手势信息流是由图像采集器231通过连续的图像拍摄而生成的视频数据,因此所述手势信息流包括连续多帧用户行为图像。显示设备200在启动 手势交互后,可以向图像采集器231发送开启指令,启动图像采集器231进行图像拍摄。在启动图像拍摄后,用户可以在图像采集器231的拍摄范围内做出动态手势,则图像采集器231可以随着用户的动态手势动作,连续拍摄多帧用户行为图像。并实时将拍摄获得的多帧用户行为图像发送给控制器250形成手势信息流。
由于手势信息流中包括多帧用户行为图像,而用户行为图像是由图像采集器231进行拍摄获得,因此手势信息流中所包含的用户行为图像帧率可以和图像采集器231的图像拍摄帧率相同。例如,当图像采集器231以每秒30帧(30FPS)的帧率进行图像拍摄时,控制器250也可以按照每秒30帧的帧率获取的手势信息流。
但是在一些计算能力较弱显示设备200,过高的帧率将导致控制器250的实时数据处理量过大,影响手势识别的响应速度。因此,在一些实施例中,显示设备200还可以获得较低帧率的手势信息流。为了降低手势信息流的帧率,显示设备200可以在图像采集器231拍摄获得的图像中,等间隔地提取多帧用户行为图像。例如,显示设备200可以在图像采集器231拍摄获得的手势图像中,每间隔一帧提取一帧用户行为图像,从而获得帧率为15的手势信息流。显示设备200还可以向图像采集器231发送用于帧率调节的控制指令,控制图像采集器231每秒只拍摄15帧手势图像数据,从而形成帧率为15的手势信息流。
需要说明的是,由于动态手势的输入过程会受到不同用户动作输入速度的影响,即部分用户的手势输入动作较快,部分用户的手势输入动作较慢。显然,对于动作较慢时输入的手势,相邻帧之间的手势差异较小,则低帧率的手势信息流也能够表征完整的手势输入过程。而对于动作较快时输入的手势,相邻帧之间的手势差异较大,则低帧率的手势信息流有可能丢失部分关键手势,影响手势识别的准确率。因此,为了提高手势识别的准确率,显示设备200应尽可能保持较高的帧率获取用户行为图像,例如,用户行为图像可以是用户的手势交互图像,手势信息流的帧率可维持在15-30FPS区间内。
并且,在一些实施例中,显示设备200还可以根据当前运行负荷,在特定的区间内动态调整手势信息流的帧率,以实现在运算能力充足时,通过获取高帧率手势信息流提高手势识别的准确率;而在运算能力不足时,通过获取低帧率手势信息流减少对控制器250运算能力的过度消耗。
在获取手势信息流后,显示设备200可以对手势信息流中的每帧用户行为图像进行手势识别处理,以便从手势信息流中提取出关键手势信息。其中,手势识别处理可以基于图像识别算法,在用户行为图像中识别手指、关节、手腕等关键点的位置。即关键点坐标用于表征手关节在用户行为图像中的成像位置。
例如,显示设备200可以通过特征形状匹配的方式,在用户行为图像中识别各关键点在当前用户行为图像中的位置坐标。再将各关键点坐标按照设定的顺序组成信息向量。即如图11所示,用于表征手势动作的关键点可以包括21个手指关键点,每个关键点的位置信息都可以通过对应点的坐标进行表示。如对于指尖关键点,拇指指尖坐标为P T1=(x t1,y t1),食指指尖坐标为P T2=(x t2,y t2),中指指尖坐标为P t3=(x t3,y t3)……;同理,对于指中关键点,也同样采用上述坐标表示方式,即拇指指中坐标为:
P M1=(x m1,y m1)……;而指根关键点为P B1=(x b1,y b1)。
上述指尖、指中以及指根坐标可以组合形成用于表示指尖信息、指中信息以及指根信息的向量,即指尖信息F T为:
F T=[P T1,P T2,P T3,P T4,P T5]
指中信息F M为:
F M=[P M1,P M2,P M3,P M4,P M5]
指根信息F B为:
F B=[P B1,P B2,P B3,P B4,P B5]
除上述指尖F T、指中F M、指根F B坐标信息外,显示设备200还可以在用户行为图像中提取掌心坐标P Palm和手腕坐标P Wrist。再将这些坐标信息组合形成手势关键坐标集H Info。即手势关键坐标集H Info为:
H Info=[P Palm,P Wrist,F T,F M,F B]
可见,上述手势关键坐标集为多个关键点坐标组合成的坐标集。因此基于对上述手势关键坐标集中关键点位置的相互关系,显示设备200可以从根据手势关键坐标集确定关键手势类型。为了确定关键手势类型,在一些实施中,显示设备200可以从手势信息流中提取关键手势信息时,先识别用户行为图像中的关键点坐标,再从数据库中提取预设的关键点标准坐标。其中,关键点标准坐标为显示设备200的运营商通过对人群手势进行统计分析所确定的模板坐标集,每种手势可以设有对应的关键点标准坐标。
在提取关键点坐标和关键点标准坐标后,显示设备200可以计算关键点坐标与关键点标准坐标的差值。如果计算获得的差值小于或等于预设识别阈值,即确定当前用户行为图像中的用户手势与标准手势模板中的手势类型相似,因此可以确定关键点标准坐标对应的手势类型为目标手势类型。
例如,用户对图像采集器231摆出五指并拢手势,则通过对该手势对应的一帧用户行为图像进行识别,可以获得手势关键坐标集H Info1,再从数据库中匹配五指并拢手势相近的标准手势,以提取关键点标准坐标H’。通过计算两个坐标集之间的差值,即H=H Info-H’,如果差值小于或等于预设识别阈值H”,即H≤H”,则匹配命中该目标坐标集,因此可以确定该当前用户行为图像中的目标手势类型为五指并拢手势。
在一些实施例中,所述关键手势信息还可以包括置信度参数,用于表征各手势类型与标准手势之间的差异。此时,关键手势信息还可以包括以下能够表示关键手势类型的参数项,即手势姿态信息包括但不限于:手面向信息H F(Hand Face),手朝向信息H O(HandOrientation),手朝向偏角信息H OB,左右手信息H S(Hand Side),手势伸缩状态信息H T(Handstretched)等。其中,每个参数项均可以通过上述手势关键坐标集计算获得。
其中,手朝向信息,可用于表示画面中手指指尖的朝向,即如图19所示,指尖朝上为Up,朝下为Down,朝左为Left,朝右为Right,朝前(中)为Center,默认为Unknown,因此,手朝向信息可以表示为:
H O={Up,Down,Left,Right,Center,Unknown}
同理,在识别手朝向信息的同时,还可以根据具体关键点坐标之间的位置关系,确定手朝向偏角信息,等同于手朝向信息的置信度。例如,手朝向虽然检测为Left,但是依然会有偏角,可能不是完全朝向左方,这时就需要根据偏角信息进行一些后续处理,也可以防止误触发。即手朝向偏角可以表示为:
H Ob=a(0<a<90)
显示设备200可以优先提取手朝向信息,即根据左右手和食指关键点信息生成手朝向信息,显示设备200可以使用食指指根信息P B2、小拇指指根信息P B5、手腕信息P Wrist,左右手信息H s生成,手朝向偏角信息H OB,手横向纵向信息H XY,手姿态偏角信息H XB,H YB,最终得到手朝向信息H O。即:
H O=g(H OB,H XY,H XB,H YB)=f(P B2,P B5,P Wrist,H S,α)
生成逻辑如下,计算食指指根P B2和小拇指指根P B5所在向量与x轴方向的偏角f(ΔX,ΔY),该偏角的取值范围为(0°,90°)。根据偏角可得到手朝向信息,再通过设置偏角阈值,用于判断朝向信息是否有效。例如,可以设定偏角阈值β为5,即45±5范围内认为朝向信息无效,手横向纵向信息H XY,即生成公式如下:
Figure PCTCN2022109185-appb-000003
式中,ΔX为食指指根和小拇指指根的水平坐标差;ΔY为食指指根和小拇指指根的竖直坐标差;f(ΔX,ΔY)为偏角;β为偏角阈值。
再计算食指指根和小拇指指根的中间点P M,以及计算食指到小拇指之间的四个手指指根连线的中点,然后计算P M和手腕坐标P Wrist的差值ΔY和食指指根和小拇指指根的差值ΔX,进而可得到手朝向俯仰角度信息:
Figure PCTCN2022109185-appb-000004
式中,H YB为手朝向俯仰角度;ΔX为食指指根和小拇指指根的水平坐标差;ΔY为食指指根和小拇指指根的竖直坐标差。
若俯仰角度过大,则认为是手朝向为Center,具体阈值为α。由于Center朝向的姿态判定误差较大,不能作为动作的判定标准,因此在一些精细度要求不高的场景下,可以直接等同于Unknown。即判断公式如下:
Figure PCTCN2022109185-appb-000005
式中,H O为手朝向信息,包括Center和其他两种状态,α为手朝向俯仰角度阈值。
显然,对于某些要求动作精细的场景下,需要更为精准的手姿态偏角信息H XB,H YB,因此显示设备200可以对用户的手进行建模,对不同距离预设手属性信息,得到更为精准的手姿态偏角信息。即用户可以预先输入不同距离下的手型(size)信息,后根据当前帧距离信息,食指指根信息P B2、小拇指指根信息P B5、手腕信息P Wrist,左右手信息H s可生成手姿态偏角信息H XB,H YB
根据中间点P M信息,手腕信息P Wrist,手横向纵向信息H XY,左右手信息H s可生成对应的朝向信息。例如,右手纵向情况下,需要对比手腕和中间点的Y轴信息,若中间点y值小于手腕y值,证明为纵向。因此:
H O=l(P M,P Wrist,H XY,H S)
手面向信息H F表示画面中手面向的信息,可以包括表示面向的具体值,即前向为Front,背向为Back。手面向信息H F默认为Unknown。即:
H F={Front,Back,Unknown}
在进行手面向信息的识别过程中,还可以确定手面向偏角信息,用于表征手面向的程度,等同于手面向信息的置信度。例如,用户的手面向信息虽然检测为Front,但是依然会有偏角,可能不是完全朝向前方,这时就需要根据偏角信息进行一些后续处理,以防止误触发手势。即:
H Fb=a(0<a<90)
通过提取手面向信息,以及根据食指指根信息P B2、小拇指指根信息P B5、左右手信息H s、手朝向信息H O生成手面向信息H F,生成逻辑为,以右手朝上为例,若食指指根的x小于小拇指指根的x,证明为Front,更多细节不再赘述,以通用公式代替:
H F=g(P B2,P B5,H S,α,H O)
对于左右手信息,可用于表示画面中的手影像归属于用户的左手还是右手的成像,其中,左手为Left,右手为Right,因此左右手信息可以表示为:
H S={Right,Left,Unknown}
对于手势伸缩状态,可用于表示手指的伸缩状态,即处于伸开状态的手指状态可以表示为1,处于收缩状态的手指状态可以表示为0。显然,对于手指的伸缩状态不仅包括伸开和收缩两种状态,因此也可通过设置不同的值表示伸缩状态,例如,可以设置表示伸缩状态的值为0,1,2。其中,完全收缩为0,半伸开为1,全伸开为2,可根据具体应用场景灵活变换。因此手势伸缩状态可以表示为:
H T=[F 1,F 2,F 3,F 4,F 5](F=0 or 1or 2)
式中,F 1~F 5分别代表五个手指的伸缩状态。
提取手势伸缩状态,在该部分,主要提取每根手指的蜷缩状态,依据为手朝向、手面向、左右手、手势关键点等信息,最终提取得到的蜷缩状态属性为0或1(本实施例以状态属性0或1为例),其中,0为蜷缩状态,1为伸开状态。以H o=Up,H S=Right,H F=Front为例,即用户摆出右手面向摄像头,手朝上的情况,假设食指指尖坐标为50,食指指中坐标为70,食指指尖在指中上方,则表示手指伸开,为1,若食指指尖为30,指中为50,则为蜷缩状态。拇指和其余四指的对比方式不同,在其余四指对比横坐标的时候,拇指需要对比纵坐标。在手朝向为Up和Down的情况下,拇指需要对比x坐标,其余四指需要对比y坐标;而在手朝向为Right和Left的情况下,拇指需要对比y坐标,其余四指需要对比x坐标。其中,拇指需要对比指根和指尖的状态,其余四指需要对比指中和指尖的状态,也可根据具体场景调整对比点位,最终得到5根手指的蜷缩状态信息。
通过上述手势识别过程,可以得到当前帧关键手势信息,包括手面向信息H F,手朝向信息H O,手朝向偏角信息H OB,左右手信息H s,手势伸缩状态信息H T。其中,手朝向偏角信息可用于判断手势朝向的准确定,在特定场景可以设置阈值,过滤一些模糊姿态手势,提高手势识别准确率。以右手,手背面向摄像头,手势朝下(偏角86度),比手势1为例,其最终的关键手势信息G Info可以表示为:
G Info={H F=Back,H O=Down,H S=Right,H T={0,1,0,0,0},H OB=86}
由于用户动态手势为一个持续输入过程,即手势交互动作可以划分为多个阶段,因此关键手势信息包括多个阶段的关键手势类型。在一些实施例中,显示设备200可以通过遍历多个连续帧用户行为图像对应的目标手势类型,并确定多帧用户行为图像对应关键手势类型的交集,即根据多个连续帧用户行为图像划分动态手势的多个阶段,每个阶段中的用户行为图像归属于相同的目标手势类型。
例如,显示设备200可以通过对多帧用户行为图像photo1~photon中的手势关键坐标集进行分析,确定出每帧用户行为图像中的关键手势类型type1~typen。再对比多帧用户行为图像的关键手势类型type1~typen,从而将关键手势类型相同的多帧用户行为图像,如photo1~photo30和photo31~photon,分别确定为两个阶段,从而确定这两个阶段的关键手势类型,即type1=type2=…=type30和type31=type32=…=typen。
对于多个阶段对应的置信度参数,在一些实施例中,置信度参数包括关键手势偏角,则显示设备200可以根据关键点坐标与关键点标准坐标,计算手势偏角;再遍历每个阶段中多个连续帧用户行为图像对应的手势偏角,以获得每个阶段中的偏角并集;提取每个阶段中的所述偏角并集中的极值,以作为当前阶段关键手势信息中的关键手势偏角。
在提取出关键手势信息后,显示设备200可以调用检测模型进行动态手势匹配。其中,所述检测模型是一种匹配模型,包含多个以树形结构存储的节点,每个节点中设有手势姿态模板。多个节点可以分别处于不同的层级,除根节点和叶子节点外,每个层级的节点中均设有上级节点,且每个层级的节点均被指定下级节点。例如,在显示设备200的存储器中,可以预先存储多个手势姿态模板,每个手势姿态模板用于表征一种静态手势动作。同时,显示设备200还根据存储的手势姿态模板构建手势检测模型,在所述检测模型中,可以赋予每个手势姿态模板对应的节点属性和下级节点。因此,在显示设备200中,手势姿态模板可以仍然保持原本的存储数量,仅通过赋予节点属性即可构成检测模型。
显然,对于检测模型,每个节点中仅插入一个手势姿态模板,而每个手势姿态模板可 以赋予多个节点属性。例如,一个“抓取-松开”的动态手势包括三个阶段,即五指张开手势、五指蜷缩手势、五指张开手势。其对应在检测模型中的节点和手势姿态模板为:根节点-“五指张开手势”;一级节点-“五指蜷缩手势”;二级节点-“五指张开手势”。可见,对于各节点,仅插入一个手势姿态模板,而对于各手势姿态模板,则对应赋予不同层级的节点属性,即“五指张开手势”目标被赋予了根节点和二级节点两个节点属性。
在检测模型中,根节点用于初始化匹配,可以包括多个手势姿态模板,可用于匹配用户输入的初始手势。例如,根节点可以插入用于表征触发手势交互的手势姿态模板。检测模型中的叶子节点中通常不插入特定的手势姿态模板,而是插入用于表示特定响应动作的控制指令,因此在本申请实施例中,除另有说明外,所述检测模型的节点不包括叶子节点。
在调用检测模型后,显示设备200可以使用检测模型匹配关键手势信息,以获得目标手势信息,其中目标手势信息在每个阶段关键手势类型与手势姿态模板相同,且置信度参数在置信度区间内的节点组合。因此,目标手势信息可以通过一个动作(action)路径进行表示。为了确定目标手势信息,显示设备200可以将关键手势信息中各阶段的关键手势类型与检测模型中的各层级节点上的手势姿态模板进行匹配。
在使用检测模型进行关键手势匹配的过程中,显示设备200可以先基于各阶段的关键手势类型,在对应层级中匹配类型相同的手势姿态模板。并在匹配命中一个手势姿态模板时,记录该手势姿态模板对应的节点。同时,显示设备200还判断该节点的置信度参数是否在预设的合理置信度区间范围内。如果当前阶段关键手势类型与手势姿态模板相同,且置信度参数在置信度区间内,则开始下一阶段的匹配。
例如,对于“抓取-松开”的动态手势,在用户输入该动态手势以后,显示设备200可以先对第一阶段的“五指张开手势”与根节点中的手势姿态模板进行匹配,当匹配确定“五指张开手势”与一个根节点中的五指张开手势模板相同或相近时,可以判断第一阶段的置信度参数是否在预设的置信度区间内,即手势朝向偏角是否在预设偏角区间内。如果手势朝向偏角在预设偏角区间内,则开始第二阶段关键手势“五指蜷缩手势”与根节点的下级节点进行上述匹配。
经过对每个阶段的关键手势与对应层级的节点进行匹配后,显示设备200可以获得由多个匹配命中节点组成的动作路径,动作路径最终会指向一个叶子节点,叶子节点对应一个目标手势信息,因此,显示设备200可以在匹配完成后得到目标手势信息,并执行目标手势信息关联的控制指令。
例如,根据显示设备200的手势交互策略的设定,抓取-松开”的动态手势可用于删除当前选中的文件,因此,显示设备200可以在匹配获得“根节点-五指张开;一级节点-五指蜷缩;二级节点-五指张开”的动作路径后,获得删除指令,并通过执行删除指令,对当前选中的文件进行删除。
可见,在上述实施例中,显示设备200通过对手势信息流中各阶段的手势姿态信息进行提取,并使用具有树结构节点形式的检测模型对手势姿态信息进行匹配,可以按照手势输入阶段逐层确定动作路径,从而获得目标手势信息。由于检测模型采用树结构的节点形式,因此在进行手势关键信息匹配的过程中,可以避免每次读取动态手势模板,重复检测。此外,树结构的检测模型还支持用户随时插入节点,实现手势录入。并且通过调整每个节点的置信度区间,可以自定义节点匹配过程的命中率,使检测模型能够使用不同用户的手势习惯,实现自定义手势操作。
在一些实施例中,为了使显示设备200可以针对关键手势信息进行手势类型匹配,显示设备200可以在使用检测模型匹配关键手势信息时,先从多阶段关键手势信息中提取第一阶段关键手势类型。再根据第一阶段关键手势类型匹配第一节点,其中,所述第一节点为存储的手势姿态模板与第一阶段关键手势类型相同的节点。匹配获得第一节点以后,显示设备200可以再从关键手势信息中提取第二阶段关键手势类型,其中,第二阶段为第一阶段的后续动作阶段。再根据第二阶段关键手势类型匹配第二节点。同理,第二节点为 存储的手势姿态模板与第二阶段关键手势类型相同的节点,即第一节点指定的下级节点包括第二节点。最后记录第一节点和第二节点,以获得动作分支。
例如,显示设备200中可以预先注册4种关键手势模板,分别对应的关键手势信息为G info1-G info4,对应能够组合出AM 1-AM 5五种动态手势。其中,AM 1-AM 4的第一阶段关键手势类型相同,AM 3-AM 4的第二阶段手势类型也相同,如图20所示,可以得到对应的树形结构检测模型,对应的动态手势表示如下:
Figure PCTCN2022109185-appb-000006
Figure PCTCN2022109185-appb-000007
Figure PCTCN2022109185-appb-000008
Figure PCTCN2022109185-appb-000009
Figure PCTCN2022109185-appb-000010
在进行关键手势信息匹配时,显示设备200可以按照检测模型树结构的节点存储层级,优先对G info1和G info2的关键手势信息进行匹配。若匹配到关键手势信息为G info1则会根据G info1对应根节点被指定的下级节点进行续继检测,即匹配关键手势模板为G info2、G info3以及G info4的下级节点。同理,如果在第二层级节点的匹配过程中,匹配到关键手势信息为G info4,则会继续检测下级节点,即第三层级中的G info2和G info3对应的节点。依次进行后续层级的节点匹配,直至检测到叶子节点,如在第三层级中匹配命中G info3的节点,则会返回动作AM 3。若在一个层级节点的匹配期间,检测到检测模型当前层级节点中未存储的其他动作,则会重回树根节点,重新检测G info1和G info2
需要说明的是,上述实施例中,第一阶段、第二阶段以及第一节点和第二节点仅仅用于表征动态手势中不同阶段的先后关系以及检测模型中不同节点的上下层级关系,并不具有相应的数字含义。在使用检测模型进行关键手势信息的匹配过程中,同一阶段的手势姿态既可以作为第一阶段也可以作为第二阶段,同理,同一个节点也既可以作为第一节点也可以作为第二节点。
例如,在使用检测模型进行关键手势信息匹配的开始阶段,需要对开始阶段的关键手势信息与检测模型中的根节点进行匹配,此时,开始阶段为第一阶段,开始阶段的下一个阶段为第二阶段;匹配命中的根节点为第一节点,根节点的下一层级匹配命中的节点为第二节点。而在开始阶段完成匹配后,显示设备200则会继续使用检测模型对关键手势信息进行匹配。此时,开始阶段的下一阶段为第一阶段,第一阶段的下一个阶段为第二阶段;而在根节点下一层级节点中匹配命中的节点为第一节点,第一节点下一层级匹配命中的节点为第二节点。因此,在使用检测模型进行匹配的过程中,可以重复上述过程,直至匹配到最终的叶子节点。
具有树结构的检测模型还支持用户的手势录入过程,即在一些实施例中,显示设备200可以在根据第二阶段关键手势类型匹配第二节点时,遍历第一节点的下级节点存储的手势姿态模板;如果所有下级节点存储的手势姿态模板均与第二阶段关键手势类型不同,即用户输入的动态手势为一种新的手势,此时可以触发显示设备200进行手势录入,即控制显示器260显示录入界面。
录入界面可以提示用户进行手势录入,为了获得准确的动态手势,在进行手势录入的过程中,录入界面可以通过提示消息,提示用户重复多次摆出需要录入的动态手势。即用户对同一行为进行多次循环录入。同时,用户还可以通过录入界面指定录入的动态手势所关联的控制指令。显示设备200则在用户每次进行录入时,按照上述示例提取关键手势信 息,并与检测模型的节点进行匹配,当在其中一个层级的节点中未匹配到关键手势模板时,根据对应阶段的关键手势类型,在当前层级添加新节点。
为了减少手势录入过程对用户手势交互操作的影响,在一些实施例中,显示设备200可以在显示录入界面前,通过提示消息或窗口询问用户是否启动录入,并接收用户基于该窗口输入的指令。如果用户输入了录入手势信息,则可以接收用户基于录入界面输入的录入手势信息,并响应于录入手势信息,为检测模型设置新节点,新节点为第一节点的下级节点。最后在新节点存储对应阶段的手势类型,以作为新节点的手势姿态模板。
可见,在上述实施例中,显示设备200可以基于树结构的检测模型实时进行动态手势录入,通过确定待录入Action并录入用户行为,检测行为树结构中是否有对应Action分支。若没有对应Action分支,则进行手势关键姿态提取,然后得到对应的行为模板,将对应节点插入行为树,完成动态手势录入。显然,在进行动态手势录入的过程中,如果用户输入的动态手势在检测模型中有对应Action分支,则根据分支模板对用户行为进行检测,若检测成功,则无需对检测模型的节点状态进行改变。
在一些实施例中,显示设备200在使用检测模型对关键手势信息进行匹配时,还可以对相应的置信度进行判断,其中,置信度可以包括手势偏角和关键手势维持帧数。对于手势偏角,显示设备200可以在匹配命中一个节点后,获取检测模型中对应节点预设的置信度区间;再对比当前阶段关键手势偏角与对应节点的置信度区间。如果关键手势偏角在置信度区间内,则记录对应的当前节点并开始当前节点的下级节点匹配;如果关键手势偏角不在置信度区间内,则确定手势偏差较大,因此需要进一步判断或者进行适应性调整。
由于置信度参数不在置信度区间内可能是用户输入习惯造成的,显示设备200还可以针对用户习惯调整检测模型参数。因此,在一些实施例中,如果在使用检测模型对关键手势信息进行匹配的过程中,一个阶段的关键手势类型与节点中的手势姿态模板相同,但关键手势偏角不在置信度区间内,显示设备200还可以按照手势偏角修改置信度区间。
需要说明的是,在进行模板匹配时,显示设备200可以对手朝向、手面向、手指伸缩信息进行匹配,若匹配成功,再检测置信度阈值是否成功匹配,若成功匹配则认为手势匹配成功。而在进行手势录入时,显示设备200只需要对手朝向、手面向、手指伸缩信息进行匹配。若匹配成功即算模板匹配成功,若动态手势中的所有手势都匹配成功,则认为动态手势匹配成功,最后根据其中最佳置信度进行模板置信度优化。
其中,最佳置信度可以通过多次输入用户行为图像时的部分关键帧进行计算获得。例如,在手势检测过程中,动态手势中有个五指向上的动作,这个动作在特定顺序中出现了10次,而检测时只要检测到三次就认为检测到该手势。则在这10次中会有8个连续手势符合标准(10-3+1),需要选取其中置信度平均最低的那一次,因为在手势开始和结束的阶段,由于手势和其他手势连接动作处可能会有较大偏角,导致偏角值过大,若采用该部分偏角值为置信度值,会出现很多误检测情况。
对于关键手势维持帧数这一置信度参数,其为用户行为图像中与第一阶段关键手势类型相同的连续帧数。在一些实施例中,显示设备200还可以在根据第二阶段关键手势类型匹配第二节点前,获取维持帧数;如果第一阶段关键手势类型的维持帧数大于或等于帧数阈值,即用户较长时间的保持了一个手势动作,不属于误输入的情况,因此可以根据第二阶段关键手势类型匹配第二节点。而如果第一阶段关键手势类型的维持帧数小于帧数阈值,当前输入与预定的动态手势可能存在不同,因此可以按照上述实施例启动手势录入,即控制显示器260显示录入界面,以更新置信度区间。
例如,在一个手势交互动作过程中,会出现多种手势类型,因此,需要提取其中较为明显的特征手势来作为该动作的特征姿态。其中,核心的手势姿态特征为手朝向和手指伸缩状态,因此,显示设备200可以对动作帧进行手势关键点识别和关键手势信息提取;再对关键手势信息进行循环匹配,若手势面向、手朝向、左右手、手指伸缩状态相同,则判断为同类手势。每检测到一次同类手势,就更新偏角信息和同类手势数量信息,偏角信息 取最大范围,同类手势数量信息需要大于阈值。该阈值会根据帧率确定,也可以设置为固定值,如设置为3。对动作帧进行处理,选取其中符合条件的手势姿态,在对多个动作帧进行处理时,取动作交集,每个动作姿态的参数取并集,最终得到对应的关键手势模板。
由于用户在录入某个手势时,做的动作比较标准,但在使用手势交互时,则可能比较随意,不太在意姿势是否标准。尤其在用户比较着急的时候,可能做的手势很不标准。导致显示设备200在进行动态手势检测时识别不准确,降低用户体验。
为了改善上述问题,提高用户体验,在一些实施例中,显示设备200还可以在进行动态手势检测时,采取伪跳转的方式。即显示设备200可以获取中间阶段置信度参数,所述中间阶段为关键手势信息的多阶段中,位于开始阶段和结束阶段之间一个阶段。再对比中间阶段置信度参数与对应节点的置信度区间,如果中间阶段置信度参数不在对应节点的置信度区间内,标记中间阶段对应的节点为预跳转节点。再按照检测模型对预跳转节点的下级节点执行匹配,以根据预跳转节点的下级节点匹配结果确定目标手势信息。
在按照检测模型对预跳转节点的下级节点执行匹配时,显示设备200可以获取预跳转节点的下级节点匹配结果;如果匹配结果为命中任一下级节点,记录预跳转节点和命中的下级节点,以作为目标手势信息的节点;如果匹配结果为未命中下级节点,舍弃预跳转节点,重新从上级节点进行匹配。
例如,如图21所示,在检测到动作G1后,会进入后续动作G2的检测。此时,如果出现一个动作G2,但是置信度参数超出置信度区间,显示设备200则会进行一次伪跳转,即同时进行动作G1的后续检测和动作G2的后续动作检测。若进行伪跳转后检测到动作G3,则认为之前的伪跳转成立,直接进入动作G3。如图22所示,若进行伪跳转后未检测到动作G3,但是出现动作G4,而动作G1和动作G4刚好组成另一个Action路径,则认为此次伪跳转不成立,继续进行动作G4后续动作检测。
为了更好的实施伪跳转的方式,显示设备200可以设置一个伪跳转阈值,如不在置信度区间的一个特定置信度参数值,则在置信度参数小于伪跳转阈值时才进行伪跳转。并且,每进行一次伪跳转都会有提示,用户可以通过特定按键或特定手势删除此次伪跳转。在伪跳转一定次数后,显示设备200会对伪跳转涉及的Action节点进行优化,增大指定阈值以适应用户动作风格。
其中,显示设备200可以通过多种方式更新伪跳转阈值,例如,每进行一次伪跳转,就弹出提示,默认会更新Action节点信息,若用户认为此次检测为误检测,则只需删除此次识别即可。显示设备200也可以在多次伪跳转后更新伪跳转阈值,以获得更好的用户体验。此外,对于伪跳转过程,还可以设定一个次数阈值,即在检测过程中,有多次伪跳转,那么超过一定次数后,则认为前面的伪跳转无效。
基于上述显示设备控制方法,本申请的部分实施例中还提供一种显示设备200。所述显示设备200包括:显示器260、图像采集接口以及控制器250。其中,显示器260被配置为显示用户界面;图像采集接口被配置为采集用户输入的用户行为图像;如图23、图24所示,控制器250被配置为执行以下程序步骤:
获取手势信息流,所述手势信息流包括连续多帧用户行为图像;
从所述手势信息流中提取关键手势信息,所述关键手势信息包括多个阶段的关键手势类型和每个阶段的置信度参数;
使用检测模型匹配所述关键手势信息,以获得目标手势信息,所述检测模型包括多个以树形结构存储的节点;每个所述节点中设有手势姿态模板和指定的下级节点;所述目标手势信息为在每个阶段关键手势类型与手势姿态模板相同,且所述置信度参数在置信度区间内的节点组合;
执行所述目标手势信息关联的控制指令。
具体的,图24为本申请实施例提供的动态手势交互时序关系图,如图24所示,动态手势交互可以包括如下步骤:
S2401:图像采集器采集用户摆出的手势。
S2402:图像采集器将采集到的用户摆出的手势作为手势信息流发送给图像采集接口。
S2403:图像采集接口将接收到的手势信息流发送给控制器。
S2404:控制器基于获取到的手势信息流,检测各个阶段的关键手势类型。
S2405:使用检测模型匹配关键手势信息,以获得目标手势信息。
S2406:执行该目标手势信息关联的控制指令,并通过响应交互使显示器显示相应的内容。
由以上内容可知,上述实施例提供的显示设备200可以在用户输入动态手势后,获取手势信息流,并从手势信息流中提取关键手势信息。再使用检测模型对关键手势信息中各阶段的关键手势类型进行匹配,以获得关键手势类型相同且置信度参数在设定的置信度区间内的节点组合,作为确定的目标手势信息,最后执行目标手势信息关联的控制指令,实现动态手势交互。所述显示设备200基于手势关键点检测动态手势,再基于树结构节点存储形式的检测模型,对关键手势类型进行动态匹配,能够丰富动态手势交互形式,并且支持用户自定义动态手势。
图25为本申请实施例提供的显示设备的另一使用场景的示意图。如图25所示,用户可通过控制装置100来操作显示设备200,或者,设置在显示设备200上的摄像头等视频采集装置201还可以采集包括用户人体的视频数据,并根据视频数据中的图像对用户的手势信息、肢体信息等进行响应,进而根据用户的动作信息执行对应的控制命令。使得用户在不需要遥控器100的情况下,就可以实现对显示设备200进行控制,来丰富显示设备200的功能,提高用户体验。
显示设备200还可与服务器通过多种通信方式进行数据通信。示例的,显示设备200可以通过发送和接收信息,以及电子节目指南(EPG,Electronic Program Guide)互动,接收软件程序更新,或访问远程储存的数字媒体库。服务器可以是一组,也可以是多组,可以是一类或多类服务器。通过服务器提供视频点播和广告服务等其他网络服务内容。
在另一些示例中,显示设备200还可以再增加更多功能或减少上述各实施例中所提到的功能。本申请对该显示设备200的具体实现不作具体限定,例如显示设备200可以是任意的电视机等电子设备。
示例性地,图26为本申请实施例提供的显示设备中另一硬件系统的硬件结构示意图。如图26中示出了图25中显示设备200中的显示设备可以具体包括:面板1、背光组件2、主板3、电源板4、后壳5和基座6。其中,面板1用于给用户呈现画面;背光组件2位于面板1的下方,通常是一些光学组件,用于供应充足的亮度与分布均匀的光源,使面板1能正常显示影像,背光组件2还包括背板20,主板3和电源板4设置于背板20上,通常在背板20上冲压形成一些凸包结构,主板3和电源板4通过螺钉或者挂钩固定在凸包上;后壳5盖设在面板1上,以隐藏背光组件2、主板3以及电源板4等显示设备的零部件,起到美观的效果;底座6,用于支撑显示设备。可选地,图26中还包括按键板,按键板可以设置在显示设备的背板上,本申请对此不做限定。
另外,显示设备200还可以包括声音再现装置(图中未示出)例如音响组件,如包括功率放大器(Amplifier,AMP)及扬声器(Speaker)的I2S接口等,用于实现声音的再现。通常音响组件至少能够实现两个声道的声音输出;当要实现全景声环绕的效果,则需要设置多个音响组件,输出多个声道的声音,这里不再具体展开说明。
需要说明的是,显示设备200可以采用OLED显示屏等具体的实现形式,这样,如图26所示的显示设备200所包含的模板发生相应的改变,此处不做过多说明。本申请对显示设备200内部的具体结构不作限定。
随着电子技术的不断发展,电视机等显示设备能够实现的功能越来越多,例如,显示设备可以通过其设置的视频采集装置拍摄用户的图像,并由处理器对图像中用户的手势信息进行识别后,执行手势信息对应的命令。
然而,目前显示设备通过手势信息确定的控制命令较为单一,造成了显示设备的智能化程度较低、用户体验较差。
为了提高显示的智能化程度,提高用户的体验,下面以具体地实施例,对本申请提供的显示设备的控制方法进行说明,下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。
在一些实施例中,本申请实施例提供的显示设备的控制方法的执行主体可以是显示设备,具体可以是显示设备中的CPU、MCU、SOC等控制器或者控制单元、处理器、处理单元等,本申请后续实施例中,以控制器为执行主体作为示例。则当控制器通过显示设备的视频采集装置获取到视频数据后,根据视频数据的连续多帧图像进行手势识别,进而根据识别到的手势信息执行对应的动作。
在一些实施例中,图27为本申请实施例提供的一种显示设备的控制方法一实施例的示意图,其中,当控制器通过视频采集装置的视频数据中获取到图27右侧的待检测图像、对该待检测图像中的手势A进行识别,能够通过手势识别算法识别出该待检测图像中包括手势信息,该手势信息中包括的“OK”形手势、以及手势的位置、大小等。随后,控制器可以根据当前显示设备的显示器上所显示的,光标位于控件“确定”上,确定该手势信息的“OK”对应的控制命令为“点击确定控件”,最终控制器可以执行该命令。
在另一些实施例中,图28为本申请实施例提供的一种显示设备的控制方法另一实施例的示意图,其中,当控制器通过视频采集装置的视频数据中每一帧图像中的手势进行识别后,根据前后两帧待检测图像相比较得出,待检测图像中用户的手势B从前一帧图像中左侧移动至了后一帧图像中的右侧,说明待检测图像中用户的手势B发生了移动。随后,控制器可以根据当前显示器上所显示的内容为,正在移动的光标C,可以确定手势信息对应的控制命令为“向右侧移动光标”,并且所移动的距离可以与待检测图像中手势信息对应的移动距离相关,本申请后续实施例将提供计算待检测图像中手势移动距离和显示器上光标移动距离的关联方式。
从上述图27和图28所示实施例可以看出,当显示设备中的控制器能够通过视频采集装置采集的视频数据,确定用户的手势信息,进而执行用户通过手势表示出的控制命令,使得用户不用依赖于遥控器、手机等控制装置即可控制显示设备,丰富了显示设备的功能、增添了控制显示设备时的趣味性,能够极大地提高显示设备的用户体验。
本申请对控制器根据一帧待检测图像确定该图像中手势信息的具体方式不做限定,例如可以采用机器学习模型基于图像识别的方式识别出待检测图像中的手势信息等。
在一些实施例中,本申请还提供一种显示设备的控制方式,可以通过定义待检测图像中人体手部的关键点坐标,进而确定手部的手势信息,能够更好地应用于显示设备的场景中。例如,图29为本申请实施例提供的手部关键点坐标的示意图,在如图29所示的示例中,将人体手部按照手指、关节、手掌的位置依次标记1-21的共21个关键点。
图30为本申请实施例提供的手部关键点的不同伸缩状态示意图,其中,控制器在对待检测图像中的手势信息进行识别时,首先通过图像识别等算法确定待检测图像中手的朝向,并在图像中包括手心一侧的关键点时,继续对所有的关键点进行标识,并判断每个关键点的位置。例如,图30中最左侧的图像中,对应于手部中指的9-12号关键点之间的距离较为稀疏且分散,说明中指处于伸展状态,图30中部的图像中,对应于手部中指的9-12号关键点之间,上部较为集中、下部较为分散,说明中指处于半弯折状态;图30右侧的图像中,对应于手部中指的9-12号关键点之间的距离较近且集中,说明中指处于完全的蜷缩状态。因此可以定义不同的关键点之间的距离、分布比例等,对图30中不同的状态进行区分,则根据图30中相同的方式,可以对图29中5个指头各自对应的每个关键点进行识别后,得到待检测图像中的手势信息。
在一些实施例中,本申请还提供一种显示设备的控制方法,控制器可以识别待检测图像中的手势信息和肢体信息,并根据这两种信息共同确定控制命令并执行。例如,图31 为本申请实施例提供的显示设备的控制方法一应用场景的示意图,在图31所示的场景中,显示设备200的具体结构与图25-图26中所示相同,此时,显示设备200的用户可以通过手势和肢体共同来表示控制命令,随后显示设备200通过其视频采集装置采集到视频数据后,显示设备200中的控制器对多帧图像中的待检测图像进行识别,同时识别出待检测图像中用户的手势信息和肢体信息。
图32为本申请实施例提供的使用手势信息和肢体信息共同确定控制命令的示意图,其中,假设图32中左侧的手势信息F为“OK”手势,肢体信息G为手肘指向左上角,则根据手势信息F和肢体信息G可以确定的控制命令为点击显示器左侧所显示的控件;图32中右侧的手势信息H为“OK”手势,肢体信息I为手肘指向右上角,则根据手势信息H和肢体信息I可以确定的控制命令为点击显示器右侧所显示的控件。
结合上述实施例可以看出,本申请实施例提供的显示设备的控制方法,控制器能够根据待检测图像中的手势信息和肢体信息共同确定不同的控制命令,丰富了用户可以使用这种交互方式向显示设备发出的控制命令的数量,进一步提高了显示设备的智能化程度以及用户体验。
在一些实施例中,若显示设备的控制器的计算能力支持,控制器可以对其从视频数据中抽取的每一帧待检测图像进行手势和肢体的信息识别,但是,由于常见的手势和肢体识别所需要的计算量较大,极大地增加了控制器所需的计算量,并且用户大多数时间内也并不是一直在控制显示设备,因此,本申请提供的显示设备设置有至少两个检测模型,记为第一检测模型和第二检测模型,其中,第二检测模型用于对待检测图像中的手势信息和肢体信息进行识别,而第一检测模型的计算量和数据量小于第二检测模型,可以用于对待检测图像中是否包括手势信息进行识别。下面通过图33对本申请实施例提供的显示设备的控制方法进行具体说明。
图33为本申请实施例提供的显示设备的控制方法的流程示意图,如图33所示的控制方法包括:
S3301:按照预设时间间隔,从显示设备的视频采集装置所采集的视频数据的连续多帧图像中,抽取一帧待检测图像。
其中,本申请可应用在如图31所示的场景中,由显示设备中的控制器执行,当显示设备处于工作状态时,其视频采集装置将采集其朝向方向的视频数据,则作为执行主体的控制器获取视频数据后,从视频数据中按照预设时间间隔抽取一帧待检测图像。例如,当视频采集装置所采集的视频数据的帧率为60帧/秒,则控制器可以按照30帧/秒的帧率进行采样,实现每间隔一帧抽取一帧待检测图像进行后续处理,此时预设时间间隔为1/30秒。
S3302:使用第一检测模型,判断待检测图像中是否包括人体的手势信息。
具体地,针对图31中的应用场景,当用户需要控制显示设备时,即可站在视频采集装置朝向的方向上,根据其希望显示设备的控制命令,做出相应的手势和肢体的动作,此时视频采集装置采集到的包括目标手势信息和肢体信息的图像;当用户不需要控制显示设备时,视频采集装置在其采集范围内采集的视频图像中不包括目标手势信息和肢体信息。
因此,如果在S3302之前的待检测图像中不包括手势信息,且没有使用第二检测模型对待检测图像进行处理时,控制器在S3302中将使用计算量较小的第一检测模型对待检测图像进行处理,通过第一检测模型判断待检测图像中是否包括手势信息。
在一些实施例中,控制器使用手势类别检测模型作为上述第一检测模型,来实现全局感知算法,进而达到对待检测图像中是否包括手势信息进行判断的目的。其中,全局感知算法是指控制器可以在开机后默认开启并保持运行状态的算法,具有计算量较小、检测类型简单的特点,可仅用于获取特定的信息,并用于开启第二检测模型进行检测等其他非全局功能。
在一些实施例中,第一检测模型是通过多个训练图像训练得到的,每个训练图像中包括不同的待训练手势信息,则控制器使用第一检测模型将学习得到的手势信息与待检测图 像进行比对,从而判断待检测图像中是否包括手势信息,但第一检测模型可不用于具体识别手势信息,而第二检测模型可用于通过具体的关节等识别算法确定出手势信息。
S3303:若S3302中确定待检测图像中包括人体的手势信息,则确定用户希望对显示设备进行控制,控制器随后继续获取待检测图像,并使用第二检测模型对待检测图像中的目标手势信息和肢体信息进行识别。
在一些实施例中,当检测到待检测图像中包括人体的手势信息后,控制器可以继续按照预设时间间隔从视频采集装置采集的多帧图像中抽取待检测图像,并使用第二检测模型代替第一检测模型,对后续抽取的待检测图像进行处理,从而识别出每一帧待检测图像的目标手势信息和肢体信息。或者,控制器还可以减少预设时间间隔,以更少的时间间隔抽取待检测图像。
在一些实施例中,控制器也可以将S3302中确定包括人体的手势信息的待检测图像使用第二检测模型进行处理后,继续使用第二检测模型对后续的待检测图像进行处理,即对用户行为图像进行处理。
S3304:根据S3303中确定的预设数量帧的用户行为图像中的目标手势信息和肢体信息确定对应的控制命令,并执行该控制命令。
在一些实施例中,为了提高识别的准确性,控制器可以连续采集多帧图像进行处理,例如,当S3302中判断待检测图像中包括人体的手势信息,则在S3303中,按照预设时间间隔采集预设数量个(例如3个)用户行为图像后,分别对这3个用户行为图像进行目标手势信息识别和肢体信息的识别,最终在这3个用户行为图像中的目标手势信息和肢体信息相同时,确定根据这些相同的目标手势信息和肢体信息进行后续计算,能够防止因其他因素导致的偶发错误导致的识别不准确。
则当上述预设数量个用户行为图像中的目标手势信息和肢体信息均相同(或者部分相同,且部分相同的比例与预设数量比例大于阈值,例如阈值可以取80%等)时,控制器再根据映射关系,确定该目标手势信息和肢体信息所对应的控制命令。例如,图34为本申请实施例提供的映射关系一实施例的示意图,其中,该映射关系包括多个控制命令(控制命令1、控制命令2…),以及每个控制命令与对应的目标手势信息和肢体信息之间的对应关系,例如:控制命令1对应于手势信息1和肢体信息1,控制命令2对应于手势信息2和肢体信息2……。其具体的实现方式可以参照图32,不同的目标手势信息和肢体信息的组合可以对应于不同的控制命令。
在一些实施例中,上述映射关系可以是预设的、也可以是显示设备的用户所指定的,并可以提前存储在控制器中,使得控制器根据其所确定的目标手势信息和肢体信息,即可从映射关系中确定对应的控制命令并继续执行。
在另一些实施例中,图35为本申请实施例提供的映射关系另一的示意图,在图35所示的映射关系中,目标手势信息和肢体信息分别与一个控制命令对应,此时,控制器可以根据目标手势信息或者肢体信息确定一个控制命令后,使用另一个信息对所确定的控制命令进行验证,从而提高所得到的控制命令的准确性,当两个信息确定的控制命令不同时,说明识别有误,则可以不执行该控制命令或者进行重新识别等处理措施,防止执行错误的控制命令。
在又一些实施例中,本申请提供的映射关系还可以包括对应于“不执行任何命令”的控制命令,例如,图36为本申请实施例提供的一种图像中目标手势信息和肢体信息的示意图,其中,图像中的用户是背部朝向显示设备,此时手部刚好朝向显示设备。虽然用户没有想控制显示设备,但通过图33所示的流程第一检测模型确定当前待检测图像中包括手势信息,随后又通过第二检测模型识别出该目标手势信息和肢体信息后,控制器可以根据映射关系确定当前目标手势信息和肢体信息不执行任何命令。此时的映射关系可以包括例如手势信息为手掌展开、肢体信息为手肘指向斜下方等。
综上,本实施例提供的显示设备的控制方法,控制器能够根据用户行为图像中的目标 手势信息和肢体信息共同确定不同的控制命令,丰富了用户可以使用这种交互方式向显示设备发出的控制命令的数量,进一步提高了显示设备的智能化程度以及用户体验。进一步地,本实施例使用计算量较小的第一检测模型对待检测图像中是否包括手势信息进行识别,只有在第一检测模型确定包括手势信息后,再使用计算量较大的第二检测模型识别目标手势信息和肢体信息,从而能够减少无效的识别所带来的计算量和功耗、提高控制器的计算效率。
结合上述图33中S3301-S3304,在具体的实现方式中,当控制命令为点击显示器上显示的控件、返回主页、修改音量等一次性的控制操作时,如图S3304执行该控制命令后,即可结束该流程,停止使用第二检测模型识别目标手势信息和肢体信息,并返回S3301中继续抽取待检测图像,再次利用第一检测模型识别手势信息,从而重新执行如图33所示的整个流程。
在另一种具体的实现方式中,当控制命令为控制显示器上的鼠标等目标控件移动至手势信息对应的位置的移动命令时,则在S3304中执行完该移动命令后,应返回S3303中,重复执行S3303-S3304的过程,从而对用户连续的移动动作的检测,来实现对显示器上目标控件的持续移动。
在一些实施例中,上述重复执行S3303-S3304的过程中,如果识别到当前获取的预设数量的用户行为图像中的人体的目标手势信息和肢体信息对应于停止命令、或者通过第二检测模型确定预设数量的用户行为图像中不包括人体的目标手势信息和肢体信息时,均可以结束该流程,停止使用第二检测模型识别目标手势信息和肢体信息,并返回S3301中继续抽取待检测图像,再次利用第一检测模型识别手势信息,从而重新执行如图33所示的整个流程。
在一些实施例中,当控制命令为控制显示器上的鼠标等目标控件移动至手势信息对应的位置的移动命令时,并且控制器在不断重复执行S3303-S3304的过程中,可以理解的是,此时用户的手势应该处于连续移动的状态,一旦移动的过快,控制器在某一次检测的过程中,可能出现无法检测到多帧用户行为图像中目标手势信息和肢体信息的情况,在这个情况下,控制器可以不立即停止执行该流程,而是可以根据前一次或者多次的检测结果,对当前可能出现的目标手势信息和肢体信息进行预测,并根据预测得到的目标手势信息和肢体信息执行后续的移动命令。
例如,图37为本申请实施例提供的目标控件的移动位置的示意图,控制器第①次执行S3303检测到用户行为图像中的目标手势信息K和肢体信息L后,在S3304中,执行将目标控件移动到显示器上①位置的移动命令。控制器第②次执行S3303检测到用户行为图像中的目标手势信息K和肢体信息L后,在S3304中,执行将目标控件移动到显示器上②位置的移动命令。然而,假设用户在第②次检测之后移动的速度过快,导致控制器第③次执行S3303时,未能在用户行为图像中识别出目标手势信息和肢体信息,未能移动显示器上的目标控件,而后续控制器第④次执行S3303又能够检测到用户行为图像中的目标手势信息K和肢体信息L后,在S3304中,执行将目标控件移动到显示器上④位置的移动命令时,将目标控件从显示器上②位置直接移动至④位置的变化较大,给用户带来暂停、卡顿的观看效果,极大地影响用户体验。
因此,本实施例中,当控制器第③次执行S3303时,未能在用户行为图像中识别出目标手势信息和肢体信息时,由于显示器上仍然在控制目标控件的移动,控制器可以根据第①次和第②次所识别的目标手势信息K和肢体信息L的移动速度和移动方向,对第③次的用户行为图像中可能出现的目标手势信息K和肢体信息L进行预测,进而根据预测得到的目标手势信息和肢体信息所对应的预测位置,进而根据所预测得到的目标手势信息和肢体信息,执行将目标控件移动到显示器上③位置的移动命令。
最终,图38为本申请实施例提供的目标控件的移动位置另一示意图,当使用上述预测方法后,对于相同时间间隔所采集的用户行为图像中,按照①-②-③-④变化的目标手势 信息和肢体信息,虽然第③次执行S3303时,未能在用户行为图像中识别出目标手势信息和肢体信息,但还是基于预测的目标手势信息和肢体信息对显示器上的③位置进行了预测,使得整个过程中,显示器上的目标控件将按照①-②-③-④的位置均匀变化,避免了图37中目标控件从位置②直接移动到位置④的暂停与卡顿,极大地提高了显示效果,使得用户通过手势和肢体控制显示设备时的操作效果更为流畅和顺滑,进一步提高了用户体验。
而为了实现上述过程,在一些实施例中,控制器每一次执行S3303之后,都将存储从而记录本次执行S3303所得到的目标手势信息和肢体信息,以供后续一次没有检测到目标手势信息和肢体信息时进行预测。在一些实施例中,当连续多次(例如3次)执行S3303中的过程时都没有检测到目标手势信息和肢体信息,则不再进行预测,而是停止执行本次流程,重新从S3301开始执行。
基于上述实施例,在具体的实现过程中,控制器可以根据第二检测模型的识别结果,维护一个手势移动速度v及移动方向α,根据帧率和多帧间移动距离(一般为三帧)可得到手势移动速度v和移动方向α。当出现手势检测不到的情况但是肢体可检测到的时候,会增加多帧的行动预测(一般为三帧),防止因手势忽然检测不到而出现焦点重置、鼠标卡顿等影响用户体验的状况。根据手势移动速度v和移动方向α可得到下一帧的预测手势位置,当然,需要有一个速度阈值β,若手势移动速度超过阈值β,会固定为速度β,这是防止手势快速导致的速度过快影响体验。
在一些实施例中,上述示例中在使用第二检测模型识别用户行为图像的目标手势信息和肢体信息时,并不以一帧用户行为图像的识别结果为准,而是当将预设时间,抽取预设数量个用户行为图像,并在这些用户行为图像中均检测到目标手势信息和肢体信息后,再执行这些相同的目标手势信息和肢体信息对应的控制命令。而在具体的实现过程中,显示设备的控制器可以根据显示设备的工作参数,动态地调整上述预设时间间隔,例如,控制器根据当前负载较轻时,确定预设时间为100ms,即,每间隔100ms抽取一帧用户行为图像,假设预设数量为8,则该预设数量个用户行为图像对应于800ms的时间范围,如果在这个时间范围内,控制器在8帧用户行为图像中均检测到目标手势信息和肢体信息后,说明目标手势信息和肢体信息真实有效,即可行这些相同的目标手势信息和肢体信息对应的控制命令。而当控制器根据当前负载大于阈值时,确定负载较重,确定预设时间为200ms,即每间隔200ms抽取一帧用户行为图像,此时,控制器可以调整预设数量为4,从而同样在4帧用户行为图像对应的800ms的时间范围,确定目标手势信息和肢体信息的真实有效。因此,本实施例提供的控制方法中,控制器可以动态地根据预设时间间隔调整预设数量,并且二者呈反比例对应关系,从而既能够减少控制器在重负载时的计算量、也能够防止预设时间间隔较长时,由于预设数量较大导致识别时间的延长,最终在保证识别的准确的基础上,满足一定的识别效率。
在一些实施例中,图39为本申请实施例提供的显示设备的控制方法的流程示意图,可以作为如图33所示的控制方法一种具体的实现方式,如图39所示,包括以下步骤:
S3901、S3903:对待检测图像进行手势检测,若检测到目标手势信息,则执行S3904,否则执行S3901、S3903。
S3904-S3906:开启手势肢体控制模式,并继续进行肢体识别,确定肢体信息。
S3907-S3908:进行用户行为检测,确定是否检测到用户的点击手势,若是,则执行S3910,否则,执行S3909。
S3909:执行移动相关的控制指令。
S3910:执行点击相关的控制指令,并重置检测模式,停止肢体识别,只开启手势识别,并执行S3901-S3902。
S3901-S3902:对待检测图像进行手势检测,获取用户的目标手势信息,并执行S3907-S3908。
关于图39的具体实现方式及原理与图33所示相同,在本申请实施例中不再赘述。
在一些实施例中,控制器使用第二检测模型能够识别出用户行为图像中人体的目标手势信息,而第一检测模型也是通过包括手势信息的图像训练得到的,因此,控制器在每次执行完如图33所示的整个流程之后,都可以将本次执行时通过第二检测模型识别出的目标手势信息,用于对第一检测模型的训练和更新,从而实现根据当前检测的目标手势信息,对第一检测模型更加有效的更新,因此能够提高第一检测模型的实时性和适用性。
在本申请前述各实施例的具体实现过程中,虽然可以根据用户行为图像中的目标手势信息和肢体信息对显示器进行控制,但是由于显示设备的视频采集装置所采集的待检测图像中,人体可能只位于其中的一小部分区域内,使得当用户在完成控制显示器上控件较长距离的移动操作时,人体的手势信息移动的位置较长,给用户的使用带来不便。因此,本申请实施例还提供一种显示设备的控制方法,通过建立待检测图像中的“虚拟框”与显示器的映射关系,使得用户在控制显示设备时,可以仅通过其手势在虚拟框内的移动,即可实现指示目标控件在显示器上的移动,极大地减少了用户的动作幅度,能够提高用户体验。下面结合具体的实施例,对本申请提供的“虚拟框”以及相关的应用进行说明,其中,虚拟框仅为示例性的称呼,也可以被称为映射框、识别区域、映射区域等,本申请对其名称不做限定。
例如,图40为本申请实施例提供的显示设备的控制方法的流程示意图,如图40所示的方法可以应用于图31所示的场景中,由显示设备中的控制器执行,并用于在显示设备显示鼠标等控件的情况下,识别用户通过手势信息发出的移动该控件的移动命令,具体地,该方法包括:
S4001:当显示设备处于工作状态时,其视频采集装置将采集其朝向方向的视频数据,则作为执行主体的控制器获取视频数据后,从视频数据中按照预设时间间隔抽取一帧待检测图像。并识别待检测图像中的人体的手势信息。
其中,S4001的具体实现方式可以参照S3301-S3303,例如,控制器可以使用第一检测模型对每一次抽取的待检测图像中是否包括手势信息进行判断,并使用第二检测模型对包括手势信息的用户行为图像中的目标手势信息和肢体信息进行识别,具体实现及原理不再赘述。或者,S4001中还可以直接在显示设备显示目标控件、或者运行需要显示目标控件的应用程序时,说明此时可能需要对目标控件进行移动,因此每一次获取待检测图像后,都直接使用第二检测模型对用户行为图像中的目标手势信息和/或肢体信息进行识别,识别出的目标手势信息和/或肢体信息可用于后续确定移动命令。
S4002:当S4001中抽取的第一用户行为图像并进行识别后,控制器确定该第一用户行为图像中包括目标手势信息,则控制器根据第一用户行为图像中的该目标手势信息建立虚拟框,以及建立虚拟框与显示设备的显示器之间的映射关系,并可以在预设的第一显示位置显示目标控件,其中,第一显示位置可以是显示器的中心位置。
示例性地,图41为本申请实施例提供的虚拟框的示意图,其中,当第一用户行为图像中包括目标手势信息K和肢体信息L,且该目标手势信息和肢体信息为展开的手掌、对应于移动显示器上显示的目标控件的命令,则此时,控制器根据目标手势信息K所在的第一焦点位置P为中心,建立虚拟框,并在显示器的中心位置显示目标控件。在一些实施例中,虚拟框的形状可以是矩形,且该矩形的长宽之比与显示器的长宽之比相同,但虚拟框的面积与显示器的面积可以不同。如图41所示,虚拟框与显示器之间的映射关系通过图中虚线表示,在该映射关系中,虚拟框的中点P对应于显示器的中点Q,矩形虚拟框的四个顶点分别对应于矩形显示器的四个顶点,且由于虚拟框的长宽之比与显示器的长宽之比相同,使得矩形虚拟框内的一个焦点位置可以与显示器上的一个显示位置相对应,使得矩形虚拟框内焦点位置变化时,显示器上显示位置能够跟随焦点位置相应地变化。
在一些实施例中,上述映射关系可以通过虚拟框中的焦点位置与虚拟框内的一个目标位置之间的相对距离,与显示器上的显示位置与显示器上的同样的目标位置之间的相对距离表示。例如,以假设虚拟框左下角的顶点P0点为原点建立坐标系,P点的坐标可以表示 为(x,y);以显示器左下角的顶点Q0点为原点建立坐标系,Q点的坐标可以表示为(X,Y)。则该映射关系可以表示为:矩形长边方向的X/x和矩形宽边方向的Y/y。
上述S4001-S4002,控制器完成了矩形虚拟框及映射关系的建立,随后,可以在S4003-S4004中对虚拟框及映射而关系进行应用,使得手势信息对应的焦点位置的移动可以对应于显示器上目标控件的位置移动。
S4003:当第二用户行为图像中包括目标手势信息,且目标手势信息对应的第二焦点位置在矩形虚拟框中时,根据第二焦点位置和映射关系,确定显示器上的第二显示位置。
S4004:控制显示器上的目标控件移动到S4003中确定的第二显示位置。
具体地,图42为本申请实施例提供的虚拟框和显示器的对应关系示意图,其中,假设在第一用户行为图像中,以目标手势信息中的第一焦点位置P建立了虚拟框,此时,同时可以在显示器上的中心的第一显示位置Q点显示的目标控件“鼠标”。随后,当对于第一用户行为图像之后的第二用户行为图像中的虚拟框内,目标手势信息的第二焦点位置P’相对于第一检测图像向右上角方向了移动,此时,控制器可以根据第二焦点位置与虚拟框内左下角目标位置之间的第一相对距离,结合映射关系中的比例,确定出显示器上对应的第二显示位置Q’与显示器上左下角目标位置之间的第二相对距离。最终,控制器可以根据第二相对距离和左下角目标位置的坐标,计算出显示器上第二显示位置Q’的实际位置,并在第二显示位置Q’显示目标控件。
图43为本申请实施例提供的目标控件移动的示意图,其中,示出了如图42所示的过程中,当第一用户行为图像到第二用户行为图像之间的目标手势信息从第一焦点位置P移动到了第二焦点位置P’,控制器根据焦点位置在虚拟框内的变化,可以分别在显示器上第一显示位置Q和第二显示位置Q’显示目标控件,在这个过程中,给用户呈现出来的观感是,显示器上所显示的目标控件跟随其目标手势信息的移动而相应地移动。
可以理解的是,上述S4003-S4004的过程可以循环重复执行,可以对每一次识别的用户行为图像中,目标手势信息对应的焦点位置确定显示位置,并重复持续控制目标控件在显示器上移动。
在本实施例中,以目标手势信息所在的位置为焦点位置,例如目标手势信息中的一个关键点作为焦点位置,在其他实施例中,还可以以肢体信息的关键点作为焦点位置等,其实现方式相同,不再赘述。
此外需要说明的是,上述示例中,以第一用户行为图像和第二用户行为图像为单帧图像作为示例,如图40也可以与如图33所示的方法相结合,用户行为图像包括多帧用户行为图像,从而根据多帧用户行为图像中识别出来的目标手势信息确定对应的焦点位置。
综上,本实施例提供的显示设备的控制方法,能够通过建立用户行为图像中的“虚拟框”与显示器的映射关系,使得用户在控制显示设备时,可以仅通过其手势在虚拟框内的移动,即可实现指示目标控件在显示器上的移动,极大地减少了用户的动作幅度,能够提高用户体验。
在上述实施例的具体实现过程中,控制器在建立虚拟框时,所建立的虚拟框的大小可以与人体与视频采集装置之间的距离有关。例如,图44为本申请实施例提供的虚拟框的面积示意图,其中,当人体与视频采集装置之间的距离较远时,用户行为图像中手势信息所对应的面积较小,因此可以设置较小的虚拟框;当人体与视频采集装置之间的距离较近时,用户行为图像中手势信息所对应的面积较大,因此可以设置较大的虚拟框。虚拟框的面积可以与距离建立正比例变化的线性倍数关系,或者根据距离分为多级映射关系(即某段距离内对应某个框大小),具体映射关系可以根据实际情况调整。在一些实施例中,控制器可以根据显示设备所设置的红外形式或者其他任意形式的测距单元等方式确定人体与显示设备(视频采集装置设置在显示设备上)之间的距离,或者,控制器还可以根据用户行为图像中手势信息对应的面积确定对应的距离,进而根据手势信息的面积确定虚拟框的面积等。
在一些实施例中,当建立的虚拟框比较靠近用户行为图像的边缘时,由于图像识别处理算法等条件的限制,会降低识别目标手势信息的准确性。因此,控制器还可以为用户行为图像建立围绕其边缘的边缘区域建立控制最佳范围。例如,图45为本申请实施例提供的边缘区域的示意图,可以看出,边缘区域指用户行为图像内、控制最佳范围之外的,与用户行为图像的一个边界之间的距离小于预设距离的区域。在图45上方的用户行为图像中,假设根据第一用户行为图像中的目标手势信息建立的虚拟框完全位于边缘区域之外、位于控制最佳范围之内,则可以继续后续计算。而当控制器根据第一用户行为图像中的目标手势信息建立的虚拟框有部分区域位于边缘区域内,在图45下方的用户行为图像中,虚拟框左侧位于边缘区域内,则控制器可以对虚拟框在横向方向上进行压缩处理,得到了横向压缩后的虚拟框。可以理解,随后根据压缩后的虚拟框也能够与显示器建立映射关系,此时,目标手势信息对应的焦点位置的移动距离,在显示器上将对应于更大的显示位置的变化距离,虽然对于用户而言在横向的体验为目标控件移动较快,但是避免了控制器从用户行为图像的边缘区域识别目标手势信息,能够提高目标手势信息的识别精度,提高整个控制过程的准确。
在上述实施例中,提供了用户行为图像中的虚拟框,使得用户可以通过手势信息在虚拟框内的移动,控制显示器上目标控件的移动,但是在一些情况下,用户由于动作较大、身体整体移动等原因,其手势信息可能移动至虚拟框之外,导致无法识别的情况,影响控制效果。例如,图46为本申请实施例提供的手势信息的状态示意图,其中,在状态S1中,第二用户行为图像中包括目标手势信息,且目标手势信息对应的第二焦点位置可以在建立好的虚拟框K1内部,此时可以正常执行前述实施例中的控制方法,通过目标手势信息在虚拟框中的焦点位置确定目标控件的显示位置。而在图46的状态S2中,第二用户行为图像中包括目标手势信息,且目标手势信息对应的第二焦点位置可能出现在用户行为图像中虚拟框K1之外,此时将无法正常通过目标手势信息在虚拟框中的焦点位置确定目标控件的显示位置。
因此,当控制器识别到第二用户行为图像中手势信息对应的第二焦点位于虚拟框之外的P2点之后,可以重新以此时第二焦点所在的P2点位置中心,重新建立虚拟框K2,并建立虚拟框K2与显示器之间的映射关系。图47为本申请实施例提供的重新建立的虚拟框的示意图,可以看出,图47中重新建立的虚拟框K2内,第二焦点位置P2位于虚拟框K2的中心,因此,控制器此时还需根据第二焦点位置P2,在显示器上控制目标控件在中心位置显示,给用户也带来重置目标控件的观看效果,从而避免了由于手势信息移除虚拟框后导致的无法控制目标控件的问题。
图48为本申请实施例提供的重新建立的虚拟框的另一示意图,其中,在这种方式中,当出现图46中S2状态所示的手势信息出现在待检测图像中虚拟框K1之外时,控制器重置虚拟框。并且此时,假设控制器根据前一个用户行为图像中目标手势信息在虚拟框K1中的位置信息,在显示器上的第一相对位置Q1显示目标控件,此时,根据显示器上第一相对位置Q1在整个显示器内的相对位置关系,重新建立虚拟框K2,使得第二焦点位置P2在虚拟框K2内的相对位置关系,与第一相对位置Q1在显示器内的相对位置关系相同。因此,控制器可以继续在第一相对位置Q1显示目标控件,在不会出现目标控件跳变到显示器位置中心位置的情况下,完成了虚拟框K2的重置。后续的用户行为图像中,目标手势信息在虚拟框K2内变化时,控制器即可根据目标手势信息在虚拟框K2中的焦点位置确定目标控件的显示位置,从而实现在用户不可知的情况下完成焦点重置,既能够避免了由于目标手势信息移除虚拟框后导致的无法控制目标控件的问题,又能够使整个过程更加流畅,进一步提高用户的使用体验。
在一些实施例中,当控制器执行上述过程,重新建立虚拟框之后,可以在显示器上显示相关的提示信息,来提示用户当前在显示器上已经重新建立了虚拟框,并提示重新建立后的虚拟框的相关信息,例如,控制器可以在显示器的边缘位置显示文字、图像等形式的 信息,提示用户虚拟框已经重建。或者,当控制器在上述过程中确定要重新建立虚拟框之后,还可以在显示器上显示提示更新虚拟框的信息,并在接收到用户的确认信息之后,再执行重建虚拟框的过程,使得整个过程用户可控,并且根据用户的意图进行重建,防止因用户主动离开等情况下无效的重建。
在一些实施例中,上述控制目标控件的移动过程中,当控制器在控制过程中,在连续预设数量个用户行为图像中,都没有识别到目标手势信息,则可以停止在显示器上显示目标控件,从而结束图40所示的流程。或者,当控制器在一定预设时间段内处理的用户行为图像中都不包括目标手势信息,也可以停止显示器上显示目标控件,结束流程。又或者,当控制器在控制过程中,识别到用户行为图像中包括的目标手势信息对应于停止命令时,同样可以停止显示器上显示目标控件,结束流程。
在一些实施例中,在如图40所示的方法执行过程中,控制器将根据每一帧用户行为图像内,目标手势信息位于虚拟框内的焦点位置,确定显示器上的显示位置,并在显示位置上显示目标控件。在一种具体的实现方式中,图49为本申请实施例提供的目标控件的移动时的示意图,从图49中可以看出,假设当控制器确定用户行为图像1中目标手势信息位于虚拟框内的焦点位置P1、从而控制显示器上显示位置Q1显示目标控件,用户行为图像2中目标手势信息位于虚拟框内的焦点位置P2、从而控制显示器上显示位置Q2显示目标控件,用户行为图像3中目标手势信息位于虚拟框内的焦点位置P3、从而控制显示器上显示位置Q3显示目标控件。然而,上述过程中,由于用户在做出上述手势时,可能在P1-P2的过程中移动过快,使得显示器上所显示的目标控件在Q1-Q2之间移动过程,给用户带来移动速度不均匀、目标控件跳变的观感。
因此,当控制器确定第二焦点位置后,控制器所进行的处理可以参照图50中的状态变化,其中,图50为本申请实施例提供的目标控件的移动时的另一示意图。如图50所示,当控制器确定虚拟框中的第一焦点位置P1和第二焦点位置P2后,还对第二焦点位置与第一焦点位置之间的距离与预设时间间隔进行比较,如果P1-P2之间的距离与预设时间间隔(即抽取第一焦点位置和第二焦点位置所在用户行为图像的间隔时间)之比大于预设阈值,说明目标手势信息的移动速度过快,此时如果继续根据第二焦点位置确定目标控件的第二显示位置并显示目标控件,可能带来如图49所示的显示效果。因此,控制器在第一焦点位置和第二焦点位置之间确定第三焦点位置P2’,其中,第三焦点位置P2’与第一焦点位置P1之间的距离与预设时间间隔之比不大于预设阈值,以及,第三焦点位置P2’可以是位于P1-P2之间连接线上的一点,P1、P2’和P2呈线性连接关系。随后,控制器可以根据第三焦点位置P2’以及映射关系,确定显示器上的第二显示位置Q2’,并控制目标控件从第一显示位置Q1移动到第二显示位置Q2’。
而在上述移动过程中,由于手势信息移动至第二焦点位置P2,而显示器上所显示的目标控件并没有移动到第二焦点位置对应的显示位置Q2,而是移动到第三焦点位置P2’对应的第二显示位置Q2’,因此,在控制器在处理第二用户行为图像之后的第三用户行为图像时,若第三用户行为图像中包括目标手势信息、且目标手势信息对应的第四焦点位置P3位于矩形虚拟框中,同时,第四焦点位置P3与第三焦点位置P2’之间的距离与预设时间间隔之比不大于预设阈值时,可以根据映射关系确定第四焦点对应的第三显示位置Q3,并控制显示器上的目标控件从第二显示位置Q2’移动到第三显示位置Q3。
最终,在上述整个过程中,在P1-P2位置的目标手势信息移动过快时,在显示器上所显示的目标控件能够减少移动的长度,在P2-P3位置的目标手势信息移动速度减少时,能够将P1-P2位置过程中“减少”的距离进行补齐,从用户的感受来看,当其目标手势信息从虚拟框左侧的P1位置移动到右侧P3位置时,显示器上的目标控件也将从显示器左侧的Q1位置移动到右侧的Q3位置,从而在用户的目标手势信息在P1-P2之间移动过快时,还能够保持显示器上所显示的目标控件在P1-P3整体的移动速度变化不会太大,给用户带来移动速度均匀、目标控件连续变化的观感。
在上述各实施例的基础上,本申请实施例还提供了一种显示设备控制方法,图51为本申请实施例提供的显示设备控制过程示意图,如图51所示,该方法包括以下步骤:
S5101:获取若干帧用户行为图像。
S5102:对每一帧所述用户行为图像进行手势识别处理,获得目标手势信息。
S5103:基于所述目标手势信息,控制所述显示器显示对应的内容。
在一种实施方式中,所述基于所述目标手势信息,控制所述显示器显示对应的内容包括:
根据所述目标手势信息获取每一帧所述用户行为图像对应的光标位置;所述光标位置为所述用户行为图像中,用户的目标手势映射到所述显示器中的显示位置;
根据所述光标位置确定用户的手势移动轨迹,控制所述显示器中的光标沿着所述手势移动轨迹进行移动。
下面结合一个具体的实施例,对本申请实施例提供的显示设备控制过程进行说明,图52为本申请实施例提供的另一显示设备控制过程示意图,如图52所示,该方法包括以下步骤:
步骤5201、控制所述图像采集器采集用户的若干帧用户行为图像。
步骤5202、对所述用户行为图像进行手势识别处理,得到每一帧所述用户行为图像的目标手势信息。
步骤5203、根据所述目标手势信息获取每一帧所述用户行为图像对应的光标位置;所述光标位置为所述用户行为图像中,用户的手势映射到所述显示器中的显示位置。
步骤5204、根据所述光标位置确定用户的手势移动轨迹,控制所述显示器中的光标沿着所述手势移动轨迹进行移动。
在一种实施方式中,所述方法还包括:获取手势信息流,所述手势信息流包括连续多帧所述用户行为图像;从所述手势信息流中提取关键手势信息,所述关键手势信息包括多个阶段的关键手势类型和每个阶段的置信度参数;使用检测模型匹配所述关键手势信息,以获得目标手势信息,所述检测模型包括多个以树形结构存储的节点;每个所述节点中设有手势姿态模板和指定的下级节点;所述目标手势信息为在每个阶段关键手势类型与手势姿态模板相同,且所述置信度参数在置信度区间内的节点组合;执行所述目标手势信息关联的控制指令。
在一种实施方式中,所述方法还包括:按照预设时间间隔,从所述显示装置显示设备的视频采集装置所采集的视频数据的连续多帧图像中,抽取一帧待检测图像;使用第一检测模型判断所述待检测图像中是否包括人体的手势信息;若是,按照所述预设时间间隔和预设数量,从所述视频数据中继续抽取预设数量的用户行为图像待检测图像,并使用第二检测模型分别识别所述预设数量的用户行为图像待检测图像中人体的目标手势信息和肢体信息;其中,所述第一检测模型计算时的数据量小于所述第二检测模型计算时的数据量;执行所述预设数量的用户行为图像待检测图像中的所述目标手势信息和所述肢体信息对应的控制命令。
在一种实施方式中,所述方法还包括:识别第一用户行为图像中的目标手势信息;在所述第一用户行为图像中以所述目标手势信息对应的第一焦点位置为中心,建立矩形虚拟框,在所述显示屏幕的第一显示位置显示目标控件,并确定所述矩形虚拟框与所述显示设备的显示器之间的映射关系;当所述第一用户行为图像之后的第二用户行为图像中包括所述目标手势信息,且所述目标手势信息对应的第二焦点位置位于所述矩形虚拟框中时,根据所述第二焦点位置和所述映射关系,确定所述显示器上的第二显示位置;控制所述显示器上目标控件移动到所述第二显示位置。
图53为本申请实施例提供的显示设备的控制方法一实施例的流程示意图,如图53所示的一种具体的实现方式中,该过程包括以下步骤:
S5301:显示设备的控制器首先进行手势检测,若手势状态正常,则执行S5302-S5306, 否则,执行S5307。
S5302-S5306:根据手在虚拟框中的位置进行电视界面光标位置映射,进行手势移动控制,手势速度手势方向更新,手势点击检测,手势返回检测等。
S5307:进行多帧(一般为三帧)的行动预测。
S5308:期间若重新检测到手势,则执行S5312,否则,执行S5309。
S5309-S5310:清除电视界面中的鼠标,若长时间检测不到手势,执行S5311。
S5311:退出手势肢体识别,进入全局手势检测方案,直到检测到焦点手势。
S5312:进行焦点重置,若距离较近则继续移动,若距离较远,则重置焦点为电视中心位置。其中,在进行焦点重置的时候,需要进行虚拟框重新生成。此外,若多次检测不到手势。
为了方便解释,已经结合具体的实施方式进行了上述说明。但是,上述示例性的讨论不是意图穷尽或者将实施方式限定到上述公开的具体形式。根据上述的教导,可以得到多种修改和变形。上述实施方式的选择和描述是为了更好的解释原理以及实际的应用,从而使得本领域技术人员更好的使用实施方式以及适于具体使用考虑的各种不同的变形的实施方式。

Claims (43)

  1. 一种显示设备,包括:
    显示器,被配置为显示图像;
    图像输入接口,被配置为获取用户行为图像;
    控制器,被配置为:
    获取若干帧用户行为图像;对每一帧所述用户行为图像进行手势识别处理,获得目标手势信息;基于所述目标手势信息,控制所述显示器显示对应的内容。
  2. 根据权利要求1所述的显示设备,所述图像输入接口,具体被配置为连接图像采集器;
    控制器,被配置为:
    控制所述图像采集器采集用户的若干帧用户行为图像;
    对所述用户行为图像进行手势识别处理,得到每一帧所述用户行为图像的所述目标手势信息;
    根据所述目标手势信息获取每一帧所述用户行为图像对应的光标位置;所述光标位置为所述用户行为图像中,用户的目标手势映射到所述显示器中的显示位置;
    根据所述光标位置确定用户的手势移动轨迹,控制所述显示器中的光标沿着所述手势移动轨迹进行移动。
  3. 根据权利要求2所述的显示设备,所述控制器还被进一步配置为通过下述执行对所述用户行为图像进行手势识别处理,得到每一帧所述用户行为图像的所述目标手势信息:
    基于预设的动态手势识别模型对所述用户行为图像进行处理,得到每一帧所述用户行为图像的目标手势信息;
    检测所述目标手势信息中是否包括虚拟位置信息,所述虚拟坐标信息为预设的目标手势在所述用户行为图像中的位置信息;
    如果所述目标手势信息包括虚拟位置信息,则确定为所述用户行为图像中检测到了目标手势;
    如果所述目标手势信息不包括虚拟位置信息,则确定为所述用户行为图像中没有检测到目标手势。
  4. 根据权利要求3所述的显示设备,所述控制器进一步被配置为:
    在执行根据所述目标手势信息获取每一帧所述用户行为图像对应的光标位置的步骤中,
    对于某一帧的目标用户行为图像,判断所述目标用户行为图像中是否检测到目标手势;
    如果检测到了目标手势,则根据所述虚拟位置信息获取所述目标用户行为图像对应的光标位置;
    如果没有检测到目标手势,则预测所述目标用户行为图像对应的光标位置。
  5. 根据权利要求4所述的显示设备,所述控制器进一步被配置为:
    在执行根据所述虚拟位置信息获取所述目标用户行为图像对应的光标位置的步骤中,
    根据所述虚拟位置信息,将所述用户行为图像的用户手势映射到所述显示器中,得到原始光标位置;
    根据所述用户行为图像的上一帧用户行为图像对应的光标位置以及预设的调节阈值获取第一位置数值;根据所述原始光标位置和预设的调节阈值获取第二位置数值;
    根据所述第一位置数值和所述第二位置数值获取所述目标用户行为图像对应的目标光标位置。
  6. 根据权利要求5所述的显示设备,所述预设的调节阈值的设定方法包括:
    Figure PCTCN2022109185-appb-100001
    其中:
    E 1表示预设的调节阈值;k表示第一调节参数;g表示第二调节参数;
    S g表示所述目标用户行为图像的尺寸;
    S c表示所述目标用户行为图像的前一帧用户行为图像对应的光标位置处的控件的尺寸;
    S tv表示所述显示器的尺寸。
  7. 根据权利要求4所述的显示设备,所述控制器进一步被配置为:
    在执行预测所述目标用户行为图像对应的光标位置的步骤中,
    判断在所述目标用户行为图像之前,预设的检测数量的用户行为图像中,没有检测到目标手势的用户行为图像的数量是否超过了预设的检测阈值;
    若否,则确定为所述光标应进行第一类运动,并对所述目标用户行为图像进行第一处理,得到所述目标用户行为图像对应的目标光标位置;
    若是,则确定为所述光标应进行第二类运动,并对所述目标用户行为图像进行第二处理,得到所述目标用户行为图像对应的目标光标位置。
  8. 根据权利要求7所述的显示设备,所述第一类运动为直线运动,所述控制器进一步被配置为:
    在执行对所述目标用户行为图像进行第一处理的步骤中,
    根据所述目标用户行为图像的前两帧用户行为图像对应的光标位置获取历史光标位置偏移量;
    根据所述历史光标位置偏移量和第一时间获取光标移动速度;所述第一时间为:预设的动态手势识别模型处理所述前两帧用户行为图像所间隔的时间;
    根据所述光标移动速度、第二时间和预设的第一预测阈值获取所述光标的目标光标位置偏移量;所述第二时间为:预设的动态手势识别模型处理所述目标用户行为图像和前一帧用户行为图像所间隔的时间;
    对所述前一帧用户行为图像对应的坐标位置和所述目标光标位置偏移量求和,得到目标光标位置。
  9. 根据权利要求8所述的显示设备,所述第一预测阈值的设定方法包括:
    Figure PCTCN2022109185-appb-100002
    其中:
    E 2表示第一预测阈值;a1表示第一预测参数;a2表示第二预测参数;
    D f表示预设时间内预设的动态手势识别模型对用户行为图像的处理速率;
    C f表示预设时间内所述图像采集器231采集用户行为图像的速率;
    P f表示预设时间内光标移动的帧率。
  10. 根据权利要求7所述的显示设备,所述第一类运动为曲线运动,所述控制器进一步被配置为:
    在执行对所述目标用户行为图像进行第二处理的步骤中,
    根据所述目标用户行为图像的前两帧用户行为图像对应的光标位置获取历史光标位置偏移量;
    根据所述历史光标位置偏移量和第一时间获取光标移动速度;所述第一时间为:预设的动态手势识别模型处理所述前两帧用户行为图像所间隔的时间;
    根据所述光标移动速度、第二时间和预设的第二预测阈值获取所述光标的目标光标位置偏移量;所述第二时间为:预设的动态手势识别模型处理所述目标用户行为图像和前一帧用户行为图像所间隔的时间;
    对所述前一帧用户行为图像对应的坐标位置和所述目标光标位置偏移量求差值,得到目标光标位置。
  11. 根据权利要求4所述的显示设备,所述控制器进一步被配置为:
    在执行预测所述目标用户行为图像对应的光标位置的步骤前,
    检测所述目标用户行为图像之前的预设阈值的用户行为图像中,是否全部没有检测到目标手势;
    若是,则控制所述光标不进行移动;
    若否,则执行预测所述目标用户行为图像对应的光标位置的步骤。
  12. 根据权利要求1所述的显示设备,所述控制器,具体被配置为:
    获取手势信息流,所述手势信息流包括连续多帧所述用户行为图像;
    从所述手势信息流中提取关键手势信息,所述关键手势信息包括多个阶段的关键手势类型和每个阶段的置信度参数;
    使用检测模型匹配所述关键手势信息,以获得目标手势信息,所述检测模型包括多个以树形结构存储的节点;每个所述节点中设有手势姿态模板和指定的下级节点;所述目标手势信息为在每个阶段关键手势类型与手势姿态模板相同,且所述置信度参数在置信度区间内的节点组合;
    执行所述目标手势信息关联的控制指令。
  13. 根据权利要求12所述的显示设备,所述控制器被进一步配置为通过下述执行从所述手势信息流中提取关键手势信息:
    识别所述用户行为图像中的关键点坐标,所述关键点坐标用于表征手关节在所述用户行为图像中的成像位置;
    提取预设关键点标准坐标;
    计算所述关键点坐标与所述关键点标准坐标的差值;
    如果所述差值小于或等于预设识别阈值,确定所述关键点标准坐标对应的手势类型为目标手势类型;
    根据多个连续帧用户行为图像,划分动态手势的多个阶段,每个阶段中的用户行为图像归属于相同的所述目标手势类型。
  14. 根据权利要求13所述的显示设备,所述置信度参数包括关键手势偏角,所述控制器被配置为:
    根据所述关键点坐标与所述关键点标准坐标,计算手势偏角;
    遍历每个阶段中多个连续帧用户行为图像对应的手势偏角,以获得每个阶段中的偏角并集;
    提取每个阶段中的所述偏角并集中的极值,以作为当前阶段关键手势信息中的关键手势偏角。
  15. 根据权利要求12所述的显示设备,所述控制器被配置为:
    使用检测模型匹配所述关键手势信息的步骤中,从所述多阶段关键手势信息中提取第一阶段关键手势类型;
    根据第一阶段关键手势类型匹配第一节点,所述第一节点为存储的手势姿态模板与第一阶段关键手势类型相同的节点;
    从所述关键手势信息中提取第二阶段关键手势类型,所述第二阶段为第一阶段的后续动作阶段;
    根据第二阶段关键手势类型匹配第二节点,所述第二节点为存储的手势姿态模板与第二阶段关键手势类型相同的节点;所述第一节点指定的下级节点包括第二节点;
    记录所述第一节点和所述第二节点,以获得动作分支。
  16. 根据权利要求15所述的显示设备,所述控制器被配置为:
    根据第二阶段关键手势类型匹配第二节点的步骤中,遍历所述第一节点指定下级节点 存储的手势姿态模板;
    如果所有下级节点存储的手势姿态模板均与所述第二阶段关键手势类型不同,控制所述显示器显示录入界面;
    接收用户基于所述录入界面输入的录入手势信息;
    响应于所述录入手势信息,为所述检测模型设置新节点,所述新节点为所述第一节点的下级节点;
    在所述新节点存储所述第二阶段手势类型,以作为所述新节点的手势姿态模板。
  17. 根据权利要求15所述的显示设备,所述控制器被配置为:
    获取所述检测模型中各节点预设的置信度区间;
    对比各阶段关键手势偏角与对应节点的置信度区间;
    如果所述关键手势偏角不在所述置信度区间内,按照所述手势偏角修改所述置信度区间。
  18. 根据权利要求15所述的显示设备,所述置信度参数还包括关键手势维持帧数;所述控制器被配置为:
    根据第二阶段关键手势类型匹配第二节点的步骤前,获取维持帧数,所述维持帧数为所述用户行为图像中与第一阶段关键手势类型相同的连续帧数;
    如果第一阶段关键手势类型的维持帧数大于或等于帧数阈值,根据第二阶段关键手势类型匹配第二节点;
    如果第一阶段关键手势类型的维持帧数小于所述帧数阈值,控制所述显示器显示录入界面。
  19. 根据权利要求12所述的显示设备,所述控制器被配置为:
    获取中间阶段置信度参数,所述中间阶段为关键手势信息的多阶段中,位于开始阶段和结束阶段之间一个阶段;
    对比所述中间阶段置信度参数与对应节点的置信度区间;
    如果所述中间阶段置信度参数不在对应节点的置信度区间内,标记所述中间阶段对应的节点为预跳转节点;
    按照所述检测模型对所述预跳转节点的下级节点执行匹配,以根据所述预跳转节点的下级节点匹配结果确定目标手势信息。
  20. 根据权利要求19所述的显示设备,所述控制器被配置为:
    按照所述检测模型对所述预跳转节点的下级节点执行匹配的步骤中,获取所述预跳转节点的下级节点匹配结果;
    如果所述匹配结果为命中任一下级节点,记录所述预跳转节点和命中的下级节点,以作为所述目标手势信息的节点;
    如果所述匹配结果为未命中下级节点,舍弃所述预跳转节点。
  21. 根据权利要求1所述的显示设备,所述显示设备还包括:
    视频采集装置,被配置为采集视频数据;
    所述控制器,被配置为按照预设时间间隔,从所述视频采集装置所采集的视频数据的连续多帧图像中,抽取一帧待检测图像;使用第一检测模型判断所述待检测图像中是否包括人体的手势信息;若是,按照所述预设时间间隔和预设数量,从所述视频数据中继续抽取预设数量的用户行为图像,并使用第二检测模型分别识别所述预设数量的用户行为图像中人体的目标手势信息和肢体信息;其中,所述第一检测模型计算时的数据量小于所述第二检测模型计算时的数据量;执行所述预设数量的用户行为图像中的所述目标手势信息和所述肢体信息对应的控制命令。
  22. 根据权利要求21所述的显示设备,所述控制器具体被配置为:当所述预设数量的用户行为图像中的所述目标手势信息和所述肢体信息全部相同或者部分相同时,通过映射关系确定全部相同或者部分相同的所述目标手势信息和所述肢体信息对应的控制命令; 其中,所述映射关系包括:多个控制命令,以及每个控制命令与目标手势信息、肢体信息之间的对应关系;执行所述控制命令。
  23. 根据权利要求21所述的显示设备,所述控制命令为控制所述显示器上目标控件移动至所述手势信息对应的位置的移动命令;
    所述控制器还被配置为:重复按照所述预设时间间隔和预设数量,从所述视频数据中继续抽取预设数量的待检测图像,并使用第二检测模型分别识别所述预设数量的用户行为图像中人体的目标手势信息和肢体信息,执行将目标控件移动至每所述预设数量的用户行为图像中的所述目标手势信息对应的位置的控制命令。
  24. 根据权利要求23所述的显示设备,所述控制器还被配置为:当所述预设数量的用户行为图像中不包括所述目标手势信息,根据上一次所执行的控制命令对应的多帧用户行为图像中目标手势信息对应的移动速度和移动方向,确定所述目标手势信息在所述预设数量的用户行为图像中对应的预测位置;执行控制所述目标控件移动至所述预测位置的移动命令。
  25. 根据权利要求24所述的显示设备,所述控制器还被配置为:存储所述预设数量的用户行为图像中的所述目标手势信息和所述肢体信息。
  26. 根据权利要求21-25任一项所述的显示设备,所述控制器还被配置为:根据所述预设时间间隔,确定所述预设数量;其中,所述预设时间间隔的长度数值与所述预设数量的数值成反比例对应关系。
  27. 根据权利要求21-25任一项所述的显示设备,所述控制器还被配置为:当执行所述控制命令之后,停止使用所述第二检测模型识别所述预设数量的用户行为图像中人体的目标手势信息和肢体信息;或者,
    当识别到所述预设数量的用户行为图像中人体的目标手势信息和肢体信息对应于停止命令时,停止使用所述第二检测模型识别所述预设数量的用户行为图像中人体的目标手势信息和肢体信息;或者,
    当所述预设数量的用户行为图像中不包括人体的目标手势信息和肢体信息时,停止使用所述第二检测模型识别所述预设数量的用户行为图像中人体的目标手势信息和肢体信息。
  28. 根据权利要求21-25任一项所述的显示设备,所述控制器还被配置为:使用所述第二检测模型得到的预设数量的用户行为图像中人体的目标手势信息,更新所述第一检测模型。
  29. 根据权利要求21-25任一项所述的显示设备,所述控制器还被配置为:根据所述显示设备的工作参数,确定与所述工作参数对应的预设时间间隔。
  30. 根据权利要求1所述的显示设备,所述显示设备还包括:
    视频采集装置,被配置为采集视频数据;
    所述控制器,具体被配置为识别第一用户行为图像中的目标手势信息;在所述第一用户行为图像中以所述目标手势信息对应的第一焦点位置为中心,建立矩形虚拟框,控制所述显示器的第一显示位置显示目标控件,并确定所述矩形虚拟框与所述显示器之间的映射关系;当所述第一用户行为图像之后的第二用户行为图像中包括所述目标手势信息,且所述目标手势信息对应的第二焦点位置位于所述矩形虚拟框中时,根据所述第二焦点位置和所述映射关系,确定所述显示器上的第二显示位置;控制所述显示器上目标控件移动到所述第二显示位置。
  31. 根据权利要求30所述的显示设备,所述控制器具体被配置为:根据所述映射关系,以及所述第二焦点位置与虚拟框内目标位置的第一相对距离,确定所述第二显示位置与所述显示器上目标位置的第二相对距离;根据所述显示器上目标位置的位置和所述第二相对距离,确定所述显示器上的第二显示位置。
  32. 根据权利要求31所述的显示设备,所述控制器还被配置为:确定人体与所述视 频采集装置之间的距离;根据所述距离,确定所述矩形虚拟框的面积;其中,所述距离的数值与所述面积的数值成正比例关系。
  33. 根据权利要求32所述的显示设备,所述控制器还被配置为:当所述矩形虚拟框与所述用户行为图像的边缘区域存在重合时,对所述矩形虚拟框与所述边缘区域重合方向的边界进行压缩处理,使所述矩形虚拟框停留在所述边缘区域之外;其中,所述边缘区域为与所述用户行为图像的一个边界之间的距离小于预设距离的区域。
  34. 根据权利要求30-33任一项所述的显示设备,所述控制器还被配置为:当所述第一用户行为图像之后的第二用户行为图像中包括所述目标手势信息,且所述目标手势信息对应的第二焦点位置位于所述矩形虚拟框之外,以所述第二焦点位置为中心,重新建立矩形虚拟框和映射关系;或者,当所述第一用户行为图像之后的第二用户行为图像中包括所述目标手势信息,且所述目标手势信息对应的第二焦点位置位于所述矩形虚拟框之外,根据所述第二用户行为图像前一个用户行为图像中,所述目标控件所在显示位置与所述显示器的第一相对位置,重新建立矩形虚拟框和映射关系,使重新建立后的虚拟框中的第二焦点位置与所述用户行为图像的第二相对位置与所述第一相对位置相同。
  35. 根据权利要求30-33任一项所述的显示设备,所述控制器还被配置为:当连续预设数量个用户行为图像中不包括所述目标手势信息,停止在所述显示器上显示所述目标控件;或者,当预设时间段内的用户行为图像中不包括所述目标手势信息,停止在所述显示器上显示所述目标控件;或者,当识别到用户行为图像中包括的目标手势信息对应于停止命令时,停止在所述显示器上显示所述目标控件。
  36. 根据权利要求30所述的显示设备,所述控制器还被配置为:当所述第一用户行为图像之后的第二用户行为图像中包括所述目标手势信息,所述目标手势信息对应的第二焦点位置位于所述矩形虚拟框中,且所述第二焦点位置和所述第一焦点位置之间的距离与预设时间间隔之比大于预设阈值时,在所述第一焦点位置和所述第二焦点位置之间确定第三焦点位置,其中,所述第三焦点位置和所述第一焦点位置之间的距离与所述预设时间间隔之比不大于所述预设阈值;根据所述映射关系和所述第三焦点位置与虚拟框内目标位置的第一相对距离,确定所述第二显示位置与所述显示器上目标位置的第二相对距离,并根据所述显示器上目标位置的位置和所述第二相对距离,确定所述显示器上的第二显示位置;控制所述显示器上目标控件移动到所述第二显示位置。
  37. 根据权利要求36所述的显示设备,所述控制器还被配置为:当所述第二用户行为图像之后抽取的第三用户行为图像中包括所述目标手势信息,所述目标手势信息对应的第四焦点位置位于所述矩形虚拟框中,且所述第三用户行为图像中所述目标手势信息对应的第四焦点位置和所述第三焦点位置之间的距离与所述预设时间间隔之比不大于所述预设阈值时,根据所述映射关系和所述第四焦点位置与虚拟框内目标位置的第一相对距离,确定第三显示位置与所述显示器上目标位置的第二相对距离,并根据所述显示器上目标位置的位置和所述第二相对距离,确定所述显示器上的第三显示位置;控制所述显示器上目标控件移动到所述第三显示位置。
  38. 根据权利要求34所述的显示设备,所述控制器还被配置为:当确定需要重新建立矩形虚拟框时,在所述显示器上显示建立请求信息;当重新建立所述矩形虚拟框之后,在显示器上显示重新建立后的所述矩形虚拟框的信息。
  39. 一种显示设备控制方法,所述方法包括:
    获取若干帧用户行为图像;
    对每一帧所述用户行为图像进行手势识别处理,获得目标手势信息;
    基于所述目标手势信息,控制所述显示器显示对应的内容。
  40. 根据权利要求39所述的方法,所述基于所述目标手势信息,控制所述显示器显示对应的内容包括:
    根据所述目标手势信息获取每一帧所述用户行为图像对应的光标位置;所述光标位置 为所述用户行为图像中,用户的目标手势映射到所述显示器中的显示位置;
    根据所述光标位置确定用户的手势移动轨迹,控制所述显示器中的光标沿着所述手势移动轨迹进行移动。
  41. 根据权利要求39所述的方法,所述方法还包括:
    获取手势信息流,所述手势信息流包括连续多帧所述用户行为图像;
    从所述手势信息流中提取关键手势信息,所述关键手势信息包括多个阶段的关键手势类型和每个阶段的置信度参数;
    使用检测模型匹配所述关键手势信息,以获得目标手势信息,所述检测模型包括多个以树形结构存储的节点;每个所述节点中设有手势姿态模板和指定的下级节点;所述目标手势信息为在每个阶段关键手势类型与手势姿态模板相同,且所述置信度参数在置信度区间内的节点组合;
    执行所述目标手势信息关联的控制指令。
  42. 根据权利要求39所述的方法,所述方法还包括:
    按照预设时间间隔,从所述显示设备的视频采集装置所采集的视频数据的连续多帧图像中,抽取一帧待检测图像;
    使用第一检测模型判断所述待检测图像中是否包括人体的手势信息;
    若是,按照所述预设时间间隔和预设数量,从所述视频数据中继续抽取预设数量的用户行为图像,并使用第二检测模型分别识别所述预设数量的用户行为图像中人体的目标手势信息和肢体信息;其中,所述第一检测模型计算时的数据量小于所述第二检测模型计算时的数据量;
    执行所述预设数量的用户行为图像中的所述目标手势信息和所述肢体信息对应的控制命令。
  43. 根据权利要求39所述的方法,所述方法还包括:
    识别第一用户行为图像中的目标手势信息;
    在所述第一用户行为图像中以所述目标手势信息对应的第一焦点位置为中心,建立矩形虚拟框,在所述显示屏幕的第一显示位置显示目标控件,并确定所述矩形虚拟框与所述显示设备的显示器之间的映射关系;
    当所述第一用户行为图像之后的第二用户行为图像中包括所述目标手势信息,且所述目标手势信息对应的第二焦点位置位于所述矩形虚拟框中时,根据所述第二焦点位置和所述映射关系,确定所述显示器上的第二显示位置;
    控制所述显示器上目标控件移动到所述第二显示位置。
PCT/CN2022/109185 2021-11-04 2022-07-29 一种显示设备及其控制方法 WO2023077886A1 (zh)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
CN202111302345.9A CN116069280A (zh) 2021-11-04 2021-11-04 显示装置及其控制方法
CN202111302336.XA CN116069229A (zh) 2021-11-04 2021-11-04 显示装置及其控制方法
CN202111302345.9 2021-11-04
CN202111302336.X 2021-11-04
CN202210266245.3A CN114610153A (zh) 2022-03-17 2022-03-17 一种显示设备及动态手势交互方法
CN202210266245.3 2022-03-17
CN202210303452.1A CN114637439A (zh) 2022-03-24 2022-03-24 显示设备和手势轨迹识别方法
CN202210303452.1 2022-03-24

Publications (1)

Publication Number Publication Date
WO2023077886A1 true WO2023077886A1 (zh) 2023-05-11

Family

ID=86240638

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/109185 WO2023077886A1 (zh) 2021-11-04 2022-07-29 一种显示设备及其控制方法

Country Status (1)

Country Link
WO (1) WO2023077886A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103576848A (zh) * 2012-08-09 2014-02-12 腾讯科技(深圳)有限公司 手势操作方法和手势操作装置
CN108921101A (zh) * 2018-07-04 2018-11-30 百度在线网络技术(北京)有限公司 基于手势识别控制指令的处理方法、设备及可读存储介质
CN110458095A (zh) * 2019-08-09 2019-11-15 厦门瑞为信息技术有限公司 一种有效手势的识别方法、控制方法、装置和电子设备
CN112668506A (zh) * 2020-12-31 2021-04-16 咪咕动漫有限公司 一种手势跟踪方法、设备及计算机可读存储介质
US20210191611A1 (en) * 2020-02-14 2021-06-24 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for controlling electronic device based on gesture
CN113282168A (zh) * 2021-05-08 2021-08-20 青岛小鸟看看科技有限公司 头戴式显示设备的信息输入方法、装置及头戴式显示设备
CN114610153A (zh) * 2022-03-17 2022-06-10 海信视像科技股份有限公司 一种显示设备及动态手势交互方法
CN114637439A (zh) * 2022-03-24 2022-06-17 海信视像科技股份有限公司 显示设备和手势轨迹识别方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103576848A (zh) * 2012-08-09 2014-02-12 腾讯科技(深圳)有限公司 手势操作方法和手势操作装置
CN108921101A (zh) * 2018-07-04 2018-11-30 百度在线网络技术(北京)有限公司 基于手势识别控制指令的处理方法、设备及可读存储介质
CN110458095A (zh) * 2019-08-09 2019-11-15 厦门瑞为信息技术有限公司 一种有效手势的识别方法、控制方法、装置和电子设备
US20210191611A1 (en) * 2020-02-14 2021-06-24 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for controlling electronic device based on gesture
CN112668506A (zh) * 2020-12-31 2021-04-16 咪咕动漫有限公司 一种手势跟踪方法、设备及计算机可读存储介质
CN113282168A (zh) * 2021-05-08 2021-08-20 青岛小鸟看看科技有限公司 头戴式显示设备的信息输入方法、装置及头戴式显示设备
CN114610153A (zh) * 2022-03-17 2022-06-10 海信视像科技股份有限公司 一种显示设备及动态手势交互方法
CN114637439A (zh) * 2022-03-24 2022-06-17 海信视像科技股份有限公司 显示设备和手势轨迹识别方法

Similar Documents

Publication Publication Date Title
CN113596537B (zh) 显示设备及播放速度方法
CN108712603B (zh) 一种图像处理方法及移动终端
CN115315679A (zh) 在多用户环境中使用手势来控制设备的方法和系统
WO2018000519A1 (zh) 一种基于投影的用户交互图标的交互控制方法及系统
CN107643828A (zh) 对车辆中的用户行为进行识别和响应的方法和系统
CN104428732A (zh) 与近眼显示器的多模交互
CN114637439A (zh) 显示设备和手势轨迹识别方法
WO2022037535A1 (zh) 显示设备及摄像头追踪方法
CN112862859A (zh) 一种人脸特征值创建方法、人物锁定追踪方法及显示设备
WO2022100262A1 (zh) 显示设备、人体姿态检测方法及应用
TWI646526B (zh) 子畫面佈局控制方法和裝置
WO2022078172A1 (zh) 一种显示设备和内容展示方法
CN114257824A (zh) 直播显示方法、装置、存储介质及计算机设备
CN107622300B (zh) 多模态虚拟机器人的认知决策方法和系统
CN113778217A (zh) 显示设备及显示设备控制方法
WO2023077886A1 (zh) 一种显示设备及其控制方法
WO2023169282A1 (zh) 确定交互手势的方法、装置及电子设备
CN114610153A (zh) 一种显示设备及动态手势交互方法
CN112473121A (zh) 一种显示设备及基于肢体识别的躲避球显示方法
WO2021238733A1 (zh) 显示设备及图像识别结果显示方法
CN112261289B (zh) 显示设备及ai算法结果获取方法
CN117980873A (zh) 一种显示设备及其控制方法
CN111880422B (zh) 设备控制方法及装置、设备、存储介质
CN112817557A (zh) 一种基于多人手势识别的音量调节方法及显示设备
WO2021190336A1 (zh) 设备控制方法、装置及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22888921

Country of ref document: EP

Kind code of ref document: A1