CN116804892A

CN116804892A - Universal equipment control method, equipment and system based on camera shooting assembly

Info

Publication number: CN116804892A
Application number: CN202210259539.3A
Authority: CN
Inventors: 杨定宇; 凌泽志; 余智平; 陈佳兴; 刘贺
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-03-16
Filing date: 2022-03-16
Publication date: 2023-09-26
Also published as: WO2023174214A1

Abstract

The application discloses a universal equipment control method, equipment and a system based on a camera component, relates to the technical field of terminals, and can conveniently realize equipment control under the conditions of multiple scenes, equipment crossing, system crossing and the like. In the scheme, when the electronic equipment detects the preset user gesture, the control event corresponding to the user gesture is abstracted into a general event (or standard event) irrelevant to a specific scene, a system, equipment and the like, so that the method can be suitable for equipment control under the conditions of multiple scenes and the like. The method eliminates the inconvenience brought by the conventional dependence on the third-party physical equipment for equipment control, and greatly improves the convenience of man-machine interaction. In addition, the method is not affected by the difference between the platforms, and more accurate and friendly equipment control can be provided through a more comprehensive and universal interaction scheme. And reducing the learning cost of the control method between different devices for the user.

Description

Universal equipment control method, equipment and system based on camera shooting assembly

Technical Field

The embodiment of the application relates to the technical field of terminals, in particular to a universal equipment control method, equipment and system based on a camera component.

Background

With the development of intelligent terminal technology, electronic equipment can accept control of various modes of users. By way of example, a user may control an electronic device through a physical control device (e.g., remote control, mouse, etc.).

However, the above-described manner of controlling the electronic device by the user has certain drawbacks. As an example, the manner in which electronic device control is performed by the control device requires the user to carry the control device with him, which is highly dependent on the control device. As another example, the manner in which electronic device control is performed by a control device is inconvenient to use in a cross-device and/or cross-user control scenario. For example, in a cross-device control scenario, if the control device to which device a and device B are matched is different, the user is required to replace the control device. As another example, in a cross-user control scenario, a control device needs to be transferred between user a and user B.

Disclosure of Invention

The application provides a universal equipment control method, equipment and a system based on a camera assembly, which can conveniently realize equipment control under the conditions of multiple scenes, equipment crossing, system crossing and the like.

In order to achieve the above purpose, the embodiment of the application adopts the following technical scheme:

in a first aspect, there is provided an electronic device, such as a first device, the first device comprising: the camera shooting assembly is used for acquiring an image frame, wherein the image frame comprises hand characteristics of a user; a processor for running a first application; identifying a user gesture corresponding to a user hand feature in the image frame while running the first application; determining a general event according to the gesture of the user; and indicating the second application to respond to the generic event.

According to the scheme provided by the first aspect, when the electronic device detects the preset user gesture, the control event corresponding to the user gesture is abstracted into a general event (or standard event) irrelevant to a specific scene, a system, equipment and the like, so that the method can be suitable for equipment control under the conditions of multiple scenes and the like. The method eliminates the inconvenience brought by the conventional dependence on the third-party physical equipment for equipment control, and greatly improves the convenience of man-machine interaction. In addition, the method is not affected by the difference between the platforms, and more accurate and friendly equipment control can be provided through a more comprehensive and universal interaction scheme. And reducing the learning cost of the control method between different devices for the user.

In one possible implementation, the second application runs in the first device; the processor instructs a second application to respond to the generic event, including: the processor instructs the second application to respond to the generic event in accordance with the generic event in combination with the device information of the first device. The first device may be more closely related to the actual usage scenario by responding to the general event (or the standard event) according to specific device information.

In one possible implementation manner, the first device further includes: and the communication interface is used for sending a control instruction corresponding to the universal event to target equipment such as second equipment so as to instruct a second application running in the second equipment to respond to the universal event. The scheme provided by the application can be suitable for equipment control under the conditions of multiple scenes, equipment crossing and the like. The convenience of man-machine interaction is greatly improved. In addition, the method is not affected by the difference between the platforms, and more accurate and friendly equipment control can be provided through a more comprehensive and universal interaction scheme. And reducing the learning cost of the control method between different devices for the user.

In one possible implementation manner, the sending, by the communication interface, a control instruction corresponding to the general event to the second device specifically includes: and the communication interface sends a control instruction corresponding to the universal event to the second equipment according to the universal event and combining equipment information of the second equipment. As a way, the first device can embody the universal event into a control instruction matched with the actual situation according to specific device information of the second device, so that the second device can respond to the universal event accurately.

In one possible implementation, the device information includes one or more of the following: application information, interface information, device type, operating system.

In one possible implementation, the above device types include one or more of the following: video playing equipment, screen throwing equipment, storage equipment, game equipment and audio playing equipment.

In one possible implementation, the first device and the second device have different operating systems. The scheme provided by the application can be suitable for equipment control under the conditions of multiple scenes, equipment crossing, system crossing and the like. The convenience of man-machine interaction is greatly improved. In addition, the method is not affected by the difference between the platforms, and more accurate and friendly equipment control can be provided through a more comprehensive and universal interaction scheme. And reducing the learning cost of the control method between different devices for the user.

In one possible implementation manner, the first device further includes: the memory is used for storing the corresponding relation between the gestures of a plurality of users and the universal events; the processor determines a general event according to the gesture of the user, and specifically comprises the following steps: the processor determines the general event corresponding to the user gesture according to the corresponding relation between the plurality of user gestures and the plurality of general events stored in the memory. As an example, the first device may find a generic event corresponding to the user gesture based on a pre-stored correspondence.

In one possible implementation, the generic event includes one or more of the following: opening the main menu, determining, selecting, returning, increasing volume, decreasing volume, moving cursor, voice input. The universal event is suitable for multiple scenes, cross-equipment, cross-system and other situations.

In one possible implementation, the generic event is a determination or selection; the interface information of the first device represents that the second application interface comprises a first file, or the device type of the first device is a storage device; the processor instructs the second application to respond to the generic event comprising: the processor instructs the second application to perform one or more of: displaying an interface after the first file is opened; or displaying the interface of the next path of the path corresponding to the first file. The first device can embody the general event as an instruction matched with the actual situation according to the specific device information of the first device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation, the generic event is a determination or selection; the application information of the first device characterizes that the second application is an audio-video application, and the second application interface comprises a first file; the processor instructs the second application to respond to the generic event comprising: the processor instructs the second application to display an interface for playing the first file. The first device can embody the general event as an instruction matched with the actual situation according to the specific device information of the first device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation, the generic event is a determination or selection, and the interface information of the first device characterizes that the second application interface includes a first function key thereon; the processor instructs the second application to respond to the generic event comprising: the processor instructs the second application to display an interface corresponding to the first function key. The first device can embody the general event as an instruction matched with the actual situation according to the specific device information of the first device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation, the first function key includes any one of the following: selection frame, marking frame, deletion frame, play, pause, throw screen, mute, accelerate, decelerate, bullet screen, comment, praise, collection, maximize, minimize, close.

In one possible implementation, the generic event is a return; the processor instructs the second application to respond to the generic event comprising: the processor instructs the second application to return to the previous level interface. The first device can embody the general event as an instruction matched with the actual situation according to the specific device information of the first device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation manner, the general event is a voice input, and the interface information of the first device characterizes that a second application interface includes a voice input key; the processor instructs the second application to respond to the generic event comprising: the processor instructs the second application to perform one or more of: detecting a sound signal of the surrounding environment; or, opening a voice input function corresponding to the voice input key; or opening a voice input function of the second application; alternatively, the voice input function of the first device is turned on. The first device can embody the general event as an instruction matched with the actual situation according to the specific device information of the first device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation manner, the general event is movement, and the interface information of the first device characterizes that the second application interface includes a cursor; the processor instructs the second application to respond to the generic event comprising: the processor instructs to move a cursor on the second application interface. The first device can embody the general event as an instruction matched with the actual situation according to the specific device information of the first device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation manner, the first device acquires an image frame, including: the first device acquires a plurality of image frames. The method and the device can comprehensively determine the gestures of the user based on a plurality of image frames so as to improve the accuracy of the general instruction and the universality of the scene to which the scheme can be applied.

In one possible implementation manner, the user gestures corresponding to a plurality of consecutive image frames in the plurality of image frames are the same. The method and the device can comprehensively determine the gestures of the user based on a plurality of image frames so as to improve the accuracy of the general instruction.

In one possible implementation manner, the user gesture corresponding to a plurality of consecutive image frames in the plurality of image frames satisfies a preset condition. The scheme has high universality of applicable scenes, and a user can realize various control effects based on various gestures.

In one possible implementation manner, the first device identifies a user gesture corresponding to a user hand feature in an image frame, including: the first device detects a plurality of key points of the user's hand in a plurality of image frames according to the user's hand characteristics; the method comprises the steps that first equipment determines position information of a plurality of key points of a user hand in a plurality of image frames; the first device determines user gestures corresponding to the plurality of image frames according to position information of a plurality of key points of the user hands in the plurality of image frames. And determining the gesture of the user by comprehensively analyzing a plurality of image frames so as to improve the accuracy of the general instruction.

In one possible implementation, the first device determines location information of a plurality of keypoints of a user's hand in a plurality of image frames, including: the method comprises the steps that first equipment detects hands of an image i in a plurality of image frames, and first position information of a plurality of key points of a user hand in the image i is obtained; the first equipment detects the hands of the image j in the image frames and acquires second position information of a plurality of key points of the hands of the user in the image j; the first equipment corrects the second position information according to the first position information to obtain position information of a plurality of key points of the user's hand in the image j; wherein, the acquisition time corresponding to the image j is located after the acquisition time of the image i. By the method, the accuracy of the determined position information of the hand key points of the user can be ensured.

In one possible implementation, the processor is further configured to determine a location on the second device display indicated by the generic event based on the user gesture.

In one possible implementation, the processor determines a location on the second device display screen indicated by the generic event according to the user gesture, specifically including: the first device determines coordinates (x 2, y 2) on the display screen of the second device indicated by the generic event according to the following formula: x2= (1+x1) ×d/2; y2= (1+y1) ×h/2; where x1 and y1 are coordinates of a user gesture, d is a width of the second device display screen and h is a height of the second device display screen. By the calculation method, the position on the display screen of the second device indicated by the general event can be confirmed to be determined.

In one possible implementation manner, the coordinates (x 2, y 2) on the display screen of the second device indicated by the general event are coordinates under a preset coordinate system; the origin of coordinates of the preset coordinate system is the center point of the second equipment display screen, the x axis of the preset coordinate system is parallel to the horizontal side of the second equipment display screen, and the y axis of the preset coordinate system is parallel to the vertical side of the second equipment display screen. The application is not limited to a specific reference frame.

In a second aspect, there is provided an electronic device, such as a second device, comprising: a communication interface for receiving control instructions from a first application, the control instructions corresponding to a generic event; a memory for storing device information of the second device; and the processor is used for combining the device information of the second device and indicating the second application to respond to the control instruction.

In the solution provided in the second aspect, when the electronic device receives a general event (or a standard event) unrelated to a specific scenario, a system, a device, etc., the electronic device may be closer to an actual usage scenario by responding to the general event (or the standard event) according to specific device information.

In one possible implementation, the generic event is a determination or selection; the interface information of the second device represents that the second application interface comprises a first file, or the device type of the second device is a storage device; the processor instructs the second application to respond to the generic event comprising: the processor instructs the second application to perform one or more of: displaying an interface after the first file is opened; or displaying the interface of the next path of the path corresponding to the first file. The second device can embody the general event as an instruction matched with the actual situation according to the specific device information of the second device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation, the generic event is a determination or selection; the application information of the second device characterizes that the second application is an audio-video application, and the second application interface comprises a first file; the processor instructs the second application to respond to the generic event comprising: the processor instructs the second application to display an interface for playing the first file. The second device can embody the general event as an instruction matched with the actual situation according to the specific device information of the second device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation, the generic event is a determination or selection, and the interface information of the second device characterizes that the second application interface includes a first function key thereon; the processor instructs the second application to respond to the generic event comprising: the processor instructs the second application to display an interface corresponding to the first function key. The second device can embody the general event as an instruction matched with the actual situation according to the specific device information of the second device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation, the generic event is a return; the processor instructs the second application to respond to the generic event comprising: the processor instructs the second application to return to the previous level interface. The second device can embody the general event as an instruction matched with the actual situation according to the specific device information of the second device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation manner, the general event is a voice input, and the interface information of the second device characterizes that a second application interface includes a voice input key; the processor instructs the second application to respond to the generic event comprising: the processor instructs the second application to perform one or more of: detecting a sound signal of the surrounding environment; or, opening a voice input function corresponding to the voice input key; or opening a voice input function of the second application; alternatively, the voice input function of the second device is turned on. The second device can embody the general event as an instruction matched with the actual situation according to the specific device information of the second device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation manner, the general event is movement, and the interface information of the second device characterizes that a cursor is included on the second application interface; the processor instructs the second application to respond to the generic event comprising: the processor instructs to move a cursor on the second application interface. The second device can embody the general event as an instruction matched with the actual situation according to the specific device information of the second device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In a third aspect, there is provided a general-purpose apparatus control method based on an image pickup assembly, the method being applied to a first apparatus, the method comprising: acquiring an image frame, wherein the image frame comprises hand characteristics of a user; identifying a user gesture corresponding to a user hand feature in the image frame while running the first application; determining a general event according to the gesture of the user; and indicating the second application to respond to the generic event.

According to the scheme provided by the third aspect, when the electronic device detects the preset user gesture, the control event corresponding to the user gesture is abstracted into a general event (or standard event) irrelevant to a specific scene, a system, equipment and the like, so that the method can be suitable for equipment control under the conditions of multiple scenes and the like. The method eliminates the inconvenience brought by the conventional dependence on the third-party physical equipment for equipment control, and greatly improves the convenience of man-machine interaction. In addition, the method is not affected by the difference between the platforms, and more accurate and friendly equipment control can be provided through a more comprehensive and universal interaction scheme. And reducing the learning cost of the control method between different devices for the user.

In one possible implementation, the second application runs in the first device; the indicating the second application to respond to the generic event includes: and indicating the second application to respond to the universal event according to the universal event by combining the device information of the first device. The first device may be more closely related to the actual usage scenario by responding to the general event (or the standard event) according to specific device information.

In one possible implementation, the indicating the second application to respond to the generic event includes: and sending a control instruction corresponding to the general event to target equipment, such as second equipment, so as to instruct a second application running in the second equipment to respond to the general event. The scheme provided by the application can be suitable for equipment control under the conditions of multiple scenes, equipment crossing and the like. The convenience of man-machine interaction is greatly improved. In addition, the method is not affected by the difference between the platforms, and more accurate and friendly equipment control can be provided through a more comprehensive and universal interaction scheme. And reducing the learning cost of the control method between different devices for the user.

In a possible implementation manner, the sending, to the second device, a control instruction corresponding to the general event specifically includes: and according to the universal event, combining the equipment information of the second equipment, and sending a control instruction corresponding to the universal event to the second equipment. As a way, the first device can embody the universal event into a control instruction matched with the actual situation according to specific device information of the second device, so that the second device can respond to the universal event accurately.

In one possible implementation manner, correspondence between a plurality of user gestures and a plurality of general events is stored in the first device; the determining a general event according to the gesture of the user specifically includes: and determining the universal event corresponding to the user gesture according to the stored correspondence between the plurality of user gestures and the plurality of universal events. As an example, the first device may find a generic event corresponding to the user gesture based on a pre-stored correspondence.

In one possible implementation, the generic event is a determination or selection; the interface information of the first device represents that the second application interface comprises a first file, or the device type of the first device is a storage device; the indicating the second application to respond to the generic event includes: instruct the second application to perform one or more of the following: displaying an interface after the first file is opened; or displaying the interface of the next path of the path corresponding to the first file. The first device can embody the general event as an instruction matched with the actual situation according to the specific device information of the first device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation, the generic event is a determination or selection; the application information of the first device characterizes that the second application is an audio-video application, and the second application interface comprises a first file; the indicating the second application to respond to the general event includes: the second application is instructed to display an interface for playing the first file. The first device can embody the general event as an instruction matched with the actual situation according to the specific device information of the first device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation, the generic event is a determination or selection, and the interface information of the first device characterizes that the second application interface includes a first function key thereon; the indicating the second application to respond to the general event includes: and indicating the second application to display the interface corresponding to the first function key. The first device can embody the general event as an instruction matched with the actual situation according to the specific device information of the first device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation, the generic event is a return; the indicating the second application to respond to the generic event includes: the second application is instructed to return to the previous level interface. The first device can embody the general event as an instruction matched with the actual situation according to the specific device information of the first device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation manner, the general event is a voice input, and the interface information of the first device characterizes that a second application interface includes a voice input key; the indicating the second application to respond to the general event includes: instruct the second application to perform one or more of the following: detecting a sound signal of the surrounding environment; or, opening a voice input function corresponding to the voice input key; or opening a voice input function of the second application; alternatively, the voice input function of the first device is turned on. The first device can embody the general event as an instruction matched with the actual situation according to the specific device information of the first device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation manner, the general event is movement, and the interface information of the first device characterizes that the second application interface includes a cursor; the indicating the second application to respond to the general event includes: indicating movement of a cursor on the second application interface. The first device can embody the general event as an instruction matched with the actual situation according to the specific device information of the first device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation manner, the acquiring an image frame includes: a plurality of image frames is acquired. The method and the device can comprehensively determine the gestures of the user based on a plurality of image frames so as to improve the accuracy of the general instruction and the universality of the scene to which the scheme can be applied.

In one possible implementation manner, the identifying the user gesture corresponding to the user hand feature in the image frame includes: detecting a plurality of key points of the user hand in a plurality of image frames according to the hand characteristics of the user; determining position information of a plurality of key points of a user hand in a plurality of image frames; and determining user gestures corresponding to the plurality of image frames according to the position information of the plurality of key points of the user hands in the plurality of image frames. And determining the gesture of the user by comprehensively analyzing a plurality of image frames so as to improve the accuracy of the general instruction.

In one possible implementation manner, the determining the location information of the plurality of key points of the user's hand in the plurality of image frames includes: performing human hand detection on an image i in a plurality of image frames to obtain first position information of a plurality of key points of a user hand in the image i; performing human hand detection on an image j in a plurality of image frames to obtain second position information of a plurality of key points of a user hand in the image j; correcting the second position information according to the first position information to obtain position information of a plurality of key points of the user's hand in the image j; wherein, the acquisition time corresponding to the image j is located after the acquisition time of the image i. By the method, the accuracy of the determined position information of the hand key points of the user can be ensured.

In one possible implementation, the method further includes determining a location on the second device display indicated by the generic event based on the user gesture.

In one possible implementation manner, the determining, according to the gesture of the user, the position on the display screen of the second device indicated by the general event specifically includes: coordinates (x 2, y 2) on the second device display screen indicated by the generic event are determined according to the following formula: x2= (1+x1) ×d/2; y2= (1+y1) ×h/2; where x1 and y1 are coordinates of a user gesture, d is a width of the second device display screen and h is a height of the second device display screen. By the calculation method, the position on the display screen of the second device indicated by the general event can be confirmed to be determined.

In a fourth aspect, there is provided a general-purpose apparatus control method based on an image pickup assembly, the method being applied to a second apparatus, the method including: receiving a control instruction from a first application, the control instruction corresponding to a generic event; and in combination with the device information of the second device, indicating the second application to respond to the control instruction.

In the solution provided in the fourth aspect, when the electronic device receives a general event (or a standard event) unrelated to a specific scenario, a system, a device, etc., the electronic device may be closer to an actual usage scenario by responding to the general event (or the standard event) according to specific device information.

In one possible implementation, the generic event is a determination or selection; the interface information of the second device represents that the second application interface comprises a first file, or the device type of the second device is a storage device; the indicating the second application to respond to the generic event includes: instruct the second application to perform one or more of the following: displaying an interface after the first file is opened; or displaying the interface of the next path of the path corresponding to the first file. The second device can embody the general event as an instruction matched with the actual situation according to the specific device information of the second device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation, the generic event is a determination or selection; the application information of the second device characterizes that the second application is an audio-video application, and the second application interface comprises a first file; the indicating the second application to respond to the general event includes: the second application is instructed to display an interface for playing the first file. The second device can embody the general event as an instruction matched with the actual situation according to the specific device information of the second device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation, the generic event is a determination or selection, and the interface information of the second device characterizes that the second application interface includes a first function key thereon; the indicating the second application to respond to the general event includes: and indicating the second application to display the interface corresponding to the first function key. The second device can embody the general event as an instruction matched with the actual situation according to the specific device information of the second device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation, the generic event is a return; the indicating the second application to respond to the generic event includes: the second application is instructed to return to the previous level interface. The second device can embody the general event as an instruction matched with the actual situation according to the specific device information of the second device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation manner, the general event is a voice input, and the interface information of the second device characterizes that a second application interface includes a voice input key; the indicating the second application to respond to the general event includes: instruct the second application to perform one or more of the following: detecting a sound signal of the surrounding environment; or, opening a voice input function corresponding to the voice input key; or opening a voice input function of the second application; alternatively, the voice input function of the second device is turned on. The second device can embody the general event as an instruction matched with the actual situation according to the specific device information of the second device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In one possible implementation manner, the general event is movement, and the interface information of the second device characterizes that a cursor is included on the second application interface; the indicating the second application to respond to the general event includes: indicating movement of a cursor on the second application interface. The second device can embody the general event as an instruction matched with the actual situation according to the specific device information of the second device so as to accurately respond to the general event. The scheme is applicable to multiple scene situations.

In a fifth aspect, a communication system is provided, the communication system comprising: an electronic device as in any one of the possible implementations of the first aspect (e.g. the first device) and an electronic device as in any one of the possible implementations of the second aspect (e.g. the second device).

In a sixth aspect, there is provided a computer readable storage medium having stored thereon computer program code which, when executed by a processor, implements a method as in any one of the possible implementations of the third or fourth aspects.

In a seventh aspect, a chip system is provided, the chip system comprising a processor, a memory, the memory having computer program code stored therein; the computer program code, when executed by the processor, implements a method as in any one of the possible implementations of the third or fourth aspect. The chip system may be formed of a chip or may include a chip and other discrete devices.

In an eighth aspect, a computer program product is provided which, when run on a computer, causes the method as in any one of the possible implementations of the third or fourth aspect to be carried out.

Drawings

FIG. 1 is a flowchart of a method for inputting characters into a television based on human hand key points according to an embodiment of the present application;

FIG. 2 is a flowchart of a gesture-based mouse driving method according to an embodiment of the present application;

fig. 3 is a schematic hardware structure of an electronic device according to an embodiment of the present application;

fig. 4 is a schematic diagram of a general device control process based on a camera assembly according to an embodiment of the present application;

fig. 5 is a schematic diagram of another general device control process based on a camera module according to an embodiment of the present application;

fig. 6A is a flowchart of a general device control method based on a camera module according to an embodiment of the present application;

FIG. 6B is a flowchart of a method for obtaining key points of a user's hand according to an embodiment of the present application;

FIG. 7 is an exemplary diagram of two human hand keypoints provided in an embodiment of the application;

FIG. 8 is a schematic diagram of six static gestures according to an embodiment of the present application;

FIG. 9 is a schematic diagram of three dynamic gestures according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a process for determining a gesture of a dynamic user according to an embodiment of the present application;

FIG. 11 is an exemplary diagram of a cross-device control scenario provided by an embodiment of the present application;

FIG. 12 is a diagram illustrating a correspondence between a gesture position of a user and a position on a display screen of a device according to an embodiment of the present application;

fig. 13 is a schematic diagram of a method for determining a position of a cursor on a display screen of a device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. Wherein, in the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, in the description of the embodiments of the present application, "plurality" means two or more than two.

The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present embodiment, unless otherwise specified, the meaning of "plurality" is two or more.

The embodiment of the application provides a universal equipment control method based on a camera component, which is applied to the process of controlling electronic equipment.

In order to not rely on a third party physical device in performing device control, as one possible implementation, the electronic device may determine control instructions based on computer vision techniques, and respond to the control instructions. The computer vision technology can replace human eyes to recognize, track, measure and the like through the camera shooting component, so that the computer can sense the environment.

Since gestures can express rich information by a non-contact method, gesture recognition is a very important human-computer interaction mode in the field of computer vision. Based on this, as a possible implementation manner, when performing device control, the electronic device may determine a control instruction based on the human body motion acquired in real time, and then respond to the control instruction. The electronic device is usually preset with a correspondence between human actions and control instructions. By the method, the electronic equipment can directly respond to the instruction corresponding to the human body action of the user without depending on third-party physical equipment. The third-party physical device, such as a mouse, a control finger ring, a remote controller, a game handle, and the like, which is not limited by the embodiment of the application.

Illustratively, in embodiments of the present application, the human actions may include, but are not limited to: body movements, limb movements, user gestures, facial expressions, etc.

As an example, referring to fig. 1, fig. 1 illustrates a flowchart of a method for implementing character input to a television based on human hand keypoints, taking control of an electronic device based on user gestures as an example.

As shown in fig. 1, the television (i.e., the electronic device) may detect the position of a human hand key point in an image when detecting that the image acquired in real time includes a preset human hand image. Then, the television returns the approximate position of the hand according to the position of the key point of the hand to be used as a detection area for hand tracking. Further, the television detects key points of the human hand in the detection area, and determines the relative position relation of the key points of the human hand so as to determine the gesture. Finally, the television starts a gesture operation function, and determines the character which the user wants to input according to the position change of the key points of the human hand, so as to execute a corresponding action instruction and display the corresponding character.

By the man-machine interaction method shown in fig. 1, the man-machine interaction with the television can be realized through gestures of a user without depending on physical equipment, and the function of inputting characters to the television end or the function of a mouse can be realized.

As another example, referring to fig. 2, fig. 2 illustrates a gesture-based mouse driving method flowchart taking as an example the control of an electronic device based on a gesture of a user.

As shown in fig. 2, the electronic device may track a gesture in a subsequently acquired image when detecting that the image acquired in real time includes a preset hand image. Then, based on the motion trail of the tracked gesture, the electronic device determines a motion trail, and further drives a mouse component of the electronic device to move based on the motion trail.

It can be understood that the method for controlling the electronic equipment based on the human body actions acquired in real time controls the electronic equipment through the hand of the user in the detected image, so that the dependence on the third-party physical equipment is overcome, and the human-computer interaction experience is natural and efficient.

However, conventional methods for performing device control based on human body actions acquired in real time still have some drawbacks. For example, conventional methods for performing device control based on human body actions acquired in real time are mostly specific interaction schemes for devices of specific forms or specific interaction schemes for realizing specific functions. The method in the example shown in fig. 1 is used for a specific interaction scenario in which characters are entered into a television (i.e., feature morphology device) by user gestures. The method in the example shown in fig. 2 is also used to implement a mouse function (i.e., a specific function) specific interaction scheme through user gestures. The above method cannot be applied to devices other than the specific-form device or can be used to realize other functions other than the specific function.

In order to improve the universality of a method for controlling equipment based on human body actions acquired in real time on multiple scenes and the compatibility of the method with multiple equipment, the embodiment of the application provides a universal equipment control method based on a camera shooting assembly.

In some embodiments, the method for controlling a general device based on an image capturing component provided by the embodiment of the present application may also be applicable to device control under a cross-device or cross-application condition. For example, device control in a cross-device scenario controls a second device, such as through a first device.

Further, in some embodiments, the method for controlling a general device based on an image capturing component provided by the embodiments of the present application may also be applicable to device control under a cross-system condition. The device control in the cross-system case is, for example, a control of the second device by the first device, wherein the first device is different from the operating system of the second device.

The electronic device provided by the embodiment of the application may include, but is not limited to, a camera device, a personal computer (personal computer, PC), a smart phone, a netbook, a tablet computer, an intelligent wearable device (such as an intelligent watch, an intelligent bracelet, an intelligent glasses, a telephone watch, etc.), an intelligent camera, a palm computer, an intelligent television, a personal digital assistant (personal digital assistant, PDA), a portable multimedia player (portable multimedia player, PMP), a projection device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a television, a somatosensory game machine in a man-machine interaction scene, etc. The application is not limited to the specific function and structure of the electronic device.

As an example, in a cross-device control scenario, for example a scenario in which a second device is controlled by a first device, the first device comprises a camera. The second device may or may not include a camera, and the present application is not limited.

Referring to fig. 3, fig. 3 is a schematic diagram of a hardware structure of an electronic device (such as a first device or a second device) provided in an embodiment of the present application, taking a smart phone including a camera as an example.

As shown in fig. 3, the electronic device may include a processor 310, a memory (including an external memory interface 320 and an internal memory 321), a universal serial bus (universal serial bus, USB) interface 330, a charge management module 340, a power management module 341, a battery 342, an antenna 1, an antenna 2, a mobile communication module 350, a wireless communication module 360, an audio module 370, a speaker 370A, a receiver 370B, a microphone 370C, an earphone interface 370D, a sensor module 380, keys 390, a motor 391, an indicator 392, a camera 393, a display screen 394, and a subscriber identity module (subscriber identification module, SIM) card interface 395, among others. The sensor module 380 may include, among other things, pressure sensors, gyroscopic sensors, barometric pressure sensors, magnetic sensors, acceleration sensors, distance sensors, proximity sensors, fingerprint sensors, temperature sensors, touch sensors, ambient light sensors, bone conduction sensors, etc.

It should be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the electronic device. In other embodiments of the application, the electronic device may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 310 may include one or more processing units. For example: the processor 310 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a flight controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

A memory may also be provided in the processor 310 for storing instructions and data. In some embodiments, the memory in the processor 310 is a cache memory. The memory may hold instructions or data that the processor 310 has just used or recycled. If the processor 310 needs to reuse the instruction or data, it may be called directly from the memory. Repeated accesses are avoided and the latency of the processor 310 is reduced, thereby improving the efficiency of the system.

In some embodiments, processor 310 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

The charge management module 340 is configured to receive a charge input from a charger. The power management module 341 is configured to connect the battery 342, the charge management module 340 and the processor 310. The power management module 341 receives input from the battery 342 and/or the charge management module 340 to power the processor 310, the internal memory 321, the display screen 394, the camera 393, the wireless communication module 360, and the like.

The wireless communication function of the electronic device may be implemented by the antenna 1, the antenna 2, the mobile communication module 350, the wireless communication module 360, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device may be used to cover a single or multiple communications bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 350 may provide a solution for wireless communication including 2G/3G/4G/5G/6G, etc. applied on an electronic device. The mobile communication module 350 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 350 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 350 may amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate the electromagnetic waves. In some embodiments, at least some of the functional modules of the mobile communication module 350 may be disposed in the processor 310. In some embodiments, at least some of the functional modules of the mobile communication module 350 may be provided in the same device as at least some of the modules of the processor 310.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through audio means (not limited to speakers 370A, receivers 370B, etc.) or displays images or video through a display screen 394. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 350 or other functional module, independent of the processor 310.

The wireless communication module 360 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wiFi network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc. applied on an electronic device. The wireless communication module 360 may be one or more devices that integrate at least one communication processing module. The wireless communication module 360 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 310. The wireless communication module 360 may also receive a signal to be transmitted from the processor 310, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In an embodiment of the present application, the electronic device may communicate with other electronic devices, for example, send control instructions to other electronic devices, based on the wireless communication module 360, the antenna 1, and/or the antenna 2.

In some embodiments, the antenna 1 and the mobile communication module 350 of the electronic device are coupled, and the antenna 2 and the wireless communication module 360 are coupled, so that the electronic device can communicate with the network and other devices through wireless communication technology. The wireless communication techniques may include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), new Radio (NR), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite based augmentation system (satellite based augmentation systems, SBAS), etc.

The electronic device implements display functions through the GPU, display screen 394, and application processor, etc. The GPU is a microprocessor for image processing, connected to the display screen 394 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 310 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 394 is used for displaying images, videos, and the like. The display screen 394 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device may include 1 or N displays 394, N being a positive integer greater than 1.

In the embodiment of the application, the electronic device can perform application interface rendering through the GPU, and perform application interface display through the display screen 394.

The electronic device may implement shooting functions through the ISP, the camera 393, the video codec, the GPU, the display screen 394, the application processor, and the like.

The external memory interface 320 may be used to connect an external memory card, such as a micro (SD) memory card, to enable expansion of the memory capabilities of the electronic device. The external memory card communicates with the processor 310 through an external memory interface 320 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 321 may be used to store executable program code for a computer program. By way of example, computer programs may include operating system programs and application programs. The operating system may include, but is not limited to Apple treeOS, and the like. Wherein the executable program code includes instructions. The internal memory 321 may include a storage program area and a storage data area. The storage program area may store an operating system, an application program required for at least one function, and the like. The storage data area may store data created during use of the electronic device (e.g., application data, useHousehold data, etc.), etc. In addition, the internal memory 321 may include a high-speed random access memory, and may also include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 310 performs various functional applications of the electronic device and data processing by executing instructions stored in the internal memory 321, and/or instructions stored in a memory provided in the processor.

The sensor module 380 may sense a user's operation, such as a touch operation, a click operation, a slide operation, a user approaching a screen, etc.

The electronic device may implement audio functionality through an audio module 370, speaker 370A, receiver 370B, microphone 370C, an application processor, and the like. Such as music playing, recording, etc. Regarding the specific operation and function of the audio module 370, speaker 370A, receiver 370B and microphone 370C, reference may be made to the description in the conventional art.

The keys 390 include a power on key, a volume key, etc. Key 390 may be a mechanical key. Or may be a touch key. The electronic device may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device.

It should be noted that, the hardware modules included in the electronic device shown in fig. 3 are only described by way of example, and the specific structure of the electronic device is not limited. For example, if the electronic device is a PC, the electronic device may also include a keyboard, a mouse, and the like. If the electronic device is a television, the electronic device may further include a remote control or the like.

The following will specifically describe a general device control method based on a camera module provided by the embodiment of the application by taking a human body action as an example of a user gesture and combining with the accompanying drawings. As shown in fig. 4, a general device control method based on an image capturing component according to an embodiment of the present application may include the following 4 stages:

Stage 1: and an image acquisition stage.

The image acquisition stage is used for capturing an image under the current scene so as to respond to an instruction indicated by a preset user gesture in time when the user gesture is detected.

In some embodiments, the first device may support image capture based on a native application, device functionality, third party application, applet, fast application (light application), etc., as the application is not limited. For example, the first device may capture an image of the current scene when the user opens a function, application, applet, or the like that is device controlled by a gesture.

Stage 2: and a gesture recognition stage.

The gesture recognition stage is for recognizing a user gesture in an image of a current scene captured by the first device. As an example, the first device may perform user gesture recognition, gesture coordinate determination, and the like through an algorithm module integrated in the first device.

In some embodiments, the first device may perform gesture recognition based on a captured single frame image of the current scene.

In other embodiments, in order to improve accuracy of gesture recognition and avoid triggering a wrong service due to a single frame image recognition result, the first device may determine a gesture category of the user according to overall consideration of the gesture recognition result of the multiple frame images.

For example, as shown in fig. 5, after the first device (such as an algorithm module integrated in the first device) performs gesture recognition on the multi-frame image, the first device (such as a post-processing module integrated in the first device) may overall analyze the gesture recognition class result of the algorithm module on the multi-frame image to determine the gesture class of the user.

Stage 3: a generic event determination phase.

The universal event determining stage is used for abstracting the control event corresponding to the gesture of the user into a universal event (or a standard event). Wherein the generic event is a generic event (or standard event) that is independent of the specific scenario, system, device, etc. By the method, the method can be suitable for equipment control under the conditions of multiple scenes, equipment crossing, system crossing and the like.

For example, the first device may abstract the control instruction corresponding to the user gesture into a general event through a general event abstraction module (or a standard event abstraction module) integrated in the first device.

Stage 4: and a response stage.

The response phase is used for responding to the general events according to the actual use situation. Illustratively, the actual usage scenario is used to characterize one or more of, for example, application information, device information, system information, and the like.

In some embodiments, the generic event is directed to the first device. For this case, the first device may respond to the above-mentioned generic event directly according to the actual usage scenario of the first device.

In other embodiments, the generic event is directed to the second device. For this case, the first device may instruct the second device to respond to the above-described general event according to an actual usage scenario of the second device. Alternatively, the first device may respond to the generic event correspondingly to the second device's actual usage field Jing Zhishi.

It can be understood that, according to the general device control method based on the camera assembly provided by the embodiment of the application, the device control can be performed without depending on a third-party physical device. In addition, the method is not influenced by the difference between platforms in some cross-platform scenes, such as cross-application scenes, cross-equipment scenes, cross-system scenes and the like, and more accurate and friendly equipment control can be provided through a more comprehensive and universal interaction scheme. And reducing the learning cost of the control method between different devices for the user.

The method for controlling the universal device based on the camera assembly provided by the embodiment of the application is specifically introduced by taking the example that the image acquired by the first device includes the gesture of the user and combining the specific scene.

Referring to fig. 6A, fig. 6A is a flowchart of a general device control method based on an image capturing component according to an embodiment of the present application. As shown in fig. 6A, a general device control method based on an image capturing component according to an embodiment of the present application may include S601-S605, S606-1, or S601-S605, S606-2, and S607.

In FIG. 6A, S601 is an image acquisition stage (stage 1), S602-S604 is a gesture recognition stage (stage 2), S605 is a general event determination stage (stage 3), and S606-1 or S606-2& S607 is a response stage (stage 4).

S601: the first device acquires an image frame. The image frames include user hand features therein.

In some embodiments, the image frame acquired by the first device is, for example, an image captured by the first device via a camera.

In other embodiments, the image frames acquired by the first device are one or more image frames in real-time video captured by the first device via the camera. Wherein the real-time video is composed of a plurality of image frames. For this case, S601 may specifically include: the first device acquires a plurality of image frames, wherein one or more of the plurality of image frames includes a user hand feature.

For example, if the image frame acquired by the first device includes a plurality of image frames in the real-time video captured by the first device through the camera, the plurality of image frames may be continuous image frames or discontinuous image frames, which is not limited by the present application.

As a possible implementation, the first device may first detect whether the user is included in the image frame. When the user is included in the image frame, the first device may further detect whether the user hand feature is included in the image frame.

Illustratively, as shown in fig. 6B, the first device may determine whether the user is included in the image through face detection. If the user is included in the image, further exemplary, the first device may determine the approximate range of the user and perform a human hand detection within that range to determine whether the user hand features are included in the image frame.

As one example, the first device may detect whether user hand features are included in the image frames based on a single shot multi box detector (single shot multibox detector, SSD) detection network, neural Network (NN), such as deep neural network (deep neural network, DNN), convolutional neural network (convolutional neuron network, CNN), or the like.

As for the method and specific procedure of determining whether the image frame includes the hand feature of the user by the first device, the present application is not limited and reference may be made to conventional techniques.

S602: the first device detects a plurality of keypoints of a user's hand in an image frame.

As one possible implementation, the first device may detect a plurality of keypoints of the user's hand upon detecting that the user's hand feature is included in the image frame.

As a possible implementation manner, as shown in fig. 6B, the first device may acquire coordinates of a hand feature of the user when detecting that the hand feature of the user is included in the image frame, and perform hand keypoint detection within a hand coordinate range of the user, so as to acquire a plurality of keypoints of the hand of the user.

As one example, the first device may detect multiple keypoints of a user's hand in an image frame based on an SSD detection network, a neural network (e.g., DNN, CNN, etc.), or the like. The specific method for detecting the key points of the human hand by the first device is not limited, and can refer to the conventional technology.

In the embodiment of the present application, the key points of the human hand may include, but are not limited to, joints of the human hand (such as the nodes A1-a15 shown in fig. 7 (a)), contours of the human hand (such as the contour points B1-B33 shown in fig. 7 (B)), and the like, which are not limited by the present application.

In the embodiment of the present application, if the first device acquires a plurality of image frames in S601, the first device detects key points of a user' S hand in each image frame by using the same method.

S603: the first device determines location information for a plurality of keypoints of a user's hand in an image frame.

As an example, the position information of a plurality of key points of the user's hand in the image frame may be represented by coordinates of the plurality of key points.

As an example, the XOY plane of the coordinate system (hereinafter referred to as a first coordinate system) referred to by the coordinates of the plurality of key points (for example, m) coincides with the plane in which the n key points of the plurality of key points are located. Wherein, m and n are positive integers, n is less than or equal to m, and the ratio of n to m meets the preset value. Illustratively, taking the gesture shown in fig. 7 (a) or fig. 7 (b) as an example, the XOY plane of the first coordinate system coincides with the palm of the user.

It should be noted that the present application is not limited to the specific arrangement principle of the first coordinate system, and the present application is applicable to any case.

If the first device acquires multiple image frames in S601, as a possible implementation manner, the first device may determine location information of multiple key points of the user' S hand in each image frame by using the same method.

In some embodiments, to ensure accuracy of the location information of the determined user hand keypoints, as another possible implementation, the first device may further correct the location information of the user hand keypoints in the current image frame based on the determined location information of the user hand keypoints.

Taking the example that the first device determines the position information of the plurality of key points of the user's hand in the image j in the acquired m image frames, the first device may correct the second position information according to the position information (such as the first position information) of the plurality of key points of the user's hand in the image i determined previously after the position information (such as the second position information) of the plurality of key points of the user's hand in the image j is acquired by performing the human hand detection on the image j. The image j and the image i are any image frames in m image frames acquired by the first device, and the acquisition time corresponding to the image j is located after the acquisition time of the image i.

As an example, image i is the previous frame image of image j.

S604: the first device determines a user gesture according to position information of a plurality of key points of a user hand in an image frame.

In the embodiment of the present application, the first device may determine the gesture of the user based on the relative positional relationship between the keypoints reflected by the positional information of the plurality of keypoints.

As one example, the first device may determine user gestures in an image frame acquired by the first device from a training set comprising a large amount of user gesture data based on a neural network. The user gesture in the image frame acquired by the first device is, for example, a user gesture in the training set that has a highest degree of matching with the plurality of keypoints.

In embodiments of the present application, user gestures may include, but are not limited to, static gestures and dynamic gestures.

The static gesture refers to a specific gesture (such as fist making, scissors hand, palm opening, etc.). The specific gesture has a corresponding relationship with the control event. Referring to fig. 8, fig. 8 shows six static gesture examples.

Dynamic gestures may refer to a combined gesture (e.g., palm opening and making a fist) composed of a plurality of specific gestures in a certain order, where the combined gesture has a correspondence to a control event. Alternatively, a dynamic gesture may refer to a motion trajectory of a specific gesture (e.g., keep a specific gesture drawing a circle, keep a specific gesture waving, etc.) that conforms to a certain motion trajectory, where a combined gesture of the motion trajectory and the specific gesture has a correspondence to a control event. Alternatively, a dynamic gesture may refer to a particular gesture that lasts for a certain length of time (e.g., a palm that opens for a certain length of time), where the combination of the length of time and the particular gesture has a correspondence to a control event.

Referring to fig. 9, fig. 9 shows three dynamic gesture examples. The dynamic gesture "gesture 1+gesture 2" shown in fig. 9 is a combined gesture formed by combining specific gestures according to a certain sequence, the dynamic gesture "gesture 3 lasts for 2 seconds+gesture 4" is a combined gesture of a specific duration and a specific gesture, and the dynamic gesture "keep gesture 4 circles" is a combined gesture of a specific motion track and a specific gesture.

As an example, if the first device acquires an image frame in S601, the user gesture in the image frame is a static gesture. For this case, the first device may determine the user gesture directly from the monitoring of the image frame.

As another example, if the first device acquires a plurality of image frames in S601, the user gesture in the image frames may be a static gesture or a dynamic gesture.

For example, assume that the user gestures corresponding to the plurality of image frames are the same, and the user gestures are static gestures. The user gesture corresponding to the image frames is assumed to satisfy a preset condition combination, for example, a combination of a plurality of specific gestures and specific conditions is satisfied, and the user gesture is a dynamic gesture.

If the first device acquires multiple image frames in S601, as a possible implementation manner, when detecting a first preset gesture based on one image frame, the first device may directly determine a user gesture according to a monitoring result of the image frame, that is, use the first preset gesture as the user gesture of the image frame. Wherein the user gesture is a static gesture.

If the first device acquires multiple image frames in S601, in order to improve the accuracy of gesture recognition, to avoid triggering a wrong service due to a single-frame image recognition result error, the first device may determine a gesture of a user by performing overall consideration according to the gesture recognition result of multiple-frame images. Wherein the user gesture is a static gesture. For example, the first device may make a user gesture in a preset number of image frames (e.g., 5 frames) with a first preset gesture when the first preset gesture is detected continuously in the preset number of image frames (e.g., 5 frames).

As another possible implementation, as shown in fig. 10, the first device may continuously track the user gesture in a subsequent image frame when the second preset gesture is detected based on one image frame. The second preset gesture is an initial gesture of the preset dynamic gesture. Further, if the first device detects a third preset gesture in the following gesture tracking process of the user, the first device determines the gesture of the user according to the gesture change process from the second preset gesture to the third preset gesture. Wherein the user gesture is a dynamic gesture. And ending the gesture of the preset dynamic gesture when the gesture is third preset. The gesture change process from the second preset gesture to the third preset gesture corresponds to a certain control event.

In some embodiments, in the gesture change process from the second preset gesture to the third preset gesture, the first device may also detect one or more other preset gestures, and the embodiments of the present application do not limit the number of gestures, specific gesture types, sequence of gestures, and the like included in the dynamic gesture.

S605: the first device determines a generic event from the user gesture. Wherein generic events apply to cross-device and/or cross-application device control scenarios.

In the embodiment of the present application, the first device may be preset with a correspondence between a plurality of user gestures and a plurality of general events (hereinafter referred to as a mapping relationship between gestures and general events). Based on this, the first device may determine, according to the mapping relationship between the gesture and the generic event, the generic event corresponding to the user gesture determined by the first device in S603.

The general event may correspond to a static gesture or a dynamic gesture. Referring to table 1 below, table 1 shows an example of mapping relationship between gestures and general events.

TABLE 1

Universal event	User gestures
		Opening main menu	Gesture 1+gesture 2
Determination of	Gesture 3
		Selection of	Gesture 4
Return to	Gesture 5
		Increasing volume	Gesture 6 last 2 seconds + gesture 7
Reducing volume	Gesture 6 lasts 2 seconds + gesture 8
		Moving a cursor	Hold gesture 9 swing
Speech input	Hold gesture 10 circle

Wherein the generic events "determine", "select" and "return" in table 1 correspond to static gestures. The general events "open main menu", "increase volume", "decrease volume", "move cursor" and "voice input" in table 1 correspond to dynamic gestures.

It should be noted that, in the embodiment of the present application, the general instruction determined by the first device may be adapted to multiple scenarios.

As one example, the first device may directly respond to the respective control event according to the determined generic instruction.

Further, the general instruction determined by the first device may also be adapted to device control in a multi-application scenario. Wherein, the application scene is such as game scene, video scene, office scene, study scene, etc. That is, based on the general instruction determined by the first device, control over the game application interface can be achieved, and control over application interfaces such as a video application interface, an office application interface, a learning application interface and the like can also be achieved.

For the case that the first device directly responds to the corresponding control event according to the determined general instruction, for example, as shown in fig. 6A, the first device may perform S606-1:

s606-1: the first device responds to the generic event in conjunction with device information of the first device.

Wherein the device information of the first device may include, but is not limited to, one or more of the following: application information, interface information, device type, operating system.

Wherein the application information includes, but is not limited to, one or more of active thread information, function information, attribute information, etc. of an application currently running (e.g., running in the foreground) by the first device. Interface information such as interface information of an application currently running (e.g., running in the foreground) by the first device. The device types include, but are not limited to, one or more of the following: video playing equipment, screen throwing equipment, storage equipment, game equipment and audio playing equipment. In some embodiments, the device type is also used to characterize the hardware configuration and/or software configuration of the device.

For example, assuming that the general event determined by the first device is determination or selection, the interface information of the first device characterizes that the interface of the first device includes a first file, and for this case, S606-1 may specifically include: the first device opens a first file; or the first device opens a next path of the path corresponding to the first file.

For example, assuming that the general event determined by the first device is determination or selection, the interface information of the first device characterizes that the interface of the first device includes the first file, and the device type of the first device is a storage device, for this case, S606-1 may specifically include: the first device opens a first file; or the first device opens a next path of the path corresponding to the first file.

For example, assuming that the general event determined by the first device is determination or selection, the interface information of the first device indicates that the interface of the first device includes a first file, and the application information of the first device indicates that the first device currently runs an audio/video application, for this case, S606-1 may specifically include: the first device plays the first file.

For example, assuming that the general event determined by the first device is determination or selection, the interface information of the first device indicates that the interface of the first device includes a first function key, and for this case, S606-1 may specifically include: the first device performs a function task of the first function key. The first function key is selected from a selection frame, a marking frame, a deleting frame, a playing frame, a suspending frame, a screen throwing frame, a mute frame, an acceleration frame, a deceleration frame, a bullet screen, a comment frame, a praise frame, a collection frame, a maximization frame, a minimization frame, a closing frame and the like, and the application is not limited and depends on specific interface display.

For example, assuming that the general event determined by the first device is return, for this case, S606-1 may specifically include: the first device returns to the previous interface of the current interface characterized by the interface information of the first device. For example, assuming that the current interface of the first device is an interface corresponding to the path where the first file is located, the first device responds to the general event and displays an interface corresponding to the path of the previous stage of the path where the first file is located.

For example, assuming that the general event determined by the first device is a voice input, the interface information of the first device indicates that the current interface of the first device includes a voice input key, and for this case, S606-1 may specifically include: the first device detects sound signals of the surrounding environment; or the first equipment opens the voice input function corresponding to the voice input key; or the first device opens a voice input function of an application currently operated by the first device, which is characterized by application information of the first device; or the first device opens the voice input function of the first device; alternatively, the first device turns on the voice assistant.

It may be understood that, in the embodiment of the present application, the general event for voice input may be a voice detection function of the first device itself, or may be a voice input function of an application running on the first device, or may be a voice input key on a current interface of the first device (such as a native interface of the first device or a third party application interface), which is not limited by the present application, and is specific to a specific manner or scenario of voice input.

For example, assuming that the general event determined by the first device is moving a cursor, the interface information of the first device indicates that the current interface of the first device includes the cursor, and the device type indicates that the first device includes a mouse, for this case, S606-1 may specifically include: the first device activates the mouse to move a cursor on a current interface of the first device according to the track indicated by the general event.

As an example, the generic instructions determined by the first device may also adapt device control across devices. For example, a user may instruct the first device to control the second device via a user gesture. Wherein a communication connection is established between the first device and the second device.

Exemplary, a cross-device control scenario is shown in fig. 11, where a first device is the projection device shown in fig. 11, and a second device is the notebook computer shown in fig. 11. The notebook computer is projecting screen to the photographic equipment, and the projection equipment is provided with a camera device. In the device control scenario shown in fig. 11, in the process of projecting a screen from a notebook computer to a photographing device, the photographing device may acquire an image including a hand of a user in real time through a camera device, and trigger a general event corresponding to a specific user gesture when the acquired image includes the specific user gesture.

Fig. 11 is only an example of a cross-device control scenario, and the present application is not limited to specific device types and functions of the first device and the second device, and a relationship (including a connection relationship) between the first device and the second device in the cross-device control scenario. The first device may also be a portable device such as a smart phone, for example. It can be understood that the portable device carried by the user is used as the first device to control the device of the cross-device, which is greatly convenient for the user to control the device.

For example, assuming that the general event determined by the first device is moving a cursor, the interface information of the first device characterizes that the cursor is included on the current interface of the first device, and for this case, S606-1 may specifically include: the first device moves a cursor on the current display interface. For example, the first device may be based on the control instructions

In some embodiments, if the general event determined by the first device is moving a cursor, the step S606-1 may further include: the first device determines a cursor movement track corresponding to the universal event. For example, the cursor movement trajectory may be represented by a plurality of coordinates having a sequential relationship.

As one possible implementation, the first device may dynamically calculate the position of the cursor on the display screen of the first device based on the position information of the user gesture in each of the plurality of image frames.

As an example, the first device may determine the position coordinates (x 2, y 2) of the cursor on the first device display screen according to the following formula:

x2＝(1+x1)×d1/2；

y2＝(1+y1)×h1/2；

wherein x1 and y1 in the above formula are coordinates of a gesture of a user, d1 is a width of a display screen of the first device, and h1 is a height of a display screen of the second device.

As an example, the coordinates (x 2, y 2) of the position of the cursor on the display screen of the first device are coordinates under a preset coordinate system (such as a second preset coordinate system). The coordinate origin of the second preset coordinate system is the center point of the first equipment display screen, the x-axis of the second preset coordinate system is parallel to the transverse side of the first equipment display screen, and the y-axis of the second preset coordinate system is parallel to the vertical side of the first equipment display screen.

As an example, please refer to fig. 12, fig. 12 shows an exemplary diagram of a correspondence between a gesture position of a user and a position on a display screen of a device. As shown in fig. 12, assume that the O point on the plane in which the user gesture is located corresponds to the O ' point on the first device display screen, and the A1 point, the A2 point, the A3 point, and the A4 point on the plane in which the user gesture is located correspond to the A1' point, the A2' point, the A3' point, and the A4' point on the first device display screen, respectively. In fig. 12, only the point O is (0, 0), the point A1 is (-1, 0), the point A2 is (1, 0), the point A3 is (0, 1), the point A4 is (0, -1), the point A1 'is the midpoint of the left edge of the first device display screen, the point A2' is the midpoint of the right edge of the first device display screen, the point A3 'is the midpoint of the upper edge of the first device display screen, and the point A4' is the midpoint of the lower edge of the first device display screen.

As an example, assume that the user gesture moves from the O point position shown in fig. 12 to the P point position shown in fig. 13, and correspondingly, the cursor moves from the O 'point position shown in fig. 12 to the P' point position shown in fig. 13 on the display screen of the device. Where OP/OA 2=o 'P'/O 'A2' is shown in fig. 13.

In some embodiments, the operating systems of the first device and the second device may be the same in a device control scenario across the devices. For example, in the cross-device control scenario shown in fig. 11, the operating systems of the notebook computer and the photographing device may be the same

In other embodiments, the operating systems of the first device and the second device may also be different in a cross-device control scenario. For example, in the cross-device control scenario shown in fig. 11, the operating systems of the notebook computer and the photographic device may be different.

For the case where the operating systems of the first device and the second device are different, the generic instructions determined by the first device still apply. That is, the general device control method based on the camera assembly provided by the embodiment of the application is also suitable for a cross-system device control scene.

For the above case where the first device performs device control across devices according to the determined general instruction, exemplarily, as shown in fig. 6A, the first device may perform S606-2:

S606-2: the first equipment sends a control instruction corresponding to the general event to the second equipment, and the control instruction is used for the second equipment to respond to the control instruction by combining equipment information of the second equipment.

As a possible implementation manner, the control instruction corresponding to the general event sent by the first device to the second device cannot be directly adapted to the second device, and the second device is required to convert the control instruction into an instruction adapted to the second device in combination with the device information of the second device.

As another possible implementation manner, the first device may determine, according to device information of the second device obtained from the second device, a control instruction corresponding to a general event adapted to the second device, and then send the control instruction corresponding to the general event to the second device. For this case, the control instructions sent by the first device to the second device are directly adapted to the second device.

Correspondingly, the second device performs the following S607:

s607: the second device responds to control instructions from the first device.

In some embodiments, assuming that the control instruction from the first device is determined by the first device according to the device information of the second device, the step S607 may specifically include: the second device directly responds to the control instruction from the first device.

In other embodiments, assuming that the control instruction from the first device cannot be directly adapted to the second device, S607 may specifically include: the second device converts the control instruction into an instruction (e.g., a target instruction) adapted to the second device in combination with device information of the second device, and the second device responds to the instruction (e.g., the target instruction).

For example, assuming that the general event corresponding to the control instruction from the first device is determination or selection, the interface information of the second device characterizes that the interface of the second device includes the first file, and the device type of the second device is a storage device, for this case, S607 may specifically include: the second device opens the first file; or the second device opens a next path of the path corresponding to the first file.

For example, assuming that the general event corresponding to the control instruction from the first device is determination or selection, the interface information of the second device indicates that the interface of the second device includes the first file, and the application information of the second device indicates that the second device currently runs the audio/video application, for this case, S607 may specifically include: the second device plays the first file.

For example, assuming that the general event corresponding to the control instruction from the first device is determination or selection, the interface information of the second device indicates that the interface of the second device includes a first function key, and for this case, S607 may specifically include: the second device performs the functional tasks of the first function key. The first function key is selected from a selection frame, a marking frame, a deleting frame, a playing frame, a suspending frame, a screen throwing frame, a mute frame, an acceleration frame, a deceleration frame, a bullet screen, a comment frame, a praise frame, a collection frame, a maximization frame, a minimization frame, a closing frame and the like, and the application is not limited and depends on specific interface display.

For example, assuming that the general event corresponding to the control instruction from the first device is return, the step S607 may specifically include: the second device returns to the previous interface of the current interface characterized by the interface information of the second device. For example, assuming that the current interface of the second device is an interface corresponding to the path where the first file is located, the second device responds to the general event to display an interface corresponding to the path of the previous stage of the path where the first file is located.

For example, assuming that the general event corresponding to the control instruction from the first device is voice input, the interface information of the second device indicates that the current interface of the second device includes a voice input key, and for this case, S607 may specifically include: the second device detects sound signals of the surrounding environment; or the second equipment opens the voice input function corresponding to the voice input key; or the second device opens the voice input function of the application currently operated by the second device, which is characterized by the application information of the second device; or the second device opens the voice input function of the second device; alternatively, the second device turns on the voice assistant.

It may be understood that, in the embodiment of the present application, the general event for voice input may be a voice detection function of the second device itself, or may be a voice input function of an application running on the second device, or may be a voice input key on a current interface of the second device (such as a native interface of the second device or a third party application interface), which is not limited by the present application, and is specific to a specific manner or scenario of voice input.

For example, assuming that the general event corresponding to the control instruction from the first device is moving a cursor, the interface information of the second device indicates that the current interface of the second device includes the cursor, and the device type indicates that the second device includes a mouse, for this case, S607 may specifically include: the second device activates the mouse to move a cursor on the current interface of the second device according to the track indicated by the general event.

For example, assuming that the general event corresponding to the control instruction from the first device is moving a cursor, the interface information of the second device indicates that the current interface of the second device includes the cursor, and for this case, S607 may specifically include: the second device moves a cursor on the current display interface.

In some embodiments, if the general event determined by the first device is moving the cursor, S607 may further include: the second device determines a cursor movement track corresponding to the universal event. For example, the cursor movement trajectory may be represented by a plurality of coordinates having a sequential relationship.

It can be understood that, based on the method provided by the embodiment of the present application, when the electronic device detects a preset user gesture, the method can be suitable for device control under multiple scenarios, cross-device, cross-system conditions, and the like by abstracting a control event corresponding to the user gesture into a general event (or a standard event) unrelated to a specific scenario, system, device, and the like. The method eliminates the inconvenience brought by the conventional dependence on the third-party physical equipment for equipment control, and greatly improves the convenience of man-machine interaction. In addition, the method is not affected by the difference between the platforms, and more accurate and friendly equipment control can be provided through a more comprehensive and universal interaction scheme. And reducing the learning cost of the control method between different devices for the user.

It is to be understood that the various aspects of the embodiments of the application may be used in any reasonable combination, and that the explanation or illustration of the various terms presented in the embodiments may be referred to or explained in the various embodiments without limitation.

It should also be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

It is to be understood that, in order to implement the functions of any of the above embodiments, an electronic device (such as a first device or a second device) includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the application can divide the functional modules of the electronic equipment (such as the first equipment or the second equipment), for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

It should be understood that each module in an electronic device (such as the first device or the second device) may be implemented in software and/or hardware, which is not particularly limited. In other words, the electronic device is presented in the form of functional modules. A "module" herein may refer to an application specific integrated circuit ASIC, an electronic circuit, a processor and memory that execute one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the described functionality.

In an alternative, when data transmission is implemented using software, it may be implemented wholly or partly in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are fully or partially implemented. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital subscriber line ((digital subscriber line, DSL)), or wireless (e.g., infrared, wireless, microwave, etc.), the computer-readable storage medium may be any available medium that can be accessed by the computer or a data storage device such as a server, data center, etc., that contains an integration of one or more available media, the available media may be magnetic media, (e.g., floppy disk, hard disk, tape), optical media (e.g., digital versatile disk (digital video disk, DVD)), or semiconductor media (e.g., solid State Disk (SSD)), etc.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware or in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in an electronic device. It is also possible that the processor and the storage medium reside as discrete components in an electronic device, such as a first device or a second device.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above.

Claims

1. An electronic device, the electronic device comprising:

the camera shooting assembly is used for acquiring image frames, and the image frames comprise hand features of a user;

a processor for running a first application; identifying a user gesture corresponding to the user hand feature in the image frame while running the first application; determining a general event according to the user gesture; and instructing the second application to respond to the generic event.

2. The electronic device of claim 1, wherein the second application is running in the electronic device;

the processor instructs the second application to respond to the generic event, including:

the processor instructs the second application to respond to the generic event in accordance with the generic event in combination with device information of the electronic device.

3. The electronic device of claim 1, wherein the electronic device further comprises:

and the communication interface is used for sending a control instruction corresponding to the universal event to target equipment so as to instruct the second application running in the target equipment to respond to the universal event.

4. The electronic device according to claim 3, wherein the communication interface sends the control instruction corresponding to the generic event to the target device, specifically including:

And the communication interface sends a control instruction corresponding to the universal event to the target equipment according to the universal event and the equipment information of the target equipment.

5. The electronic device of claim 2 or 4, wherein the device information comprises one or more of: application information, interface information, device type, operating system.

6. The electronic device of any of claims 3-5, wherein the electronic device is different from an operating system of the target device.

7. The electronic device of any one of claims 1-6, wherein the electronic device further comprises:

the memory is used for storing the corresponding relation between the gestures of a plurality of users and the universal events;

the processor determines a general event according to the user gesture, and specifically comprises the following steps:

the processor determines a general event corresponding to the user gesture according to the correspondence between the plurality of user gestures and the plurality of general events stored in the memory.

8. The electronic device of any one of claims 1-7, wherein the generic event comprises one or more of: opening the main menu, determining, selecting, returning, increasing volume, decreasing volume, moving cursor, voice input.

9. The electronic device of claim 8, wherein the generic event is a determination or selection; the interface information of the electronic equipment characterizes that the second application interface comprises a first file, or the equipment type of the electronic equipment is storage equipment;

the processor instructs the second application to perform one or more of:

displaying an interface after the first file is opened; or alternatively, the process may be performed,

and displaying an interface of a next-stage path of the path corresponding to the first file.

10. The electronic device of claim 8, wherein the generic event is a determination or selection; the application information of the electronic equipment characterizes that the second application is an audio-video application, and the second application interface comprises a first file;

the processor instructs the second application to display an interface to play the first file.

11. The electronic device of claim 8, wherein the generic event is a determination or selection, and wherein interface information of the electronic device characterizes a first function key included on the second application interface;

and the processor instructs the second application to display an interface corresponding to the first function key.

12. The electronic device of claim 8, wherein the generic event is a return;

the processor instructs the second application to return to a previous level interface.

13. The electronic device of claim 8, wherein the generic event is a voice input and the interface information of the electronic device characterizes the second application interface as including a voice input key thereon;

the processor instructs the second application to perform one or more of:

detecting a sound signal of the surrounding environment; or alternatively, the process may be performed,

opening a voice input function corresponding to the voice input key; or alternatively, the process may be performed,

opening a voice input function of the second application; or alternatively, the process may be performed,

and opening a voice input function of the electronic equipment.

14. The electronic device of claim 8, wherein the generic event is a movement, and wherein interface information of the electronic device characterizes a cursor included on the second application interface;

the processor instructs to move the cursor on the second application interface.

15. The electronic device of any one of claims 3-14, wherein the electronic device comprises a memory device,

the processor is further configured to determine a location on the target device display screen indicated by the generic event from the user gesture.

16. The electronic device of claim 15, wherein the processor determines the location on the target device display indicated by the generic event from the user gesture, specifically comprising:

the electronic device determines coordinates (x 2, y 2) on the target device display screen indicated by the generic event according to the following formula:

x2＝(1+x1)×d/2；

y2＝(1+y1)×h/2；

wherein x1 and y1 are coordinates of the user gesture, d is a width of the target device display screen, and h is a height of the target device display screen.

17. The electronic device of claim 15, wherein the coordinates (x 2, y 2) on the target device display indicated by the generic event are coordinates in a preset coordinate system;

the coordinate origin of the preset coordinate system is the center point of the display screen of the target device, the x-axis of the preset coordinate system is parallel to the transverse edge of the display screen of the target device, and the y-axis of the preset coordinate system is parallel to the vertical edge of the display screen of the target device.

18. An electronic device, the electronic device comprising:

a communication interface for receiving control instructions from a first application, the control instructions corresponding to a generic event;

a memory for storing device information of the electronic device;

and the processor is used for combining the equipment information of the electronic equipment and indicating the second application to respond to the control instruction.

19. The electronic device of claim 18, wherein the device information includes one or more of: application information, interface information, device type, operating system.

20. The electronic device of claim 18 or 19, wherein the generic event comprises one or more of: opening the main menu, determining, selecting, returning, increasing volume, decreasing volume, moving cursor, voice input.

21. The electronic device of claim 20, wherein the generic event is a determination or selection; the interface information of the electronic equipment characterizes that the second application interface comprises a first file, or the equipment type of the electronic equipment is storage equipment;

the processor instructs the second application to respond to the control instruction, including:

The processor instructs the second application to perform one or more of:

22. The electronic device of claim 20, wherein the generic event is a determination or selection; the application information of the electronic equipment characterizes that the second application is an audio-video application, and the second application interface comprises a first file;

23. The electronic device of claim 20, wherein the electronic device comprises a memory device,

the general event is determination or selection, and the interface information of the electronic equipment characterizes that the second application interface comprises a first function key;

24. The electronic device of claim 20, wherein the generic event is a return;

25. The electronic device of claim 20, wherein the generic event is a voice input and the interface information of the electronic device characterizes the second application interface as including a voice input key thereon;

the processor instructs the second application to perform one or more of:

and opening a voice input function of the electronic equipment.

26. The electronic device of claim 20, wherein the generic event is a movement and the interface information of the electronic device characterizes a cursor included on the second application interface;

the processor instructs to move the cursor on the second application interface.

27. A method for controlling a universal device based on a camera assembly, the method comprising:

the method comprises the steps that first equipment acquires an image frame, wherein the image frame comprises hand characteristics of a user;

when the first device runs a first application, identifying a user gesture corresponding to the hand feature of the user in the image frame;

the first device determines a general event according to the user gesture;

the first device instructs a second application to respond to the generic event.

28. The method of claim 27, wherein the second application is running in the first device; the first device instructs a second application to respond to the generic event, including:

the first device instructs the second application to respond to the general event according to the general event in combination with the device information of the first device.

29. The method of claim 27, wherein the first device directing the second application to respond to the generic event comprises:

and the first equipment sends a control instruction corresponding to the universal event to the second equipment so as to instruct the second application running in the second equipment to respond to the universal event.

30. A method for controlling a universal device based on a camera assembly, the method comprising:

the second device receives a control instruction based on the first application from the first device, wherein the control instruction corresponds to a general event;

and the second equipment is combined with equipment information of the second equipment to instruct a second application to respond to the control instruction.

31. The method of claim 30, wherein the device information comprises one or more of: application information, interface information, device type, operating system.

32. A communication system, the communication system comprising:

the electronic device of any of claims 1-17, and,

the electronic device of any of claims 18-26.

33. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon computer program code which, when executed by a processing circuit, implements the method according to any of claims 27-29 or 30-31.

34. A chip system, comprising a processing circuit, a storage medium having computer program code stored therein; the computer program code implementing the method of any of claims 27-29 or 30-31 when executed by the processing circuitry.

35. A computer program product for running on a computer to implement the method of any one of claims 27-29 or 30-31.