CN113071438A

CN113071438A - Control instruction generation method and device, storage medium and electronic equipment

Info

Publication number: CN113071438A
Application number: CN202010009023.4A
Authority: CN
Inventors: 徐军; 王朝; 刘琦; 王全
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2020-01-06
Filing date: 2020-01-06
Publication date: 2021-07-06
Anticipated expiration: 2040-01-06
Also published as: CN113071438B

Abstract

The embodiment of the disclosure discloses a method and a device for generating a control instruction, a storage medium and an electronic device, wherein the method comprises the following steps: determining a starting frame image of a hand of a user to be detected; recognizing gesture features corresponding to each image frame by frame from the initial frame image to obtain a gesture feature set; determining a gesture operation event of the user based on the gesture feature set; and generating a control instruction for indicating to execute a target operation event corresponding to the gesture operation event. The embodiment of the disclosure can solve the technical problem that dangerous driving of a user is easily caused because the user needs to transfer the sight to the control equipment in the existing control mode.

Description

Control instruction generation method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to gesture recognition technologies, and in particular, to a method and an apparatus for generating a control instruction, a storage medium, and an electronic device.

Background

Currently, a user may control some functions related to the automobile, such as adjusting the volume of the sound box of the vehicle, adjusting the temperature of the air conditioner of the vehicle, etc., by operating a button, a knob, a touch screen, etc. on the control device. However, when the user operates the control device, the line of sight is shifted to the control device, and dangerous driving is easily caused once the user shifts the line of sight to a place other than the road ahead while driving the automobile.

Disclosure of Invention

The disclosure is provided to solve the technical problem that dangerous driving of a user is easily caused because the user needs to transfer the sight to a control device in the existing control mode. The embodiment of the disclosure provides a method and a device for generating a control instruction, a storage medium and an electronic device.

According to an aspect of the embodiments of the present disclosure, there is provided a method for generating a control instruction, including:

determining a starting frame image of a hand of a user to be detected;

recognizing gesture features corresponding to each image frame by frame from the initial frame image to obtain a gesture feature set;

determining a gesture operation event of the user based on the gesture feature set;

and generating a control instruction for indicating to execute a target operation event corresponding to the gesture operation event.

According to another aspect of the embodiments of the present disclosure, there is provided a control instruction generation apparatus, including:

the first determining module is used for determining a starting frame image of a hand of a user to be detected;

the identification module is used for identifying the gesture characteristics corresponding to each image frame by frame from the initial frame image determined by the first determination module to obtain a gesture characteristic set;

a second determination module, configured to determine a gesture operation event of the user based on the gesture feature set recognized by the recognition module;

and the generating module is used for generating a control instruction for instructing to execute a target operation event corresponding to the gesture operation event determined by the second determining module.

According to still another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the method for generating a control instruction according to any one of the embodiments.

According to still another aspect of the embodiments of the present disclosure, there is provided the electronic device including:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to execute the method for generating a control instruction according to any one of the embodiments.

According to the method for generating the control instruction provided by the embodiment of the disclosure, the starting frame image of the hand of the user to be detected is determined, the gesture features corresponding to each image are identified frame by frame from the starting frame image to obtain the gesture feature set, the gesture operation event of the user is determined based on the gesture feature set, and the control instruction for indicating the target operation event corresponding to the execution of the gesture operation event is generated.

The technical solution of the present disclosure is further described in detail by the accompanying drawings and examples.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is an exemplary scene diagram of a method for generating a control instruction in an application according to the present disclosure.

Fig. 2 is a flowchart illustrating a method for generating a control instruction according to an exemplary embodiment of the present disclosure.

Fig. 3A is a schematic diagram of a gesture operation event provided by an exemplary embodiment of the present disclosure.

Fig. 3B is a schematic diagram of a gesture operation event provided by another exemplary embodiment of the present disclosure.

Fig. 3C is a schematic diagram of a gesture operation event provided by yet another exemplary embodiment of the present disclosure.

Fig. 4 is a flowchart illustrating a method for generating a control instruction according to another exemplary embodiment of the present disclosure.

Fig. 5 is a flowchart illustrating a method for generating a control instruction according to still another exemplary embodiment of the present disclosure.

Fig. 6 is a flowchart illustrating a method for generating a control instruction according to still another exemplary embodiment of the present disclosure.

Fig. 7 is a schematic structural diagram of a device for generating a control instruction according to an exemplary embodiment of the present disclosure.

Fig. 8 is a schematic structural diagram of a device for generating a control instruction according to another exemplary embodiment of the present disclosure.

Fig. 9 is a schematic structural diagram of a control instruction generation apparatus according to still another exemplary embodiment of the present disclosure.

Fig. 10 is a block diagram of an electronic device provided in an exemplary embodiment of the present disclosure.

Detailed Description

Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present disclosure are used merely to distinguish one element from another, and are not intended to imply any particular technical meaning, nor is the necessary logical order between them.

It is also understood that in embodiments of the present disclosure, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.

It is also to be understood that any reference to any component, data, or structure in the embodiments of the disclosure, may be generally understood as one or more, unless explicitly defined otherwise or stated otherwise.

In addition, the term "and/or" in the present disclosure is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the former and latter associated objects are in an "or" relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

The disclosed embodiments may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

Summary of the application

In the course of implementing the present disclosure, the inventor finds that, at present, a user controls some functions related to the automobile by operating buttons, knobs, touch screens and other components on the control device. However, the user's sight line is shifted to the control device when operating the control device, and dangerous driving is easily caused once the user shifts the sight line to a place other than the road ahead while driving the automobile.

Exemplary System

As shown in fig. 1, this exemplary scenario includes a car 110, a user 120, a camera device 130, and a control device 140. Wherein the image pickup apparatus 130 and the control apparatus 140 are communicatively connected.

It should be noted that, in the scenario illustrated in fig. 1, only two devices that are the control device 140 and the imaging device 130 are independent from each other are taken as an example, in an application, the two devices may also be integrated, for example, the control device 140 may have a camera (not shown in fig. 1).

In the present application, the user 120 may control some functions related to the automobile 110, such as adjusting the volume of a sound box (not shown in fig. 1), controlling the opening or closing of a window, controlling the temperature of an air conditioner (not shown in fig. 1) on the automobile, and the like, by operating a button, a knob, a touch screen (not shown in fig. 1), and the like on the control device 140.

However, the user 120 needs to shift his/her line of sight to the control device 140 when the above-described function is implemented by operating a component on the control device 140, which may easily lead to dangerous driving in such a case.

Based on this, the present disclosure proposes a method for generating a control instruction, in which the user 120 may control a part of functions related to the automobile 110 by making a gesture operation event, such as putting a hand, making a fist, and the like.

Exemplary method

Fig. 2 is a flowchart illustrating a method for generating a control instruction according to an exemplary embodiment of the present disclosure. The present embodiment can be applied to an electronic device, such as the control device 140 illustrated in fig. 1, as shown in fig. 2, and includes the following steps:

step 201, determining a start frame image of a hand of a user to be detected.

In an embodiment, based on the application scenario illustrated in fig. 1, the camera device 130 may be in an image capturing state at all times, and thus, when the user 120 makes a gesture operation event, the camera device 130 may capture an image of a process in which the user 120 makes the gesture operation event.

In an embodiment, the camera device 130 may further send the captured image to the control device 140 in real time, and after receiving the image sent by the camera device 130, the control device 140 may detect whether the image includes the hand of the user 120. When the control device 140 detects the hand of the user 120 in the received image, the image may be determined as a start frame image for which detection of the hand of the user is required.

In an embodiment, the control device 140 may detect whether the hand of the user 120 is included in the image by detecting whether a hand-related biometric, such as a fingerprint, is included in the image. It should be noted that the above description is only an exemplary description of how the control device 140 detects whether the image includes the hand of the user, and other implementations may exist in practical applications, and the disclosure is not limited thereto.

Step 202, starting from the start frame image, recognizing the gesture feature corresponding to each image frame by frame to obtain a gesture feature set.

In an embodiment, the control device 140 may, starting from the received start frame image, recognize the gesture feature corresponding to each image every time one frame image is received, and put the recognized gesture feature corresponding to each image into the same set, so as to obtain a gesture feature set.

In an embodiment, when the hand of the user 120 is not detected in an image after the starting frame image, the control device 140 may stop recognizing the gesture feature corresponding to the subsequently received image until the next time the hand of the user 120 is detected again from the received image, and return to performing step 201.

Step 203, determining a gesture operation event of the user based on the gesture feature set.

In one embodiment, the gesture operation event is a dynamic gesture operation event, for example, a gesture operation event that a forefinger of a single hand circles clockwise/counterclockwise as shown in fig. 3A, a gesture operation event that a palm of the single hand changes to a fist and changes from the fist to the palm as shown in fig. 3B, and a gesture operation event that a palm of the single hand swings to the left/right as shown in fig. 3C.

And step 204, generating a control instruction for indicating to execute a target operation event corresponding to the gesture operation event.

In an embodiment, the gesture operation event corresponds to a control operation event, for example, the control operation event corresponding to the gesture operation event illustrated in fig. 3A is to increase/decrease the volume of the car sound box; for another example, the control operation event corresponding to the gesture operation event illustrated in fig. 3B is selected; for another example, the control operation event corresponding to the gesture operation event illustrated in fig. 3C is to switch to the previous/next control operation event.

For example, assuming that the user 120 wants to switch songs, the user 120 may perform the gesture operation event illustrated in fig. 3C, and then perform the gesture operation event illustrated in fig. 3B, thereby switching songs.

In this disclosure, for convenience of description, the control operation event corresponding to the gesture operation event determined in step 203 is referred to as a target operation event. In step 204, a control command instructing to execute the target operation event may be generated, and then the control device 140 may implement a part of the functions related to controlling the automobile 110 by executing the control command.

Based on the embodiment, by determining the initial frame image of the hand of the user to be detected, the gesture features corresponding to each image are identified frame by frame from the initial frame image to obtain the gesture feature set, the gesture operation event of the user is determined based on the gesture feature set, and the control instruction for instructing the execution of the target operation event corresponding to the gesture operation event is generated.

As shown in fig. 4, based on the embodiment shown in fig. 2, step 202 may include the following steps:

at step 2021, each consecutive N images are grouped frame by frame starting from the start frame image to obtain M image groups.

Step 2022, for each image group, inputting each image in the image group to the trained spatial feature extraction network to obtain a spatial feature corresponding to each image in the image group.

Step 2023, inputting the spatial features corresponding to each image in the image group to the trained temporal feature extraction network to obtain the temporal features corresponding to the image group.

Step 2024: and obtaining a gesture feature set according to the time features corresponding to the M image groups respectively.

The above steps 2021 to 2024 are collectively described as follows:

first, it is explained that, since a dynamic gesture operation event has both a spatial feature and a temporal feature, in the present disclosure, a gesture feature corresponding to an image includes a spatial feature and a temporal feature.

Based on the above, in an embodiment, the gesture feature corresponding to the image can be recognized in the unit of the image group. Specifically, each consecutive N images may be grouped frame by frame starting from the start frame image, resulting in M image groups. For example, assuming that 16 images are collected by the camera device 130 during the gesture operation event of the user 120, for convenience of description, the 16 images are respectively numbered as 0-15, and assuming that 4 consecutive images are preset to be grouped into one group, 13 image groups can be obtained, and the image numbers in the 13 image groups are respectively 0-3, 1-4, 2-5, 3-6, 4-7, 5-8, 6-9, 7-10, 8-11, 9-12, 10-13, 11-14 and 12-15.

Then, in an embodiment, for each image group, each image in the image group is input to the trained spatial feature extraction network, so as to obtain a spatial feature corresponding to each image in the image group.

In one embodiment, the structure of the spatial feature extraction network may include a MobileNetv2 structure and a full link layer.

Then, in an embodiment, for each image group, the spatial features corresponding to the images in the image group are rearranged into one spatial feature by using a re-arrangement (reshape) function, and the spatial feature is input to the trained temporal feature extraction network to obtain the temporal features corresponding to the image group.

In one embodiment, the temporal feature extraction network adopts a two-dimensional convolution structure (Conv 2D) structure, and the number of convolution kernels of the two-dimensional convolution structure is 1.

Finally, in an embodiment, a maximal Pooling (Max Pooling) operation may be used to perform feature fusion on the time features corresponding to each image group, so as to obtain a gesture feature set.

Based on the embodiment, the gesture feature set of the gesture operation event made by the user is obtained, and in the gesture feature recognition process, the spatial feature of the gesture operation event is recognized and the time feature of the gesture operation event is recognized, so that the finally obtained gesture feature set can accurately reflect the spatial feature and the time feature of the gesture operation event made by the user, and the accuracy of the subsequently determined gesture operation event is improved.

In addition, in an embodiment, before the step 2021 is performed, image preprocessing, such as normalization processing, image resizing, and the like, may be performed on each image frame by frame from the start frame image, and then the steps 2021 to 2022 are performed on each preprocessed image to identify the gesture feature corresponding to each preprocessed image, so as to obtain a gesture feature set.

In this embodiment, the accuracy of subsequent gesture feature recognition may be improved by image preprocessing each image.

In an embodiment, before the step 2021 is executed, the collected images may be sampled according to a set time interval or a set number of frame intervals from a start frame image, and then the steps 2021 to 2022 are executed for each sampled image, so as to recognize a gesture feature corresponding to each sampled image, and obtain a gesture feature set.

In this embodiment, by sampling the acquired image and performing gesture feature recognition on the sampled image, the data processing amount of the gesture feature recognition process can be reduced by the processing, so that the efficiency of subsequent gesture feature recognition can be improved.

As shown in fig. 5, based on the embodiment shown in fig. 2, step 203 may include the following steps:

step 2031, inputting the gesture feature set to the trained gesture classification network to obtain the gesture recognition parameters of the user.

The gesture recognition parameters comprise at least one preset gesture operation event and the probability of the preset gesture operation event made by the user.

In one embodiment, the trained gesture classification network adopts a convolutional neural network structure, which includes a 2-layer fully-connected layer and a softmax function.

Step 2032, determining the gesture operation event of the user according to the probability.

In an embodiment, the gesture operation event with the highest probability may be determined as the gesture operation event of the user.

Based on the embodiment, the gesture feature set is input to the trained gesture classification network to obtain the gesture recognition parameters of the user, and the gesture operation event of the user is determined according to the probability in the gesture recognition parameters, so that the gesture operation event of the user is determined.

As shown in fig. 6, based on the embodiment shown in fig. 2, step 204 may include the following steps:

step 2041, a preset mapping set is searched for by using the gesture operation events of the user as keywords, and the preset mapping set comprises at least one group of corresponding relations between the gesture operation events and the control operation events.

In an embodiment, a mapping set including at least one group of correspondence between gesture operation events and control operation events may be preset, for example, as shown in table 1 below, which is an example of the preset mapping set:

TABLE 1

Gesture operation event	Controlling operational events
		Clockwise circle drawing by single index finger	Increase the volume of the vehicle-mounted sound box
Single hand index finger anticlockwise drawing ring	Reduce the volume of the vehicle-mounted sound box
		Changing one hand palm into fist and changing from fist into palm	Selection of
Waving the palm of one hand leftwards	Switch to last one
		Waving the palm rightwards with one hand	Switch to the next one

Step 2042, determine the control operation event in the correspondence containing the keywords as a target operation event.

Step 2043, generate control instructions for instructing execution of the target operational event.

Based on the above embodiment, by presetting a mapping set including at least one group of corresponding relations between gesture operation events and control operation events, in an application, the preset mapping set is searched for by using the gesture operation events of a user as a keyword, the control operation events in the corresponding relations including the keyword are determined as target operation events, a control instruction for instructing execution of the target operation events is generated, and accordingly, a corresponding control function is realized based on the gesture operation of the user.

Any of the methods for generating control instructions provided by the embodiments of the present disclosure may be executed by any suitable device with data processing capability, including but not limited to: terminal equipment, a server and the like. Alternatively, any control instruction generation method provided by the embodiments of the present disclosure may be executed by a processor, for example, the processor may execute any control instruction generation method mentioned in the embodiments of the present disclosure by calling a corresponding instruction stored in a memory. And will not be described in detail below.

Exemplary devices

Fig. 7 is a schematic diagram of a device for generating a control instruction according to an exemplary embodiment of the present disclosure. The present embodiment can be applied to an electronic device, as shown in fig. 7, including: a first determination module 71, an identification module 72, a second determination module 73, and a generation module 74.

The first determining module 71 is configured to determine a start frame image of a hand of a user to be detected;

the recognition module 72 is configured to recognize, frame by frame, gesture features corresponding to each image from the start frame image determined by the first determination module 71, so as to obtain a gesture feature set;

a second determining module 73, configured to determine a gesture operation event of the user based on the gesture feature set recognized by the recognition module 72;

a generating module 74, configured to generate a control instruction for instructing to execute a target operation event corresponding to the gesture operation event determined by the second determining module 73.

As shown in fig. 8, on the basis of the embodiment shown in fig. 7, the identification module 72 may include:

a processing sub-module 721, configured to perform image preprocessing on each image frame by frame, starting from the start frame image determined by the first determining module 71;

the recognition submodule 722 is configured to recognize a gesture feature corresponding to each preprocessed image obtained by the processing submodule 721, so as to obtain a gesture feature set.

As shown in fig. 9, on the basis of the embodiment shown in fig. 7, the identification module 72 may include:

the grouping submodule 723 is configured to group every consecutive N images into one group frame by frame starting from the start frame image determined by the first determining module 71, so as to obtain M image groups, where N and M are both natural numbers greater than 1;

a first input submodule 724, configured to, for each image group obtained by the grouping submodule 723, input each image in the image group to a trained spatial feature extraction network, so as to obtain a spatial feature corresponding to each image in the image group;

a second input submodule 725, configured to input the spatial features corresponding to each image in the M image groups obtained by the first input submodule 724 to a trained temporal feature extraction network, so as to obtain temporal features corresponding to the image groups;

the feature determining submodule 726 is configured to obtain a gesture feature set according to the time features corresponding to the M image groups obtained by the second input submodule 725.

The second determination module 73 may include:

a third input submodule 731, configured to input the gesture feature set identified by the identification module 72 to a trained gesture classification network, so as to obtain a gesture identification parameter of the user, where the gesture identification parameter includes at least one preset gesture operation event and a probability that the user makes the preset gesture operation event;

the event determining submodule 732 is configured to determine a gesture operation event of the user according to the probability obtained by the third input submodule 731.

The event determination sub-module 732 is specifically configured to:

and determining the corresponding preset gesture operation event with the maximum probability as the gesture operation event of the user.

The generating module 74 may include:

the searching submodule 741, configured to search a preset mapping set by using the gesture operation event of the user determined by the second determining module 73 as a keyword, where the preset mapping set includes a corresponding relationship between at least one group of gesture operation events and control operation events;

a target determining submodule 742, configured to determine, as a target operation event, a control operation event in the correspondence including the keyword obtained by the searching submodule 741;

a generating submodule 743, configured to generate a control instruction for instructing to execute the target operation event determined by the target determining submodule 742.

Exemplary electronic device

Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 10. The electronic device may be either or both of the first device 100 and the second device 200, or a stand-alone device separate from them that may communicate with the first device and the second device to receive the collected input signals therefrom.

FIG. 10 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure.

As shown in fig. 10, the electronic device 10 includes one or more processors 11 and memory 12.

The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.

Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer readable storage medium and executed by the processor 11 to implement the above-described generation method of the control instructions of the various embodiments of the present disclosure and/or other desired functions. Various contents such as an input signal, a signal component, a noise component, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, when the electronic device is the first device 100 or the second device 200, the input device 13 may be a microphone or a microphone array as described above for capturing an input signal of a sound source. When the electronic device is a stand-alone device, the input means 13 may be a communication network connector for receiving the acquired input signals from the first device 100 and the second device 200.

The input device 13 may also include, for example, a keyboard, a mouse, and the like.

The output device 14 may output various information including the determined distance information, direction information, and the like to the outside. The output devices 14 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.

Of course, for simplicity, only some of the components of the electronic device 10 relevant to the present disclosure are shown in fig. 10, omitting components such as buses, input/output interfaces, and the like. In addition, the electronic device 10 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the method of generating control instructions according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification above.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in a method of generating control instructions according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. A control instruction generation method comprises the following steps:

determining a starting frame image of a hand of a user to be detected;

2. The method according to claim 1, wherein the identifying, from the starting frame image, the gesture feature corresponding to each image frame by frame to obtain a gesture feature set comprises:

starting from the initial frame image, carrying out image preprocessing on each image frame by frame;

and identifying the gesture characteristics corresponding to each preprocessed image to obtain a gesture characteristic set.

3. The method according to claim 1, wherein the identifying, from the starting frame image, the gesture feature corresponding to each image frame by frame to obtain a gesture feature set comprises:

dividing every continuous N images into a group frame by frame from a starting frame image to obtain M image groups, wherein N and M are natural numbers larger than 1;

inputting each image in the image group into a trained spatial feature extraction network to obtain a spatial feature corresponding to each image in the image group;

inputting the spatial features corresponding to the images in the image group into a trained time feature extraction network to obtain the time features corresponding to the image group;

and obtaining a gesture feature set according to the time features corresponding to the M image groups respectively.

4. The method of claim 1, wherein the determining the gesture operation event of the user based on the set of gesture features comprises:

inputting the gesture feature set to a trained gesture classification network to obtain gesture recognition parameters of the user, wherein the gesture recognition parameters comprise at least one preset gesture operation event and the probability of the user making the preset gesture operation event;

and determining the gesture operation event of the user according to the probability.

5. The method of claim 4, wherein said determining a gesture operational event of the user in dependence on the probability comprises:

6. The method of claim 1, wherein the generating of the control instruction for instructing execution of the target operation event corresponding to the gesture operation event comprises:

searching a preset mapping set by taking the gesture operation events of the user as keywords, wherein the preset mapping set comprises the corresponding relation between at least one group of gesture operation events and control operation events;

determining the control operation event in the corresponding relation containing the keywords as a target operation event;

and generating a control instruction for instructing the target operation event to be executed.

7. A control instruction generation apparatus comprising:

8. The apparatus of claim 7, wherein the identification module comprises:

the processing submodule is used for carrying out image preprocessing on each image frame by frame from the initial frame image determined by the first determining module;

and the recognition submodule is used for recognizing the gesture characteristics corresponding to each preprocessed image obtained by the processing submodule to obtain a gesture characteristic set.

9. A computer-readable storage medium storing a computer program for executing the method for generating control instructions according to any one of claims 1 to 6.

10. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor is configured to execute the method for generating the control instruction according to any one of claims 1 to 6.