CN113936233A

CN113936233A - Method and device for identifying finger-designated target

Info

Publication number: CN113936233A
Application number: CN202111537543.3A
Authority: CN
Inventors: 张立; 吴斐; 杨华龙; 张冰洋; 刘天一
Original assignee: Beijing LLvision Technology Co ltd
Current assignee: Beijing LLvision Technology Co ltd
Priority date: 2021-12-16
Filing date: 2021-12-16
Publication date: 2022-01-14

Abstract

According to the method and the device for recognizing the finger-designated target, the first hand in the first image or the first video is recognized, and if the first hand is recognized, a voice recognition awakening word is received; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result. The finger designated target can be automatically selected through hand recognition and gesture recognition, and the method can be applied to a large number of repeated types of finger designated target recognition scenes.

Description

Method and device for identifying finger-designated target

Technical Field

The invention relates to the technical field of detection and identification of a specified target, in particular to a method and a device for identifying a finger specified target.

Background

The target identification technology is widely applied to the fields of daily aspects, small goods storage in logistics industry, large space technology, national defense and the like.

The traditional target identification method needs to calibrate objects manually and is difficult to be suitable for a large amount of repeated work.

Disclosure of Invention

The invention provides a method and a device for identifying a finger-designated target, which are used for solving the defects that the method for identifying the finger-designated target in the prior art is difficult to work in a large number of repeated types and cannot be used for an object in a video stream, can be applied to scenes of the objects in the large number of repeated types, and can be used for identifying the object in the video stream.

In a first aspect, the present invention provides a method for identifying a finger-designated target, including: identifying a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is identified; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result.

According to the method for identifying the finger-designated target provided by the invention, the method for identifying the finger-designated target based on the multi-target identification algorithm comprises the following steps of comparing the first fingertip coordinate with the coordinate of the target to be identified in the first image or the first video, determining and identifying the finger-designated target in the first image or the first video, and outputting a first identification result, wherein the method comprises the following steps: acquiring the coordinates of the identification frame of the target to be identified in the first image or the first video based on a multi-target identification algorithm; comparing the first fingertip coordinate with the coordinate of the identification selection frame of the target to be identified, determining the target to be identified corresponding to the identification selection frame comprising the first fingertip coordinate, and designating the target for the finger in the first image or the first video; and identifying a target pointed by the finger in the first image or the first video, and outputting the first identification result.

According to the method for identifying the finger-designated target provided by the invention, the coordinates of the first fingertip are compared with the coordinates of the target to be identified in the acquired first image or first video, the finger-designated target in the first image or first video is determined and identified, and a first identification result is output, wherein the method comprises the following steps: and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video within a preset range, determining and recognizing the finger-designated target in the first image or first video, and outputting the first recognition result.

In a second aspect, the present invention further provides a method for identifying a finger-designated target, including: identifying a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is identified; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and if the gesture of the first hand is a preset gesture, extracting a first fingertip of the gesture of the first hand; identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the first fingertip coordinate with the coordinate of the second result, determining and identifying a finger-designated target in the first image or the first video, and outputting a first identification result; if the recognized gesture of the second hand is the preset gesture in the second image or the second video, receiving the voice recognition instruction, and extracting a second fingertip coordinate of the gesture of the second hand; the first image is the same as the second image, the first video is the same as the second video, and the viewing angles of the first hand and the second hand are the same; and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting a third recognition result of the finger-designated target in the second image or the second video.

According to the method for recognizing the finger-pointed target provided by the invention, the comparing the coordinates of the second finger tip with the coordinates of the second recognition result and outputting the third recognition result of the target pointed by the finger in the second image or the second video comprises the following steps: and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting the second recognition result comprising the second fingertip coordinate as the third recognition result.

In a third aspect, the present invention further provides an apparatus for recognizing a finger-designated target, including: the first recognition module is used for recognizing a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is recognized; the first-class extraction module is used for identifying the gesture of the first hand, receiving a voice identification instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and the first target identification module is used for comparing the first fingertip coordinate with the coordinate of the target to be identified in the acquired first image or first video based on a multi-target identification algorithm, determining and identifying the finger-designated target in the first image or first video, and outputting a first identification result.

In a fourth aspect, the present invention further provides an apparatus for recognizing a finger-designated object, including: the second recognition module is used for recognizing a first hand in the first image or the first video, and receiving a voice recognition awakening word if the first hand is recognized; the second recognition and extraction module is used for recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; the second target identification module is used for identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the first fingertip coordinate with the coordinate of the second identification result, determining and identifying a finger-designated target in the first image or the first video, and outputting a first identification result; the second recognition and extraction module is further configured to receive the voice recognition instruction and extract a second fingertip coordinate of the gesture of the second hand if the recognized gesture of the second hand is the preset gesture in the second image or the second video; the first image is the same as the second image, the first video is the same as the second video, and the viewing angles of the first hand and the second hand are the same; the second target recognition module is further configured to compare the second pointer coordinate with the coordinate of the second recognition result, and output a third recognition result of the finger-specified target in the second image or the second video.

In a fifth aspect, the present invention further provides a pair of smart glasses, including a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for identifying a finger-designated object according to the first and second aspects when executing the program.

In a sixth aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method for identifying a finger-specified target according to the first and second aspects.

In a seventh aspect, the present invention further provides a computer program product, on which executable instructions are stored, and when executed by a processor, the instructions cause the processor to implement the steps of the method for identifying a finger-pointing target according to the first and second aspects.

According to the method and the device for recognizing the finger-designated target, the first hand in the first image or the first video is recognized, and if the first hand is recognized, a voice recognition awakening word is received; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result. The finger designated target can be automatically selected through hand recognition and gesture recognition, the recognition process is more intelligent, and the method can be applied to a large number of repeated types of finger designated target recognition scenes.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart illustrating an embodiment of a method for identifying a finger-designated target according to the present invention;

FIG. 2 is a flowchart illustrating an embodiment of a method for obtaining a first recognition result according to the present invention;

FIG. 3 is a flow chart illustrating another embodiment of a method for identifying a finger-designated object according to the present invention;

fig. 4 is a schematic diagram of an application scenario provided in the embodiment of the present invention;

fig. 5 is a schematic diagram of another application scenario provided in the embodiment of the present invention;

FIG. 6 is a schematic diagram of an exemplary embodiment of a device for recognizing a finger-designated object according to the present invention;

FIG. 7 is a schematic diagram of an exemplary embodiment of a device for recognizing a finger-designated object according to the present invention;

fig. 8 is a schematic physical structure diagram of a pair of smart glasses according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The meaning of the finger for designating the target in the invention is that the target is selected according to the visual angle of the first person, and the finger pointing direction is the visual angle direction of the first person.

Fig. 1 is a schematic flow chart of an embodiment of a method for identifying a finger-designated target according to the present invention. As shown in fig. 1, the method for identifying a finger-designated object may include the steps of:

s101, a first hand in the first image or the first video is recognized, and if the first hand is recognized, a voice recognition awakening word is received.

In step S101, before the first hand in the first image or the first video is recognized, it is first necessary to determine whether the first hand exists in the first image or the first video, and if the first hand exists in the first image or the first video, the first hand needs to be recognized. The first hand may be identified using a predetermined hand recognition algorithm. The preset hand recognition algorithm may be an algorithm based on the ssd framework. After the first hand is recognized, the voice recognition awakening word sent by the user can be received, the voice recognition awakening word can be understood as a key for carrying out voice recognition, and the voice recognition module installed in the intelligent device can receive the voice recognition awakening word and then can receive the voice recognition instruction for carrying out voice recognition. Different devices with voice recognition functions are respectively preset with different awakening voice recognition words, and the embodiment of the invention does not limit the voice recognition awakening words. In some intelligent devices, such as smart phones, because power consumption generated by a voice recognition thread does not need to be considered, a voice recognition function is always in an on state, and the next voice recognition can be started by receiving a voice recognition awakening word; in other smart devices, such as smart glasses, the voice recognition function needs to be turned on before receiving the voice recognition wake-up word, considering the influence of the voice recognition thread on the power consumption of the smart glasses.

S102, recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture.

In step S102, after receiving the voice recognition wake-up word, the gesture of the first hand may be recognized first, and then a voice recognition instruction is received to extract the first fingertip coordinate. After receiving the voice recognition awakening word, the voice recognition instruction can also be received firstly, then the gesture of the first hand is recognized, and after the gesture of the first hand is determined to be the preset gesture, the first fingertip coordinate is extracted. The method for recognizing the gesture of the first hand may be a 21-point detection algorithm, or may also be a gesture determination algorithm, and the method for recognizing the gesture of the first hand is not limited in the embodiment of the present invention. The 21-point detection algorithm can judge whether the finger is bent or unbent according to the 21 key points, and further determine the gesture. The voice recognition instruction may specify the name or source of the target, etc. for the interrogating finger. For example, the voice command may be "what this is" or "where this is bought" or the like. The first fingertip coordinates of the gesture of the first hand may be extracted by starting an object calibration-recognition procedure. One or more first fingertip coordinates may be provided, and the number of the first fingertip coordinates is not limited in the embodiment of the present invention. The number of the first fingertip coordinates is related to the gesture of the first hand, and if only one finger in the gesture of the first hand points to the target in the first image or the first video, the number of the first fingertip coordinates is one; if a plurality of fingers in the gesture of the first hand point to the target in the first image or the first video, the number of the first fingertip coordinates is multiple.

S103, comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the target designated by the finger in the first image or first video, and outputting a first recognition result.

In step S103, coordinates of the target to be recognized in the first image or the first video are acquired according to a multi-target recognition algorithm. The multi-target recognition algorithm can be used as an algorithm framework based on the latest open source algorithm yolo-x. And comparing the first fingertip coordinate with the coordinate of the target to be recognized in the first image or the first video, and judging whether the first fingertip coordinate exists in the coordinate of the target to be recognized in the first image or the first video. And taking the target to be recognized including the first fingertip coordinate in the first image or the first video as a finger designated target in the first image or the first video, recognizing the finger designated target, and outputting a recognition result as a first recognition result. The output mode of the first recognition result may be voice output, or may also be picture output, or may also be that the voice output and the picture output are performed simultaneously. The number of the first recognition results may be one or more, and the embodiment of the present invention does not limit the number of the first recognition results. If the number of the first identification results is multiple, the probability value corresponding to the first identification result is output while the first identification result is output.

According to the method for recognizing the finger-designated target, the first hand in the first image or the first video is recognized, and if the first hand is recognized, a voice recognition awakening word is received; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result. The finger designated target can be automatically selected through hand recognition and gesture recognition, the recognition process is more intelligent, and the method can be applied to a large number of repeated types of finger designated target recognition scenes.

Fig. 2 is a schematic flowchart of an embodiment of a method for obtaining a first recognition result according to the present invention. As shown in fig. 2, the method for obtaining the first recognition result may include the following steps:

s201, obtaining the coordinates of the identification frame of the target to be identified in the first image or the first video based on the multi-target identification algorithm.

In step S201, according to a multi-target recognition algorithm, a recognition box of an object to be recognized in the first image or the first video may be obtained, and the object to be recognized exists in the recognition box. The recognition box may be a box, and the coordinates of the acquired recognition box may be coordinates of four vertices of the box.

S202, comparing the first fingertip coordinate with the coordinate of the identification selection frame of the target to be identified, determining the target to be identified corresponding to the identification selection frame comprising the first fingertip coordinate, and designating the target for the finger in the first image or the first video.

S203, identifying the finger designated target in the first image or the first video, and outputting a first identification result.

The description of step S202 and step S203 is referred to step S103, and is not repeated here.

In some optional embodiments, comparing the first fingertip coordinate with the acquired coordinate of the target to be recognized in the first image or the first video, determining and recognizing the finger-designated target in the first image or the first video, and outputting the first recognition result may include: and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or the first video within a preset range, determining and recognizing the finger-designated target in the first image or the first video, and outputting a first recognition result. The preset range can be set according to the actual situation, and the embodiment of the invention does not limit the preset range.

According to the method for determining the first recognition result, the finger-designated target is determined from the targets to be recognized in the preset range in the first image or the first video, so that the problem of large calculation amount of whole image retrieval is effectively avoided, and the calculation cost can be saved.

Fig. 3 is a flowchart illustrating another embodiment of a method for identifying a finger-designated object according to the present invention. As shown in fig. 3, the method for identifying a finger-designated object may include the steps of:

s301, a first hand in the first image or the first video is recognized, and if the first hand is recognized, a voice recognition awakening word is received.

In step S301, the description of step S301 is referred to in step S101, and is not described herein again.

S302, recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture.

In step S302, the description of step S302 is referred to in step S102, and is not repeated herein.

S303, identifying the target to be identified in the first image or the first video based on the multi-target identification algorithm, storing the second identification result, comparing the first fingertip coordinate with the second result coordinate, determining and identifying the finger-designated target in the first image or the first video, and outputting the first identification result.

In step S303, all objects to be recognized that can be recognized in the first image or the first video are recognized, and the recognition result is saved as a second recognition result. The description of step S303 is given in step S103, and is not repeated here.

S304, if the recognized gesture of the second hand is a preset gesture in the second image or the second video, receiving a voice recognition instruction, and extracting a second fingertip coordinate of the gesture of the second hand; the first image is the same as the second image, the first video is the same as the second video, and the visual angles of the first hand and the second hand are the same.

In step S304, the description of step S304 is referred to as S102, which is not described herein.

And S305, comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting a third recognition result of the finger designated target in the second image or the second video.

In step S305, the second fingertip coordinate is compared with the coordinate of the second recognition result, and it is determined whether the second fingertip coordinate is present in the second recognition result. And taking the second result including the second fingertip coordinate as a third recognition result of the finger-designated target in the second image or the second video.

According to the method for recognizing the finger-designated target, provided by the embodiment of the invention, the first hand in the first image or the first video is recognized, and if the first hand is recognized, the voice recognition awakening word is received; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the coordinates of the first fingertip with the coordinates of the second identification result, determining and identifying a finger-designated target in the first image or the first video, and outputting the first identification result; if the recognized gesture of the second hand is a preset gesture in the second image or the second video, receiving a voice recognition instruction, and extracting a second fingertip coordinate of the gesture of the second hand; the first image is the same as the second image, the first video is the same as the second video, and the visual angles of the first hand and the second hand are the same; and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting a third recognition result of the finger designated target in the second image or the second video. And matching the hand fingers which accord with the preset gesture in the second image or the second video with the second recognition result to obtain a third recognition result of the finger-designated target by taking all recognition results in the first image or the first video as the second recognition result for storage. The method can be applied to scenes in which a large number of finger-designated targets need to be identified repeatedly, can save operation cost and improve identification efficiency.

Fig. 4 is a schematic view of an application scenario provided in an embodiment of the present invention. As shown in fig. 4, the recognition process of the finger-specified target includes the following steps:

step 1, inputting an image or a video where a finger-designated target is located, and performing hand identification on the input image or video; step 2, if a hand exists in the input image or video, identifying the hand gesture by using a 21 key point gesture identification method; if no gesture exists in the input image or video, returning to hand recognition and waiting for a hand recognition result; step 3, judging whether a 'golden finger' gesture, namely a preset gesture, exists according to the gesture recognition result, and if the 'golden finger' gesture exists, starting a voice recognition function; if the 'golden finger' gesture does not exist, waiting for a 'golden finger' signal; step 4, judging whether a voice instruction is detected, if the voice instruction is detected, starting a multi-target recognition algorithm, wherein the multi-target recognition algorithm is also called a universal recognition algorithm, matching a golden finger target, outputting a recognition result of the golden finger, namely determining a finger designated target in an input image or video according to a hand gesture, and outputting the recognition result; and if the voice command is not detected, waiting for the voice command.

Fig. 5 is a schematic diagram of another application scenario provided in the embodiment of the present invention. As shown in fig. 5, the identification process of the finger-designated magic cube may be: a preset finger is used for pointing the magic cube, and voice instructions of 'what' and 'where to buy' are sent out; and after receiving and recognizing the voice command, outputting a purchase page of the magic cube, wherein the finger appointed target can be the magic cube and a purchase channel thereof according to the purchase page.

Fig. 6 is a schematic structural diagram of an embodiment of a device for recognizing a finger-specified target according to the present invention. As shown in fig. 6, the finger-specified object recognition apparatus includes:

the first recognition module 601 is configured to recognize a first hand in a first image or a first video, and receive a voice recognition wake-up word if the first hand is recognized;

the first recognition and extraction module 602 is configured to recognize a gesture of a first hand, receive a voice recognition instruction if the gesture of the first hand is a preset gesture, and extract a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture;

the first target recognition module 603 is configured to compare the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determine and recognize the finger-designated target in the first image or first video, and output a first recognition result.

Optionally, the first object identifying module 603 includes:

the acquisition unit is used for acquiring the coordinates of an identification frame of a target to be identified in the first image or the first video based on a multi-target identification algorithm;

the determining unit is used for comparing the first fingertip coordinate with the coordinate of the identification frame of the target to be identified, determining the target to be identified corresponding to the identification frame comprising the first fingertip coordinate, and designating the target for the finger in the first image or the first video;

and the identification unit is used for identifying the target pointed by the finger in the first image or the first video and outputting a first identification result.

Optionally, the first target identifying module 603 is further configured to compare the first fingertip coordinate with a coordinate of a target to be identified in a preset range in the acquired first image or first video, determine and identify a finger-designated target in the first image or first video, and output a first identification result.

Fig. 7 is a schematic structural diagram of an embodiment of a device for recognizing a finger-specified target according to the present invention. As shown in fig. 7, the finger-specified object recognition apparatus includes:

a second recognition module 701, configured to recognize a first hand in the first image or the first video, and receive a voice recognition wake-up word if the first hand is recognized;

a second recognition and extraction module 702, configured to recognize a gesture of the first hand, receive a voice recognition instruction if the gesture of the first hand is a preset gesture, and extract a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture;

the second target recognition module 703 is configured to recognize a target to be recognized in the first image or the first video based on a multi-target recognition algorithm, store a second recognition result, compare the first fingertip coordinate with the coordinate of the second recognition result, determine and recognize a finger-designated target in the first image or the first video, and output the first recognition result;

the second recognition and extraction module 702 is further configured to receive a voice recognition instruction and extract a second fingertip coordinate of the gesture of the second hand if the recognized gesture of the second hand is a preset gesture in the second image or the second video; the first image is the same as the second image, the first video is the same as the second video, and the visual angles of the first hand and the second hand are the same;

the second target recognition module 703 is further configured to compare the second pointer coordinate with the coordinate of the second recognition result, and output a third recognition result of the finger-specified target in the second image or the second video.

Optionally, the second target recognition module 703 is further configured to compare the second fingertip coordinate with a coordinate of the second recognition result, and output the second recognition result including the second fingertip coordinate as a third recognition result.

In some optional embodiments, the method for recognizing the finger-designated object provided by the present invention may be applied to smart glasses, or may also be applied to other smart devices, which is not limited in this embodiment of the present invention. The process of the user identifying the items in the storage container using the smart glasses may include:

after the user wears the intelligent glasses, the intelligent glasses can scan the articles in the storage container in the visual field, the user stretches out the hand to point to the articles, the intelligent glasses identify the hand of the user, the information of the recognized hand is fed back on the user interface, the user receives the voice recognition awakening word according to the awakening word, then the hand gestures are recognized, the hand gestures are confirmed to be preset gestures, voice recognition instructions of what the hand gestures are sent out, after the voice recognition instructions are received by the intelligent glasses, fingertip coordinates of the hand gestures are extracted, and all the objects appearing in the visual field of the intelligent glasses are identified, the identification results of all the objects are stored, and then comparing the fingertip coordinates with the coordinates of the recognition results of all the articles, and outputting the recognition result of the article comprising the fingertip coordinates as the recognition result of the finger-specified target on the user interface.

In the same visual field range, when the position of the object is changed due to the movement of fingers or the user points at the object by using the other hand, the intelligent glasses recognize the hand of the user and the gesture of the hand, when the gesture of the hand is a preset gesture, according to a voice command of 'what' sent by the user, the fingertip coordinates of the gesture of the hand are extracted, the fingertip coordinates are compared with the coordinates of recognition results of all the objects, and the recognition result of the object comprising the fingertip coordinates is output on a user interface as the recognition result of a finger-designated target.

The method for identifying the finger-designated target can be realized according to pure visual identification, namely, both hand identification and gesture identification are realized through visual identification. In the prior art, the finger-designated target is usually recognized by means of visual recognition and sensor recognition, for example, a combination of smart glasses and a handheld device is used, wherein a sensor in the handheld device can recognize a gesture. Compared with the prior art, the method for identifying the finger-designated target provided by the invention does not need multiple devices, and is more convenient.

Fig. 8 is a schematic structural diagram of smart glasses according to the present invention. As shown in fig. 8, the head mounted device may include: a processor (processor)801, a communication Interface (Communications Interface)802, a memory (memory)803 and a communication bus 804, wherein the processor 801, the communication Interface 802 and the memory 803 complete communication with each other through the communication bus 804. The processor 801 may call logic instructions in the memory 803 to perform a method of identifying a finger-specified target, the method comprising:

identifying a first hand in a first image or a first video; if the first hand is recognized, receiving a voice recognition awakening word; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result.

The method further comprises the following steps: identifying a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is identified; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the coordinates of the first fingertip with the coordinates of the second identification result, determining and identifying a finger-designated target in the first image or the first video, and outputting the first identification result; if the recognized gesture of the second hand is a preset gesture in the second image or the second video, receiving a voice recognition instruction, and extracting a second fingertip coordinate of the gesture of the second hand; the first image is the same as the second image, the first video is the same as the second video, and the visual angles of the first hand and the second hand are the same; and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting a third recognition result of the finger designated target in the second image or the second video.

In addition, the logic instructions in the memory 803 may be implemented in the form of software functional modules and stored in a computer readable storage medium when the software functional modules are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a method for identifying a finger-designated object provided by the above methods, the method comprising:

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for identifying a finger-designated object provided by the above methods, the method comprising:

The above-described embodiments of the apparatus are merely illustrative, and the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for recognizing a finger-designated object, comprising:

identifying a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is identified;

recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture;

and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result.

2. The method for recognizing the finger-designated object according to claim 1, wherein the comparing the first fingertip coordinate with the coordinate of the object to be recognized in the first image or the first video based on the multi-object recognition algorithm, determining and recognizing the finger-designated object in the first image or the first video, and outputting the first recognition result comprises:

acquiring the coordinates of the identification frame of the target to be identified in the first image or the first video based on a multi-target identification algorithm;

comparing the first fingertip coordinate with the coordinate of the identification selection frame of the target to be identified, determining the target to be identified corresponding to the identification selection frame comprising the first fingertip coordinate, and designating the target for the finger in the first image or the first video;

and identifying a target pointed by the finger in the first image or the first video, and outputting the first identification result.

3. The method for identifying the finger-designated object according to claim 1, wherein the comparing the first fingertip coordinate with the coordinate of the object to be identified in the acquired first image or first video, determining and identifying the finger-designated object in the first image or first video, and outputting a first identification result comprises:

and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video within a preset range, determining and recognizing the finger-designated target in the first image or first video, and outputting the first recognition result.

4. A method for recognizing a finger-designated object, comprising:

identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the first fingertip coordinate with the coordinate of the second identification result, determining and identifying a finger-designated target in the first image or the first video, and outputting a first identification result;

if the recognized gesture of the second hand is the preset gesture in the second image or the second video, receiving the voice recognition instruction, and extracting a second fingertip coordinate of the gesture of the second hand; the first image is the same as the second image, the first video is the same as the second video, and the viewing angles of the first hand and the second hand are the same;

and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting a third recognition result of the finger-designated target in the second image or the second video.

5. The method for recognizing a finger-specified object according to claim 4, wherein the comparing the coordinates of the second finger tip with the coordinates of the second recognition result and outputting a third recognition result of the object pointed by the finger in the second image or the second video includes:

and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting the second recognition result comprising the second fingertip coordinate as the third recognition result.

6. An apparatus for recognizing a finger-designated object, comprising:

the first recognition module is used for recognizing a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is recognized;

the first recognition and extraction module is used for recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and the first target identification module is used for comparing the first fingertip coordinate with the coordinate of the target to be identified in the acquired first image or first video based on a multi-target identification algorithm, determining and identifying the finger-designated target in the first image or first video, and outputting a first identification result.

7. An apparatus for recognizing a finger-designated object, comprising:

the second recognition module is used for recognizing a first hand in the first image or the first video, and receiving a voice recognition awakening word if the first hand is recognized;

the second recognition and extraction module is used for recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture;

the second target identification module is used for identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the first fingertip coordinate with the coordinate of the second identification result, determining and identifying a finger-designated target in the first image or the first video, and outputting a first identification result;

the second recognition and extraction module is further configured to receive the voice instruction and extract a second fingertip coordinate of the gesture of the second hand if the recognized gesture of the second hand is the preset gesture in the second image or the second video; the first image is the same as the second image, the first video is the same as the second video, and the viewing angles of the first hand and the second hand are the same;

the second target recognition module is further configured to compare the second pointer coordinate with the coordinate of the second recognition result, and output a third recognition result of the finger-specified target in the second image or the second video.

8. Smart glasses comprising a memory, a processor and a computer program stored on said memory and executable on said processor, characterized in that said processor, when executing said program, carries out the steps of the method for identification of a finger-specified object according to any one of claims 1 to 5.

9. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the method for identifying a finger-specified object according to any one of claims 1 to 5.

10. A computer program product having stored thereon executable instructions, characterized in that the instructions, when executed by a processor, cause the processor to carry out the steps of the method for identification of a finger-specified object as claimed in any one of claims 1 to 5.