CN113936233A - Method and device for identifying finger-designated target - Google Patents

Method and device for identifying finger-designated target Download PDF

Info

Publication number
CN113936233A
CN113936233A CN202111537543.3A CN202111537543A CN113936233A CN 113936233 A CN113936233 A CN 113936233A CN 202111537543 A CN202111537543 A CN 202111537543A CN 113936233 A CN113936233 A CN 113936233A
Authority
CN
China
Prior art keywords
hand
gesture
image
video
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111537543.3A
Other languages
Chinese (zh)
Inventor
张立
吴斐
杨华龙
张冰洋
刘天一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing LLvision Technology Co ltd
Original Assignee
Beijing LLvision Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing LLvision Technology Co ltd filed Critical Beijing LLvision Technology Co ltd
Priority to CN202111537543.3A priority Critical patent/CN113936233A/en
Publication of CN113936233A publication Critical patent/CN113936233A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

According to the method and the device for recognizing the finger-designated target, the first hand in the first image or the first video is recognized, and if the first hand is recognized, a voice recognition awakening word is received; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result. The finger designated target can be automatically selected through hand recognition and gesture recognition, and the method can be applied to a large number of repeated types of finger designated target recognition scenes.

Description

Method and device for identifying finger-designated target
Technical Field
The invention relates to the technical field of detection and identification of a specified target, in particular to a method and a device for identifying a finger specified target.
Background
The target identification technology is widely applied to the fields of daily aspects, small goods storage in logistics industry, large space technology, national defense and the like.
The traditional target identification method needs to calibrate objects manually and is difficult to be suitable for a large amount of repeated work.
Disclosure of Invention
The invention provides a method and a device for identifying a finger-designated target, which are used for solving the defects that the method for identifying the finger-designated target in the prior art is difficult to work in a large number of repeated types and cannot be used for an object in a video stream, can be applied to scenes of the objects in the large number of repeated types, and can be used for identifying the object in the video stream.
In a first aspect, the present invention provides a method for identifying a finger-designated target, including: identifying a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is identified; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result.
According to the method for identifying the finger-designated target provided by the invention, the method for identifying the finger-designated target based on the multi-target identification algorithm comprises the following steps of comparing the first fingertip coordinate with the coordinate of the target to be identified in the first image or the first video, determining and identifying the finger-designated target in the first image or the first video, and outputting a first identification result, wherein the method comprises the following steps: acquiring the coordinates of the identification frame of the target to be identified in the first image or the first video based on a multi-target identification algorithm; comparing the first fingertip coordinate with the coordinate of the identification selection frame of the target to be identified, determining the target to be identified corresponding to the identification selection frame comprising the first fingertip coordinate, and designating the target for the finger in the first image or the first video; and identifying a target pointed by the finger in the first image or the first video, and outputting the first identification result.
According to the method for identifying the finger-designated target provided by the invention, the coordinates of the first fingertip are compared with the coordinates of the target to be identified in the acquired first image or first video, the finger-designated target in the first image or first video is determined and identified, and a first identification result is output, wherein the method comprises the following steps: and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video within a preset range, determining and recognizing the finger-designated target in the first image or first video, and outputting the first recognition result.
In a second aspect, the present invention further provides a method for identifying a finger-designated target, including: identifying a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is identified; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and if the gesture of the first hand is a preset gesture, extracting a first fingertip of the gesture of the first hand; identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the first fingertip coordinate with the coordinate of the second result, determining and identifying a finger-designated target in the first image or the first video, and outputting a first identification result; if the recognized gesture of the second hand is the preset gesture in the second image or the second video, receiving the voice recognition instruction, and extracting a second fingertip coordinate of the gesture of the second hand; the first image is the same as the second image, the first video is the same as the second video, and the viewing angles of the first hand and the second hand are the same; and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting a third recognition result of the finger-designated target in the second image or the second video.
According to the method for recognizing the finger-pointed target provided by the invention, the comparing the coordinates of the second finger tip with the coordinates of the second recognition result and outputting the third recognition result of the target pointed by the finger in the second image or the second video comprises the following steps: and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting the second recognition result comprising the second fingertip coordinate as the third recognition result.
In a third aspect, the present invention further provides an apparatus for recognizing a finger-designated target, including: the first recognition module is used for recognizing a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is recognized; the first-class extraction module is used for identifying the gesture of the first hand, receiving a voice identification instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and the first target identification module is used for comparing the first fingertip coordinate with the coordinate of the target to be identified in the acquired first image or first video based on a multi-target identification algorithm, determining and identifying the finger-designated target in the first image or first video, and outputting a first identification result.
In a fourth aspect, the present invention further provides an apparatus for recognizing a finger-designated object, including: the second recognition module is used for recognizing a first hand in the first image or the first video, and receiving a voice recognition awakening word if the first hand is recognized; the second recognition and extraction module is used for recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; the second target identification module is used for identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the first fingertip coordinate with the coordinate of the second identification result, determining and identifying a finger-designated target in the first image or the first video, and outputting a first identification result; the second recognition and extraction module is further configured to receive the voice recognition instruction and extract a second fingertip coordinate of the gesture of the second hand if the recognized gesture of the second hand is the preset gesture in the second image or the second video; the first image is the same as the second image, the first video is the same as the second video, and the viewing angles of the first hand and the second hand are the same; the second target recognition module is further configured to compare the second pointer coordinate with the coordinate of the second recognition result, and output a third recognition result of the finger-specified target in the second image or the second video.
In a fifth aspect, the present invention further provides a pair of smart glasses, including a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for identifying a finger-designated object according to the first and second aspects when executing the program.
In a sixth aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method for identifying a finger-specified target according to the first and second aspects.
In a seventh aspect, the present invention further provides a computer program product, on which executable instructions are stored, and when executed by a processor, the instructions cause the processor to implement the steps of the method for identifying a finger-pointing target according to the first and second aspects.
According to the method and the device for recognizing the finger-designated target, the first hand in the first image or the first video is recognized, and if the first hand is recognized, a voice recognition awakening word is received; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result. The finger designated target can be automatically selected through hand recognition and gesture recognition, the recognition process is more intelligent, and the method can be applied to a large number of repeated types of finger designated target recognition scenes.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart illustrating an embodiment of a method for identifying a finger-designated target according to the present invention;
FIG. 2 is a flowchart illustrating an embodiment of a method for obtaining a first recognition result according to the present invention;
FIG. 3 is a flow chart illustrating another embodiment of a method for identifying a finger-designated object according to the present invention;
fig. 4 is a schematic diagram of an application scenario provided in the embodiment of the present invention;
fig. 5 is a schematic diagram of another application scenario provided in the embodiment of the present invention;
FIG. 6 is a schematic diagram of an exemplary embodiment of a device for recognizing a finger-designated object according to the present invention;
FIG. 7 is a schematic diagram of an exemplary embodiment of a device for recognizing a finger-designated object according to the present invention;
fig. 8 is a schematic physical structure diagram of a pair of smart glasses according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The meaning of the finger for designating the target in the invention is that the target is selected according to the visual angle of the first person, and the finger pointing direction is the visual angle direction of the first person.
Fig. 1 is a schematic flow chart of an embodiment of a method for identifying a finger-designated target according to the present invention. As shown in fig. 1, the method for identifying a finger-designated object may include the steps of:
s101, a first hand in the first image or the first video is recognized, and if the first hand is recognized, a voice recognition awakening word is received.
In step S101, before the first hand in the first image or the first video is recognized, it is first necessary to determine whether the first hand exists in the first image or the first video, and if the first hand exists in the first image or the first video, the first hand needs to be recognized. The first hand may be identified using a predetermined hand recognition algorithm. The preset hand recognition algorithm may be an algorithm based on the ssd framework. After the first hand is recognized, the voice recognition awakening word sent by the user can be received, the voice recognition awakening word can be understood as a key for carrying out voice recognition, and the voice recognition module installed in the intelligent device can receive the voice recognition awakening word and then can receive the voice recognition instruction for carrying out voice recognition. Different devices with voice recognition functions are respectively preset with different awakening voice recognition words, and the embodiment of the invention does not limit the voice recognition awakening words. In some intelligent devices, such as smart phones, because power consumption generated by a voice recognition thread does not need to be considered, a voice recognition function is always in an on state, and the next voice recognition can be started by receiving a voice recognition awakening word; in other smart devices, such as smart glasses, the voice recognition function needs to be turned on before receiving the voice recognition wake-up word, considering the influence of the voice recognition thread on the power consumption of the smart glasses.
S102, recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture.
In step S102, after receiving the voice recognition wake-up word, the gesture of the first hand may be recognized first, and then a voice recognition instruction is received to extract the first fingertip coordinate. After receiving the voice recognition awakening word, the voice recognition instruction can also be received firstly, then the gesture of the first hand is recognized, and after the gesture of the first hand is determined to be the preset gesture, the first fingertip coordinate is extracted. The method for recognizing the gesture of the first hand may be a 21-point detection algorithm, or may also be a gesture determination algorithm, and the method for recognizing the gesture of the first hand is not limited in the embodiment of the present invention. The 21-point detection algorithm can judge whether the finger is bent or unbent according to the 21 key points, and further determine the gesture. The voice recognition instruction may specify the name or source of the target, etc. for the interrogating finger. For example, the voice command may be "what this is" or "where this is bought" or the like. The first fingertip coordinates of the gesture of the first hand may be extracted by starting an object calibration-recognition procedure. One or more first fingertip coordinates may be provided, and the number of the first fingertip coordinates is not limited in the embodiment of the present invention. The number of the first fingertip coordinates is related to the gesture of the first hand, and if only one finger in the gesture of the first hand points to the target in the first image or the first video, the number of the first fingertip coordinates is one; if a plurality of fingers in the gesture of the first hand point to the target in the first image or the first video, the number of the first fingertip coordinates is multiple.
S103, comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the target designated by the finger in the first image or first video, and outputting a first recognition result.
In step S103, coordinates of the target to be recognized in the first image or the first video are acquired according to a multi-target recognition algorithm. The multi-target recognition algorithm can be used as an algorithm framework based on the latest open source algorithm yolo-x. And comparing the first fingertip coordinate with the coordinate of the target to be recognized in the first image or the first video, and judging whether the first fingertip coordinate exists in the coordinate of the target to be recognized in the first image or the first video. And taking the target to be recognized including the first fingertip coordinate in the first image or the first video as a finger designated target in the first image or the first video, recognizing the finger designated target, and outputting a recognition result as a first recognition result. The output mode of the first recognition result may be voice output, or may also be picture output, or may also be that the voice output and the picture output are performed simultaneously. The number of the first recognition results may be one or more, and the embodiment of the present invention does not limit the number of the first recognition results. If the number of the first identification results is multiple, the probability value corresponding to the first identification result is output while the first identification result is output.
According to the method for recognizing the finger-designated target, the first hand in the first image or the first video is recognized, and if the first hand is recognized, a voice recognition awakening word is received; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result. The finger designated target can be automatically selected through hand recognition and gesture recognition, the recognition process is more intelligent, and the method can be applied to a large number of repeated types of finger designated target recognition scenes.
Fig. 2 is a schematic flowchart of an embodiment of a method for obtaining a first recognition result according to the present invention. As shown in fig. 2, the method for obtaining the first recognition result may include the following steps:
s201, obtaining the coordinates of the identification frame of the target to be identified in the first image or the first video based on the multi-target identification algorithm.
In step S201, according to a multi-target recognition algorithm, a recognition box of an object to be recognized in the first image or the first video may be obtained, and the object to be recognized exists in the recognition box. The recognition box may be a box, and the coordinates of the acquired recognition box may be coordinates of four vertices of the box.
S202, comparing the first fingertip coordinate with the coordinate of the identification selection frame of the target to be identified, determining the target to be identified corresponding to the identification selection frame comprising the first fingertip coordinate, and designating the target for the finger in the first image or the first video.
S203, identifying the finger designated target in the first image or the first video, and outputting a first identification result.
The description of step S202 and step S203 is referred to step S103, and is not repeated here.
In some optional embodiments, comparing the first fingertip coordinate with the acquired coordinate of the target to be recognized in the first image or the first video, determining and recognizing the finger-designated target in the first image or the first video, and outputting the first recognition result may include: and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or the first video within a preset range, determining and recognizing the finger-designated target in the first image or the first video, and outputting a first recognition result. The preset range can be set according to the actual situation, and the embodiment of the invention does not limit the preset range.
According to the method for determining the first recognition result, the finger-designated target is determined from the targets to be recognized in the preset range in the first image or the first video, so that the problem of large calculation amount of whole image retrieval is effectively avoided, and the calculation cost can be saved.
Fig. 3 is a flowchart illustrating another embodiment of a method for identifying a finger-designated object according to the present invention. As shown in fig. 3, the method for identifying a finger-designated object may include the steps of:
s301, a first hand in the first image or the first video is recognized, and if the first hand is recognized, a voice recognition awakening word is received.
In step S301, the description of step S301 is referred to in step S101, and is not described herein again.
S302, recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture.
In step S302, the description of step S302 is referred to in step S102, and is not repeated herein.
S303, identifying the target to be identified in the first image or the first video based on the multi-target identification algorithm, storing the second identification result, comparing the first fingertip coordinate with the second result coordinate, determining and identifying the finger-designated target in the first image or the first video, and outputting the first identification result.
In step S303, all objects to be recognized that can be recognized in the first image or the first video are recognized, and the recognition result is saved as a second recognition result. The description of step S303 is given in step S103, and is not repeated here.
S304, if the recognized gesture of the second hand is a preset gesture in the second image or the second video, receiving a voice recognition instruction, and extracting a second fingertip coordinate of the gesture of the second hand; the first image is the same as the second image, the first video is the same as the second video, and the visual angles of the first hand and the second hand are the same.
In step S304, the description of step S304 is referred to as S102, which is not described herein.
And S305, comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting a third recognition result of the finger designated target in the second image or the second video.
In step S305, the second fingertip coordinate is compared with the coordinate of the second recognition result, and it is determined whether the second fingertip coordinate is present in the second recognition result. And taking the second result including the second fingertip coordinate as a third recognition result of the finger-designated target in the second image or the second video.
According to the method for recognizing the finger-designated target, provided by the embodiment of the invention, the first hand in the first image or the first video is recognized, and if the first hand is recognized, the voice recognition awakening word is received; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the coordinates of the first fingertip with the coordinates of the second identification result, determining and identifying a finger-designated target in the first image or the first video, and outputting the first identification result; if the recognized gesture of the second hand is a preset gesture in the second image or the second video, receiving a voice recognition instruction, and extracting a second fingertip coordinate of the gesture of the second hand; the first image is the same as the second image, the first video is the same as the second video, and the visual angles of the first hand and the second hand are the same; and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting a third recognition result of the finger designated target in the second image or the second video. And matching the hand fingers which accord with the preset gesture in the second image or the second video with the second recognition result to obtain a third recognition result of the finger-designated target by taking all recognition results in the first image or the first video as the second recognition result for storage. The method can be applied to scenes in which a large number of finger-designated targets need to be identified repeatedly, can save operation cost and improve identification efficiency.
Fig. 4 is a schematic view of an application scenario provided in an embodiment of the present invention. As shown in fig. 4, the recognition process of the finger-specified target includes the following steps:
step 1, inputting an image or a video where a finger-designated target is located, and performing hand identification on the input image or video; step 2, if a hand exists in the input image or video, identifying the hand gesture by using a 21 key point gesture identification method; if no gesture exists in the input image or video, returning to hand recognition and waiting for a hand recognition result; step 3, judging whether a 'golden finger' gesture, namely a preset gesture, exists according to the gesture recognition result, and if the 'golden finger' gesture exists, starting a voice recognition function; if the 'golden finger' gesture does not exist, waiting for a 'golden finger' signal; step 4, judging whether a voice instruction is detected, if the voice instruction is detected, starting a multi-target recognition algorithm, wherein the multi-target recognition algorithm is also called a universal recognition algorithm, matching a golden finger target, outputting a recognition result of the golden finger, namely determining a finger designated target in an input image or video according to a hand gesture, and outputting the recognition result; and if the voice command is not detected, waiting for the voice command.
Fig. 5 is a schematic diagram of another application scenario provided in the embodiment of the present invention. As shown in fig. 5, the identification process of the finger-designated magic cube may be: a preset finger is used for pointing the magic cube, and voice instructions of 'what' and 'where to buy' are sent out; and after receiving and recognizing the voice command, outputting a purchase page of the magic cube, wherein the finger appointed target can be the magic cube and a purchase channel thereof according to the purchase page.
Fig. 6 is a schematic structural diagram of an embodiment of a device for recognizing a finger-specified target according to the present invention. As shown in fig. 6, the finger-specified object recognition apparatus includes:
the first recognition module 601 is configured to recognize a first hand in a first image or a first video, and receive a voice recognition wake-up word if the first hand is recognized;
the first recognition and extraction module 602 is configured to recognize a gesture of a first hand, receive a voice recognition instruction if the gesture of the first hand is a preset gesture, and extract a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture;
the first target recognition module 603 is configured to compare the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determine and recognize the finger-designated target in the first image or first video, and output a first recognition result.
Optionally, the first object identifying module 603 includes:
the acquisition unit is used for acquiring the coordinates of an identification frame of a target to be identified in the first image or the first video based on a multi-target identification algorithm;
the determining unit is used for comparing the first fingertip coordinate with the coordinate of the identification frame of the target to be identified, determining the target to be identified corresponding to the identification frame comprising the first fingertip coordinate, and designating the target for the finger in the first image or the first video;
and the identification unit is used for identifying the target pointed by the finger in the first image or the first video and outputting a first identification result.
Optionally, the first target identifying module 603 is further configured to compare the first fingertip coordinate with a coordinate of a target to be identified in a preset range in the acquired first image or first video, determine and identify a finger-designated target in the first image or first video, and output a first identification result.
Fig. 7 is a schematic structural diagram of an embodiment of a device for recognizing a finger-specified target according to the present invention. As shown in fig. 7, the finger-specified object recognition apparatus includes:
a second recognition module 701, configured to recognize a first hand in the first image or the first video, and receive a voice recognition wake-up word if the first hand is recognized;
a second recognition and extraction module 702, configured to recognize a gesture of the first hand, receive a voice recognition instruction if the gesture of the first hand is a preset gesture, and extract a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture;
the second target recognition module 703 is configured to recognize a target to be recognized in the first image or the first video based on a multi-target recognition algorithm, store a second recognition result, compare the first fingertip coordinate with the coordinate of the second recognition result, determine and recognize a finger-designated target in the first image or the first video, and output the first recognition result;
the second recognition and extraction module 702 is further configured to receive a voice recognition instruction and extract a second fingertip coordinate of the gesture of the second hand if the recognized gesture of the second hand is a preset gesture in the second image or the second video; the first image is the same as the second image, the first video is the same as the second video, and the visual angles of the first hand and the second hand are the same;
the second target recognition module 703 is further configured to compare the second pointer coordinate with the coordinate of the second recognition result, and output a third recognition result of the finger-specified target in the second image or the second video.
Optionally, the second target recognition module 703 is further configured to compare the second fingertip coordinate with a coordinate of the second recognition result, and output the second recognition result including the second fingertip coordinate as a third recognition result.
In some optional embodiments, the method for recognizing the finger-designated object provided by the present invention may be applied to smart glasses, or may also be applied to other smart devices, which is not limited in this embodiment of the present invention. The process of the user identifying the items in the storage container using the smart glasses may include:
after the user wears the intelligent glasses, the intelligent glasses can scan the articles in the storage container in the visual field, the user stretches out the hand to point to the articles, the intelligent glasses identify the hand of the user, the information of the recognized hand is fed back on the user interface, the user receives the voice recognition awakening word according to the awakening word, then the hand gestures are recognized, the hand gestures are confirmed to be preset gestures, voice recognition instructions of what the hand gestures are sent out, after the voice recognition instructions are received by the intelligent glasses, fingertip coordinates of the hand gestures are extracted, and all the objects appearing in the visual field of the intelligent glasses are identified, the identification results of all the objects are stored, and then comparing the fingertip coordinates with the coordinates of the recognition results of all the articles, and outputting the recognition result of the article comprising the fingertip coordinates as the recognition result of the finger-specified target on the user interface.
In the same visual field range, when the position of the object is changed due to the movement of fingers or the user points at the object by using the other hand, the intelligent glasses recognize the hand of the user and the gesture of the hand, when the gesture of the hand is a preset gesture, according to a voice command of 'what' sent by the user, the fingertip coordinates of the gesture of the hand are extracted, the fingertip coordinates are compared with the coordinates of recognition results of all the objects, and the recognition result of the object comprising the fingertip coordinates is output on a user interface as the recognition result of a finger-designated target.
The method for identifying the finger-designated target can be realized according to pure visual identification, namely, both hand identification and gesture identification are realized through visual identification. In the prior art, the finger-designated target is usually recognized by means of visual recognition and sensor recognition, for example, a combination of smart glasses and a handheld device is used, wherein a sensor in the handheld device can recognize a gesture. Compared with the prior art, the method for identifying the finger-designated target provided by the invention does not need multiple devices, and is more convenient.
Fig. 8 is a schematic structural diagram of smart glasses according to the present invention. As shown in fig. 8, the head mounted device may include: a processor (processor)801, a communication Interface (Communications Interface)802, a memory (memory)803 and a communication bus 804, wherein the processor 801, the communication Interface 802 and the memory 803 complete communication with each other through the communication bus 804. The processor 801 may call logic instructions in the memory 803 to perform a method of identifying a finger-specified target, the method comprising:
identifying a first hand in a first image or a first video; if the first hand is recognized, receiving a voice recognition awakening word; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result.
The method further comprises the following steps: identifying a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is identified; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the coordinates of the first fingertip with the coordinates of the second identification result, determining and identifying a finger-designated target in the first image or the first video, and outputting the first identification result; if the recognized gesture of the second hand is a preset gesture in the second image or the second video, receiving a voice recognition instruction, and extracting a second fingertip coordinate of the gesture of the second hand; the first image is the same as the second image, the first video is the same as the second video, and the visual angles of the first hand and the second hand are the same; and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting a third recognition result of the finger designated target in the second image or the second video.
In addition, the logic instructions in the memory 803 may be implemented in the form of software functional modules and stored in a computer readable storage medium when the software functional modules are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a method for identifying a finger-designated object provided by the above methods, the method comprising:
identifying a first hand in a first image or a first video; if the first hand is recognized, receiving a voice recognition awakening word; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result.
The method further comprises the following steps: identifying a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is identified; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the coordinates of the first fingertip with the coordinates of the second identification result, determining and identifying a finger-designated target in the first image or the first video, and outputting the first identification result; if the recognized gesture of the second hand is a preset gesture in the second image or the second video, receiving a voice recognition instruction, and extracting a second fingertip coordinate of the gesture of the second hand; the first image is the same as the second image, the first video is the same as the second video, and the visual angles of the first hand and the second hand are the same; and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting a third recognition result of the finger designated target in the second image or the second video.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for identifying a finger-designated object provided by the above methods, the method comprising:
identifying a first hand in a first image or a first video; if the first hand is recognized, receiving a voice recognition awakening word; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result.
The method further comprises the following steps: identifying a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is identified; recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving a voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the coordinates of the first fingertip with the coordinates of the second identification result, determining and identifying a finger-designated target in the first image or the first video, and outputting the first identification result; if the recognized gesture of the second hand is a preset gesture in the second image or the second video, receiving a voice recognition instruction, and extracting a second fingertip coordinate of the gesture of the second hand; the first image is the same as the second image, the first video is the same as the second video, and the visual angles of the first hand and the second hand are the same; and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting a third recognition result of the finger designated target in the second image or the second video.
The above-described embodiments of the apparatus are merely illustrative, and the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for recognizing a finger-designated object, comprising:
identifying a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is identified;
recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture;
and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video based on a multi-target recognition algorithm, determining and recognizing the finger-designated target in the first image or first video, and outputting a first recognition result.
2. The method for recognizing the finger-designated object according to claim 1, wherein the comparing the first fingertip coordinate with the coordinate of the object to be recognized in the first image or the first video based on the multi-object recognition algorithm, determining and recognizing the finger-designated object in the first image or the first video, and outputting the first recognition result comprises:
acquiring the coordinates of the identification frame of the target to be identified in the first image or the first video based on a multi-target identification algorithm;
comparing the first fingertip coordinate with the coordinate of the identification selection frame of the target to be identified, determining the target to be identified corresponding to the identification selection frame comprising the first fingertip coordinate, and designating the target for the finger in the first image or the first video;
and identifying a target pointed by the finger in the first image or the first video, and outputting the first identification result.
3. The method for identifying the finger-designated object according to claim 1, wherein the comparing the first fingertip coordinate with the coordinate of the object to be identified in the acquired first image or first video, determining and identifying the finger-designated object in the first image or first video, and outputting a first identification result comprises:
and comparing the first fingertip coordinate with the coordinate of the target to be recognized in the acquired first image or first video within a preset range, determining and recognizing the finger-designated target in the first image or first video, and outputting the first recognition result.
4. A method for recognizing a finger-designated object, comprising:
identifying a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is identified;
recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture;
identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the first fingertip coordinate with the coordinate of the second identification result, determining and identifying a finger-designated target in the first image or the first video, and outputting a first identification result;
if the recognized gesture of the second hand is the preset gesture in the second image or the second video, receiving the voice recognition instruction, and extracting a second fingertip coordinate of the gesture of the second hand; the first image is the same as the second image, the first video is the same as the second video, and the viewing angles of the first hand and the second hand are the same;
and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting a third recognition result of the finger-designated target in the second image or the second video.
5. The method for recognizing a finger-specified object according to claim 4, wherein the comparing the coordinates of the second finger tip with the coordinates of the second recognition result and outputting a third recognition result of the object pointed by the finger in the second image or the second video includes:
and comparing the second fingertip coordinate with the coordinate of the second recognition result, and outputting the second recognition result comprising the second fingertip coordinate as the third recognition result.
6. An apparatus for recognizing a finger-designated object, comprising:
the first recognition module is used for recognizing a first hand in a first image or a first video, and receiving a voice recognition awakening word if the first hand is recognized;
the first recognition and extraction module is used for recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture; and the first target identification module is used for comparing the first fingertip coordinate with the coordinate of the target to be identified in the acquired first image or first video based on a multi-target identification algorithm, determining and identifying the finger-designated target in the first image or first video, and outputting a first identification result.
7. An apparatus for recognizing a finger-designated object, comprising:
the second recognition module is used for recognizing a first hand in the first image or the first video, and receiving a voice recognition awakening word if the first hand is recognized;
the second recognition and extraction module is used for recognizing the gesture of the first hand, receiving a voice recognition instruction if the gesture of the first hand is a preset gesture, and extracting a first fingertip coordinate of the gesture of the first hand; or receiving the voice recognition instruction, recognizing the gesture of the first hand, and extracting a first fingertip coordinate of the gesture of the first hand if the gesture of the first hand is a preset gesture;
the second target identification module is used for identifying a target to be identified in the first image or the first video based on a multi-target identification algorithm, storing a second identification result, comparing the first fingertip coordinate with the coordinate of the second identification result, determining and identifying a finger-designated target in the first image or the first video, and outputting a first identification result;
the second recognition and extraction module is further configured to receive the voice instruction and extract a second fingertip coordinate of the gesture of the second hand if the recognized gesture of the second hand is the preset gesture in the second image or the second video; the first image is the same as the second image, the first video is the same as the second video, and the viewing angles of the first hand and the second hand are the same;
the second target recognition module is further configured to compare the second pointer coordinate with the coordinate of the second recognition result, and output a third recognition result of the finger-specified target in the second image or the second video.
8. Smart glasses comprising a memory, a processor and a computer program stored on said memory and executable on said processor, characterized in that said processor, when executing said program, carries out the steps of the method for identification of a finger-specified object according to any one of claims 1 to 5.
9. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the method for identifying a finger-specified object according to any one of claims 1 to 5.
10. A computer program product having stored thereon executable instructions, characterized in that the instructions, when executed by a processor, cause the processor to carry out the steps of the method for identification of a finger-specified object as claimed in any one of claims 1 to 5.
CN202111537543.3A 2021-12-16 2021-12-16 Method and device for identifying finger-designated target Pending CN113936233A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111537543.3A CN113936233A (en) 2021-12-16 2021-12-16 Method and device for identifying finger-designated target

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111537543.3A CN113936233A (en) 2021-12-16 2021-12-16 Method and device for identifying finger-designated target

Publications (1)

Publication Number Publication Date
CN113936233A true CN113936233A (en) 2022-01-14

Family

ID=79289137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111537543.3A Pending CN113936233A (en) 2021-12-16 2021-12-16 Method and device for identifying finger-designated target

Country Status (1)

Country Link
CN (1) CN113936233A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120309532A1 (en) * 2011-06-06 2012-12-06 Microsoft Corporation System for finger recognition and tracking
US20150177846A1 (en) * 2011-08-12 2015-06-25 The Research Foundation For The State University Of New York Hand pointing estimation for human computer interaction
CN105205454A (en) * 2015-08-27 2015-12-30 深圳市国华识别科技开发有限公司 System and method for capturing target object automatically
CN111507246A (en) * 2020-04-15 2020-08-07 上海幂方电子科技有限公司 Method, device, system and storage medium for selecting marked object through gesture
CN111985417A (en) * 2020-08-24 2020-11-24 中国第一汽车股份有限公司 Functional component identification method, device, equipment and storage medium
CN112749646A (en) * 2020-12-30 2021-05-04 北京航空航天大学 Interactive point-reading system based on gesture recognition
CN112863508A (en) * 2020-12-31 2021-05-28 思必驰科技股份有限公司 Wake-up-free interaction method and device
CN113486765A (en) * 2021-06-30 2021-10-08 上海商汤临港智能科技有限公司 Gesture interaction method and device, electronic equipment and storage medium
CN113792651A (en) * 2021-09-13 2021-12-14 广州广电运通金融电子股份有限公司 Gesture interaction method, device and medium integrating gesture recognition and fingertip positioning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120309532A1 (en) * 2011-06-06 2012-12-06 Microsoft Corporation System for finger recognition and tracking
US20150177846A1 (en) * 2011-08-12 2015-06-25 The Research Foundation For The State University Of New York Hand pointing estimation for human computer interaction
CN105205454A (en) * 2015-08-27 2015-12-30 深圳市国华识别科技开发有限公司 System and method for capturing target object automatically
CN111507246A (en) * 2020-04-15 2020-08-07 上海幂方电子科技有限公司 Method, device, system and storage medium for selecting marked object through gesture
CN111985417A (en) * 2020-08-24 2020-11-24 中国第一汽车股份有限公司 Functional component identification method, device, equipment and storage medium
CN112749646A (en) * 2020-12-30 2021-05-04 北京航空航天大学 Interactive point-reading system based on gesture recognition
CN112863508A (en) * 2020-12-31 2021-05-28 思必驰科技股份有限公司 Wake-up-free interaction method and device
CN113486765A (en) * 2021-06-30 2021-10-08 上海商汤临港智能科技有限公司 Gesture interaction method and device, electronic equipment and storage medium
CN113792651A (en) * 2021-09-13 2021-12-14 广州广电运通金融电子股份有限公司 Gesture interaction method, device and medium integrating gesture recognition and fingertip positioning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIWEI CHAN 等: "FingerPad:private and subtle interaction using fingertips", 《UIST 13》 *
刘兴廷: "基于手势交互的目标环境信息提取", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Similar Documents

Publication Publication Date Title
US11120254B2 (en) Methods and apparatuses for determining hand three-dimensional data
CN106951484B (en) Picture retrieval method and device, computer equipment and computer readable medium
US10311115B2 (en) Object search method and apparatus
US20190188916A1 (en) Method and apparatus for augmenting reality
EP3273388A1 (en) Image information recognition processing method and device, and computer storage medium
WO2020029466A1 (en) Image processing method and apparatus
CN108960206B (en) Video frame processing method and device
JP7393472B2 (en) Display scene recognition method, device, electronic device, storage medium and computer program
CN113627411A (en) Super-resolution-based commodity identification and price matching method and system
JP2021534480A (en) Face recognition methods, devices, electronics and computers Non-volatile readable storage media
CN113780098A (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN107291238B (en) Data processing method and device
CN110569794B (en) Article information storage method and device and computer readable storage medium
US20170286450A1 (en) Systems, devices, and methods for computing geographical relationships between objects
CN113936233A (en) Method and device for identifying finger-designated target
CN108446693B (en) Marking method, system, equipment and storage medium of target to be identified
CN110765926A (en) Drawing book identification method and device, electronic equipment and storage medium
CN113920306B (en) Target re-identification method and device and electronic equipment
US20170185831A1 (en) Method and device for distinguishing finger and wrist
CN110287943B (en) Image object recognition method and device, electronic equipment and storage medium
US20220351233A1 (en) Image processing apparatus, image processing method, and program
CN114093006A (en) Training method, device and equipment of living human face detection model and storage medium
CN114677566A (en) Deep learning model training method, object recognition method and device
CN111008210B (en) Commodity identification method, commodity identification device, codec and storage device
CN113220125A (en) Finger interaction method and device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220114

RJ01 Rejection of invention patent application after publication