CN116311526A - Image area determining method and device, electronic equipment and storage medium - Google Patents

Image area determining method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116311526A
CN116311526A CN202310295271.3A CN202310295271A CN116311526A CN 116311526 A CN116311526 A CN 116311526A CN 202310295271 A CN202310295271 A CN 202310295271A CN 116311526 A CN116311526 A CN 116311526A
Authority
CN
China
Prior art keywords
gesture
operation area
determining
region
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310295271.3A
Other languages
Chinese (zh)
Inventor
刘永康
贺宇
李召
李明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xingtong Technology Co ltd
Original Assignee
Shenzhen Xingtong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xingtong Technology Co ltd filed Critical Shenzhen Xingtong Technology Co ltd
Priority to CN202310295271.3A priority Critical patent/CN116311526A/en
Publication of CN116311526A publication Critical patent/CN116311526A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The disclosure provides an image area determining method, an image area determining device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a target image containing gesture operation, and extracting effective gesture characteristic information of the target image; determining at least one candidate operation area, and extracting candidate operation area characteristic information of each candidate operation area; and determining a target operation area according to the effective gesture characteristic information and the at least one candidate operation area characteristic information. According to the gesture feature information processing method and device, the specific operation area of the gesture operation can be determined by combining the gesture feature information when the gesture operation is specifically executed, and the practicability and the flexibility of the determination of the operation area are improved under the condition that the identification accuracy of the operation area is guaranteed.

Description

Image area determining method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of artificial intelligence, and in particular relates to an image area determining method, an image area determining device, electronic equipment and a storage medium.
Background
Artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that reacts in a manner similar to human intelligence, with the theory and technology becoming increasingly mature since birth, and with the application area expanding, control interactions in related contexts based on gesture operations becoming more common.
In the related art, in a related man-machine interaction scene, preset interaction events corresponding to each standard gesture action are preset, the acquired gesture action is matched with the preset standard gesture action, the preset event corresponding to the standard gesture action which is successfully matched is used as the interaction event corresponding to the current gesture action, for example, a preset operation area corresponding to the standard gesture action which is successfully matched is used as the operation area corresponding to the current gesture action.
However, in the mode of presetting the standard gesture actions and the corresponding preset operation areas, the gesture actions which can be identified are limited, the operation gestures of the user are required to be relatively fixed, and the gesture operation habits of the user are changeable under the actual operation condition, so that the accuracy of the determination of the operation areas is affected.
Disclosure of Invention
In order to solve the above technical problems or at least partially solve the above technical problems, embodiments of the present disclosure provide an image region determining method, an apparatus, an electronic device, and a storage medium, which can determine a specific operation region of a gesture operation in combination with gesture feature information when the gesture operation is specifically performed, so as to improve the practicability and flexibility of determining the operation region under the condition of ensuring the accuracy of identifying the operation region.
According to an aspect of the present disclosure, there is provided an image area determining method including: acquiring a target image containing gesture operation, and extracting effective gesture characteristic information of the target image; determining at least one candidate operation area, and extracting candidate operation area characteristic information of each candidate operation area; and determining a target operation area according to the effective gesture characteristic information and at least one candidate operation area characteristic information.
According to another aspect of the present disclosure, there is provided an image area determining apparatus including: the first extraction module is used for acquiring a target image containing gesture operation and extracting effective gesture characteristic information of the target image; the second extraction module is used for determining at least one candidate operation area and extracting candidate operation area characteristic information of each candidate operation area; and the determining module is used for determining a target operation area according to the effective gesture characteristic information and the at least one candidate operation area characteristic information.
According to another aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory storing a program, wherein the program comprises instructions that, when executed by the processor, cause the processor to perform the above-described image region determination method.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the above-described image area determining method.
Compared with the prior art, the technical scheme provided by the embodiment of the disclosure has the following advantages:
the method comprises the steps of obtaining a target image containing gesture operation, extracting effective gesture characteristic information of the target image, determining at least one candidate operation area, extracting candidate operation area characteristic information of each candidate operation area, and further determining the target operation area according to the effective gesture characteristic information and the at least one candidate operation area characteristic information. Therefore, the embodiment of the disclosure can combine the gesture characteristic information during the specific gesture operation to determine the specific operation area of the gesture operation, and the practicability and the flexibility of the determination of the operation area are improved under the condition that the identification accuracy of the operation area is ensured.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, the drawings that are required for the description of the embodiments or the prior art will be briefly described below, and it will be obvious to those skilled in the art that other drawings can be obtained from these drawings without inventive effort.
FIG. 1 illustrates a schematic view of an identification scene of an image region according to an exemplary embodiment of the present disclosure;
FIG. 2 illustrates another recognition scenario diagram of an image region according to an exemplary embodiment of the present disclosure;
FIG. 3 illustrates a flow diagram of a method of determining an image region according to an exemplary embodiment of the present disclosure;
FIG. 4 illustrates a flow diagram of another method of determining an image region according to an exemplary embodiment of the present disclosure;
fig. 5 shows a schematic block diagram of an image area determining apparatus according to an exemplary embodiment of the present disclosure;
fig. 6 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below. It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise. The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.
In order to solve the above technical problems, the present disclosure proposes an image region determining method, in which the determination of an operation region is not performed according to a preset fixed rule such as matching of a preset standard gesture, but a specific operation region of a gesture operation is determined by combining gesture feature information when the gesture operation is specifically performed, so that the practicability and flexibility of the determination of the operation region are improved under the condition that the accuracy of the identification of the operation region is ensured.
In addition, in the embodiment of the present disclosure, the operation area may be an area that may be interacted with one of any operation objects, for example, when the operation object is a document, as shown in fig. 1, the operation area may be a text area in the document, so that after identifying the text area, interaction of text identification may be performed; for example, when the operation object is an operation interface, as shown in fig. 2, the operation area may be any interactive operation control area in the operation interface, so that the control area is identified to perform control function execution and the like.
The image area determining method, apparatus, electronic device, and storage medium of the present disclosure are described below with reference to the accompanying drawings.
Fig. 3 is a flowchart of an image region determining method according to an embodiment of the present disclosure, as shown in fig. 3, including:
step 301, a target image including gesture operation is acquired, and effective gesture feature information of the target image is extracted.
The target image is an image of a user when performing gesture operation on an object to be identified, wherein the object to be identified may be any object containing a plurality of operation areas, including, but not limited to, documents, operation interfaces, and the like mentioned in the above embodiments.
In one embodiment of the disclosure, in order to avoid misrecognition, gesture feature information of all gesture operations is not directly extracted, but effective gesture feature information is identified, that is, after determining that a gesture operation belongs to an effective operation gesture, the effective gesture feature information is further extracted, where in different application scenarios, the effective operation gesture is different, for example, the effective operation gesture corresponds to an effective operation gesture corresponding to a preset operation finger, for example, the effective operation gesture also corresponds to the operation finger being the preset effective operation finger, that is, only the operation finger may be identified according to the target image, for example, the current operation finger is identified as a "right index finger", and after determining that the operation gesture belongs to the preset operation gesture, the effective gesture feature information is identified according to the target image after determining that the gesture operation finger belongs to the preset operation gesture, where the preset effective operation gesture may be calibrated by a system, or may be set by a user according to personal habit, and so on.
Wherein the valid gesture feature information includes at least one of: the gesture operation information is represented by any characteristic dimension such as finger names, finger numbers, finger tip positions, finger joint positions, finger directions, finger tip distances, and position relations among finger joint positions corresponding to the gesture.
In some possible embodiments, after determining the operation gesture, the target image may be input into a pre-trained depth recognition model, where the gesture recognition model may include an effective gesture recognition module, a gesture feature extraction module, where the effective gesture recognition module is configured to determine whether the current gesture operation is an effective operation gesture, and in an actual implementation process, the effective gesture recognition module may include an effective gesture detection sub-module, a hand classification sub-module, and the like, where the effective gesture detection sub-module is configured to identify whether the current operation gesture is a preset effective operation gesture, and the hand classification sub-module is configured to detect whether the current operation finger is a preset effective operation finger, and in addition, the above-mentioned gesture feature extraction module uses a deep learning method to perform extraction of effective gesture feature information by using a depth convolution network. For example: finger joint positions of four finger joints of the corresponding finger can be output as effective gesture characteristic information.
In the embodiment, the two-dimensional image is used as the recognition base to recognize the effective gesture characteristic information, so that the operation is simple and the cost is low.
Step 302, determining at least one candidate operation region, and extracting candidate operation region characteristic information of each candidate operation region.
In one embodiment of the present disclosure, a candidate operation region, which refers to an interaction region to which gesture operation is possible, is primarily determined from a target image.
It should be noted that, in different application scenarios, the manner of determining the candidate operation area according to the target image is different, and examples are as follows:
in some possible examples, the at least one first operation region is determined according to the target image, wherein the first operation region can be understood as all operation regions which are possibly interacted with of the object to be identified and contained in the target image, and in some possible examples, the at least one second operation region contained in the target image is identified, for example, region segmentation detection is performed on the target image through a pre-trained deep learning model, so as to determine the at least one second operation region contained in the target image. For example, when the target image is a document, a text region, a chart region, or the like contained in the document may be determined as the second operation region based on the region division detection.
Further, in order to improve the accuracy of the determination of the subsequent target operation area, the area screening is further performed on the second candidate operation area, and at least one second candidate operation area meeting the preset screening condition is determined to be at least one first operation area.
In some optional embodiments, determining the region confidence of each second operation region, where the higher the region confidence is to represent the corresponding second operation region, the more likely it is that the gesture operation corresponds to the target operation region, where the region confidence may be determined according to the number of times each second operation region is determined to be the target operation region in the historical operation data, or the second operation region may be determined according to the matching degree of the region type of each second operation region and the object type of the object to be identified, for example, when the object to be identified is a document, the matching degree of the region type to the icon type to the document is obviously smaller than the matching degree of the region type to the document, and thus the region confidence of the icon type is smaller than the region confidence of the document type.
After determining the region confidence, in this embodiment, at least one second operation region where the region confidence is greater than the preset confidence threshold is determined as at least one first operation region. The region confidence may be calibrated based on experimental data.
And after the first operation area is determined, the first operation area is further screened according to the effective gesture characteristic information to obtain candidate operation areas.
In some alternative embodiments, the relative position feature information of the region is determined according to the fingertip position and/or the finger direction, and the first operation region corresponding to the first operation region feature information matched with the relative position feature information of the region is determined as the candidate operation region. The first operation region feature information corresponds to region relative position feature information, and the first operation region feature information includes, but is not limited to, region coordinate information, region contour shape information, or is: and identifying coordinate information of at least one region key point of each candidate operation region as first operation region feature information, wherein the region key points can be four coordinate points at the extreme edge of the first operation region, namely, after the four coordinate points at the extreme edge of the first operation region are determined, generating a minimum rectangular bounding box of the four coordinate points as the first region feature information and the like. For example, the region key point may be a plurality of sampling points randomly sampled in the first operation region, coordinate information of the plurality of sampling points is determined as the first region feature information, and the like.
Wherein, in some possible examples, the region relative position feature information is determined according to the fingertip position and/or the finger direction, wherein, when the region relative position feature information is determined according to the inter-finger position feature information only, the first operation region feature information is a center point of the first operation region or any one point on a region edge of the first operation region, and may also be second coordinate information of a point closest to a gesture key point or the like on the region edge of the first operation region.
In this embodiment, the relative position information of each first operation area and the gesture key point is determined according to the first coordinate information and the second coordinate information between the fingers, where the relative position information includes one or more of distance information and azimuth information, and further, a first operation area corresponding to the first operation area feature information matched with the area relative position feature information is determined to be the candidate operation area. That is, in this embodiment, since the finger key points and the operation areas to be operated have a certain relative positional association relationship, candidate operation areas that may belong to the target operation area are selected based on the relative positional information of each first operation area and the finger key points.
In some optional embodiments, the first operation area feature information is second coordinate information of an area center point of each corresponding first operation area, and when the user performs gesture operation, the target operation area may not be located at a lower side of the inter-finger point, so that the preset relative position relationship is between the first coordinate information of the inter-finger and the second coordinate information of the area center point of each first operation area, the first coordinate information is located below the second coordinate information, that is, an ordinate value corresponding to the first coordinate information is smaller than an ordinate value of the second coordinate information, and if an ordinate value corresponding to the first coordinate information is smaller than an ordinate value of the second coordinate information, the corresponding second candidate operation area is considered as a candidate operation area meeting the preset relative position.
In some optional embodiments, the first operation region feature information is second coordinate information of a point closest to the gesture key point on a region edge of each corresponding first operation region, and considering that the distance between the fingers of the operation finger is not too far away from the target operation region in the actual gesture operation scene, so in this embodiment, the preset relative position information is that the distance between the fingers of the operation finger and the region edge of each first operation region, the distance between the nearest point of the gesture key point and the first operation region is smaller than a preset pixel threshold, in this embodiment, the nearest distance between the first coordinate information and the second coordinate information is determined, and whether the nearest distance is smaller than the preset pixel threshold is determined, if so, each corresponding first operation region is regarded as a candidate operation region meeting the preset relative position.
In some possible embodiments, the relative position feature information may be determined according to the finger direction, where the first operation area feature information is an angle between the first operation area and the finger direction, and the like, and in this embodiment, the relative position information corresponds to the first operation area feature information, and is an angle between the first operation area and the finger direction, and the like, and then the direction angle between each first operation area and the finger direction is determined, and the first operation area whose direction angle is less than the preset angle threshold is determined as the candidate operation area.
Of course, the above-described embodiments of determining the candidate operation regions from the inter-finger positions and from the finger directions may be performed alone or in combination, to determine the candidate operation regions in the first operation region, or the like.
Step 303, determining a target operation area according to the effective gesture feature information and at least one candidate operation area feature information.
In one embodiment of the present disclosure, after determining at least one candidate operation region, region feature information of each candidate operation region is determined to facilitate subsequent determination of a target operation region in a feature dimension.
In one embodiment of the present disclosure, in combination with the gesture feature information and the region feature information, a target operation region corresponding to the gesture operation is determined in at least one candidate operation region. The determination process of the target operation area: on the one hand, the gesture motion is not matched with the standard gesture motion any more, the feature dimension-based recognition can be performed on various gesture motions, and the flexibility of gesture operation is improved; on the other hand, the gesture feature information and the region feature information are combined to determine the target region, so that the gesture feature is considered, the region feature dimension is considered, and the combined consideration of the gesture feature information and the region feature information further improves the accuracy of determining the target operation region.
Further, after determining the target operation area corresponding to the gesture operation, executing an interaction event corresponding to the target operation area, for example, determining text information of the target operation area to perform text recognition under a document recognition scene.
It should be noted that, in different application scenarios, according to the gesture feature information and the region feature information, in at least one candidate operation region, the manner of determining the target operation region corresponding to the gesture operation is different, which is exemplified as follows:
in some possible examples, as shown in fig. 4, determining the target operation region from the valid gesture feature information and the at least one candidate operation region feature information includes:
step 401, obtaining feature vectors according to the effective gesture feature information and the candidate operation region feature information, wherein,
the feature vector includes at least one of: the method comprises the steps of selecting a plurality of edge point coordinate vectors of a plurality of area edge points of each candidate operation area, an inter-finger coordinate vector corresponding to effective gesture feature information, a plurality of edge distance vectors of a plurality of edge lines formed by the inter-finger coordinate vector and the plurality of area edge points respectively, and a distance ratio vector of every two edge distance information in the plurality of edge distance vectors.
In one embodiment of the present disclosure, feature vectors are obtained from gesture feature information and candidate region feature information, for example, a feature extraction model is generated by pre-training, and the gesture feature information and the region feature information are input into the feature extraction model to obtain feature vectors, and the like. The candidate region feature vector may be feature information describing the candidate operation region in any dimension.
In some possible embodiments, a feature vector is obtained according to the valid gesture operation feature information and the candidate operation region feature information, wherein the feature vector may include at least one of: the gesture feature vector (e.g., what motion the gesture is, gesture state finger key point coordinates, etc.), the region feature vector of each candidate operation region (e.g., the coordinate position of the minimum bounding box of the candidate operation region, the center point coordinates of the candidate operation region, etc.), the associated feature vector of the gesture feature information with each candidate operation region (the associated feature vector may be at least one of, for example, gesture feature information, i.e., the distance between the point of the finger and the top, bottom, left, right edge of the minimum bounding box of the candidate operation region, respectively, the coordinate position information of the intersection point of the finger and the top, bottom, left, right edge of the minimum bounding box of the candidate operation region, for example, gesture feature information, i.e., the distance between the point of the finger and the top, bottom, left, right edge of the minimum bounding box of the candidate operation region, i.e., the distance between the point of the finger and the top, right edge of the minimum bounding box of the candidate operation region, respectively, the distance between the point of the finger and the top, right edge of the candidate region, etc.).
It should be emphasized that, when the feature vector includes the gesture feature information and the associated feature vector of each candidate operation region, the association relationship between the operation finger and the candidate operation region is also considered when determining the target operation region, so that the determination accuracy of the target operation region is further ensured.
For example, the gesture feature information includes inter-finger coordinate information of the operation finger, candidate region feature vectors of each candidate operation region, including: for example, the edge point coordinate information of the edge points of the plurality of regions of each candidate operation region may be four edge point coordinate information closest to the operation finger on four edge lines of the minimum bounding box corresponding to each candidate operation region, or the edge point coordinate information closest to the operation finger on each edge line corresponding to each candidate operation region, and so on, and then the gesture feature information and the associated feature vector of each candidate operation region include: and the inter-finger coordinate information is respectively corresponding to a plurality of edge distance information of a plurality of edge lines formed by a plurality of region edge points, distance ratio information of every two edge distance information in the plurality of edge distance information, and the like. For example, the inter-finger coordinate information is the plurality of edge distance information of a plurality of edge lines respectively formed by a plurality of region edge points: the direction vector of the intersection point of the minimum bounding box with the candidate operation area, the upper edge, the lower edge, the left edge and the right edge respectively, and the distance ratio information of every two edge distance information in the plurality of edge distance information is as follows: the ratio of the distance of the left edge to the distance of the right edge in the smallest bounding box in the candidate operation region, etc.
Step 402, inputting the feature vector into a pre-trained region selection model, and determining a target operation region according to a model output result of the region selection model.
In one embodiment of the present disclosure, an algorithm such as a regression tree (e.g., a gradient boosting decision tree algorithm (Gradient Boosting Decision Tree, GBTD)) may be trained in advance, a region selection model may be obtained by training according to sample data, a feature vector may be input into the region selection model trained in advance, and a target operation region may be determined according to a model output result of the region selection model, for example, if the region selection model outputs a probability value that each candidate operation region belongs to the target operation region, a candidate operation region with the largest probability value may be used as the target operation region.
Therefore, in the embodiment of the disclosure, the target operation area is determined based on the gesture operation and various feature dimensions of the corresponding candidate operation area on the object to be identified, which are related to the determination of the target operation area, the target operation area is determined without being limited by the specific gesture motion recognition, and the determination of the target area is performed by considering the relevance of the gesture operation and the related candidate operation area in the feature dimensions, so that the determination flexibility and accuracy of the target area are improved, the recognition is performed by using a two-dimensional target image, the recognition cost is lower, and the operability is higher.
In summary, the image region determining method according to the embodiment of the present disclosure obtains a target image including gesture operations, extracts effective gesture feature information of the target image, determines at least one candidate operation region, and extracts candidate operation region feature information of each candidate operation region, and further determines a target operation region according to the effective gesture feature information and the at least one candidate operation region feature information. Therefore, the embodiment of the disclosure can combine the gesture characteristic information during the specific gesture operation to determine the specific operation area of the gesture operation, and the practicability and the flexibility of the determination of the operation area are improved under the condition that the identification accuracy of the operation area is ensured.
In order to implement the above embodiment, the present disclosure also proposes a determination apparatus of an image area.
Referring to fig. 5, there is shown a schematic block diagram of an image area determining apparatus according to an exemplary embodiment of the present disclosure, as shown in fig. 5, the apparatus includes: a first extraction module 510, a second extraction module 520, a determination module 530, wherein,
the first extraction module 510 is configured to obtain a target image including gesture operations, and extract valid gesture feature information of the target image;
a second extraction module 520, configured to determine at least one candidate operation area, and extract candidate operation area feature information of each candidate operation area;
a determining module 530, configured to determine a target operation area according to the valid gesture feature information and at least one candidate operation area feature information.
In an alternative embodiment, the method further comprises: the effective gesture characteristic information determining module is used for:
extracting gesture feature information of the target image, wherein the gesture feature information comprises at least one of the following information: finger names, finger numbers, finger tip positions, finger joint positions, finger directions and finger tip distances corresponding to gestures;
and determining effective gesture characteristic information based on the gesture characteristic information by using a hand recognition model.
In an alternative embodiment, the second extraction module 520 is specifically configured to:
determining at least one first operation area according to the target image;
and obtaining the candidate operation area according to the fingertip position and/or the finger direction in the effective gesture characteristic information and the first operation area characteristic information corresponding to the at least one first operation area.
In an alternative embodiment, the second extraction module 520 is specifically configured to:
determining at least one second operation area according to the target image;
and filtering at least one second operation area with the confidence coefficient lower than a first preset value to obtain at least one first operation area.
In an alternative embodiment, the second extraction module 520 is specifically configured to:
determining the relative position characteristic information of the region according to the fingertip position and/or the finger direction;
and determining a first operation area corresponding to the first operation area characteristic information matched with the area relative position characteristic information as the candidate operation area.
In an alternative embodiment, the region relative position feature information includes: distance information from a gesture keypoint, and/or bearing information from the gesture keypoint, wherein the gesture keypoint comprises at least one of: and the inter-finger key points corresponding to the inter-finger positions and the image key points in the target image in the finger direction.
In an alternative embodiment, the determining module 530 is specifically configured to:
obtaining feature vectors according to the effective gesture feature information and the candidate operation area feature information, wherein,
the feature vector includes at least one of: the method comprises the steps of selecting edge point coordinate vectors of a plurality of area edge points of each candidate operation area, inter-finger coordinate vectors corresponding to effective gesture feature information, a plurality of edge distance vectors of a plurality of edge lines formed by the inter-finger coordinate vectors and the plurality of area edge points respectively, and distance ratio vectors of every two edge distance information in the plurality of edge distance vectors;
and inputting the feature vector into a pre-trained region selection model, and determining the target operation region according to a model output result of the region selection model.
The gesture operation area recognition device provided by the embodiment of the disclosure can execute any gesture operation area recognition method applicable to electronic equipment such as computers, smart phones and servers, and has the corresponding function modules and beneficial effects of the execution method. Details of the embodiments of the apparatus of the present disclosure that are not described in detail may refer to descriptions of any of the embodiments of the method of the present disclosure.
The exemplary embodiments of the present disclosure also provide an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor. The memory stores a computer program executable by the at least one processor for causing the electronic device to perform a method according to embodiments of the present disclosure when executed by the at least one processor.
The present disclosure also provides a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to an embodiment of the present disclosure.
The present disclosure also provides a computer program product comprising a computer program, wherein the computer program, when executed by a processor of a computer, is for causing the computer to perform a method according to embodiments of the disclosure.
Referring to fig. 6, a block diagram of an electronic device 600 that may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic devices are intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the electronic device 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 606 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the electronic device 600 are connected to the I/O interface 605, including: an input unit 606, an output unit 607, a storage unit 608, and a communication unit 609. The input unit 606 may be any type of device capable of inputting information to the electronic device 600, and the input unit 606 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 607 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 6606 can include, but is not limited to, magnetic disks, optical disks. The communication unit 609 allows the electronic device 600 to exchange information/data with other devices through a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above. For example, in some embodiments, the method of recognition of a gesture operation region may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 606. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 600 via the ROM 602 and/or the communication unit 609. In some embodiments, the computing unit 601 may be configured to perform the image region determination method by any other suitable means (e.g. by means of firmware).
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Claims (10)

1. An image area determining method, comprising:
acquiring a target image containing gesture operation, and extracting effective gesture characteristic information of the target image;
determining at least one candidate operation area, and extracting candidate operation area characteristic information of each candidate operation area;
and determining a target operation area according to the effective gesture characteristic information and at least one candidate operation area characteristic information.
2. The method of claim 1, wherein the method further comprises:
extracting gesture feature information of the target image, wherein the gesture feature information comprises at least one of the following information: finger names, finger numbers, finger tip positions, finger joint positions, finger directions and finger tip distances corresponding to gestures;
and determining effective gesture characteristic information based on the gesture characteristic information by using a hand recognition model.
3. The method of claim 1, wherein the determining at least one candidate operating region comprises:
determining at least one first operation area according to the target image;
and obtaining the candidate operation area according to the fingertip position and/or the finger direction in the effective gesture characteristic information and the first operation area characteristic information corresponding to the at least one first operation area.
4. A method according to claim 3, wherein determining at least one first operating region from the target image comprises:
determining at least one second operation area according to the target image;
and filtering at least one second operation area with the confidence coefficient lower than a first preset value to obtain at least one first operation area.
5. The method according to claim 3 or 4, wherein the obtaining the candidate operation region according to the fingertip position and/or finger direction in the valid gesture feature information and the first operation region feature information corresponding to the at least one first operation region includes:
determining the relative position characteristic information of the region according to the fingertip position and/or the finger direction;
and determining a first operation area corresponding to the first operation area characteristic information matched with the area relative position characteristic information as the candidate operation area.
6. The method of claim 5, wherein,
the region relative position characteristic information includes: distance information from a gesture keypoint, and/or bearing information from the gesture keypoint, wherein the gesture keypoint comprises at least one of: and the inter-finger key points corresponding to the inter-finger positions and the image key points in the target image in the finger direction.
7. The method of claim 5, wherein determining a target operating region based on the valid gesture feature information and at least one candidate operating region feature information comprises:
obtaining feature vectors according to the effective gesture feature information and the candidate operation area feature information, wherein,
the feature vector includes at least one of: the method comprises the steps of selecting edge point coordinate vectors of a plurality of area edge points of each candidate operation area, inter-finger coordinate vectors corresponding to effective gesture feature information, a plurality of edge distance vectors of a plurality of edge lines formed by the inter-finger coordinate vectors and the plurality of area edge points respectively, and distance ratio vectors of every two edge distance information in the plurality of edge distance vectors;
and inputting the feature vector into a pre-trained region selection model, and determining the target operation region according to a model output result of the region selection model.
8. An image area determining apparatus, comprising:
the first extraction module is used for acquiring a target image containing gesture operation and extracting effective gesture characteristic information of the target image;
the second extraction module is used for determining at least one candidate operation area and extracting candidate operation area characteristic information of each candidate operation area;
and the determining module is used for determining a target operation area according to the effective gesture characteristic information and the at least one candidate operation area characteristic information.
9. An electronic device, comprising:
a processor; and
a memory in which a program is stored,
wherein the program comprises instructions which, when executed by the processor, cause the processor to perform the method according to any of claims 1-7.
10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.
CN202310295271.3A 2023-03-22 2023-03-22 Image area determining method and device, electronic equipment and storage medium Pending CN116311526A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310295271.3A CN116311526A (en) 2023-03-22 2023-03-22 Image area determining method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310295271.3A CN116311526A (en) 2023-03-22 2023-03-22 Image area determining method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116311526A true CN116311526A (en) 2023-06-23

Family

ID=86825512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310295271.3A Pending CN116311526A (en) 2023-03-22 2023-03-22 Image area determining method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116311526A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556781A (en) * 2024-01-12 2024-02-13 杭州行芯科技有限公司 Target pattern determining method and device, electronic equipment and storage medium
CN117556781B (en) * 2024-01-12 2024-05-24 杭州行芯科技有限公司 Target pattern determining method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556781A (en) * 2024-01-12 2024-02-13 杭州行芯科技有限公司 Target pattern determining method and device, electronic equipment and storage medium
CN117556781B (en) * 2024-01-12 2024-05-24 杭州行芯科技有限公司 Target pattern determining method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106484266B (en) Text processing method and device
Lahiani et al. Real time hand gesture recognition system for android devices
CN109919077B (en) Gesture recognition method, device, medium and computing equipment
US9423908B2 (en) Distinguishing between touch gestures and handwriting
CN112990204B (en) Target detection method and device, electronic equipment and storage medium
US8965051B2 (en) Method and apparatus for providing hand detection
CN113780098B (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN113313083A (en) Text detection method and device
Lahiani et al. Hand pose estimation system based on Viola-Jones algorithm for android devices
CN114170688B (en) Character interaction relation identification method and device and electronic equipment
CN111492407B (en) System and method for map beautification
US9342152B2 (en) Signal processing device and signal processing method
CN111598128A (en) Control state identification and control method, device, equipment and medium of user interface
US20170085784A1 (en) Method for image capturing and an electronic device using the method
CN116311526A (en) Image area determining method and device, electronic equipment and storage medium
US20140285426A1 (en) Signal processing device and signal processing method
CN115101069A (en) Voice control method, device, equipment, storage medium and program product
KR20190132885A (en) Apparatus, method and computer program for detecting hand from video
CN112101368B (en) Character image processing method, device, equipment and medium
CN118053202A (en) Fingertip detection method, fingertip detection device, electronic equipment and storage medium
CN117079321A (en) Face attribute identification method and device, electronic equipment and storage medium
CN114495173A (en) Posture recognition method and device, electronic equipment and computer readable medium
CN117392754A (en) Finger reading method and device, electronic equipment and storage medium
CN114049638A (en) Image processing method, image processing device, electronic equipment and storage medium
US10474886B2 (en) Motion input system, motion input method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination