Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide a target object matching method, system, device and machine-readable medium for solving the problems in the prior art.
To achieve the above and other related objects, the present invention provides a target object matching method, including:
acquiring one or more matching areas of one or more target objects in different postures;
affine transforming the one or more matching regions to corresponding reference regions in a reference object;
and determining the matching degree of the target object and the reference object.
Optionally, the reference object corresponds to the target object, the target object comprising at least one of: target human body, target animal body.
Optionally, one or more continuous frame images are acquired, and one or more matching regions of one or more target human bodies in different postures are determined according to the continuous frame images.
Optionally, the matching region comprises at least one of: target human skeleton points, a target human skeleton point combination and a target human body part.
Optionally, the target human skeleton point comprises at least one of: the center point of the head, the left hip point, the right hip point, the left shoulder point and the right shoulder point.
Optionally, the target human skeleton point combination includes at least one of: a human body skeleton point combination consisting of a human head central point, a left hip point and a right hip point; a human body skeleton point combination consisting of a human head central point, a left shoulder point and a right shoulder point; a human body skeleton point combination consisting of the left shoulder point, the left hip point and the right hip point; the right shoulder point, the left hip point and the right hip point form a human body skeleton point combination.
Optionally, the target human body part comprises at least one of: human head, human shoulder, human arm, human buttock, human shank.
Optionally, if the matching region is a target human skeleton point, the reference region includes a reference human skeleton point; the reference human skeleton point includes at least one of: a head center reference point corresponding to a head center point, a left hip reference point corresponding to a left hip point, a right hip reference point corresponding to a right hip point, a left shoulder reference point corresponding to a left shoulder point, a right shoulder reference point corresponding to a right shoulder point.
Optionally, if the matching region is a target human skeleton point combination, the reference region includes a reference human skeleton point combination; the reference human skeleton point combination comprises at least one of the following components: a reference human body skeleton point combination consisting of a human head center reference point, a left hip reference point and a right hip reference point; a reference human body skeleton point combination consisting of a head center reference point, a left shoulder reference point and a right shoulder reference point; a reference human body skeleton point combination consisting of a left shoulder reference point, a left hip reference point and a right hip reference point; and the reference human body skeleton point combination is formed by the right shoulder reference point, the left hip reference point and the right hip reference point.
Optionally, the affine transformation comprises at least one of: rotation, translation and shearing.
Optionally, determining the matching degree between the target object and the reference object specifically includes:
acquiring a target human body skeleton point sequence and a reference human body skeleton point sequence, and determining a skeleton point coincidence sequence and a skeleton point coincidence ratio of the target human body skeleton point sequence and the reference human body skeleton point sequence;
calculating cosine value sequences of the internal angles of the limb triangles of the target human body skeleton point and the reference human body skeleton point according to the skeleton point coincidence sequence;
and determining the matching degree of the target human body and the reference human body according to the cosine value sequence of the limb triangle internal angle of the target human body skeleton point, the cosine value sequence of the limb triangle internal angle of the reference human body skeleton point and the skeleton point coincidence ratio.
Optionally, determining the matching degree between the target object and the reference object specifically includes:
acquiring a target human body skeleton point combination sequence and a reference human body skeleton point combination sequence, and determining a skeleton point combination coincidence sequence and a skeleton point combination coincidence ratio of the target human body skeleton point combination sequence and the reference human body skeleton point combination sequence;
calculating cosine value sequences of the internal angles of the limb triangles of the target human body skeleton point and the reference human body skeleton point according to the skeleton point combination coincidence sequence;
and determining the matching degree of the target human body and the reference human body according to the cosine value sequence of the limb triangle internal angle of the target human body skeleton point, the cosine value sequence of the limb triangle internal angle of the reference human body skeleton point and the combined coincidence ratio of the skeleton points.
The invention also provides a target object matching system, which comprises:
the acquisition module is used for acquiring one or more matching areas when one or more target objects are in different postures;
an affine transformation module for affine transforming the one or more matching regions to corresponding reference regions in a reference object;
and the matching module is used for determining the matching degree of the target object and the reference object.
Optionally, the reference object corresponds to the target object, the target object comprising at least one of: target human body, target animal body.
Optionally, the acquiring module includes an image acquiring unit and a matching region unit;
the image acquisition unit is used for acquiring one or more continuous frame images;
the matching region unit determines one or more matching regions when one or more target human bodies are in different postures according to the continuous frame images.
Optionally, the matching region comprises at least one of: target human skeleton points, a target human skeleton point combination and a target human body part.
Optionally, the target human skeleton point comprises at least one of: the center point of the head, the left hip point, the right hip point, the left shoulder point and the right shoulder point.
Optionally, the target human skeleton point combination includes at least one of: a human body skeleton point combination consisting of a human head central point, a left hip point and a right hip point; a human body skeleton point combination consisting of a human head central point, a left shoulder point and a right shoulder point; a human body skeleton point combination consisting of the left shoulder point, the left hip point and the right hip point; the right shoulder point, the left hip point and the right hip point form a human body skeleton point combination.
Optionally, the target human body part comprises at least one of: human head, human shoulder, human arm, human buttock, human shank.
Optionally, if the matching region is a target human skeleton point, the reference region includes a reference human skeleton point; the reference human skeleton point includes at least one of: a head center reference point corresponding to a head center point, a left hip reference point corresponding to a left hip point, a right hip reference point corresponding to a right hip point, a left shoulder reference point corresponding to a left shoulder point, a right shoulder reference point corresponding to a right shoulder point.
Optionally, if the matching region is a target human skeleton point combination, the reference region includes a reference human skeleton point combination; the reference human skeleton point combination comprises at least one of the following components: a reference human body skeleton point combination consisting of a human head center reference point, a left hip reference point and a right hip reference point; a reference human body skeleton point combination consisting of a head center reference point, a left shoulder reference point and a right shoulder reference point; a reference human body skeleton point combination consisting of a left shoulder reference point, a left hip reference point and a right hip reference point; and the reference human body skeleton point combination is formed by the right shoulder reference point, the left hip reference point and the right hip reference point.
Optionally, the affine transformation comprises at least one of: rotation, translation and shearing.
Optionally, the matching module includes a first processing unit, a first calculating unit and a first matching unit;
the first processing unit is used for acquiring a target human body skeleton point sequence and a reference human body skeleton point sequence, and determining a skeleton point coincidence sequence and a skeleton point coincidence ratio of the target human body skeleton point sequence and the reference human body skeleton point sequence;
the first calculation unit is used for calculating cosine value sequences of the internal angles of the limb triangles of the target human body skeleton point and the reference human body skeleton point according to the skeleton point coincidence sequence;
the first matching unit is used for determining the matching degree of the target human body and the reference human body according to the cosine value sequence of the limb triangle inner angle of the target human body skeleton point, the cosine value sequence of the limb triangle inner angle of the reference human body skeleton point and the skeleton point coincidence ratio.
Optionally, the matching module includes a second processing unit, a second calculating unit, and a second matching unit;
the second processing unit is used for acquiring a target human body skeleton point combination sequence and a reference human body skeleton point combination sequence, and determining a skeleton point combination coincidence sequence and a skeleton point combination coincidence ratio of the target human body skeleton point combination sequence and the reference human body skeleton point combination sequence;
the second calculation unit is used for calculating cosine value sequences of the internal angles of the limb triangles of the target human body skeleton point and the reference human body skeleton point according to the skeleton point combination coincidence sequence;
the second matching unit is used for determining the matching degree of the target human body and the reference human body according to the cosine value sequence of the limb triangle inner angle of the target human body skeleton point, the cosine value sequence of the limb triangle inner angle of the reference human body skeleton point and the combined coincidence ratio of the skeleton points.
The invention also provides a target object matching device, comprising:
acquiring one or more matching areas of one or more target objects in different postures;
affine transforming the one or more matching regions to corresponding reference regions in a reference object;
and determining the matching degree of the target object and the reference object.
The present invention also provides an apparatus comprising:
one or more processors; and
one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform a method as described in one or more of the above.
The present invention also provides one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform the methods as described in one or more of the above.
As described above, the target object matching method, system, device and machine-readable medium provided by the present invention have the following beneficial effects: the method comprises the steps of obtaining one or more matching areas when one or more target objects are in different postures; affine transforming the one or more matching regions to corresponding reference regions in a reference object; and determining the matching degree of the target object and the reference object. The method is based on a bottom-up deep pose method, and can directly detect key points of all human bodies in the whole continuous frame image, affine transform the key points in the pose of the target human body in the continuous frame image to another human body, and calculate the matching degree of the target human body and the reference human body according to the result after the affine transform; the method not only has short time consumption, but also can calculate the matching degree of the target human body and the reference human body under the condition that a plurality of target human bodies exist at the same time.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
Affine transformation: in the geometry, one vector space is linearly transformed and then translated into the other vector space.
Deep Pose: and detecting the deeply learned postures.
The key points are as follows: the best point or a more representative point.
Skeleton key point: the optimal skeleton point or the relatively representative skeleton point in the skeleton points of the human body.
Referring to fig. 1 to 3, the present invention provides a target object matching method, including:
s100, one or more matching areas of one or more target objects in different postures are obtained. In the embodiment of the present application, the target object may include, for example, at least one of: target human body, target animal body. If the target object is a human body, the target human body in different postures comprises at least one of the following: the target human body is a worker, and the posture of the worker during operation is determined; the target human body is a dance teacher, and the dance teacher takes a posture during dance teaching; the target human body is a student, and the student takes the posture during the inter-class exercise; the target human body is the middle-aged woman, the posture of the middle-aged woman when dancing the square, and the like.
In an exemplary embodiment, one or more continuous frame images may be acquired, for example, by a conventional computer and a common camera, and one or more matching regions of one or more target human bodies in different poses are determined from the continuous frame images. Compared with the prior art, dedicated human body detection sensor equipment does not need to be arranged, and the cost is greatly reduced. The continuous frame image includes a video, a continuously shot photograph, and the like.
In some exemplary embodiments, the matching region comprises at least one of: target human skeleton points, a target human skeleton point combination and a target human body part.
Wherein the target human skeleton points comprise at least one of: the center point of the head, the left hip point, the right hip point, the left shoulder point and the right shoulder point. The target human skeleton point combination comprises at least one of the following components: a human body skeleton point combination consisting of a human head central point, a left hip point and a right hip point; a human body skeleton point combination consisting of a human head central point, a left shoulder point and a right shoulder point; a human body skeleton point combination consisting of the left shoulder point, the left hip point and the right hip point; the right shoulder point, the left hip point and the right hip point form a human body skeleton point combination. The target human body part includes at least one of: human head, human shoulder, human arm, human buttock, human shank.
For example, the human body posture is taken as a target object, continuous frame images are taken as videos, and when a matching area of the target object is determined, one or more videos are obtained through a conventional computer and a common camera. One or more human bodies are obtained from a video picture, according to the obtained human bodies, the human body skeleton points are used as matching areas, skeleton key point positioning is carried out on the human bodies in the video picture based on a Bottom-up deep Pose method, and the skeleton key points of all the human bodies in the whole video picture can be directly detected. One or more human bodies are selected as target human bodies or human bodies to be matched, which is equivalent to that skeleton key points of all the target human bodies or all the human bodies to be matched in the whole video picture can be directly detected. Compared with a Top-down (Top-down) method in the prior art, the Deep Pose method based on the bottom-up can directly and integrally detect the human body without detecting the human body from the head of the human body first and then sequentially detecting the human body downwards according to the method in the prior art. Moreover, the embodiment of the application is efficient and accurate, the consumed time is hardly influenced by the number of people in the picture, and the frame rate under a multi-person video scene can reach more than 20FPS (Frames Per Second, the number of Frames transmitted Per Second); compared with the prior art that the frame rate is only 2FPS in a multi-person video scene, the method is obviously superior to the prior art in time consumption and multi-person scene detection. Multiplayer video scenes may include, for example, worker work, dance teaching, somatosensory games, student break-time exercises, square dances, and the like.
S200, affine transforming the one or more matching areas to corresponding reference areas in the reference object. In the embodiment of the application, the reference object corresponds to the target object, and specifically, if the target object is a human body, the reference object is also the human body; if the target object is an animal, the reference object is also an animal.
In some exemplary embodiments, the reference region comprises at least one of: reference human skeleton points, reference human skeleton point combinations and reference human body parts.
If the matching region is a target human body skeleton point, the reference region comprises a reference human body skeleton point; the reference human skeleton point includes at least one of: a head center reference point corresponding to a head center point, a left hip reference point corresponding to a left hip point, a right hip reference point corresponding to a right hip point, a left shoulder reference point corresponding to a left shoulder point, a right shoulder reference point corresponding to a right shoulder point.
If the matching area is the target human body skeleton point combination, the reference area comprises a reference human body skeleton point combination; the reference human skeleton point combination comprises at least one of the following components: a reference human body skeleton point combination consisting of a human head center reference point, a left hip reference point and a right hip reference point; a reference human body skeleton point combination consisting of a head center reference point, a left shoulder reference point and a right shoulder reference point; a reference human body skeleton point combination consisting of a left shoulder reference point, a left hip reference point and a right hip reference point; and the reference human body skeleton point combination is formed by the right shoulder reference point, the left hip reference point and the right hip reference point.
If the matching region is the target human body part, the reference region comprises a reference human body part, and the reference human body part comprises at least one of the following parts: the human body head reference, the human body shoulder reference, the human body arm reference, the human body buttock reference and the human body leg reference.
For example, in the embodiment of the present application, if the target object is a human body, two postures of the human body in a certain state are selected from a video picture, one of the human bodies is selected as the target human body or the human body to be matched, the other human body is selected as a reference human body or a standard human body, and skeleton points of the target human body are affine transformed to the other reference human body, so that the skeleton points of the human body are unified to approximately the same scale and angle. Wherein the affine transformation operation comprises at least one of: rotation, translation and shearing. Since affine transformation relations can be determined from three human skeleton key points and there are at most 18 human skeleton key points per human body, there are 816 choices under the condition that affine transformation relations are determined from only three human skeleton points. Thus, a sequence consisting of 18 human skeletal key points corresponds to a human skeletal point sequence; at least three human skeleton key points are randomly selected from 18 human skeleton key points, the selected human skeleton key points are used as a human skeleton key point combination, and a sequence formed by the human skeleton key point combination is equivalent to a human skeleton point combination sequence. The method and the device can select the optimal three key points of the human body skeleton as affine transformation reference points, or select the combination of more than three key points of the human body skeleton as affine transformation reference points. For example, in the embodiment of the present application, a relatively representative three-point combination in the common visible keypoint sequence is selected as a candidate point combination: that is, affine transformations are sequentially performed by selecting (head center point, left hip point, right hip point), (head center point, left shoulder point, right shoulder point), (left shoulder point, left hip point, right hip point), (right shoulder point, left hip point, right hip point), and the like. If the matching degree calculated according to the candidate point combination is the highest, the candidate point combination is the best point combination or the key point combination.
And S300, determining the matching degree of the target object and the reference object.
In an exemplary embodiment, the determining the matching degree of the target object and the reference object by using the human skeleton point as the matching region specifically includes:
and acquiring a target human body skeleton point sequence and a reference human body skeleton point sequence, and determining a skeleton point coincidence sequence and a skeleton point coincidence ratio of the target human body skeleton point sequence and the reference human body skeleton point sequence. Specifically, a target human body skeleton point sequence consisting of 18 human body skeleton key points of a target human body and a reference human body skeleton point sequence consisting of 18 human body skeleton key points of a reference human body are obtained; and then determining a skeleton point coincidence sequence and a skeleton point coincidence ratio of the target human body skeleton point sequence and the reference human body skeleton point sequence.
Calculating cosine value sequences of the internal angles of the limb triangles of the target human body skeleton point and the reference human body skeleton point according to the skeleton point coincidence sequence;
and determining the matching degree of the target human body and the reference human body according to the cosine value sequence of the limb triangle internal angle of the target human body skeleton point, the cosine value sequence of the limb triangle internal angle of the reference human body skeleton point and the skeleton point coincidence ratio.
In an exemplary embodiment, the determining the matching degree of the target object and the reference object by using the human skeleton point combination as the matching region specifically includes:
and acquiring a target human body skeleton point combination sequence and a reference human body skeleton point combination sequence, and determining a skeleton point combination coincidence sequence and a skeleton point combination coincidence ratio of the target human body skeleton point combination sequence and the reference human body skeleton point combination sequence. Specifically, more than three human skeleton key points are arbitrarily selected from 18 human skeleton key points of the target human body as a target human skeleton point combination, so that a target human skeleton point combination sequence is formed. Selecting corresponding human skeleton key points from the reference human body to form a reference human skeleton point combination corresponding to the target human skeleton point combination and simultaneously form a reference human skeleton point sequence; and determining a skeleton point combination coincidence sequence and a skeleton point combination coincidence ratio of the target human body skeleton point combination sequence and the reference human body skeleton point combination sequence.
Calculating cosine value sequences of the internal angles of the limb triangles of the target human body skeleton point and the reference human body skeleton point according to the skeleton point combination coincidence sequence;
and determining the matching degree of the target human body and the reference human body according to the cosine value sequence of the limb triangle internal angle of the target human body skeleton point, the cosine value sequence of the limb triangle internal angle of the reference human body skeleton point and the combined coincidence ratio of the skeleton points.
In the embodiment of the present application, as shown in fig. 2 and 3, determining the matching degree between the target human body and the reference human body by using the human skeleton point as the matching region includes:
respectively obtaining a target human body skeleton point sequence and a reference human body skeleton point sequence, namely respectively obtaining human body skeleton key point sequences of a target human body and a reference human body, carrying out affine transformation on human body skeleton key points of the target human body into the reference human body, and then determining a human body skeleton key point coincidence sequence and a human body skeleton key point coincidence ratio r. And if the human skeleton key points of the target human body are affine transformed to the reference human body and the common visible human skeleton key points exist in the target human body and the reference human body, counting all the common visible human skeleton key points in the target human body and the reference human body, and recording the result as a human skeleton key point coincidence sequence. And dividing the number of all the visible human skeleton key points of the target human body and the reference human body by the number of all the visible human skeleton key points according to the counted number of all the common visible human skeleton key points, and recording the result as a human skeleton key point coincidence ratio r.
The limb angles of the common visible skeleton points of the target human skeleton and the reference human skeleton are respectively calculated and expressed by cosine values, and the range is (-1.0, 1.0). Namely, the cosine value sequences of the inner angles of the limb triangles of the target human body skeleton point and the reference human body skeleton point are calculated according to the human body skeleton key point coincidence sequence. Calculating a cosine value sequence of the target human body skeleton points according to the common visible skeleton key point sequence: a is0,a1,…,ai(ii) a Calculating a cosine value sequence of the target human body skeleton points according to the common visible skeleton key point sequence: b0,b1,...,bi。
Calculating the similarity S according to the cosine value sequence of the limb triangle internal angle of the target human body skeleton point, the cosine value sequence of the limb triangle internal angle of the reference human body skeleton point and the skeleton point coincidence ratio r as follows:
and determining the matching degree of the target human body and the reference human body according to the similarity S. The matching degree and the similarity value correspond to each other, that is, the similarity S between the target human body and the reference human body is 95%, and the matching degree between the target human body and the reference human body is 95%.
The method comprises the steps of obtaining one or more matching areas when one or more target objects are in different postures; affine transforming the one or more matching regions to corresponding reference regions in a reference object; and determining the matching degree of the target object and the reference object. The method is based on a bottom-up Deep Pose method, can directly detect the human skeleton points of all human bodies in the whole video, selects one or more human bodies as target human bodies, and is equivalent to directly detecting the human skeleton points of all target human bodies in the whole video. And selecting human skeleton key points in the human skeleton points, carrying out affine transformation on the skeleton key points of the target human body to the reference human body, and finding out a skeleton key point sequence which is visible by the target human body and the reference human body together after the affine transformation. And calculating limb angles of the target human body and the reference human body according to the commonly visible skeleton key point sequence, namely calculating cosine value sequences of the internal angles of the respective limb triangles of the target human body skeleton point and the reference human body skeleton point according to the commonly visible skeleton key point sequence. And finally, calculating similarity according to the cosine value sequence of the limb triangle internal angle of the target human body skeleton point, the cosine value sequence of the limb triangle internal angle of the reference human body skeleton point and the coincidence ratio of the skeleton points, and correspondingly determining the matching degree of the target human body and the reference human body according to the similarity. The method is efficient and accurate, the time consumption is hardly influenced by the number of people in the picture, and the frame rate in a multi-person video scene can reach more than 20FPS (Frames Per Second, the number of Frames transmitted Per Second). Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.
As shown in fig. 4, the present application further provides a target object matching system, which includes:
an obtaining module M10, configured to obtain one or more matching regions when one or more target objects are in different poses. In the embodiment of the present application, the target object may include, for example, at least one of: target human body, target animal body. If the target object is a human body, the target human body in different postures comprises at least one of the following: the target human body is a worker, and the posture of the worker during operation is determined; the target human body is a dance teacher, and the dance teacher takes a posture during dance teaching; the target human body is a student, and the student takes the posture during the inter-class exercise; the target human body is the middle-aged woman, the posture of the middle-aged woman when dancing the square, and the like.
In an exemplary embodiment, as shown in fig. 5, the acquiring module M10 includes an image acquiring unit D10 and a matching region unit D20;
the image acquisition unit D10 is used for acquiring one or more continuous frame images; the image acquisition unit D10 is constituted by a conventional computer and a general camera, for example. The image acquisition unit D10 acquires one or more continuous frame images by a conventional computer and a general camera.
The matching region unit D20 is connected to the image obtaining unit D10 and is configured to determine one or more matching regions when one or more target human bodies are in different postures according to the continuous frame images. Compared with the prior art, dedicated human body detection sensor equipment does not need to be arranged, and the cost is greatly reduced. The continuous frame image includes a video, a continuously shot photograph, and the like.
In some exemplary embodiments, the matching region comprises at least one of: target human skeleton points, a target human skeleton point combination and a target human body part.
Wherein the target human skeleton points comprise at least one of: the center point of the head, the left hip point, the right hip point, the left shoulder point and the right shoulder point. The target human skeleton point combination comprises at least one of the following components: a human body skeleton point combination consisting of a human head central point, a left hip point and a right hip point; a human body skeleton point combination consisting of a human head central point, a left shoulder point and a right shoulder point; a human body skeleton point combination consisting of the left shoulder point, the left hip point and the right hip point; the right shoulder point, the left hip point and the right hip point form a human body skeleton point combination. The target human body part includes at least one of: human head, human shoulder, human arm, human buttock, human shank.
For example, the human body posture is taken as a target object, continuous frame images are taken as videos, and when a matching area of the target object is determined, one or more videos are obtained through a conventional computer and a common camera. One or more human bodies are obtained from a video picture, according to the obtained human bodies, the human body skeleton points are used as matching areas, skeleton key point positioning is carried out on the human bodies in the video picture based on a Bottom-up deep Pose method, and the skeleton key points of all the human bodies in the whole video picture can be directly detected. One or more human bodies are selected as target human bodies or human bodies to be matched, which is equivalent to that skeleton key points of all the target human bodies or all the human bodies to be matched in the whole video picture can be directly detected. Compared with a Top-down (Top-down) method in the prior art, the Deep Pose method based on the bottom-up can directly and integrally detect the human body without detecting the human body from the head of the human body first and then sequentially detecting the human body downwards according to the method in the prior art. Moreover, the embodiment of the application is efficient and accurate, the consumed time is hardly influenced by the number of people in the picture, and the frame rate under a multi-person video scene can reach more than 20FPS (Frames Per Second, the number of Frames transmitted Per Second); compared with the prior art that the frame rate is only 2FPS in a multi-person video scene, the method is obviously superior to the prior art in time consumption and multi-person scene detection. Multiplayer video scenes may include, for example, worker work, dance teaching, somatosensory games, student break-time exercises, square dances, and the like.
An affine transformation module M20 for affine transforming the one or more matching regions to corresponding reference regions in the reference object. In the embodiment of the application, the reference object corresponds to the target object, and if the target object is a human body, the reference object is also the human body; if the target object is an animal, the reference object is also an animal.
In some exemplary embodiments, the reference region comprises at least one of: reference human skeleton points, reference human skeleton point combinations and reference human body parts.
If the matching region is a target human body skeleton point, the reference region comprises a reference human body skeleton point; the reference human skeleton point includes at least one of: a head center reference point corresponding to a head center point, a left hip reference point corresponding to a left hip point, a right hip reference point corresponding to a right hip point, a left shoulder reference point corresponding to a left shoulder point, a right shoulder reference point corresponding to a right shoulder point.
If the matching area is the target human body skeleton point combination, the reference area comprises a reference human body skeleton point combination; the reference human skeleton point combination comprises at least one of the following components: a reference human body skeleton point combination consisting of a human head center reference point, a left hip reference point and a right hip reference point; a reference human body skeleton point combination consisting of a head center reference point, a left shoulder reference point and a right shoulder reference point; a reference human body skeleton point combination consisting of a left shoulder reference point, a left hip reference point and a right hip reference point; and the reference human body skeleton point combination is formed by the right shoulder reference point, the left hip reference point and the right hip reference point.
If the matching region is the target human body part, the reference region comprises a reference human body part, and the reference human body part comprises at least one of the following parts: the human body head reference, the human body shoulder reference, the human body arm reference, the human body buttock reference and the human body leg reference.
For example, in the embodiment of the present application, if the target object is a human body, two postures of the human body in a certain state are selected from a video picture, one of the human bodies is selected as the target human body or the human body to be matched, the other human body is selected as a reference human body or a standard human body, and skeleton points of the target human body are affine transformed to the other reference human body, so that the skeleton points of the human body are unified to approximately the same scale and angle. Wherein the affine transformation operation comprises at least one of: rotation, translation and shearing. Since affine transformation relations can be determined from three human skeleton key points and there are at most 18 human skeleton key points per human body, there are 816 choices under the condition that affine transformation relations are determined from only three human skeleton points. Thus, a sequence consisting of 18 human skeletal key points corresponds to a human skeletal point sequence; at least three human skeleton key points are randomly selected from 18 human skeleton key points, the selected human skeleton key points are used as a human skeleton key point combination, and a sequence formed by the human skeleton key point combination is equivalent to a human skeleton point combination sequence. The method and the device can select the optimal three key points of the human body skeleton as affine transformation reference points, or select the combination of more than three key points of the human body skeleton as affine transformation reference points. For example, in the embodiment of the present application, a relatively representative three-point combination in the common visible keypoint sequence is selected as a candidate point combination: that is, affine transformations are sequentially performed by selecting (head center point, left hip point, right hip point), (head center point, left shoulder point, right shoulder point), (left shoulder point, left hip point, right hip point), (right shoulder point, left hip point, right hip point), and the like. If the matching degree calculated according to the candidate point combination is the highest, the candidate point combination is the best point combination or the key point combination.
And the matching module M30 is used for determining the matching degree of the target object and the reference object.
In an exemplary embodiment, as shown in fig. 6, the matching module M30 includes a first processing unit D30, a first computing unit D40, and a first matching unit D50;
the first processing unit D30 is configured to obtain a target human skeleton point sequence and a reference human skeleton point sequence, and determine a skeleton point coincidence sequence and a skeleton point coincidence ratio of the target human skeleton point sequence and the reference human skeleton point sequence. Specifically, a target human body skeleton point sequence consisting of 18 human body skeleton key points of a target human body and a reference human body skeleton point sequence consisting of 18 human body skeleton key points of a reference human body are obtained; and then determining a skeleton point coincidence sequence and a skeleton point coincidence ratio of the target human body skeleton point sequence and the reference human body skeleton point sequence.
The first calculating unit D40 is connected with the first processing unit D30 and is used for calculating cosine value sequences of the internal angles of the limb triangles of the target human body skeleton point and the reference human body skeleton point according to the skeleton point coincidence sequence;
the first matching unit D50 is connected to the first calculating unit D40, and is configured to determine a matching degree between the target human body and the reference human body according to the cosine value sequence of the internal angle of the limb triangle of the target human body skeleton point, the cosine value sequence of the internal angle of the limb triangle of the reference human body skeleton point, and the skeleton point coincidence ratio.
In an exemplary embodiment, as shown in fig. 7, the matching module M30 includes a second processing unit D60, a second computing unit D70 and a second matching unit D80;
the second processing unit D60 is configured to obtain a target human body skeleton point combination sequence and a reference human body skeleton point combination sequence, and determine a skeleton point combination coincidence sequence and a skeleton point combination coincidence ratio of the target human body skeleton point combination sequence and the reference human body skeleton point combination sequence. Specifically, more than three human skeleton key points are arbitrarily selected from 18 human skeleton key points of the target human body as a target human skeleton point combination, so that a target human skeleton point combination sequence is formed. Selecting corresponding human skeleton key points from the reference human body to form a reference human skeleton point combination corresponding to the target human skeleton point combination and simultaneously form a reference human skeleton point sequence; and determining a skeleton point combination coincidence sequence and a skeleton point combination coincidence ratio of the target human body skeleton point combination sequence and the reference human body skeleton point combination sequence.
The second calculating unit D70 is connected with the second processing unit D60 and is used for calculating cosine value sequences of the internal angles of the respective limb triangles of the target human body skeleton point and the reference human body skeleton point according to the skeleton point combination coincidence sequence;
the second matching unit D80 is connected with the second calculating unit D70 and is used for determining the matching degree of the target human body and the reference human body according to the cosine value sequence of the limb triangle internal angle of the target human body skeleton point, the cosine value sequence of the limb triangle internal angle of the reference human body skeleton point and the combination coincidence ratio of the skeleton points.
In the embodiment of the present application, as shown in fig. 2 and 3, determining the matching degree between the target human body and the reference human body by using the human skeleton point as the matching region includes:
respectively obtaining a target human body skeleton point sequence and a reference human body skeleton point sequence, namely respectively obtaining human body skeleton key point sequences of a target human body and a reference human body, carrying out affine transformation on human body skeleton key points of the target human body into the reference human body, and then determining a human body skeleton key point coincidence sequence and a human body skeleton key point coincidence ratio r. And if the human skeleton key points of the target human body are affine transformed to the reference human body and the common visible human skeleton key points exist in the target human body and the reference human body, counting all the common visible human skeleton key points in the target human body and the reference human body, and recording the result as a human skeleton key point coincidence sequence. And dividing the number of all the visible human skeleton key points of the target human body and the reference human body by the number of all the visible human skeleton key points according to the counted number of all the common visible human skeleton key points, and recording the result as a human skeleton key point coincidence ratio r.
The limb angles of the common visible skeleton points of the target human skeleton and the reference human skeleton are respectively calculated and expressed by cosine values, and the range is (-1.0, 1.0). Namely, the cosine value sequences of the inner angles of the limb triangles of the target human body skeleton point and the reference human body skeleton point are calculated according to the human body skeleton key point coincidence sequence. Calculating a cosine value sequence of the target human body skeleton points according to the common visible skeleton key point sequence: a is0,a1,...,ai(ii) a Calculating a cosine value sequence of the target human body skeleton points according to the common visible skeleton key point sequence: b0,b1,…,bi。
Calculating the similarity S according to the cosine value sequence of the limb triangle internal angle of the target human body skeleton point, the cosine value sequence of the limb triangle internal angle of the reference human body skeleton point and the skeleton point coincidence ratio r as follows:
and determining the matching degree of the target human body and the reference human body according to the similarity S. The matching degree and the similarity value correspond to each other, that is, the similarity S between the target human body and the reference human body is 95%, and the matching degree between the target human body and the reference human body is 95%.
The method comprises the steps of obtaining one or more matching areas when one or more target objects are in different postures; affine transforming the one or more matching regions to corresponding reference regions in a reference object; and determining the matching degree of the target object and the reference object. The method is based on a bottom-up Deep Pose method, can directly detect the human skeleton points of all human bodies in the whole video, selects one or more human bodies as target human bodies, and is equivalent to directly detecting the human skeleton points of all target human bodies in the whole video. And selecting human skeleton key points in the human skeleton points, carrying out affine transformation on the skeleton key points of the target human body to the reference human body, and finding out a skeleton key point sequence which is visible by the target human body and the reference human body together after the affine transformation. And calculating limb angles of the target human body and the reference human body according to the commonly visible skeleton key point sequence, namely calculating cosine value sequences of the internal angles of the respective limb triangles of the target human body skeleton point and the reference human body skeleton point according to the commonly visible skeleton key point sequence. And finally, calculating similarity according to the cosine value sequence of the limb triangle internal angle of the target human body skeleton point, the cosine value sequence of the limb triangle internal angle of the reference human body skeleton point and the coincidence ratio of the skeleton points, and correspondingly determining the matching degree of the target human body and the reference human body according to the similarity. The method is efficient and accurate, the time consumption is hardly influenced by the number of people in the picture, and the frame rate in a multi-person video scene can reach more than 20FPS (Frames Per Second, the number of Frames transmitted Per Second). Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.
An embodiment of the present application further provides a target object matching device, including:
acquiring one or more matching areas of one or more target objects in different postures;
affine transforming the one or more matching regions to corresponding reference regions in a reference object;
and determining the matching degree of the target object and the reference object.
In this embodiment, the target object matching device executes the system or the method, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.
An embodiment of the present application further provides an apparatus, which may include: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of fig. 1. In practical applications, the device may be used as a terminal device, and may also be used as a server, where examples of the terminal device may include: the mobile terminal includes a smart phone, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a vehicle-mounted computer, a desktop computer, a set-top box, an intelligent television, a wearable device, and the like.
Embodiments of the present application also provide a non-transitory readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the device may execute instructions (instructions) of steps included in the method in fig. 1 according to the embodiments of the present application.
Fig. 8 is a schematic diagram of a hardware structure of a terminal device according to an embodiment of the present application. As shown, the terminal device may include: an input device 1100, a first processor 1101, an output device 1102, a first memory 1103, and at least one communication bus 1104. The communication bus 1104 is used to implement communication connections between the elements. The first memory 1103 may include a high-speed RAM memory, and may also include a non-volatile storage NVM, such as at least one disk memory, and the first memory 1103 may store various programs for performing various processing functions and implementing the method steps of the present embodiment.
Alternatively, the first processor 1101 may be, for example, a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and the first processor 1101 is coupled to the input device 1100 and the output device 1102 through a wired or wireless connection.
Optionally, the input device 1100 may include a variety of input devices, such as at least one of a user-oriented user interface, a device-oriented device interface, a software programmable interface, a camera, and a sensor. Optionally, the device interface facing the device may be a wired interface for data transmission between devices, or may be a hardware plug-in interface (e.g., a USB interface, a serial port, etc.) for data transmission between devices; optionally, the user-facing user interface may be, for example, a user-facing control key, a voice input device for receiving voice input, and a touch sensing device (e.g., a touch screen with a touch sensing function, a touch pad, etc.) for receiving user touch input; optionally, the programmable interface of the software may be, for example, an entry for a user to edit or modify a program, such as an input pin interface or an input interface of a chip; the output devices 1102 may include output devices such as a display, audio, and the like.
In this embodiment, the processor of the terminal device includes a function for executing each module of the speech recognition apparatus in each device, and specific functions and technical effects may refer to the above embodiments, which are not described herein again.
Fig. 9 is a schematic hardware structure diagram of a terminal device according to an embodiment of the present application. FIG. 9 is a specific embodiment of the implementation of FIG. 8. As shown in fig. 9, the terminal device of the present embodiment may include a second processor 1201 and a second memory 1202.
The second processor 1201 executes the computer program code stored in the second memory 1202 to implement the method described in fig. 1 in the above embodiment.
The second memory 1202 is configured to store various types of data to support operations at the terminal device. Examples of such data include instructions for any application or method operating on the terminal device, such as messages, pictures, videos, and so forth. The second memory 1202 may include a Random Access Memory (RAM) and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
Optionally, a second processor 1201 is provided in the processing assembly 1200. The terminal device may further include: communication component 1203, power component 1204, multimedia component 1205, speech component 1206, input/output interfaces 1207, and/or sensor component 1208. The specific components included in the terminal device are set according to actual requirements, which is not limited in this embodiment.
The processing component 1200 generally controls the overall operation of the terminal device. The processing component 1200 may include one or more second processors 1201 to execute instructions to perform all or part of the steps of the target object matching method described above. Further, the processing component 1200 can include one or more modules that facilitate interaction between the processing component 1200 and other components. For example, the processing component 1200 can include a multimedia module to facilitate interaction between the multimedia component 1205 and the processing component 1200.
The power supply component 1204 provides power to the various components of the terminal device. The power components 1204 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the terminal device.
The multimedia components 1205 include a display screen that provides an output interface between the terminal device and the user. In some embodiments, the display screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the display screen includes a touch panel, the display screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
The voice component 1206 is configured to output and/or input voice signals. For example, the voice component 1206 includes a Microphone (MIC) configured to receive external voice signals when the terminal device is in an operational mode, such as a voice recognition mode. The received speech signal may further be stored in the second memory 1202 or transmitted via the communication component 1203. In some embodiments, the speech component 1206 further comprises a speaker for outputting speech signals.
The input/output interface 1207 provides an interface between the processing component 1200 and peripheral interface modules, which may be click wheels, buttons, etc. These buttons may include, but are not limited to: a volume button, a start button, and a lock button.
The sensor component 1208 includes one or more sensors for providing various aspects of status assessment for the terminal device. For example, the sensor component 1208 may detect an open/closed state of the terminal device, relative positioning of the components, presence or absence of user contact with the terminal device. The sensor assembly 1208 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact, including detecting the distance between the user and the terminal device. In some embodiments, the sensor assembly 1208 may also include a camera or the like.
The communication component 1203 is configured to facilitate communications between the terminal device and other devices in a wired or wireless manner. The terminal device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one embodiment, the terminal device may include a SIM card slot therein for inserting a SIM card therein, so that the terminal device may log onto a GPRS network to establish communication with the server via the internet.
As can be seen from the above, the communication component 1203, the voice component 1206, the input/output interface 1207 and the sensor component 1208 involved in the embodiment of fig. 9 can be implemented as the input device in the embodiment of fig. 8.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.