CN116310010B - Image recognition method, device, electronic equipment and storage medium - Google Patents

Image recognition method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116310010B
CN116310010B CN202310571910.4A CN202310571910A CN116310010B CN 116310010 B CN116310010 B CN 116310010B CN 202310571910 A CN202310571910 A CN 202310571910A CN 116310010 B CN116310010 B CN 116310010B
Authority
CN
China
Prior art keywords
node
node data
data
image
object model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310571910.4A
Other languages
Chinese (zh)
Other versions
CN116310010A (en
Inventor
刘耀
陈卫华
土旦强巴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qiwei Visual Media Technology Co ltd
Original Assignee
Beijing Qiwei Visual Media Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qiwei Visual Media Technology Co ltd filed Critical Beijing Qiwei Visual Media Technology Co ltd
Priority to CN202310571910.4A priority Critical patent/CN116310010B/en
Publication of CN116310010A publication Critical patent/CN116310010A/en
Application granted granted Critical
Publication of CN116310010B publication Critical patent/CN116310010B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition

Abstract

The application relates to an image recognition method, an image recognition device, electronic equipment and a storage medium, wherein the image recognition method comprises the following steps: and acquiring first image data and second image data, wherein the first image data and the second image data are integral images of adjacent frames of the target object. And carrying out node recognition on the first image data and the second image data through an open source human body gesture recognition algorithm to acquire first node data and second node data, wherein the first node data is node information corresponding to the lowest point of the target object in the first image data, and the second node data is node information corresponding to the lowest point of the target object in the second image data. And constructing a first object model, wherein the first object model is a virtual object model corresponding to the target object and is used for matching the first node data and the second node data. And acquiring a difference value between the first node data and the second node data, and binding the first node data and the second node data with the virtual space zero height where the first object model is located when the difference value does not exceed a first threshold value.

Description

Image recognition method, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of human body gesture recognition technologies, and in particular, to an image recognition method, an image recognition device, an electronic device, and a storage medium.
Background
Human body posture recognition technology is accompanied by development of computer vision and wide application in life practice, and behavior detection and motion recognition projects based on various algorithms are increasingly applied in practice and widely studied in the related fields. In the aspect of behavior monitoring, not only is the monitoring of the behavior of the bee colony carried out through information such as graphics, temperature and humidity, sound and the like, but also more applications are focused on the monitoring of the behavior of human beings. Human body gesture recognition is widely applied to the fields of video capturing, computer graphics and the like as an important reference basis for behavior monitoring.
At present, the traditional human body assessment gesture recognition method comprises an RMPE (region Multi-Person Pose Estimation) model and a Mask R-CNN (image instance segmentation) model, which are all top-down detection methods, and an openpost (human body gesture recognition algorithm) is adopted as a classical item of gesture recognition, and is mainly applied to the fields of behavior monitoring, gesture correction, action classification and the like, and in addition, the method has important research significance and application value in the fields of intelligent home, automatic driving, intelligent monitoring and the like. At present, a two-dimensional skeleton character matching method on the foot ground in a three-dimensional space based on image recognition usually adopts a recognition overall parameter coordinate as an initialization position parameter of the three-dimensional space, after a program is started, optimization is not performed any more, if characters only move in parallel in a picture, the overall deviation of the characters is not large, a synthetic effect can also be received, but if a real character jumps or walks back and forth, the virtual character model driven by the character can have the problems of suspension, drift, shake and the like, so that a final synthetic picture can be quite strange and unrealistic.
Therefore, in the conventional image recognition method, when the real character does not move in parallel in the picture in the foot ground matching, the combination effect of the real character and the virtual character model in the virtual three-dimensional space is poor.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an image recognition method, apparatus, electronic device, and storage medium that provide a good effect of synthesizing a real character and a virtual character model in a virtual three-dimensional space.
In a first aspect, the present application provides an image recognition method, the method comprising:
acquiring first image data and second image data, wherein the first image data and the second image data are integral images of adjacent frames of a target object;
node identification is carried out on the first image data and the second image data through an open source human body posture identification algorithm so as to obtain first node data and second node data, wherein the first node data is node information corresponding to the lowest point of a target object in the first image data, and the second node data is node information corresponding to the lowest point of the target object in the second image data;
constructing a first object model, wherein the first object model is a virtual object model corresponding to the target object and is used for matching the first node data and the second node data;
and acquiring a difference value between the first node data and the second node data, and carrying out matching binding on the first node data and the second node data and zero height of a virtual space where the first object model is located when the difference value does not exceed a first threshold value.
In one embodiment, the node recognition of the first image data and the second image data by the open source human body gesture recognition algorithm includes:
acquiring an overall image of the target object through a camera;
and carrying out human body node recognition on the target object through the open source human body gesture recognition algorithm so as to acquire the overall character node information, the overall character wire frame position and size parameters and the character foot node position information of the target object.
In one embodiment, the obtaining the difference between the first node data and the second node data, and when the difference does not exceed a first threshold, performing matching binding between the first node data and the second node data and a zero height of a virtual space where the first object model is located, includes:
establishing a three-dimensional coordinate system by taking a node corresponding to the lowest point of the target object as a coordinate origin;
and acquiring a first position coordinate and a second position coordinate, wherein the first position coordinate is a three-dimensional coordinate corresponding to the first node data, and the second position coordinate is a three-dimensional coordinate corresponding to the second node data.
In one embodiment, the obtaining the difference between the first node data and the second node data, and when the difference does not exceed a first threshold, performing matching binding between the first node data and the second node data and a zero height of a virtual space where the first object model is located, further includes:
acquiring a distance difference value between the first position coordinate and the second position coordinate;
and judging whether the distance difference value exceeds the second threshold value, and carrying out matching binding on the first node data and the second node data and zero height of the virtual space where the first object model is located when the distance difference value does not exceed the second threshold value.
In one embodiment, the obtaining the difference between the first node data and the second node data, and when the difference does not exceed a first threshold, performing matching binding between the first node data and the second node data and a zero height of a virtual space where the first object model is located, further includes:
outputting the first node data and the second node data to a rendering server for rendering;
binding the rendered first node data and second node data with zero height matching of the virtual space where the first object model is located so as to obtain a second object model and a third object model;
the second object model is a first object model matched with the first node data, and the third object model is a first object model matched with the second node data.
In one embodiment, the method further comprises:
when the position of the lowest point of the target object in the adjacent frame times is changed, acquiring the height difference between the lowest points of the adjacent frame times;
and judging whether the height difference exceeds a third threshold value, and carrying out matching binding on the first node data and the second node data and zero height of a virtual space where the first object model is located when the height difference does not exceed the third threshold value.
In one embodiment, the method further comprises:
acquiring the motion smooth delay frame number of the target object;
and judging whether the motion smooth delay frame number exceeds a fourth threshold value, and carrying out matching binding on the first node data and the second node data and zero height of a virtual space where the first object model is located when the motion smooth delay frame number does not exceed the fourth threshold value.
In a second aspect, the present application provides an image recognition apparatus, the apparatus comprising:
the image acquisition module is used for acquiring first image data and second image data, wherein the first image data and the second image data are integral images of adjacent frames of a target object;
the node identification module is used for carrying out node identification on the first image data and the second image data through an open source human body gesture identification algorithm so as to obtain first node data and second node data, wherein the first node data is node information corresponding to the lowest point of a target object in the first image data, and the second node data is node information corresponding to the lowest point of the target object in the second image data;
the model construction module is used for constructing a first object model, wherein the first object model is a virtual object model corresponding to the target object and is used for matching the first node data and the second node data;
and the matching and binding module is used for acquiring the difference value between the first node data and the second node data, and carrying out matching and binding on the first node data and the second node data and the zero height of the virtual space where the first object model is located when the difference value does not exceed a first threshold value.
In a third aspect, the present application provides an electronic device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring first image data and second image data, wherein the first image data and the second image data are integral images of adjacent frames of a target object;
node identification is carried out on the first image data and the second image data through an open source human body posture identification algorithm so as to obtain first node data and second node data, wherein the first node data is node information corresponding to the lowest point of a target object in the first image data, and the second node data is node information corresponding to the lowest point of the target object in the second image data;
constructing a first object model, wherein the first object model is a virtual object model corresponding to the target object and is used for matching the first node data and the second node data;
and acquiring a difference value between the first node data and the second node data, and carrying out matching binding on the first node data and the second node data and zero height of a virtual space where the first object model is located when the difference value does not exceed a first threshold value.
In a fourth aspect, the present application provides a computer storage medium storing a computer program which, when executed by a processor, performs the steps of:
acquiring first image data and second image data, wherein the first image data and the second image data are integral images of adjacent frames of a target object;
node identification is carried out on the first image data and the second image data through an open source human body posture identification algorithm so as to obtain first node data and second node data, wherein the first node data is node information corresponding to the lowest point of a target object in the first image data, and the second node data is node information corresponding to the lowest point of the target object in the second image data;
constructing a first object model, wherein the first object model is a virtual object model corresponding to the target object and is used for matching the first node data and the second node data;
and acquiring a difference value between the first node data and the second node data, and carrying out matching binding on the first node data and the second node data and zero height of a virtual space where the first object model is located when the difference value does not exceed a first threshold value.
According to the image recognition method, the device, the electronic equipment and the storage medium, the image data of the adjacent frames are obtained when the target object acts, and the node recognition is carried out on the image data of the adjacent frames through the open source human body gesture recognition algorithm, so that the node information of the lowest points corresponding to the image data of the adjacent frames is obtained. And then, obtaining the difference value between the lowest point nodes of the image data of the adjacent frame times, comparing the obtained difference value with a preset threshold value, and when the difference value between the lowest point nodes of the image data of the adjacent frame times does not exceed the preset threshold value, matching and binding the lowest point of the target object in the allowable difference value range with zero height of a virtual space where a virtual object model corresponding to the target object is located, which is built in advance, so as to achieve correction between the lowest point of the target object and the virtual object model, further achieve accurate matching of the real object and the virtual space object model when the target object moves in a non-parallel manner, and optimize the synthesis effect of the virtual object model driven by the real object in the virtual space to a certain extent.
Drawings
FIG. 1 is one of the flowcharts of the image recognition method of the present application;
FIG. 2 is a second flowchart of an image recognition method according to the present application;
FIG. 3 is a third flowchart of an image recognition method according to the present application;
FIG. 4 is a fourth flowchart of an image recognition method of the present application;
FIG. 5 is a fifth flowchart of an image recognition method of the present application;
FIG. 6 is a schematic diagram of the overall system architecture of an image recognition scheme according to one embodiment of the present application;
FIG. 7 is a schematic diagram of the workflow of the image recognition scheme in the present embodiment;
FIG. 8 is a schematic diagram of an image recognition device according to the present application;
fig. 9 is an internal structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
First, abbreviations, english and key terms related to the present application are explained:
SDI: the SDI interface is a "digital component serial interface" and the HD-SDI interface is a high definition digital input and output port of a broadcast stage, where HD represents a high definition signal. Since the SDI interface cannot directly transmit the compressed digital signal, after the compressed signal recorded by the digital video recorder, the hard disk, and other devices is replayed, the compressed signal must be decompressed and output through the SDI interface to enter the SDI system. If decompression and compression are repeated, it is necessary to cause degradation of image quality and increase of delay, and for this purpose, digital video recorders and nonlinear editing systems of various formats, own interfaces for directly transmitting compressed digital signals are provided.
Openpost: the openPose human body posture recognition project is an open source library developed by the university of Carniken Mercury (CMU) in the United states based on convolutional neural networks and supervised learning and taking Caffe as a framework. The gesture estimation of human body actions, facial expressions, finger movements and the like can be realized. Is suitable for single person and multiple persons, and has excellent robustness. The method is the first real-time multi-person two-dimensional attitude estimation application based on deep learning in the world, and the real-time multi-person two-dimensional attitude estimation application based on the deep learning is emerging like a spring bamboo shoot after raining.
UDP is a short name of User Datagram Protocol, chinese name is user datagram protocol, is a connectionless transport layer protocol in OSI (Open System Interconnection, open systems interconnection) reference model, provides a transaction-oriented simple unreliable information transfer service, and IETF RFC 768 is a formal specification of UDP.
As shown in fig. 1, in one embodiment, an image recognition method includes the steps of:
step S110, acquiring first image data and second image data, wherein the first image data and the second image data are integral images of adjacent frames of a target object.
Specifically, the server acquires integral image data of adjacent frames in the action process of the target object.
Step S120, performing node recognition on the first image data and the second image data through an open source human body gesture recognition algorithm to obtain first node data and second node data, wherein the first node data is node information corresponding to the lowest point of the target object in the first image data, and the second node data is node information corresponding to the lowest point of the target object in the second image data.
Specifically, the server performs node recognition on the image data of adjacent frames in the motion process of the target image acquired in step S120 through an open source human body gesture recognition algorithm, so as to acquire node information corresponding to the corresponding lowest points before and after the motion of the target object.
Step S130, a first object model is constructed, wherein the first object model is a virtual object model corresponding to a target object and is used for matching first node data and second node data.
Specifically, the server builds a virtual object model corresponding to the target object, wherein the virtual object model is used for matching node information of adjacent frame sub-image data in the action process of the target object.
Step S140, obtaining a difference value between the first node data and the second node data, and carrying out matching binding on the first node data and the second node data and zero height of the virtual space where the first object model is located when the difference value does not exceed a first threshold value.
Specifically, the server acquires the movement deviation between adjacent frame secondary node information in the action process of the target object, and matches and binds the lowest node data of the adjacent frame secondary node with zero height of the virtual space where the virtual object model is located in the action process of the target object when the movement deviation does not exceed a preset allowable movement deviation range, so that the virtual object model can be accurately matched with a real target object according to the node data, and the synthesis effect of the virtual object model driven by the real object in the virtual space is optimized.
It should be noted that, when the open source human body gesture recognition algorithm performs node recognition on the target object, all node information of the target object can be obtained, after the node corresponding to the lowest point of the target object is matched and bound with the virtual space in zero height, node information determined by other target objects can be correspondingly matched and bound with the virtual object in the virtual space, so that accurate matching between the whole target object and the virtual object model is realized after the node corresponding to the lowest point of the target object is matched and bound with the virtual object model, and the synthesis effect of the virtual object model driven by the target object in the virtual space is optimized.
According to the image recognition method, the node recognition is carried out on the image data of the adjacent frames through the open source human body gesture recognition algorithm so as to obtain the node information of the lowest points corresponding to the image data of the adjacent frames. And then, obtaining the difference value between the lowest point nodes of the image data of the adjacent frame times, comparing the obtained difference value with a preset threshold value, and when the difference value between the lowest point nodes of the image data of the adjacent frame times does not exceed the preset threshold value, matching and binding the lowest point of the target object in the allowable difference value range with zero height of a virtual space where a virtual object model corresponding to the target object is located, which is built in advance, so as to achieve correction between the lowest point of the target object and the virtual object model, further achieve accurate matching of the real object and the virtual space object model when the target object moves in a non-parallel manner, and optimize the synthesis effect of the virtual object model driven by the real object in the virtual space to a certain extent.
As shown in fig. 2, in one embodiment, the image recognition method of the present application performs node recognition on the first image data and the second image data through an open source human body gesture recognition algorithm, and includes the following steps:
step S121, acquiring an overall image of the target object by the camera.
Specifically, the server acquires an overall image of the target object through the camera.
Step S122, human body node recognition is carried out on the target object through an open source human body gesture recognition algorithm so as to acquire the overall character node information, the overall character wire frame position and size parameters and the character foot node position information of the target object.
Specifically, human body node recognition is performed on the target object through an open source human body posture recognition algorithm, so that overall character node information, overall character wire frame position and size parameters and character foot node position information of the target object are obtained.
As shown in fig. 3, in one embodiment, the image recognition method of the present application obtains a difference value between first node data and second node data, and performs matching binding between the first node data and the second node data and zero height of a virtual space where a first object model is located when the difference value does not exceed a first threshold value, including the following steps:
in step S141, a three-dimensional coordinate system is established with the node corresponding to the lowest point of the target object as the origin of coordinates.
Specifically, the server establishes a virtual space three-dimensional coordinate system in the virtual space by taking a node corresponding to the lowest point of the target object as the origin of coordinates.
In step S142, a first position coordinate and a second position coordinate are obtained, where the first position coordinate is a three-dimensional coordinate corresponding to the first node data, and the second position coordinate is a three-dimensional coordinate corresponding to the second node data.
Specifically, the server acquires the three-dimensional position coordinates of the virtual space corresponding to the secondary nodes of the adjacent frames when the target object acts.
Step S143, a distance difference between the first position coordinate and the second position coordinate is acquired.
Specifically, according to the virtual space three-dimensional position coordinates corresponding to the secondary nodes of the adjacent frames when the target object acts, which are obtained in step S142, the distance difference between the three-dimensional position coordinates corresponding to the secondary nodes of the adjacent frames when the target object acts is obtained through translation conversion of the coordinates.
And step S144, judging whether the distance difference value exceeds a second threshold value, and carrying out matching binding on the first node data and the second node data and zero height of the virtual space where the first object model is located when the distance difference value does not exceed the second threshold value.
Specifically, the server may automatically determine whether the distance difference between the coordinates of the three-dimensional positions of the adjacent frames of the target object obtained in step S143 exceeds the set maximum value of the allowable distance difference, and when the distance difference is within the range of the allowable distance difference, match and bind the node data corresponding to the adjacent frames of the target object with the zero height of the virtual space where the virtual object model is located.
Step S145, outputting the first node data and the second node data to a rendering server for rendering, and binding the rendered first node data and second node data with zero height matching of a virtual space where the first object model is located, so as to obtain a second object model and a third object model.
Specifically, the server outputs node data corresponding to adjacent frames to the rendering server for rendering when the target object acts, and the rendered node data is matched and bound with zero height of the virtual space where the virtual space object model is located, so that a virtual object model matched with the node data of the previous frame and a virtual object model matched with the node data of the next frame are obtained.
As shown in fig. 4, in one embodiment, the image recognition method of the present application further includes the following steps:
in step S410, when the position of the lowest point of the target object in the adjacent frame times changes, the height difference between the lowest points of the adjacent frame times is acquired.
Specifically, when the position of the lowest point of the target object in the adjacent frame changes, the server side obtains the height difference between the corresponding nodes of the lowest point of the target object in the adjacent frame in a virtual space three-dimensional coordinate system mode.
Step S420, judging whether the height difference exceeds a third threshold value, and carrying out matching binding on the first node data and the second node data and zero height of the virtual space where the first object model is located when the height difference does not exceed the third threshold value.
Specifically, the server side automatically determines whether the height difference obtained in step S410 exceeds the set maximum value of the allowed height difference, and when the actual height difference is within the allowed height difference range, matches and binds the node data of the adjacent frames of the target object to the zero height of the virtual space where the virtual object model is located.
As shown in fig. 5, in one embodiment, the image recognition method of the present application further includes the following steps:
step S510, obtaining the motion smooth delay frame number of the target object.
Specifically, the server side acquires the motion smooth delay frame number of the target object in real time.
Step S520, judging whether the motion smooth time delay frame number exceeds a fourth threshold value, and performing matching binding on the first node data and the second node data and zero height of the virtual space where the first object model is located when the motion smooth time delay frame number does not exceed the fourth threshold value.
Specifically, the server side will automatically determine whether the motion smooth delay frame number obtained in step S510 exceeds the set maximum frame number, and match and bind the data node corresponding to the adjacent frame number of the target object to the zero height of the virtual space where the virtual object model is located when the actual motion smooth delay frame number does not exceed the maximum frame number.
As shown in fig. 6 and 7, in one embodiment, in an image recognition scheme, a camera uses sony FS9, a shutter is adjusted to 50/1 second, an image calculation server is HPZ4, a rendering server is HPZ4, a network switch is S5735S, and the image calculation server is equipped with a video capture card BMD4K12G.
First, the image calculation server is turned on, and the camera shutter is turned on for 50/1 second. The image signal is connected from the camera to the image calculation server through the SDI wire, the rendering server is connected with the computer display through the DP wire, and the image calculation server and the rendering server are connected with the network wire and the exchanger in a side way. Image recognition software of the image calculation server is started to acquire the person image, the camera is horizontally placed using a 4K (3840×60) resolution camera, and the shutter is set to 50 frames per second. The real person stands 5 meters away from the camera, a whole body image of the real person is shot, and a camera image signal is input to an image recognition computer through an SDI board card port through an SDI line.
The built-in image recognition software of the computer uses an open-source human body posture recognition algorithm (openpost) to perform node recognition work of a human body, obtains integral character node information, integral character wire frame position and size parameters and character left and right foot node position information in the image respectively, and outputs the recognized character node data to another rendering server through network communication to bind the data with a virtual human three-dimensional model of the illusion engine. The computer for image recognition is matched with a square calibration plate with known parameters of 0.5 multiplied by 0.5 meter, the computer is respectively horizontally placed at six positions of 5 meters, 6 meters, 7 meters, 8 meters, 9 meters and 10 meters on the real ground on a straight line opposite to the camera, and the person stands behind the calibration plate. Knowing the real size of the calibration plate and the shooting pixel size, knowing the shooting size of the person can reversely calculate the height of the person in the real space.
The pixel size and the position relation of the calibration plate and the person on the shooting screen are obtained through an image algorithm, and the x, y and z axes of the three-dimensional space can be determined according to a right-hand coordinate system. The character stands at 5 meters, and the origin of the lowest point positioning virtual space of the feet and coordinates (0, 0) are obtained. Setting the overall threshold of the space as 20cm (when the distance difference between every two frames in the space of the person is not more than 20cm, judging the shaking state of the overall image through the virtual three-dimensional coordinates of the person before and after the next actions of the adjacent frames, and if the distance difference exceeds 20cm, not using the openpore algorithm to perform image recognition). Setting the overall threshold of the person to be 20cm (when the distance difference between every two frames of the person is not more than 20cm, judging the moving state of the overall image through the virtual three-dimensional coordinates of the person before and after the next actions of the adjacent frames, and if the distance difference exceeds 20cm, not using the openpost algorithm to perform image recognition). Setting the ground clearance height threshold value of the feet to be 40cm (when the ground clearance height value of the feet is not more than 40cm, judging the ground clearance state of the feet in the whole image through virtual three-dimensional coordinates of the people before and after the actions of the adjacent frames, and if the ground clearance state exceeds 40cm, performing image recognition without using the openpore algorithm). The action smoothing delay is set to 20 frames, so as to eliminate the ground-leaving state generated by data jitter, and if the action smoothing delay exceeds 20 frames, the openpost algorithm is not used for image recognition.
According to the image recognition scheme, the foot ground matching method of the two-dimensional skeleton character in the three-dimensional space based on image recognition can eliminate the problems of suspending, drifting, shaking and the like of the character model through the foot ground matching of the two-dimensional skeleton character in the three-dimensional space, so that the final synthesized picture is more real and stable. According to the scheme, matching binding of the character model and the ground is carried out on four data of space, time, position and data dithering, so that the synthesis effect of the virtual three-dimensional model driven by the nodes identified by the two-dimensional character image in the three-dimensional virtual space is guaranteed.
As shown in fig. 8, in one embodiment, an image recognition apparatus includes an image acquisition module 810, a node recognition module 820, a model building module 830, and a matching binding module 840.
The image acquisition module 810 is configured to acquire first image data and second image data, where the first image data and the second image data are integral images of adjacent frames of the target object.
The node recognition module 820 is configured to perform node recognition on the first image data and the second image data by using an open source human body gesture recognition algorithm, so as to obtain first node data and second node data, where the first node data is node information corresponding to a lowest point of a target object in the first image data, and the second node data is node information corresponding to a lowest point of the target object in the second image data.
The model construction module 830 is configured to construct a first object model, where the first object model is a virtual object model corresponding to a target object, and is configured to match first node data and second node data.
The matching binding module 840 is configured to obtain a difference between the first node data and the second node data, and perform matching binding on the first node data and the second node data with zero height of the virtual space where the first object model is located when the difference does not exceed a first threshold.
In one embodiment, the node identification module 820 is specifically configured to:
an overall image of the target object is acquired by a camera.
And carrying out human body node recognition on the target object through an open source human body posture recognition algorithm to acquire the overall character node information, the overall character wire frame position and size parameters and the character foot node position information of the target object.
In one embodiment, the matching binding module 840 is specifically configured to:
and establishing a three-dimensional coordinate system by taking a node corresponding to the lowest point of the target object as a coordinate origin.
And acquiring a first position coordinate and a second position coordinate, wherein the first position coordinate is a three-dimensional coordinate corresponding to the first node data, and the second position coordinate is a three-dimensional coordinate corresponding to the second node data.
And acquiring a distance difference value between the first position coordinate and the second position coordinate.
And judging whether the distance difference value exceeds a second threshold value, and carrying out matching binding on the first node data and the second node data and zero height of the virtual space where the first object model is located when the distance difference value does not exceed the second threshold value.
And outputting the first node data and the second node data to a rendering server for rendering, and matching and binding the rendered first node data and second node data with zero height of a virtual space where the first object model is located so as to obtain a second object model and a third object model.
In one embodiment, the image recognition device of the present application further includes a first acquisition module, a first judgment module, a second acquisition module, and a second judgment module.
The first acquisition module is used for acquiring the height difference between the lowest points of the adjacent frame times when the positions of the lowest points of the target objects in the adjacent frame times are changed.
The first judging module is used for judging whether the height difference exceeds a third threshold value, and carrying out matching binding on the first node data and the second node data and zero height of the virtual space where the first object model is located when the height difference does not exceed the third threshold value.
The second acquisition module is used for acquiring the motion smooth delay frame number of the target object.
The second judging module is used for judging whether the motion smooth time delay frame number exceeds a fourth threshold value, and carrying out matching binding on the first node data and the second node data and zero height of the virtual space where the first object model is located when the motion smooth time delay frame number does not exceed the fourth threshold value.
In one embodiment, a computer device is provided, which may be a smart terminal, and the internal structure thereof may be as shown in fig. 9. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an image recognition method.
It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, an electronic device includes a memory storing a computer program and a processor that implements the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer storage medium stores a computer program which, when executed by a processor, performs the steps of the method embodiments described above.
In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the steps in the above-described method embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (10)

1. An image recognition method, the method comprising:
acquiring first image data and second image data, wherein the first image data and the second image data are integral images of adjacent frames of a target object;
node identification is carried out on the first image data and the second image data through an open source human body posture identification algorithm so as to obtain first node data and second node data, wherein the first node data is node information corresponding to the lowest point of a target object in the first image data, and the second node data is node information corresponding to the lowest point of the target object in the second image data;
constructing a first object model, wherein the first object model is a virtual object model corresponding to the target object and is used for matching the first node data and the second node data;
and acquiring a difference value between the first node data and the second node data, and carrying out matching binding on the first node data and the second node data and zero height of a virtual space where the first object model is located when the difference value does not exceed a first threshold value.
2. The image recognition method according to claim 1, wherein the node recognition of the first image data and the second image data by an open source human body posture recognition algorithm includes:
acquiring an overall image of the target object through a camera;
and carrying out human body node recognition on the target object through the open source human body gesture recognition algorithm so as to acquire the overall character node information, the overall character wire frame position and size parameters and the character foot node position information of the target object.
3. The method for identifying an image according to claim 1, wherein the obtaining a difference between the first node data and the second node data, and when the difference does not exceed a first threshold, performing matching binding between the first node data and the second node data and a zero height of a virtual space where the first object model is located, includes:
establishing a three-dimensional coordinate system by taking a node corresponding to the lowest point of the target object as a coordinate origin;
and acquiring a first position coordinate and a second position coordinate, wherein the first position coordinate is a three-dimensional coordinate corresponding to the first node data, and the second position coordinate is a three-dimensional coordinate corresponding to the second node data.
4. The method for identifying an image according to claim 3, wherein the obtaining a difference between the first node data and the second node data, and when the difference does not exceed a first threshold, performing matching binding between the first node data and the second node data and a zero height of a virtual space where the first object model is located, further includes:
acquiring a distance difference value between the first position coordinate and the second position coordinate;
and judging whether the distance difference value exceeds a second threshold value, and carrying out matching binding on the first node data and the second node data and zero height of a virtual space where the first object model is located when the distance difference value does not exceed the second threshold value.
5. The method for identifying an image according to claim 4, wherein the obtaining a difference between the first node data and the second node data, and when the difference does not exceed a first threshold, performing matching binding between the first node data and the second node data and a zero height of a virtual space where the first object model is located, further includes:
outputting the first node data and the second node data to a rendering server for rendering;
binding the rendered first node data and second node data with zero height matching of the virtual space where the first object model is located so as to obtain a second object model and a third object model;
the second object model is a first object model matched with the first node data, and the third object model is a first object model matched with the second node data.
6. The image recognition method according to claim 1, characterized in that the method further comprises:
when the position of the lowest point of the target object in the adjacent frame times is changed, acquiring the height difference between the lowest points of the adjacent frame times;
and judging whether the height difference exceeds a third threshold value, and carrying out matching binding on the first node data and the second node data and zero height of a virtual space where the first object model is located when the height difference does not exceed the third threshold value.
7. The image recognition method according to claim 1, characterized in that the method further comprises:
acquiring the motion smooth delay frame number of the target object;
and judging whether the motion smooth delay frame number exceeds a fourth threshold value, and carrying out matching binding on the first node data and the second node data and zero height of a virtual space where the first object model is located when the motion smooth delay frame number does not exceed the fourth threshold value.
8. An image recognition apparatus, the apparatus comprising:
the image acquisition module is used for acquiring first image data and second image data, wherein the first image data is an integral image before the action of a target object, the second image data is an integral image after the action of the target object, and the first image data and the second image data are image data of adjacent frames;
the node identification module is used for carrying out node identification on the first image data and the second image data through an open source human body gesture identification algorithm so as to obtain first node data and second node data, wherein the first node data is node information corresponding to the lowest point of a target object in the first image data, and the second node data is node information corresponding to the lowest point of the target object in the second image data;
the model construction module is used for constructing a first object model, wherein the first object model is a virtual object model corresponding to the target object and is used for matching the first node data and the second node data;
and the matching and binding module is used for acquiring the difference value between the first node data and the second node data, and carrying out matching and binding on the first node data and the second node data and the zero height of the virtual space where the first object model is located when the difference value does not exceed a first threshold value.
9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 7 when the computer program is executed.
10. A computer storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 7.
CN202310571910.4A 2023-05-19 2023-05-19 Image recognition method, device, electronic equipment and storage medium Active CN116310010B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310571910.4A CN116310010B (en) 2023-05-19 2023-05-19 Image recognition method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310571910.4A CN116310010B (en) 2023-05-19 2023-05-19 Image recognition method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116310010A CN116310010A (en) 2023-06-23
CN116310010B true CN116310010B (en) 2023-07-21

Family

ID=86781953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310571910.4A Active CN116310010B (en) 2023-05-19 2023-05-19 Image recognition method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116310010B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116524217B (en) * 2023-07-03 2023-08-25 北京七维视觉传媒科技有限公司 Human body posture image matching method and device, electronic equipment and storage medium
CN116881515B (en) * 2023-09-07 2023-12-19 杭州行芯科技有限公司 Method and electronic equipment for comparing capacitance results solved by different algorithms

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019236588A1 (en) * 2018-06-04 2019-12-12 The Research Foundation For The State University Of New York System and method associated with expedient determination of location of one or more object(s) within a bounded perimeter of 3d space based on mapping and navigation to a precise poi destination using a smart laser pointer device
CN112711335A (en) * 2021-01-19 2021-04-27 腾讯科技(深圳)有限公司 Virtual environment picture display method, device, equipment and storage medium
CN112870704A (en) * 2021-03-18 2021-06-01 腾讯科技(深圳)有限公司 Game data processing method, device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019236588A1 (en) * 2018-06-04 2019-12-12 The Research Foundation For The State University Of New York System and method associated with expedient determination of location of one or more object(s) within a bounded perimeter of 3d space based on mapping and navigation to a precise poi destination using a smart laser pointer device
CN112711335A (en) * 2021-01-19 2021-04-27 腾讯科技(深圳)有限公司 Virtual environment picture display method, device, equipment and storage medium
CN112870704A (en) * 2021-03-18 2021-06-01 腾讯科技(深圳)有限公司 Game data processing method, device and storage medium

Also Published As

Publication number Publication date
CN116310010A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN116310010B (en) Image recognition method, device, electronic equipment and storage medium
US9361510B2 (en) Efficient facial landmark tracking using online shape regression method
CN113196289B (en) Human body action recognition method, human body action recognition system and equipment
CN107831902B (en) Motion control method and device, storage medium and terminal
CN110264509A (en) Determine the method, apparatus and its storage medium of the pose of image-capturing apparatus
Barandiaran et al. Real-time optical markerless tracking for augmented reality applications
CN109920055A (en) Construction method, device and the electronic equipment of 3D vision map
WO2019035155A1 (en) Image processing system, image processing method, and program
WO2023071964A1 (en) Data processing method and apparatus, and electronic device and computer-readable storage medium
Liu et al. Collabar: Edge-assisted collaborative image recognition for mobile augmented reality
CN108958469B (en) Method for adding hyperlinks in virtual world based on augmented reality
WO2021190321A1 (en) Image processing method and device
KR20220008843A (en) Behavior prediction method and apparatus, gait recognition method and apparatus, electronic equipment and computer readable storage medium
CN108510520B (en) A kind of image processing method, device and AR equipment
US9747695B2 (en) System and method of tracking an object
WO2022052782A1 (en) Image processing method and related device
CN112528902B (en) Video monitoring dynamic face recognition method and device based on 3D face model
KR102404867B1 (en) Apparatus and method for providing wrap around view monitoring using 3D distance information
CN110909691B (en) Motion detection method, motion detection device, computer-readable storage medium, and computer device
WO2022201803A1 (en) Information processing device, information processing method, and program
CN115131407B (en) Robot target tracking method, device and equipment oriented to digital simulation environment
CN112069979A (en) Real-time action recognition man-machine interaction system
CN116524217B (en) Human body posture image matching method and device, electronic equipment and storage medium
CN115665361A (en) Video fusion method in virtual environment and online video conference communication method
CN112016508B (en) Face recognition method, device, system, computing device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant