CN111582207A - Image processing method, image processing device, electronic equipment and storage medium - Google Patents

Image processing method, image processing device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111582207A
CN111582207A CN202010403620.5A CN202010403620A CN111582207A CN 111582207 A CN111582207 A CN 111582207A CN 202010403620 A CN202010403620 A CN 202010403620A CN 111582207 A CN111582207 A CN 111582207A
Authority
CN
China
Prior art keywords
target object
target
image
depth
reference node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010403620.5A
Other languages
Chinese (zh)
Other versions
CN111582207B (en
Inventor
王灿
李杰锋
刘文韬
钱晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN202010403620.5A priority Critical patent/CN111582207B/en
Publication of CN111582207A publication Critical patent/CN111582207A/en
Priority to PCT/CN2021/084625 priority patent/WO2021227694A1/en
Priority to TW110115664A priority patent/TWI777538B/en
Application granted granted Critical
Publication of CN111582207B publication Critical patent/CN111582207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The present disclosure provides an image processing method, an apparatus, an electronic device, and a storage medium, wherein the method includes: identifying a target region of a target object in a first image; determining first two-dimensional position information of a plurality of key points representing the posture of the target object in a first image respectively, the relative depth of each key point relative to a reference node of the target object and the absolute depth of the reference node of the target object in a camera coordinate system based on a target area corresponding to the target object; and determining three-dimensional position information of a plurality of key points of the target object in a camera coordinate system respectively based on the first two-dimensional position information, the relative depth and the absolute depth of the target object. The three-dimensional position information of the plurality of key points of the target object in the camera coordinate system can be obtained more accurately based on the first two-dimensional position information of the target object, the relative depth relative to the reference node and the absolute depth of the reference node.

Description

Image processing method, image processing device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.
Background
The three-dimensional human body posture detector is widely applied to the fields of security, games, entertainment and the like. The current three-dimensional human body posture detection method generally identifies first two-dimensional position information of human body key points in an image, and then converts the first two-dimensional position information into three-dimensional position information according to a predetermined position relationship between the human body key points.
The human body posture obtained by the current three-dimensional human body posture detection method has larger error.
Disclosure of Invention
The embodiment of the disclosure at least provides an image processing method and device, electronic equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides an image processing method, including: identifying a target region of a target object in the first image; determining first two-dimensional position information of a plurality of key points representing the target object posture in the first image respectively, the relative depth of each key point relative to a reference node of the target object and the absolute depth of the reference node of the target object in a camera coordinate system based on a target area corresponding to the target object; determining three-dimensional position information of a plurality of key points of the target object in the camera coordinate system respectively based on the first two-dimensional position information, the relative depth, and the absolute depth of the target object.
In this way, the three-dimensional position information of the plurality of key points of the target object in the camera coordinate system can be obtained more accurately, the three-dimensional position information of the plurality of key points of the target object in the camera coordinate system can represent the three-dimensional posture of the target object, and the higher the precision of the three-dimensional position information is, the higher the precision of the obtained three-dimensional posture of the target object is.
In a possible embodiment, the identifying a target region of a target object in the first image includes: performing feature extraction on the first image to obtain a feature map of the first image; and determining a plurality of target bounding boxes from a plurality of candidate bounding boxes generated in advance based on the feature map, and determining a target area corresponding to the target object based on the target bounding boxes.
Therefore, the target area of the target object is determined by two steps, the position of each target object in the first image can be accurately detected from the first image, and the human body information integrity and the detection accuracy in the subsequent key point detection process are improved.
In a possible implementation, the determining, based on the target bounding box, a target area corresponding to the target object includes: determining a feature subgraph corresponding to each target bounding box based on a plurality of target bounding boxes and the feature graph; and performing border frame regression processing on the basis of the feature subgraphs respectively corresponding to the target border frames to obtain target areas corresponding to the target objects.
In this way, the feature subgraphs corresponding to the target bounding boxes are subjected to bounding box regression processing, and the position of each target object in the first image can be accurately detected from the first image.
In a possible embodiment, determining an absolute depth of a reference node of the target object in a camera coordinate system based on a target area corresponding to the target object includes: determining a target feature map corresponding to the target image based on a target area corresponding to the target object and the first image; performing depth recognition processing based on a target feature map corresponding to the target object to obtain normalized absolute depth of a reference node of the target object; and obtaining the absolute depth of the reference node of the target object in the camera coordinate system based on the normalized absolute depth and the parameter matrix of the camera.
Therefore, the situation that the absolute depths acquired by different first images acquired by different cameras at the same view angle and the same position are different due to the fact that the absolute depths of the reference nodes are directly predicted based on the target feature map caused by different internal references of the cameras can be avoided as much as possible.
In a possible implementation manner, performing depth recognition processing based on a target feature map corresponding to the target object to obtain a normalized absolute depth of a reference node of the target object includes: acquiring an initial depth image based on the first image; the pixel value of any first pixel point in the initial depth image represents an initial depth value of a second pixel point corresponding to the first pixel point in the first image in the camera coordinate system; determining second two-dimensional position information of a reference node corresponding to the target object in the first image based on a target feature map corresponding to the target object, and determining an initial depth value of the reference node corresponding to the target object based on the second two-dimensional position information and the initial depth image; determining a normalized absolute depth of the reference node of the target object based on the initial depth value of the reference node corresponding to the target object and the target feature map corresponding to the target object.
In this way, the normalized absolute depth of the reference node obtained by the process can be made more accurate.
In one possible embodiment, the determining a normalized absolute depth of the reference node of the target object based on the initial depth value of the reference node corresponding to the target object and the target feature map corresponding to the target object includes: performing at least one stage of first convolution processing on a target characteristic diagram corresponding to the target object to obtain a characteristic vector of the target object; splicing the characteristic vector and the initial depth value to form a spliced vector, and performing at least one stage of second convolution processing on the spliced vector to obtain a corrected value of the initial depth value; and obtaining the normalized absolute depth based on the corrected value of the initial depth value and the initial depth value.
In one possible implementation, the image processing method is applied to a pre-trained neural network, and the neural network comprises three branch networks, namely a target detection network, a key point detection network and a depth prediction network; the neural network comprises three branch networks respectively used for obtaining a target area of the target object, first two-dimensional position information and the relative depth of the target object, and the absolute depth.
Therefore, an end-to-end target object posture detection framework is formed through three branch networks of the target detection network, the key point detection network and the depth prediction network, the first image is processed based on the framework, three-dimensional position information of a plurality of key points of each target object in the first image in a camera coordinate system is obtained, the processing speed is higher, and the identification precision is higher.
In a second aspect, an embodiment of the present disclosure further provides an image processing apparatus, including: an identification module for identifying a target region of a target object in the first image; a first detection module, configured to determine, based on a target area corresponding to the target object, first two-dimensional position information of a plurality of key points respectively representing a posture of the target object in the first image, a relative depth of each key point with respect to a reference node of the target object, and an absolute depth of the reference node of the target object in a camera coordinate system; a second detection module, configured to determine three-dimensional position information of a plurality of key points of the target object in the camera coordinate system respectively based on the first two-dimensional position information, the relative depth, and the absolute depth of the target object.
In a possible embodiment, the identification module, when identifying a target region of a target object in the first image, is configured to: performing feature extraction on the first image to obtain a feature map of the first image; and determining a plurality of target bounding boxes from a plurality of candidate bounding boxes generated in advance based on the feature map, and determining a target area corresponding to the target object based on the target bounding boxes.
In a possible implementation, the identification module, when determining the target area corresponding to the target object based on the target bounding box, is configured to: determining a feature subgraph corresponding to each target bounding box based on a plurality of target bounding boxes and the feature graph; and performing border frame regression processing on the basis of the feature subgraphs respectively corresponding to the target border frames to obtain target areas corresponding to the target objects.
In a possible implementation, the first detection module, when determining an absolute depth of a reference node of the target object in a camera coordinate system based on a target area corresponding to the target object, is configured to: determining a target feature map corresponding to the target image based on a target area corresponding to the target object and the first image; performing depth recognition processing based on a target feature map corresponding to the target object to obtain normalized absolute depth of a reference node of the target object; and obtaining the absolute depth of the reference node of the target object in the camera coordinate system based on the normalized absolute depth and the parameter matrix of the camera.
In a possible implementation manner, when performing depth recognition processing based on a target feature map corresponding to the target object to obtain a normalized absolute depth of a reference node of the target object, the first detection module is configured to: acquiring an initial depth image based on the first image; the pixel value of any first pixel point in the initial depth image represents an initial depth value of a second pixel point corresponding to the first pixel point in the first image in the camera coordinate system; determining second two-dimensional position information of a reference node corresponding to the target object in the first image based on a target feature map corresponding to the target object, and determining an initial depth value of the reference node corresponding to the target object based on the second two-dimensional position information and the initial depth image; determining a normalized absolute depth of the reference node of the target object based on the initial depth value of the reference node corresponding to the target object and the target feature map corresponding to the target object.
In one possible embodiment, the first detection module, when determining the normalized absolute depth of the reference node of the target object based on the initial depth value of the reference node corresponding to the target object and the target feature map corresponding to the target object, is configured to: performing at least one stage of first convolution processing on a target characteristic diagram corresponding to the target object to obtain a characteristic vector of the target object; splicing the characteristic vector and the initial depth value to form a spliced vector, and performing at least one stage of second convolution processing on the spliced vector to obtain a corrected value of the initial depth value; and obtaining the normalized absolute depth based on the corrected value of the initial depth value and the initial depth value.
In one possible implementation, a pre-trained neural network is deployed in the image processing apparatus, and the neural network includes three branch networks, namely a target detection network, a key point detection network and a depth prediction network; the neural network comprises three branch networks respectively used for obtaining a target area of the target object, first two-dimensional position information and the relative depth of the target object, and the absolute depth.
In a third aspect, an embodiment of the present disclosure further provides a computer device, including: a processor and a memory coupled to each other, the memory storing machine-readable instructions executable by the processor, the machine-readable instructions being executable by the processor when a computer device is run to implement the steps of the image processing method of the first aspect described above, or any one of the possible implementations of the first aspect.
In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the foregoing first aspect, or the image processing method in any one of the possible implementation manners of the first aspect.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 shows a flowchart of an image processing method provided by an embodiment of the present disclosure;
FIG. 2 illustrates a flowchart of a particular method of identifying a target region of a target object in a first image provided by an embodiment of the present disclosure;
fig. 3 illustrates a specific example of determining a target area corresponding to a target object based on a target bounding box provided by the embodiment of the present disclosure;
FIG. 4 illustrates a flow chart of a particular method of determining an absolute depth of a reference node of a target object in a camera coordinate system provided by an embodiment of the present disclosure;
FIG. 5 is a flow chart illustrating another specific method for obtaining a normalized absolute depth of a reference node according to an embodiment of the present disclosure;
FIG. 6 illustrates a specific example of a target object pose detection framework provided by embodiments of the present disclosure;
FIG. 7 illustrates another specific example of a target object pose detection framework provided by embodiments of the present disclosure;
fig. 8 shows a schematic diagram of an image processing apparatus provided by an embodiment of the present disclosure;
fig. 9 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
The three-dimensional human body posture detection method generally comprises the steps of identifying first two-dimensional position information of human body key points in an image to be identified through a neural network, and converting the first two-dimensional position information of each human body key point into three-dimensional position information according to mutual position relations (such as connection relations among different key points, distance ranges among adjacent key points and the like) among the human body key points; however, human body shapes are complex and changeable, and the position relations of key points of human bodies corresponding to different human bodies are different, so that the three-dimensional human body posture obtained by the method has larger errors.
In addition, the accuracy of the current three-dimensional human body posture detection method is based on the accurate estimation of the human body key points, but due to the shielding of clothes, limbs and the like, the human body key points cannot be accurately identified from the image in many cases, and further the three-dimensional human body posture error obtained by the method is further enlarged.
The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Based on the above research, the present disclosure provides an image processing method and apparatus, by identifying a target area of a target object in a first image, and determining first two-dimensional position information of a plurality of key points representing a posture of the target object in the first image, a relative depth of each key point with respect to a reference node of the target object, and an absolute depth of the reference node of the target object in a camera coordinate system based on the target area, thereby more accurately obtaining three-dimensional position information of the plurality of key points of the target object in the camera coordinate system based on the first two-dimensional position information, the relative depth, and the absolute depth of the target object.
To facilitate understanding of the present embodiment, first, an image processing method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the image processing method provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and the computer device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the image processing method may be implemented by a processor calling computer readable instructions stored in a memory.
The following describes an image processing method provided by an embodiment of the present disclosure, taking an execution subject as a terminal device as an example.
Referring to fig. 1, a flowchart of an image processing method provided by the embodiment of the present disclosure is shown, where the method includes steps S101 to S103, where:
s101: identifying a target region of a target object in the first image;
s102: determining first two-dimensional position information of a plurality of key points representing the target object posture in the first image respectively, the relative depth of each key point relative to a reference node of the target object and the absolute depth of the reference node of the target object in a camera coordinate system based on a target area corresponding to the target object;
s103: determining three-dimensional position information of a plurality of key points of the target object in the camera coordinate system respectively based on the first two-dimensional position information, the relative depth, and the absolute depth of the target object.
The following describes each of the above-mentioned steps S101 to S103 in detail.
I: in the above S101, at least one target object is included in the first image. The target object includes, for example, a person, an animal, a robot, a vehicle, or the like, which needs to be posed.
In a possible embodiment, when there is more than one target object included in the first image, the categories of the different target objects may be the same or different; for example, the plurality of target objects are all humans; or the plurality of target objects are all vehicles. As another example, the target object in the first image includes: humans and animals; or the target objects in the first image comprise people and vehicles, and the target object category is determined according to the actual application scene requirements.
The target region of the target object is a region in the first image including the target object.
Illustratively, referring to fig. 2, an embodiment of the present disclosure provides a specific method for identifying a target area of a target object in a first image, including:
s201: and performing feature extraction on the first image to obtain a feature map of the first image.
Here, feature extraction may be performed on the first image using, for example, a neural network to obtain a feature map of the first image.
S202: and determining a plurality of target bounding boxes from a plurality of candidate bounding boxes generated in advance based on the feature map, and determining a target area corresponding to the target object based on the target bounding boxes.
In a specific implementation, for example, a bounding box prediction algorithm may be used to obtain a plurality of target bounding boxes. The bounding box prediction algorithm includes, for example, roilign, ROI-posing, etc., where, for example, roilign may traverse a plurality of candidate bounding boxes generated in advance, and determine that a sub-image corresponding to each candidate bounding box belongs to a region of interest (ROI) value of any target object in the first image, where the higher the ROI value is, the higher the probability that the sub-image corresponding to the candidate bounding box belongs to a certain target object is; after the ROI value corresponding to each candidate bounding box is determined, a plurality of target bounding boxes are determined from the candidate bounding boxes according to the sequence of the ROI values corresponding to the candidate bounding boxes from large to small.
The target bounding box is, for example, rectangular; the information of the target bounding box includes, for example: coordinates of any vertex in the target bounding box in the first image, and a height value and a width value of the target bounding box. Alternatively, the information of the target bounding box includes, for example: coordinates of any vertex in the target bounding box in the feature map of the first image, and a height value and a width value of the target bounding box.
After the plurality of target bounding boxes are obtained, target areas corresponding to all target objects in the first image are determined based on the plurality of target bounding boxes.
Referring to fig. 3, an embodiment of the present disclosure provides a specific example of determining a target area corresponding to a target object based on a target bounding box, where the specific example includes:
s301: and determining a characteristic subgraph corresponding to each target bounding box based on the plurality of target bounding boxes and the characteristic graph.
In specific implementation, under the condition that the information of the target boundary box includes the coordinates of any vertex on the target boundary box in the first image, and the height value and the width value of the target boundary box, the feature points in the feature map and the pixel points in the first image have a certain position mapping relationship; and determining the characteristic subgraph corresponding to each target boundary frame from the characteristic graph of the first image according to the relevant information of the target boundary frame and the mapping relation between the characteristic graph and the first image.
In the case that the information of the target bounding box includes the coordinates of any vertex in the target bounding box in the feature map of the first image, and the height value and the width value of the target bounding box, the feature subgraphs respectively corresponding to the target bounding boxes can be determined from the feature map of the first image directly based on the target bounding box.
S302: and performing border frame regression processing on the basis of the feature subgraphs respectively corresponding to the target border frames to obtain target areas corresponding to the target objects.
Here, for example, a bounding box regression algorithm may be used to perform bounding box regression processing on the target bounding box based on the feature subgraph corresponding to each target bounding box, so as to obtain multiple bounding boxes including the complete target object. Each of the plurality of bounding boxes corresponds to a target object, and the region determined based on the bounding box corresponding to the target object is the target region of the corresponding target object.
At this time, the number of the obtained target areas is consistent with the number of the target objects in the first image, and each target object corresponds to one target area; if the different target objects have a mutual occlusion positional relationship, the target areas corresponding to the target objects having the mutual occlusion relationship have a certain overlapping degree.
In another embodiment of the present disclosure, other target detection algorithms may also be employed to detect a target region of a target object in the first image. For example, a semantic segmentation algorithm is adopted to determine a semantic segmentation result of each pixel point in the first image, and then the positions of the pixel points belonging to different target objects in the first image are determined according to the semantic segmentation result; and then, solving a minimum bounding box according to pixel points belonging to the same target object, and determining an area corresponding to the minimum bounding box as a target area of the target object.
II: in the above S102, the image coordinate system refers to a two-dimensional coordinate system established in both the length and width directions of the first image; the camera coordinate system is a three-dimensional coordinate system established in the direction of the optical axis of the camera and in two directions parallel to the optical axis and in the plane of the optical center of the camera.
The key points of the target object are position points which are positioned on the target object and can represent the posture of the target object after being connected according to a certain sequence; for example, when the target object is a human body, the key points include, for example, position points where respective joints of the human body are located. The position point is represented as a two-dimensional coordinate value in an image coordinate system; in the camera coordinate system, three-dimensional coordinate values are expressed.
In a specific implementation, for example, a keypoint detection network may be used to perform keypoint detection processing based on a target feature map of a target object, so as to obtain two-dimensional position information of a plurality of keypoints of the target object in a first image respectively, and a relative depth of each keypoint with respect to a reference node of the target object. Here, the manner of obtaining the target feature map may refer to the following description of S401, which is not described herein again.
The reference node is, for example, a position point where a certain portion is predetermined on the target object. For example, the reference node may be predetermined according to actual needs; for example, when the target object is a human body, the position point where the pelvis of the human body is located may be determined as a reference node, or any key point on the human body may be determined as a reference node, or the position point where the center of the chest and abdomen of the human body is located may be determined as a reference node; the specific conditions can be set as required.
Referring to fig. 4, an embodiment of the present disclosure provides a specific method for determining an absolute depth of a reference node of a target object in a camera coordinate system based on a target area corresponding to the target object, including:
s401: and determining a target feature map corresponding to the target image based on the target area corresponding to the target object and the first image.
Here, the target feature map may be determined from the feature map based on the feature map of the first image obtained by feature extraction of the first image and the target region, for example.
Here, the feature points in the feature map extracted for the first image and the pixel points in the first image have a certain position mapping relationship; after the target area of each target object is obtained, the position of each target object in the feature map of the first image can be determined according to the position mapping relation, and then the target feature map of each target object is intercepted from the feature map of the first image.
S402: and executing depth recognition processing based on the target feature map corresponding to the target object to obtain the normalized absolute depth of the reference node of the target object.
Here, in a possible implementation, for example, a depth prediction network trained in advance may be used to perform a depth detection process on the target feature map, so as to obtain a normalized absolute depth of the reference node of the target object.
In another embodiment of the present disclosure, referring to fig. 5, another specific method for obtaining a normalized absolute depth of a reference node is further provided, including:
s501: acquiring an initial depth image based on the first image; the pixel value of any first pixel point in the initial depth image represents an initial depth value of a second pixel point corresponding to the first pixel point in the first image in the camera coordinate system.
Here, the method may employ a depth prediction network to determine an initial depth value of each pixel point (second pixel point) in the first image; the initial depth value of each first pixel point forms an initial depth image of the first image; the pixel value of any pixel point (first pixel point) in the initial depth image is the initial depth value of the pixel point (second pixel point) at the corresponding position in the first image.
S502: and determining second two-dimensional position information of the reference node corresponding to the target object in the first image based on the target feature map corresponding to the target object, and determining an initial depth value of the reference node corresponding to the target object based on the second two-dimensional position information and the initial depth image.
Here, the target feature map corresponding to the target object may be a target feature map determined for each target object from the feature map of the first image, for example, based on a target region corresponding to each target object.
After the target feature maps corresponding to the target objects are obtained, for example, a pre-trained reference node detection network may be used to determine second two-dimensional position information of the reference nodes of the target objects in the first image based on the target feature maps. Then, a pixel point corresponding to the reference node is determined from the initial depth image by using the second two-dimensional position information, and a pixel value of the pixel point determined from the initial depth image is determined as an initial depth value of the reference node.
S503: determining a normalized absolute depth of the reference node of the target object based on the initial depth value of the reference node corresponding to the target object and the target feature map corresponding to the target object.
For example, at least one stage of first convolution processing may be performed on a target feature map corresponding to the target object to obtain a feature vector of the target object; splicing the characteristic vector and the initial depth value to form a spliced vector, and performing at least one stage of second convolution processing on the spliced vector to obtain a corrected value of the initial depth value; and obtaining the normalized absolute depth based on the corrected value of the initial depth value and the initial depth value.
Here, for example, a neural network for adjusting the initial depth value may be adopted, the neural network including a plurality of convolutional layers; wherein, part of the plurality of convolution layers are used for carrying out at least one stage of first convolution processing on the target characteristic diagram; the other convolution layers are used for performing at least one stage of second convolution processing on the splicing vector so as to obtain the correction value; and then, adjusting the initial depth value according to the correction value to obtain the normalized depth of the reference node of the target object.
With reference to the foregoing S402, the specific method for determining the absolute depth of the reference node of the target object in the camera coordinate system provided by the embodiment of the present disclosure further includes:
s403: and obtaining the absolute depth of the reference node of the target object in the camera coordinate system based on the normalized absolute depth and the parameter matrix of the camera.
In a specific implementation, different first images may be captured by different cameras during image processing of the different first images; for different cameras, the corresponding camera parameters may be different; here, the camera internal reference includes, for example: the focal length of the camera on the x-axis, the focal length of the camera on the y-axis, and the coordinates of the optical center of the camera on the x-axis and the y-axis in the camera coordinate system.
The camera internal parameters are different, and even first images acquired at the same view angle and the same position can be distinguished; if the absolute depth of the reference node is predicted directly based on the target feature map, the absolute depth obtained for different first images obtained by different cameras at the same view angle and the same position is different.
To avoid the above situation, embodiments of the present disclosure directly predict the normalized depth of the reference node, which is obtained without considering the camera internal parameters; and then recovering the absolute depth of the reference node according to the camera internal reference and the normalized absolute depth.
Illustratively, the normalized absolute depth, and the absolute depth of the reference node of any target object satisfy the following formula (1):
Figure BDA0002490419640000101
wherein,
Figure BDA0002490419640000102
representing a normalized absolute depth of a reference node;
Figure BDA0002490419640000103
representing the reference node absolute depth; a. theBoxRepresenting the area of the target region; a. theRoIRepresenting objectsArea of the bounding box.
(fx,fy) Representing the camera focal length. Illustratively, the camera coordinate system is a three-dimensional coordinate system; the device comprises three coordinate axes of x, y and z; the origin of the camera coordinate system is the optical center of the camera; the optical axis of the camera is the z-axis of the camera coordinate system; the plane where the optical center is located and perpendicular to the z axis is the plane where the x axis and the y axis are located; f. ofxIs the focal length of the camera in the x-axis; f. ofyIs the focal length of the camera in the y-axis.
It should be noted here that, in the above S202, there are a plurality of target bounding boxes determined by roiallign; and the areas of the target bounding boxes are all equal.
Since the focal length of the camera is already determined when the camera acquires the first image, and the target area and the target bounding box are also already determined when the target area is determined, after the normalized absolute depth of the reference node is obtained, the absolute depth of the reference node of the target object is obtained according to the above formula (1).
III: in the above S103, it is assumed that each target object includes J key points, and there are N target objects in the first image; wherein the three-dimensional poses of the N target objects are represented as:
Figure BDA0002490419640000104
wherein the three-dimensional posture of the mth target object
Figure BDA0002490419640000105
Can be expressed as:
Figure BDA0002490419640000106
wherein,
Figure BDA0002490419640000107
a coordinate value of a jth key point representing the mth target object in the x-axis direction in the camera coordinate system;
Figure BDA0002490419640000108
a coordinate value of a jth key point representing the mth target object in the y-axis direction in the camera coordinate system;
Figure BDA0002490419640000109
and a coordinate value of the jth key point representing the mth target object in the z-axis direction in the camera coordinate system.
The target areas of the N target objects are represented as:
Figure BDA00024904196400001010
wherein the target area of the mth target object
Figure BDA00024904196400001011
Expressed as:
Figure BDA00024904196400001012
here, the number of the first and second electrodes,
Figure BDA00024904196400001013
and
Figure BDA00024904196400001014
a coordinate value indicating a vertex at which an upper left corner of the target region is located;
Figure BDA00024904196400001015
and
Figure BDA00024904196400001016
respectively representing the width and height values of the target area.
The three-dimensional poses of the N target objects relative to the reference node are represented as:
Figure BDA00024904196400001017
wherein the three-dimensional posture of the mth target object relative to the reference node
Figure BDA0002490419640000111
Expressed as:
Figure BDA0002490419640000112
wherein,
Figure BDA0002490419640000113
a coordinate value of the j-th key point representing the m-th target object on the x-axis in the image coordinate system;
Figure BDA0002490419640000114
a coordinate value of a y axis in an image coordinate system of a jth key point representing the mth target object; that is to say that the first and second electrodes,
Figure BDA0002490419640000115
and the two-dimensional coordinate value of the jth key point of the mth target object in the image coordinate system is represented.
Figure BDA0002490419640000116
Representing the relative depth of the jth node of the mth target object with respect to the reference node of the mth target object.
Obtaining the three-dimensional posture of the mth target object by back projection by using the internal reference matrix K of the camera, wherein the three-dimensional coordinate information of the jth node of the mth target object satisfies the following formula (2)
Figure BDA0002490419640000117
Wherein,
Figure BDA0002490419640000118
an absolute depth value of a reference node representing the mth target object in the camera coordinate system. Here, it is to be noted that
Figure BDA0002490419640000119
Obtained on the basis of the corresponding example of the above formula (1).
The internal reference matrix K is, for example: (f)x,fy,cx,cy);
Wherein:fxis the focal length of the camera on the x-axis in the camera coordinate system; f. ofyIs the focal length of the camera on the y-axis in the camera coordinate system; c. CxIs a coordinate value of the optical center of the camera on the x-axis in a camera coordinate system; c. CyAnd a coordinate value representing the optical center of the camera on the y-axis in the camera coordinate system.
Through the process, the three-dimensional position information of the plurality of key points of the target object in the camera coordinate system can be obtained; and for the mth target object, representing the three-dimensional posture of the mth target object by the three-dimensional position information corresponding to the J key points of the target object respectively.
The embodiment of the disclosure identifies the target area of the target object in the first image, and determines first two-dimensional position information of a plurality of key points representing the posture of the target object in the first image, the relative depth of each key point relative to the reference node of the target object, and the absolute depth of the reference node of the target object in the camera coordinate system based on the target area, so as to more accurately obtain the three-dimensional position information of the plurality of key points of the target object in the camera coordinate system based on the first two-dimensional position information, the relative depth, and the absolute depth of the target object.
In another embodiment of the present disclosure, another image processing method is further provided, where the image processing method is applied to a pre-trained neural network.
The neural network comprises a target detection network, a key point detection network and a depth prediction network; the neural network comprises three branch networks respectively used for obtaining a target area of the target object, first two-dimensional position information and the relative depth of the target object, and the absolute depth.
The specific working processes of the three branch networks can be shown in the above embodiments, and are not described herein again.
According to the method and the device, an end-to-end target object posture detection framework is formed through three branch networks, namely the target detection network, the key point detection network and the depth prediction network, the first image is processed based on the framework, three-dimensional position information of a plurality of key points of each target object in the first image in a camera coordinate system is obtained, the processing speed is higher, and the recognition precision is higher.
Referring to fig. 6, an embodiment of the present disclosure further provides a specific example of a target object posture detection framework, including:
the method comprises three network branches of a target detection network, a key point detection network and a depth prediction network;
the target detection network extracts the features of the first image to obtain a feature map of the first image; then, according to the first feature map, determining a plurality of target bounding boxes from a plurality of candidate bounding boxes generated in advance by adopting RoIAlign; and executing the bounding box regression processing on the plurality of target bounding boxes to obtain a target area corresponding to each target object. And transmitting the target characteristic graph corresponding to the target area to a key point detection network and a depth prediction network.
And the key point detection network determines first two-dimensional position information of a plurality of key points representing the target object posture in the first image and the relative depth of each key point relative to a reference node of the target object based on the target feature map. And aiming at the first two-dimensional position information and the relative depth of each key point in each target feature map, forming the three-dimensional posture of the target object in the target feature map. The three-dimensional posture at this time is a three-dimensional posture with reference to itself.
And the depth prediction network determines the absolute depth of the reference node of the target object in the camera coordinate system based on the target feature map.
And finally, determining three-dimensional position information of a plurality of key points of the target object in the camera coordinate system respectively according to the first two-dimensional position information, the relative depth and the absolute depth of the reference node of the target object. For each target object, the three-dimensional position information of the plurality of key points on the target object in the camera coordinate system respectively constitutes the three-dimensional posture of the target object in the camera coordinate system. The three-dimensional posture at this time is a three-dimensional posture with reference to the camera.
Referring to fig. 7, a specific example of another target object posture detection framework is further provided in the embodiments of the present disclosure, including:
a target detection network, a key point detection network and a depth prediction network;
the target detection network extracts the features of the first image to obtain a feature map of the first image; then, according to the first feature map, determining a plurality of target bounding boxes from a plurality of candidate bounding boxes generated in advance by adopting RoIAlign; and executing the bounding box regression processing on the plurality of target bounding boxes to obtain a target area corresponding to each target object. And transmitting the target characteristic graph corresponding to the target area to a key point detection network and a depth prediction network.
And the key point detection network determines first two-dimensional position information of a plurality of key points representing the target object posture in the first image and the relative depth of each key point relative to a reference node of the target object based on the target feature map. And aiming at the first two-dimensional position information and the relative depth of each key point in each target feature map, forming the three-dimensional posture of the target object in the target feature map. The three-dimensional posture at this time is a three-dimensional posture with reference to itself.
The depth prediction network acquires an initial depth image based on the first image; determining second two-dimensional position information of a reference node corresponding to the target object in the first image based on a target feature map corresponding to the target object, and determining an initial depth value of the reference node corresponding to the target object based on the second two-dimensional position information and the initial depth image; performing at least one-stage first convolution processing on a target characteristic graph corresponding to a target object to obtain a characteristic vector of the target object; splicing the feature vector and the initial depth value of the reference node to form a spliced vector, and performing at least one stage of second convolution processing on the spliced vector to obtain a corrected value of the initial depth value; and adding the corrected value and the initial depth value of the reference node to obtain the normalized absolute depth value of the reference node.
Then, the absolute depth value of the reference node is recovered through the formula (1), and finally, the three-dimensional position information of the plurality of key points of the target object in the camera coordinate system is determined according to the first two-dimensional position information, the relative depth of the target object and the absolute depth of the reference node. For each target object, the three-dimensional position information of the plurality of key points on the target object in the camera coordinate system respectively constitutes the three-dimensional posture of the target object in the camera coordinate system. The three-dimensional posture at this time is a three-dimensional posture with reference to the camera.
Through any one of the two target object posture detection frameworks, the three-dimensional position information of the plurality of key points of each target object in the first image in the camera coordinate system can be obtained, the processing speed is higher, and the identification precision is higher.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, an image processing apparatus corresponding to the image processing method is also provided in the embodiments of the present disclosure, and since the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to the image processing method described above in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.
Referring to fig. 8, a schematic diagram of an image processing apparatus provided in an embodiment of the present disclosure is shown, where the apparatus includes: an identification module 81, a first detection module 82, a second detection module 83; wherein,
an identification module 81 for identifying a target area of a target object in the first image;
a first detection module 82, configured to determine, based on a target area corresponding to the target object, first two-dimensional position information of a plurality of key points respectively representing a posture of the target object in the first image, a relative depth of each key point with respect to a reference node of the target object, and an absolute depth of the reference node of the target object in a camera coordinate system;
a second detecting module 83, configured to determine three-dimensional position information of a plurality of key points of the target object in the camera coordinate system respectively based on the first two-dimensional position information, the relative depth, and the absolute depth of the target object.
In one possible embodiment, the identification module 81, when identifying a target region of a target object in the first image, is configured to:
performing feature extraction on the first image to obtain a feature map of the first image;
and determining a plurality of target bounding boxes from a plurality of candidate bounding boxes generated in advance based on the feature map, and determining a target area corresponding to the target object based on the target bounding boxes.
In a possible implementation manner, the identifying module 81, when determining the target area corresponding to the target object based on the target bounding box, is configured to:
determining a feature subgraph corresponding to each target bounding box based on a plurality of target bounding boxes and the feature graph;
and performing border frame regression processing on the basis of the feature subgraphs respectively corresponding to the target border frames to obtain target areas corresponding to the target objects.
In a possible implementation, the first detection module 82, when determining the absolute depth of the reference node of the target object in the camera coordinate system based on the target area corresponding to the target object, is configured to:
determining a target feature map corresponding to the target image based on a target area corresponding to the target object and the first image;
performing depth recognition processing based on a target feature map corresponding to the target object to obtain normalized absolute depth of a reference node of the target object;
and obtaining the absolute depth of the reference node of the target object in the camera coordinate system based on the normalized absolute depth and the parameter matrix of the camera.
In a possible implementation manner, when performing depth recognition processing based on a target feature map corresponding to the target object to obtain a normalized absolute depth of a reference node of the target object, the first detection module 82 is configured to:
acquiring an initial depth image based on the first image; the pixel value of any first pixel point in the initial depth image represents an initial depth value of a second pixel point corresponding to the first pixel point in the first image in the camera coordinate system;
determining second two-dimensional position information of a reference node corresponding to the target object in the first image based on a target feature map corresponding to the target object, and determining an initial depth value of the reference node corresponding to the target object based on the second two-dimensional position information and the initial depth image;
determining a normalized absolute depth of the reference node of the target object based on the initial depth value of the reference node corresponding to the target object and the target feature map corresponding to the target object.
In one possible implementation, the first detection module 82, when determining the normalized absolute depth of the reference node of the target object based on the initial depth value of the reference node corresponding to the target object and the target feature map corresponding to the target object, is configured to:
performing at least one stage of first convolution processing on a target characteristic diagram corresponding to the target object to obtain a characteristic vector of the target object;
splicing the characteristic vector and the initial depth value to form a spliced vector, and performing at least one stage of second convolution processing on the spliced vector to obtain a corrected value of the initial depth value;
and obtaining the normalized absolute depth based on the corrected value of the initial depth value and the initial depth value.
In one possible implementation, a pre-trained neural network is deployed in the image processing apparatus, and the neural network includes three branch networks, namely a target detection network, a key point detection network and a depth prediction network; the neural network comprises three branch networks respectively used for obtaining a target area of the target object, first two-dimensional position information and the relative depth of the target object, and the absolute depth.
The embodiment of the disclosure identifies the target area of the target object in the first image, and determines first two-dimensional position information of a plurality of key points representing the posture of the target object in the first image, the relative depth of each key point relative to the reference node of the target object, and the absolute depth of the reference node of the target object in the camera coordinate system based on the target area, so as to more accurately obtain the three-dimensional position information of the plurality of key points of the target object in the camera coordinate system based on the first two-dimensional position information, the relative depth, and the absolute depth of the target object.
In addition, according to the embodiment of the disclosure, an end-to-end target object posture detection frame is formed through three branch networks, namely, a target detection network, a key point detection network and a depth prediction network, the first image is processed based on the frame, three-dimensional position information of a plurality of key points of each target object in the first image in a camera coordinate system is obtained, the processing speed is higher, and the recognition accuracy is higher.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
The embodiment of the present disclosure further provides a computer device 10, as shown in fig. 9, which is a schematic structural diagram of the computer device 10 provided in the embodiment of the present disclosure, and includes:
a processor 11 and a memory 12; the memory 12 stores machine-readable instructions executable by the processor 11, which when executed by a computer device are executed by the processor to perform the steps of:
identifying a target region of a target object in the first image;
determining first two-dimensional position information of a plurality of key points representing the target object posture in the first image respectively, the relative depth of each key point relative to a reference node of the target object and the absolute depth of the reference node of the target object in a camera coordinate system based on a target area corresponding to the target object;
determining three-dimensional position information of a plurality of key points of the target object in the camera coordinate system respectively based on the first two-dimensional position information, the relative depth, and the absolute depth of the target object.
For the specific execution process of the instruction, reference may be made to the steps of the image processing method described in the embodiments of the present disclosure, and details are not described here.
The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the image processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The computer program product of the image processing method provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute steps of the image processing method described in the above method embodiments, which may be referred to specifically for the above method embodiments, and are not described herein again.
The embodiments of the present disclosure also provide a computer program, which when executed by a processor implements any one of the methods of the foregoing embodiments. The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (10)

1. An image processing method, comprising:
identifying a target region of a target object in the first image;
determining first two-dimensional position information of a plurality of key points representing the target object posture in the first image respectively, the relative depth of each key point relative to a reference node of the target object and the absolute depth of the reference node of the target object in a camera coordinate system based on a target area corresponding to the target object;
determining three-dimensional position information of a plurality of key points of the target object in the camera coordinate system respectively based on the first two-dimensional position information, the relative depth, and the absolute depth of the target object.
2. The image processing method of claim 1, wherein the identifying a target region of a target object in the first image comprises:
performing feature extraction on the first image to obtain a feature map of the first image;
and determining a plurality of target bounding boxes from a plurality of candidate bounding boxes generated in advance based on the feature map, and determining a target area corresponding to the target object based on the target bounding boxes.
3. The image processing method according to claim 2, wherein the determining a target region corresponding to the target object based on the target bounding box comprises:
determining a feature subgraph corresponding to each target bounding box based on a plurality of target bounding boxes and the feature graph;
and performing border frame regression processing on the basis of the feature subgraphs respectively corresponding to the target border frames to obtain target areas corresponding to the target objects.
4. The image processing method according to any one of claims 1 to 3, wherein determining the absolute depth of the reference node of the target object in the camera coordinate system based on the target region corresponding to the target object comprises:
determining a target feature map corresponding to the target image based on a target area corresponding to the target object and the first image;
performing depth recognition processing based on a target feature map corresponding to the target object to obtain normalized absolute depth of a reference node of the target object;
and obtaining the absolute depth of the reference node of the target object in the camera coordinate system based on the normalized absolute depth and the parameter matrix of the camera.
5. The image processing method according to claim 4, wherein the performing depth recognition processing based on the target feature map corresponding to the target object to obtain a normalized absolute depth of the reference node of the target object includes:
acquiring an initial depth image based on the first image; the pixel value of any first pixel point in the initial depth image represents an initial depth value of a second pixel point corresponding to the first pixel point in the first image in the camera coordinate system;
determining second two-dimensional position information of a reference node corresponding to the target object in the first image based on a target feature map corresponding to the target object, and determining an initial depth value of the reference node corresponding to the target object based on the second two-dimensional position information and the initial depth image;
determining a normalized absolute depth of the reference node of the target object based on the initial depth value of the reference node corresponding to the target object and the target feature map corresponding to the target object.
6. The method according to claim 5, wherein determining a normalized absolute depth of the reference node of the target object based on the initial depth value of the reference node corresponding to the target object and the target feature map corresponding to the target object comprises:
performing at least one stage of first convolution processing on a target characteristic diagram corresponding to the target object to obtain a characteristic vector of the target object;
splicing the characteristic vector and the initial depth value to form a spliced vector, and performing at least one stage of second convolution processing on the spliced vector to obtain a corrected value of the initial depth value;
and obtaining the normalized absolute depth based on the corrected value of the initial depth value and the initial depth value.
7. The image processing method according to any one of claims 1 to 6, wherein the image processing method is applied to a pre-trained neural network, and the neural network comprises three branch networks, namely a target detection network, a key point detection network and a depth prediction network; the neural network comprises three branch networks respectively used for obtaining a target area of the target object, first two-dimensional position information and the relative depth of the target object, and the absolute depth.
8. An image processing apparatus characterized by comprising:
an identification module for identifying a target region of a target object in the first image;
a first detection module, configured to determine, based on a target area corresponding to the target object, first two-dimensional position information of a plurality of key points respectively representing a posture of the target object in the first image, a relative depth of each key point with respect to a reference node of the target object, and an absolute depth of the reference node of the target object in a camera coordinate system;
a second detection module, configured to determine three-dimensional position information of a plurality of key points of the target object in the camera coordinate system respectively based on the first two-dimensional position information, the relative depth, and the absolute depth of the target object.
9. A computer device, comprising: an interconnected processor and memory, the memory storing machine-readable instructions executable by the processor, the machine-readable instructions being executable by the processor to implement the steps of the image processing method as claimed in any one of claims 1 to 7 when executed by a computer device.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the image processing method according to any one of claims 1 to 9.
CN202010403620.5A 2020-05-13 2020-05-13 Image processing method, device, electronic equipment and storage medium Active CN111582207B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202010403620.5A CN111582207B (en) 2020-05-13 2020-05-13 Image processing method, device, electronic equipment and storage medium
PCT/CN2021/084625 WO2021227694A1 (en) 2020-05-13 2021-03-31 Image processing method and apparatus, electronic device, and storage medium
TW110115664A TWI777538B (en) 2020-05-13 2021-04-29 Image processing method, electronic device and computer-readable storage media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010403620.5A CN111582207B (en) 2020-05-13 2020-05-13 Image processing method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111582207A true CN111582207A (en) 2020-08-25
CN111582207B CN111582207B (en) 2023-08-15

Family

ID=72110786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010403620.5A Active CN111582207B (en) 2020-05-13 2020-05-13 Image processing method, device, electronic equipment and storage medium

Country Status (3)

Country Link
CN (1) CN111582207B (en)
TW (1) TWI777538B (en)
WO (1) WO2021227694A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163480A (en) * 2020-09-16 2021-01-01 北京邮电大学 Behavior identification method and device
CN112528831A (en) * 2020-12-07 2021-03-19 深圳市优必选科技股份有限公司 Multi-target attitude estimation method, multi-target attitude estimation device and terminal equipment
CN112907517A (en) * 2021-01-28 2021-06-04 上海商汤智能科技有限公司 Image processing method and device, computer equipment and storage medium
CN113344998A (en) * 2021-06-25 2021-09-03 北京市商汤科技开发有限公司 Depth detection method and device, computer equipment and storage medium
CN113470112A (en) * 2021-06-30 2021-10-01 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and terminal
CN113610967A (en) * 2021-08-13 2021-11-05 北京市商汤科技开发有限公司 Three-dimensional point detection method and device, electronic equipment and storage medium
CN113610966A (en) * 2021-08-13 2021-11-05 北京市商汤科技开发有限公司 Three-dimensional attitude adjustment method and device, electronic equipment and storage medium
WO2021227694A1 (en) * 2020-05-13 2021-11-18 北京市商汤科技开发有限公司 Image processing method and apparatus, electronic device, and storage medium
CN113743234A (en) * 2021-08-11 2021-12-03 浙江大华技术股份有限公司 Target action determining method, target action counting method and electronic device
US20220076448A1 (en) * 2020-09-08 2022-03-10 Samsung Electronics Co., Ltd. Method and apparatus for pose identification
CN114764860A (en) * 2021-01-14 2022-07-19 北京图森智途科技有限公司 Feature extraction method and device, computer equipment and storage medium
CN116386016A (en) * 2023-05-22 2023-07-04 杭州睿影科技有限公司 Foreign matter treatment method and device, electronic equipment and storage medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB202009515D0 (en) 2020-06-22 2020-08-05 Ariel Ai Ltd 3D object model reconstruction from 2D images
GB2598452B (en) * 2020-06-22 2024-01-10 Snap Inc 3D object model reconstruction from 2D images
CN114354618A (en) * 2021-12-16 2022-04-15 浙江大华技术股份有限公司 Method and device for detecting welding seam
CN114782547B (en) * 2022-04-13 2024-08-20 北京爱笔科技有限公司 Three-dimensional coordinate determination method and device
CN115063789B (en) * 2022-05-24 2023-08-04 中国科学院自动化研究所 3D target detection method and device based on key point matching
WO2023236008A1 (en) * 2022-06-06 2023-12-14 Intel Corporation Methods and apparatus for small object detection in images and videos
CN114972958B (en) * 2022-07-27 2022-10-04 北京百度网讯科技有限公司 Key point detection method, neural network training method, device and equipment
CN115018918B (en) * 2022-08-04 2022-11-04 南昌虚拟现实研究院股份有限公司 Three-dimensional coordinate determination method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120099782A1 (en) * 2010-10-20 2012-04-26 Samsung Electronics Co., Ltd. Image processing apparatus and method
US20180018503A1 (en) * 2015-12-11 2018-01-18 Tencent Technology (Shenzhen) Company Limited Method, terminal, and storage medium for tracking facial critical area
CN107871134A (en) * 2016-09-23 2018-04-03 北京眼神科技有限公司 A kind of method for detecting human face and device
CN108460338A (en) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 Estimation method of human posture and device, electronic equipment, storage medium, program
CN110378308A (en) * 2019-07-25 2019-10-25 电子科技大学 The improved harbour SAR image offshore Ship Detection based on Faster R-CNN

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472753B (en) * 2018-10-30 2021-09-07 北京市商汤科技开发有限公司 Image processing method and device, computer equipment and computer storage medium
CN111582207B (en) * 2020-05-13 2023-08-15 北京市商汤科技开发有限公司 Image processing method, device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120099782A1 (en) * 2010-10-20 2012-04-26 Samsung Electronics Co., Ltd. Image processing apparatus and method
US20180018503A1 (en) * 2015-12-11 2018-01-18 Tencent Technology (Shenzhen) Company Limited Method, terminal, and storage medium for tracking facial critical area
CN107871134A (en) * 2016-09-23 2018-04-03 北京眼神科技有限公司 A kind of method for detecting human face and device
CN108460338A (en) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 Estimation method of human posture and device, electronic equipment, storage medium, program
CN110378308A (en) * 2019-07-25 2019-10-25 电子科技大学 The improved harbour SAR image offshore Ship Detection based on Faster R-CNN

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021227694A1 (en) * 2020-05-13 2021-11-18 北京市商汤科技开发有限公司 Image processing method and apparatus, electronic device, and storage medium
US12051221B2 (en) * 2020-09-08 2024-07-30 Samsung Electronics Co., Ltd. Method and apparatus for pose identification
US20220076448A1 (en) * 2020-09-08 2022-03-10 Samsung Electronics Co., Ltd. Method and apparatus for pose identification
CN112163480A (en) * 2020-09-16 2021-01-01 北京邮电大学 Behavior identification method and device
CN112528831A (en) * 2020-12-07 2021-03-19 深圳市优必选科技股份有限公司 Multi-target attitude estimation method, multi-target attitude estimation device and terminal equipment
CN112528831B (en) * 2020-12-07 2023-11-24 深圳市优必选科技股份有限公司 Multi-target attitude estimation method, multi-target attitude estimation device and terminal equipment
CN114764860A (en) * 2021-01-14 2022-07-19 北京图森智途科技有限公司 Feature extraction method and device, computer equipment and storage medium
CN112907517A (en) * 2021-01-28 2021-06-04 上海商汤智能科技有限公司 Image processing method and device, computer equipment and storage medium
CN113344998A (en) * 2021-06-25 2021-09-03 北京市商汤科技开发有限公司 Depth detection method and device, computer equipment and storage medium
WO2022267275A1 (en) * 2021-06-25 2022-12-29 北京市商汤科技开发有限公司 Depth detection method, apparatus and device, storage medium, computer program and product
CN113344998B (en) * 2021-06-25 2022-04-29 北京市商汤科技开发有限公司 Depth detection method and device, computer equipment and storage medium
CN113470112A (en) * 2021-06-30 2021-10-01 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and terminal
CN113743234A (en) * 2021-08-11 2021-12-03 浙江大华技术股份有限公司 Target action determining method, target action counting method and electronic device
WO2023015903A1 (en) * 2021-08-13 2023-02-16 上海商汤智能科技有限公司 Three-dimensional pose adjustment method and apparatus, electronic device, and storage medium
CN113610966A (en) * 2021-08-13 2021-11-05 北京市商汤科技开发有限公司 Three-dimensional attitude adjustment method and device, electronic equipment and storage medium
CN113610967B (en) * 2021-08-13 2024-03-26 北京市商汤科技开发有限公司 Three-dimensional point detection method, three-dimensional point detection device, electronic equipment and storage medium
CN113610967A (en) * 2021-08-13 2021-11-05 北京市商汤科技开发有限公司 Three-dimensional point detection method and device, electronic equipment and storage medium
CN116386016A (en) * 2023-05-22 2023-07-04 杭州睿影科技有限公司 Foreign matter treatment method and device, electronic equipment and storage medium
CN116386016B (en) * 2023-05-22 2023-10-10 杭州睿影科技有限公司 Foreign matter treatment method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2021227694A1 (en) 2021-11-18
TWI777538B (en) 2022-09-11
CN111582207B (en) 2023-08-15
TW202143100A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
CN111582207B (en) Image processing method, device, electronic equipment and storage medium
CN110135455B (en) Image matching method, device and computer readable storage medium
CN110648397B (en) Scene map generation method and device, storage medium and electronic equipment
WO2021004416A1 (en) Method and apparatus for establishing beacon map on basis of visual beacons
JP5631086B2 (en) Information processing apparatus, control method therefor, and program
CN111582204A (en) Attitude detection method and apparatus, computer device and storage medium
US8995714B2 (en) Information creation device for estimating object position and information creation method and program for estimating object position
JP5480667B2 (en) Position / orientation measuring apparatus, position / orientation measuring method, program
JP2017059207A (en) Image recognition method
CN108428224B (en) Animal body surface temperature detection method and device based on convolutional neural network
CN110782483A (en) Multi-view multi-target tracking method and system based on distributed camera network
JP5833507B2 (en) Image processing device
Liang et al. Image-based positioning of mobile devices in indoor environments
CN113052907B (en) Positioning method of mobile robot in dynamic environment
Tamjidi et al. 6-DOF pose estimation of a portable navigation aid for the visually impaired
JP2017091377A (en) Attitude estimation device, attitude estimation method, and attitude estimation program
CN111353325A (en) Key point detection model training method and device
CN113610967B (en) Three-dimensional point detection method, three-dimensional point detection device, electronic equipment and storage medium
WO2022107548A1 (en) Three-dimensional skeleton detection method and three-dimensional skeleton detection device
JP2023008030A (en) Image processing system, image processing method, and image processing program
CN113591785A (en) Human body part matching method, device, equipment and storage medium
CN114474035B (en) Robot position determining method, device and system
CN110567728B (en) Method, device and equipment for identifying shooting intention of user
JP6675584B2 (en) Image processing apparatus, image processing method, and program
CN112927291A (en) Pose determination method and device of three-dimensional object, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40026189

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant