WO2021227694A1 - 图像处理方法、装置、电子设备及存储介质 - Google Patents

图像处理方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021227694A1
WO2021227694A1 PCT/CN2021/084625 CN2021084625W WO2021227694A1 WO 2021227694 A1 WO2021227694 A1 WO 2021227694A1 CN 2021084625 W CN2021084625 W CN 2021084625W WO 2021227694 A1 WO2021227694 A1 WO 2021227694A1
Authority
WO
WIPO (PCT)
Prior art keywords
target object
target
depth
image
reference node
Prior art date
Application number
PCT/CN2021/084625
Other languages
English (en)
French (fr)
Inventor
王灿
李杰锋
刘文韬
钱晨
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2021227694A1 publication Critical patent/WO2021227694A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular, to an image processing method, device, electronic equipment, and storage medium.
  • Three-dimensional human posture detectors are widely used in security, games, entertainment and other fields.
  • the current three-dimensional human posture detection methods usually identify the first two-dimensional position information of the key points of the human body in the image, and then convert the first two-dimensional position information into three-dimensional position information according to the predetermined position relationship between the key points of the human body .
  • the embodiments of the present disclosure provide at least one image processing method, device, electronic equipment, and storage medium.
  • an embodiment of the present disclosure provides an image processing method, including: identifying a target area where a target object in a first image is located; and determining multiple keys of the target object based on the target area where the target object is located The first two-dimensional position information of the points in the first image, the relative depth of each key point to the reference node of the target object, and the absolute value of the reference node of the target object in the camera coordinate system Depth; determining multiple keys of the target object based on the first two-dimensional position information and the relative depth respectively corresponding to the multiple key points of the target object, and the absolute depth corresponding to the reference node The three-dimensional position information of the points respectively in the camera coordinate system.
  • the embodiments of the present disclosure can more accurately obtain the three-dimensional position information of the multiple key points of the target object in the camera coordinate system, and the three-dimensional position information of the multiple key points of the target object in the camera coordinate system can represent the target object.
  • the higher the accuracy of the three-dimensional position information the higher the accuracy of the obtained three-dimensional posture of the target object.
  • the method further includes: obtaining the posture of the target object based on the three-dimensional position information of the multiple key points of the target object in the camera coordinate system.
  • the posture of the target object determined based on the three-dimensional position information is also More precise.
  • the recognizing the target area where the target object in the first image is located includes: performing feature extraction on the first image to obtain a feature map of the first image; based on the A feature map is used to determine multiple target bounding boxes from multiple pre-generated candidate bounding boxes; based on the multiple target bounding boxes, determine the target area where the target object is located.
  • the determining the target area where the target object is located based on a plurality of the target bounding boxes includes: determining each of the target bounding boxes based on the plurality of target bounding boxes and the feature map The feature sub-map of the target bounding box; the bounding box regression processing is performed on the feature sub-maps respectively corresponding to the multiple target bounding boxes to obtain the target area where the target object is located.
  • the bounding box regression processing is performed on the feature sub-images respectively corresponding to the multiple target bounding boxes, and the position of each target object in the first image can be accurately detected from the first image.
  • determining the absolute depth of the reference node of the target object in the camera coordinate system includes: based on the target area where the target object is located and the first An image to determine the target feature map of the target object; perform depth recognition processing on the target feature map corresponding to the target object to obtain the normalized absolute depth of the reference node of the target object; based on the normalized absolute The depth and the parameter matrix of the camera are used to obtain the absolute depth of the reference node of the target object in the camera coordinate system.
  • the performing depth recognition processing on the target feature map corresponding to the target object to obtain the normalized absolute depth of the reference node of the target object includes: determining based on the first image The initial depth image; wherein the pixel value of any first pixel in the initial depth image is the second pixel in the first image corresponding to the position of the first pixel in the camera coordinate system Based on the target feature map corresponding to the target object, determine the second two-dimensional position information of the reference node corresponding to the target object in the first image; based on the second two-dimensional position information And the initial depth image, determining the initial depth value of the reference node corresponding to the target object; determining the normalization of the reference node of the target object based on the initial depth value of the reference node and the target feature map Absolute depth.
  • the determining the normalized absolute depth of the reference node of the target object based on the initial depth value of the reference node and the target feature map includes: corresponding to the target object The target feature map is subjected to at least one level of first convolution processing to obtain the feature vector of the target object; the feature vector and the initial depth value are spliced to obtain a spliced vector, and the spliced vector is subjected to at least one level
  • the second convolution process obtains the correction value of the initial depth value; and obtains the normalized absolute depth based on the correction value of the initial depth value and the initial depth value.
  • the parameter matrix includes: the focal length of the camera;
  • the obtaining the absolute depth of the reference node of the target object in the camera coordinate system based on the normalized absolute depth and the parameter matrix of the camera includes:
  • the absolute depth of the reference node of the target object in the camera coordinate system is obtained.
  • the image processing method is applied to a pre-trained neural network
  • the neural network includes three branch networks: a target detection network, a key point detection network, and a depth prediction network;
  • the target detection network Is used to obtain the target area where the target object is located;
  • the key point detection network is used to obtain the first two-dimensional position information of the multiple key points of the target object in the first image, and each The relative depth of the key point with respect to the reference node of the target object;
  • the depth prediction network is used to obtain the absolute depth of the reference node in the camera coordinate system.
  • an end-to-end target object pose detection framework is formed.
  • the first image is processed to obtain the information of each target object in the first image.
  • the three-dimensional position information of multiple key points in the camera coordinate system has a faster processing speed and higher recognition accuracy.
  • an embodiment of the present disclosure further provides an image processing device, including: a recognition module, configured to recognize a target area in a first image where a target object is located; and a first detection module, configured based on the location of the target object Target area, determining the first two-dimensional position information of multiple key points of the target object in the first image, the relative depth of each key point with respect to the reference node of the target object, and the The absolute depth of the reference node of the target object in the camera coordinate system; the second detection module is used for the first two-dimensional position information and the relative depth corresponding to the multiple key points of the target object, and the corresponding The absolute depth corresponding to the reference node determines the three-dimensional position information of the multiple key points of the target object in the camera coordinate system.
  • the second detection module is further configured to obtain the posture of the target object based on the three-dimensional position information of the multiple key points of the target object in the camera coordinate system.
  • the recognition module when recognizing the target area where the target object in the first image is located, is used to: perform feature extraction on the first image to obtain the image of the first image A feature map; based on the feature map, multiple target bounding boxes are determined from multiple candidate bounding boxes generated in advance; based on the multiple target bounding boxes, the target area where the target object is located is determined.
  • the recognition module determines the target area where the target object is located based on a plurality of the target bounding boxes, it is configured to: based on the plurality of target bounding boxes and the feature map , Determine the feature submap of each target bounding box; perform bounding box regression processing on the feature submaps corresponding to the multiple target bounding boxes to obtain the target area where the target object is located.
  • the first detection module is configured to determine the absolute depth of the reference node of the target object in the camera coordinate system based on the target area where the target object is located: The target area where the target object is located and the first image determine the target feature map of the target object; perform in-depth recognition processing on the target feature map corresponding to the target object to obtain the normalization of the reference node of the target object Absolute depth; based on the normalized absolute depth and the parameter matrix of the camera, the absolute depth of the reference node of the target object in the camera coordinate system is obtained.
  • the first detection module when it performs depth recognition processing on the target feature map corresponding to the target object to obtain the normalized absolute depth of the reference node of the target object, it is used to: Determine an initial depth image based on the first image; wherein the pixel value of any first pixel in the initial depth image is a second pixel in the first image corresponding to the position of the first pixel The initial depth value of the point in the camera coordinate system; based on the target feature map corresponding to the target object, determine the second two-dimensional position information of the reference node corresponding to the target object in the first image; based on The second two-dimensional position information and the initial depth image determine the initial depth value of the reference node corresponding to the target object; determine the target based on the initial depth value of the reference node and the target feature map The normalized absolute depth of the reference node of the object.
  • the first detection module is configured to determine the normalized absolute depth of the reference node of the target object based on the initial depth value of the reference node and the target feature map, : Perform at least one level of first convolution processing on the target feature map corresponding to the target object to obtain the feature vector of the target object; concatenate the feature vector and the initial depth value to obtain a concatenated vector, and compare The stitching vector is subjected to at least one level of second convolution processing to obtain the correction value of the initial depth value; and the normalized absolute depth is obtained based on the correction value of the initial depth value and the initial depth value.
  • the parameter matrix includes: the focal length of the camera; the first detection module obtains the reference node of the target object based on the normalized absolute depth and the parameter matrix of the camera In the absolute depth in the camera coordinate system, it is used for:
  • the absolute depth of the reference node of the target object in the camera coordinate system is obtained.
  • the image processing device uses a pre-trained neural network to implement image processing, and the neural network includes three branch networks: a target detection network, a key point detection network, and a depth prediction network; the target detection The network is used to obtain the target area where the target object is located; the key point detection network is used to obtain the first two-dimensional position information of the multiple key points of the target object in the first image, and each The relative depth of the key point with respect to the reference node of the target object; the depth prediction network is used to obtain the absolute depth of the reference node in the camera coordinate system.
  • the neural network includes three branch networks: a target detection network, a key point detection network, and a depth prediction network; the target detection The network is used to obtain the target area where the target object is located; the key point detection network is used to obtain the first two-dimensional position information of the multiple key points of the target object in the first image, and each The relative depth of the key point with respect to the reference node of the target object; the depth prediction network is used to obtain the absolute depth of the reference no
  • embodiments of the present disclosure also provide a computer device, including: a processor and a memory connected to each other, the memory storing machine-readable instructions executable by the processor, and when the computer device is running, the The machine-readable instructions are executed by the processor to implement the above-mentioned first aspect or the steps of the image processing method in any possible implementation of the first aspect.
  • the embodiments of the present disclosure also provide a computer-readable storage medium with a computer program stored on the computer-readable storage medium.
  • the computer program executes the first aspect or any of the first aspects when the computer program is run by a processor.
  • Fig. 1 shows a flowchart of an image processing method provided by an embodiment of the present disclosure
  • FIG. 2 shows a flowchart of a specific method for identifying a target area where a target object is located in a first image according to an embodiment of the present disclosure
  • FIG. 3 shows a specific example of determining a target area corresponding to a target object based on a target bounding box provided by an embodiment of the present disclosure
  • FIG. 4 shows a flowchart of a specific method for determining the absolute depth of a reference node of a target object in the camera coordinate system provided by an embodiment of the present disclosure
  • FIG. 5 shows a flowchart of another specific method for obtaining the normalized absolute depth of a reference node provided by an embodiment of the present disclosure
  • FIG. 6 shows a specific example of a target object pose detection framework provided by an embodiment of the present disclosure
  • FIG. 7 shows a specific example of another target object pose detection framework provided by an embodiment of the present disclosure.
  • FIG. 8 shows a schematic diagram of an image processing device provided by an embodiment of the present disclosure
  • Fig. 9 shows a schematic diagram of a computer device provided by an embodiment of the present disclosure.
  • the three-dimensional human pose detection method usually recognizes the first two-dimensional position information of the key points of the human body in the image to be recognized through the neural network, and then according to the mutual positional relationship between the key points of the human body (such as the connection relationship between different key points, phase The distance range between adjacent key points, etc.) Convert the first two-dimensional position information of each key point of the human body into three-dimensional position information; but the human body is complex and changeable, and the positional relationship between the key points of the human body corresponding to different human bodies They are also different, leading to large errors in the three-dimensional body posture obtained by this method.
  • the accuracy of the current 3D human pose detection method is based on the accurate estimation of the key points of the human body.
  • the key points of the human body cannot be accurately identified from the image due to the occlusion of clothes and limbs.
  • the three-dimensional human body posture error obtained by the above method will be further enlarged.
  • the present disclosure provides an image processing method and device.
  • identifying the target area where the target object is located in the first image and based on the target area, it is determined that multiple key points representing the posture of the target object are in the first image.
  • the first two-dimensional position information of the target object, the relative depth of each key point with respect to the reference node of the target object, and the absolute depth of the reference node of the target object in the camera coordinate system based on the first two-dimensional position information of the target object, Relative depth and absolute depth can obtain the three-dimensional position information of multiple key points of the target object in the camera coordinate system more accurately.
  • the execution subject of the image processing method provided in the embodiment of the present disclosure is generally a computer device with a certain computing capability.
  • the equipment includes, for example, terminal equipment or servers or other processing equipment.
  • the terminal equipment may be User Equipment (UE), mobile equipment, user terminal, terminal, cellular phone, cordless phone, personal digital assistant (PDA) , Handheld devices, computing devices, vehicle-mounted devices, wearable devices, etc.
  • UE User Equipment
  • PDA personal digital assistant
  • the image processing method may be implemented by a processor invoking computer-readable instructions stored in the memory.
  • FIG. 1 it is a flowchart of an image processing method provided by an embodiment of the present disclosure.
  • the method includes steps S101 to S103, wherein:
  • S102 Based on the target area where the target object is located, determine the first two-dimensional position information of multiple key points of the target object in the first image, and the relative position of each key point to the target object.
  • S103 Determine multiple keys of the target object based on the first two-dimensional position information and the relative depth respectively corresponding to the multiple key points of the target object, and the absolute depth corresponding to the reference node The three-dimensional position information of the points respectively in the camera coordinate system.
  • the first image includes at least one target object.
  • the target object is, for example, a person, an animal, a robot, or a vehicle waiting to determine a posture.
  • the categories of different target objects may be the same or different; for example, multiple target objects are all humans; or multiple target objects are all humans.
  • the vehicle For another example, the target object in the first image includes: a person and an animal; or the target object in the first image includes a person and a vehicle, and the target object category is determined according to actual application scenarios.
  • the target area where the target object is located refers to the area in the first image that includes the target object.
  • an embodiment of the present disclosure provides a specific method for identifying a target area in a first image where a target object is located, including:
  • S201 Perform feature extraction on the first image to obtain a feature map of the first image.
  • a neural network may be used to perform feature extraction on the first image to obtain a feature map of the first image.
  • S202 Based on the feature map, determine multiple target bounding boxes from multiple pre-generated candidate bounding boxes;
  • S203 Determine the target area where the target object is located based on the multiple target bounding boxes.
  • a bounding box prediction algorithm may be used to obtain multiple target bounding boxes.
  • Bounding box prediction algorithms include, for example, RoIAlign, ROI-Pooling, etc. Taking RoIAlign as an example, RoIAlign can traverse multiple candidate bounding boxes generated in advance, and determine that the sub-image corresponding to each candidate bounding box belongs to any target object in the first image The region of interest (ROI) value, the higher the ROI value, the greater the probability that the sub-image corresponding to the candidate bounding box belongs to a certain target object; after determining each candidate bounding box After the corresponding ROI value, multiple target bounding boxes are determined from the candidate bounding boxes according to the ROI value corresponding to each candidate bounding box in descending order.
  • ROI region of interest
  • the target bounding box is, for example, a rectangle; the information of the target bounding box includes, for example, the coordinates of any vertex in the target bounding box in the first image, and the height and width values of the target bounding box. Alternatively, the information of the target bounding box includes, for example, the coordinates of any vertex in the target bounding box in the feature map of the first image, and the height and width values of the target bounding box.
  • target regions corresponding to all target objects in the first image are determined.
  • the embodiment of the present disclosure provides a specific example of determining the target area corresponding to the target object based on the target bounding box, including:
  • S301 Determine a feature sub-image of each target bounding box based on a plurality of the target bounding boxes and the feature map.
  • the feature points and the first image in the feature map when the information of the target bounding box includes the coordinates of any vertex on the target bounding box in the first image, and the height and width values of the target bounding box, the feature points and the first image in the feature map
  • the pixels in an image have a certain positional mapping relationship; according to the relevant information of the target bounding box and the mapping relationship between the feature map and the first image, it is determined from the feature map of the first image that each target bounding box corresponds to each The feature subgraph.
  • the information of the target bounding box includes the coordinates of any vertex in the target bounding box in the feature map of the first image, as well as the height and width values of the target bounding box, it can be directly based on the target bounding box, from the first image In the feature map of the image, a feature sub-map corresponding to each target bounding box is determined.
  • S302 Perform bounding box regression processing on the feature sub-images respectively corresponding to the multiple target bounding boxes to obtain the target area where the target object is located.
  • a bounding box regression (Bounding-Box Regression) algorithm may be used to perform bounding box regression processing on the feature sub-images corresponding to each target bounding box to obtain multiple bounding boxes including the complete target object.
  • the target object can be accurately determined from the corresponding target area to distinguish the target object from the image background, thereby reducing the influence of the image background on the subsequent image processing process.
  • Each bounding box in the multiple bounding boxes corresponds to a target object, and the area determined based on the bounding box corresponding to the target object is the target area where the corresponding target object is located.
  • the number of target areas obtained is consistent with the number of target objects in the first image, and each target object corresponds to a target area; if there is a mutual occlusion position relationship between different target objects, there is mutual occlusion
  • the target areas corresponding to the target objects of the relationship have a certain degree of overlap.
  • other target detection algorithms can also be used to identify the target area where the target object in the first image is located.
  • the semantic segmentation algorithm is used to determine the semantic segmentation result of each pixel in the first image, and then according to the semantic segmentation result, the position of the pixel belonging to different target objects in the first image is determined; The pixels find the smallest bounding box, and the area corresponding to the smallest bounding box is determined as the target area where the target object is located.
  • the image coordinate system refers to the two-dimensional coordinate system established by the length and width of the first image
  • the camera coordinate system refers to the direction of the optical axis of the camera and parallel to the light Axis and the three-dimensional coordinate system established in two directions in the plane where the optical center of the camera is located.
  • the key points of the target object are located on the target object and have a mutual relationship with each other, and can represent the posture of the target object after being connected according to the mutual relationship; for example, when the target object is a human body, the key points include, for example, the human body
  • the key points of each joint In the image coordinate system, the key point is expressed as a two-dimensional coordinate value; in the camera coordinate system, it is expressed as a three-dimensional coordinate value.
  • the key point detection network can be used to perform key point detection processing based on the target feature map of the target object to obtain the two-dimensional position information of multiple key points of the target object in the first image, and each key point The relative depth of the point relative to the reference node of the target object.
  • the method of obtaining the target feature map can refer to the description of S401 below, which will not be repeated here.
  • the reference node is, for example, any pixel on a certain part predetermined on the target object.
  • the reference node may be predetermined according to actual needs; for example, when the target object is a human body, a pixel on the human pelvis may be determined as the reference node, or any pixel on the human body may be determined as the reference node, or The pixel point on the center of the chest and abdomen of the human body is determined as the reference node; the specific one can be set as required.
  • the absolute depth of each key point relative to the reference node of the target object is, for example, the difference between the coordinate value of the key point in the depth direction of the camera coordinate system and the coordinate value of the reference node in the depth direction of the camera coordinate system.
  • the absolute depth of the key point is, for example, the coordinate value of the key point in the depth direction of the camera coordinate system.
  • the embodiment of the present disclosure provides a specific method for determining the absolute depth of the reference node of the target object in the camera coordinate system based on the target area corresponding to the target object, including:
  • S401 Determine a target feature map of the target object based on the target area where the target object is located and the first image.
  • the target feature map of the target object may be determined from the feature map based on the feature map of the first image obtained by performing feature extraction on the first image and the target region.
  • the feature points in the feature map extracted for the first image and the pixels in the first image have a certain positional mapping relationship; after obtaining the target area where each target object is located, each target can be determined according to the positional mapping relationship The position of the object in the feature map of the first image, and then the target feature map of each target object is cut out from the feature map of the first image.
  • S402 Perform depth recognition processing on the target feature map corresponding to the target object to obtain the normalized absolute depth of the reference node of the target object.
  • the embodiments of the present disclosure In order to reduce the influence of the image difference caused by the different camera internal parameters on the absolute depth, we can first obtain the normalized absolute depth of the reference node of the target object based on the target feature map, and then use the normalized absolute depth and the camera internal parameters , Get the absolute depth of the reference node.
  • the normalized absolute depth is the absolute depth obtained by normalizing the reference node using the parameter matrix of the camera. After the normalized absolute depth is obtained, the parameter matrix of the camera can be used to restore the absolute depth of the reference node.
  • a pre-trained depth prediction network may be used to perform depth detection processing on the target feature map to obtain the normalized absolute depth of the reference node of the target object.
  • FIG. 5 another specific method for obtaining the normalized absolute depth of the reference node is also provided, including:
  • S501 Determine an initial depth image based on the first image; wherein the pixel value of any first pixel in the initial depth image is the first pixel in the first image corresponding to the position of the first pixel.
  • the first pixel in the initial depth image has a one-to-one correspondence with the second pixel in the first image, that is, the coordinate value of the first pixel in the initial depth image corresponds to the position
  • the coordinate values of the second pixel point in the first image are the same.
  • a depth prediction network can be used to determine the initial depth value of each pixel (second pixel) in the first image; the initial depth value of each first pixel constitutes the initial depth image of the first image ; The pixel value of any pixel (first pixel) in the initial depth image is the initial depth value of the pixel (second pixel) at the corresponding position in the first image.
  • S502 Determine the second two-dimensional position information of the reference node corresponding to the target object in the first image based on the target feature map corresponding to the target object;
  • S503 Determine an initial depth value of a reference node corresponding to the target object based on the second two-dimensional position information and the initial depth image.
  • the target feature map corresponding to the target object may, for example, be based on the target area corresponding to each target object, from the feature map of the first image, the target feature map determined for each target object.
  • a pre-trained reference node detection network can be used to determine the second two-dimensional position information of the reference node of the target object in the first image based on the target feature map. Then, using the second two-dimensional position information, the pixel corresponding to the reference node is determined from the initial depth image, and the pixel value of the pixel determined from the initial depth image is determined as the initial depth value of the reference node.
  • S504 Determine the normalized absolute depth of the reference node of the target object based on the initial depth value of the reference node and the target feature map.
  • At least one level of first convolution processing may be performed on the target feature map corresponding to the target object to obtain the feature vector of the target object; and the feature vector and the initial depth value can be spliced to obtain Stitching vector, and performing at least one level of second convolution processing on the stitching vector to obtain the correction value of the initial depth value; based on the correction value of the initial depth value and the initial depth value, the return value is obtained An absolute depth.
  • a neural network for adjusting the initial depth value can be used, and the neural network includes a plurality of convolutional layers; wherein, a part of the convolutional layers of the plurality of convolutional layers is used to perform at least the target feature map.
  • the specific method for determining the absolute depth of the reference node of the target object in the camera coordinate system further includes:
  • S403 Obtain the absolute depth of the reference node of the target object in the camera coordinate system based on the normalized absolute depth and the parameter matrix of the camera.
  • the corresponding camera internal parameters may be different; here,
  • the camera internal parameters include, for example, the focal length of the camera on the x-axis, the focal length of the camera on the y-axis, and the coordinates of the x-axis and the y-axis of the optical center of the camera in the camera coordinate system.
  • the internal parameters of the camera are different, even if the first image obtained at the same angle of view and the same position will be different; if the absolute depth of the reference node is directly predicted based on the target feature map, it will cause different images obtained from the same angle of view and the same position for different cameras.
  • the absolute depth of an image acquisition is different.
  • the embodiment of the present disclosure directly predicts the normalized depth of the reference node, and the normalized absolute depth is obtained without considering the camera internal parameters; then according to the camera internal parameters and the normalized absolute depth To restore the absolute depth of the reference node.
  • the target When restoring the absolute depth of the reference node based on the normalized absolute depth, for example, the target may be obtained based on the normalized absolute depth, the focal length, the area of the target region, and the area of the target bounding box The absolute depth of the reference node of the object in the camera coordinate system.
  • the camera coordinate system is a three-dimensional coordinate system; it includes three coordinate axes x, y, and z; the origin of the camera coordinate system is the optical center of the camera; the optical axis of the camera is the z-axis of the camera coordinate system; , And the plane perpendicular to the z axis is the plane where the x axis and the y axis are located; f x is the focal length of the camera on the x axis; f y is the focal length of the camera on the y axis.
  • each target object includes J key points, and there are N target objects in the first image; among them, the three-dimensional poses of the N target objects are expressed as:
  • the three-dimensional pose of the m-th target object It can be expressed as:
  • the target area where N target objects are located is expressed as: Among them, the target area where the m-th target object is located Expressed as:
  • the three-dimensional poses of the N target objects relative to the reference node are expressed as: Among them, the three-dimensional pose of the m-th target object relative to the reference node Expressed as:
  • the three-dimensional posture of the m-th target object is obtained through back-projection, where the three-dimensional coordinate information of the j-th node of the m-th target object satisfies the following formula (2)
  • the internal parameter matrix K is for example: (f x , f y , c x , c y );
  • f x is the focal length of the camera on the x-axis in the camera coordinate system
  • f y is the focal length of the camera on the y-axis in the camera coordinate system
  • c x is the optical center of the camera on the x-axis in the camera coordinate system Coordinate value
  • c y represents the coordinate value of the optical center of the camera on the y axis in the camera coordinate system.
  • the three-dimensional position information of the multiple key points of the target object in the camera coordinate system can be obtained; for the m-th target object, the three-dimensional position information corresponding to the J key points of the target object represents the m-th target object.
  • the three-dimensional pose of the target object is obtained;
  • the embodiments of the present disclosure identify the target area where the target object is located in the first image, and based on the target area, determine the first two-dimensional position information and each key point of multiple key points representing the posture of the target object in the first image.
  • the relative depth of the reference node relative to the target object and the absolute depth of the reference node of the target object in the camera coordinate system so as to obtain the target more accurately based on the first two-dimensional position information, relative depth, and absolute depth of the target object
  • another image processing method is also provided, wherein the image processing method is applied to a pre-trained neural network.
  • the neural network includes three branch networks: a target detection network, a key point detection network, and a deep prediction network; the target detection network is used to obtain the target area where the target object is located; the key point detection network is used to obtain The first two-dimensional position information of the multiple key points of the target object in the first image, and the relative depth of each key point with respect to the reference node of the target object; the depth prediction network uses To obtain the absolute depth of the reference node in the camera coordinate system.
  • the embodiments of the present disclosure form an end-to-end target object pose detection framework through three branch networks of a target detection network, a key point detection network, and a depth prediction network. Based on the framework, the first image is processed to obtain each target in the first image.
  • the three-dimensional position information of multiple key points of the object in the camera coordinate system has a faster processing speed and higher recognition accuracy.
  • the embodiment of the present disclosure also provides a specific example of a target object pose detection framework, including:
  • Target detection network key point detection network
  • deep prediction network Three network branches: target detection network, key point detection network, and deep prediction network
  • the target detection network performs feature extraction on the first image to obtain a feature map of the first image; then, according to the first feature map, RoIAlign is used to determine multiple target bounding boxes from multiple candidate bounding boxes generated in advance; The bounding box regression processing is performed on multiple target bounding boxes, and the target area corresponding to each target object is obtained.
  • the target feature map corresponding to the target area is transmitted to the key point detection network and the depth prediction network.
  • the key point detection network determines the first two-dimensional position information of multiple key points that characterize the target object's posture in the first image, and the relative relationship of each key point to the reference node of the target object depth.
  • the first two-dimensional position information and relative depth of each key point in each target feature map constitute the three-dimensional posture of the target object in the target feature map.
  • the three-dimensional posture at this time is the three-dimensional posture with reference to itself.
  • the depth prediction network determines the absolute depth of the reference node of the target object in the camera coordinate system based on the target feature map.
  • the three-dimensional position information of the multiple key points of the target object in the camera coordinate system are determined.
  • the three-dimensional position information of multiple key points on the target object in the camera coordinate system respectively constitute the three-dimensional posture of the target object in the camera coordinate system.
  • the three-dimensional posture at this time is the three-dimensional posture referenced by the camera.
  • the embodiment of the present disclosure also provides another specific example of a target object pose detection framework, including:
  • Target detection network key point detection network, and deep prediction network
  • the target detection network performs feature extraction on the first image to obtain a feature map of the first image; then, according to the first feature map, RoIAlign is used to determine multiple target bounding boxes from multiple candidate bounding boxes generated in advance; The bounding box regression processing is performed on multiple target bounding boxes, and the target area corresponding to each target object is obtained.
  • the target feature map corresponding to the target area is transmitted to the key point detection network and the depth prediction network.
  • the key point detection network determines the first two-dimensional position information of multiple key points that characterize the target object's posture in the first image, and the relative relationship of each key point to the reference node of the target object depth.
  • the first two-dimensional position information and relative depth of each key point in each target feature map constitute the three-dimensional posture of the target object in the target feature map.
  • the three-dimensional posture at this time is the three-dimensional posture with reference to itself.
  • the depth prediction network obtains the initial depth image based on the first image; and based on the target feature map corresponding to the target object, determines the second two-dimensional position information of the reference node corresponding to the target object in the first image, and based on all The second two-dimensional position information and the initial depth image determine the initial depth value of the reference node corresponding to the target object; and perform at least one level of first convolution processing on the target feature map corresponding to the target object to obtain The feature vector of the target object; the feature vector and the initial depth value of the reference node are spliced to form a spliced vector, and at least one level of second convolution processing is performed on the spliced vector to obtain the correction of the initial depth value Value; add the correction value to the initial depth value of the reference node to get the normalized absolute depth value of the reference node.
  • the absolute depth value of the reference node is restored, and then according to the first two-dimensional position information of the target object, the relative depth, and the absolute depth of the reference node, multiple key points of the target object are determined
  • the three-dimensional position information of the points respectively in the camera coordinate system For each target object, the three-dimensional position information of multiple key points on the target object in the camera coordinate system respectively constitute the three-dimensional posture of the target object in the camera coordinate system.
  • the three-dimensional posture at this time is the three-dimensional posture referenced by the camera.
  • the three-dimensional position information of the multiple key points of each target object in the first image in the camera coordinate system can be obtained.
  • the processing speed is faster and the recognition accuracy is higher. .
  • the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process.
  • the specific execution order of each step should be based on its function and possibility.
  • the inner logic is determined.
  • the embodiment of the present disclosure also provides an image processing device corresponding to the image processing method. Since the principle of the device in the embodiment of the present disclosure to solve the problem is similar to the above-mentioned image processing method of the embodiment of the present disclosure, the implementation of the device is You can refer to the implementation of the method, and the repetition will not be repeated.
  • FIG. 8 it is a schematic diagram of an image processing apparatus provided by an embodiment of the present disclosure.
  • the apparatus includes: an identification module 81, a first detection module 82, and a second detection module 83; wherein,
  • the recognition module 81 is configured to recognize the target area where the target object in the first image is located;
  • the first detection module 82 is configured to determine, based on the target area corresponding to the target object, the first two-dimensional position information in the first image and the first two-dimensional position information of the plurality of key points that characterize the target object's posture. The relative depth of the key point with respect to the reference node of the target object, and the absolute depth of the reference node of the target object in the camera coordinate system;
  • the second detection module 83 is configured to determine that the multiple key points of the target object are in the camera coordinates based on the first two-dimensional position information, the relative depth, and the absolute depth of the target object. Three-dimensional position information in the department.
  • the recognition module 81 is configured to: when recognizing the target area where the target object in the first image is located:
  • the recognition module 81 is configured to: when determining the target area corresponding to the target object based on the target bounding box:
  • Bounding box regression processing is performed based on the feature sub-maps respectively corresponding to the multiple target bounding boxes to obtain the target area corresponding to the target object.
  • the first detection module 82 determines the absolute depth of the reference node of the target object in the camera coordinate system based on the target area corresponding to the target object, it is configured to:
  • the absolute depth of the reference node of the target object in the camera coordinate system is obtained.
  • the first detection module 82 is configured to perform depth recognition processing based on the target feature map corresponding to the target object to obtain the normalized absolute depth of the reference node of the target object. :
  • the first detection module 82 determines the target object's initial depth value based on the initial depth value of the reference node corresponding to the target object and the target feature map corresponding to the target object.
  • the normalized absolute depth of the node it is used for:
  • the normalized absolute depth is obtained.
  • a pre-trained neural network is deployed in the image processing device, and the neural network includes three branch networks: a target detection network, a key point detection network, and a depth prediction network; the target detection network Is used to obtain the target area where the target object is located; the key point detection network is used to obtain the first two-dimensional position information of the multiple key points of the target object in the first image, and each The relative depth of the key point with respect to the reference node of the target object; the depth prediction network is used to obtain the absolute depth of the reference node in the camera coordinate system.
  • the target detection network Is used to obtain the target area where the target object is located
  • the key point detection network is used to obtain the first two-dimensional position information of the multiple key points of the target object in the first image, and each The relative depth of the key point with respect to the reference node of the target object
  • the depth prediction network is used to obtain the absolute depth of the reference node in the camera coordinate system.
  • the embodiments of the present disclosure identify the target area where the target object is located in the first image, and based on the target area, determine the first two-dimensional position information and each key point of multiple key points representing the posture of the target object in the first image.
  • the relative depth of the reference node relative to the target object and the absolute depth of the reference node of the target object in the camera coordinate system so as to obtain the target more accurately based on the first two-dimensional position information, relative depth, and absolute depth of the target object
  • the embodiment of the present disclosure constitutes an end-to-end target object pose detection framework through three branch networks of the target detection network, the key point detection network and the depth prediction network. Based on the framework, the first image is processed to obtain each The three-dimensional position information of multiple key points of a target object in the camera coordinate system is faster in processing speed and higher in recognition accuracy.
  • the embodiment of the present disclosure also provides a computer device 10, as shown in FIG. 9, a schematic structural diagram of the computer device 10 provided by the embodiment of the present disclosure, including:
  • the target area corresponding to the target object determine the first two-dimensional position information of multiple key points representing the posture of the target object in the first image, and the relative position of each key point relative to the target object.
  • the relative depth, and the absolute depth of the target object determine the three-dimensional position information of the multiple key points of the target object in the camera coordinate system, respectively.
  • the embodiments of the present disclosure also provide a computer-readable storage medium on which a computer program is stored, and the computer program executes the steps of the image processing method described in the above method embodiment when the computer program is run by a processor.
  • the storage medium may be a volatile or non-volatile computer readable storage medium.
  • the embodiments of the present disclosure also provide a computer program product, the computer program product carries program code, and the instructions included in the program code can be used to execute the steps of the image processing method described in the above method embodiment.
  • the computer program product carries program code
  • the instructions included in the program code can be used to execute the steps of the image processing method described in the above method embodiment.
  • please refer to the above method The embodiments are not repeated here.
  • the above-mentioned computer program product can be specifically implemented by hardware, software, or a combination thereof.
  • the computer program product is specifically embodied as a computer storage medium.
  • the computer program product is specifically embodied as a software product, such as a software development kit (SDK), etc. Wait.
  • SDK software development kit
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

一种图像处理方法、装置、电子设备及存储介质,其中,该方法包括:识别第一图像中的目标对象的目标区域(S101);基于目标对象对应的目标区域,确定表征目标对象姿态的多个关键点分别在第一图像中的第一二维位置信息、每个关键点相对目标对象的参考节点的相对深度、以及目标对象的参考节点在相机坐标系中的绝对深度(S102);基于目标对象的第一二维位置信息、相对深度、以及绝对深度,确定目标对象的多个关键点分别在相机坐标系中的三维位置信息(S103)。

Description

图像处理方法、装置、电子设备及存储介质
本公开要求在2020年05月13日提交中国专利局、申请号为202010403620.5、申请名称为“图像处理方法、装置、电子设备及存储介质”的中国专利的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及图像处理技术领域,具体而言,涉及一种图像处理方法、装置、电子设备及存储介质。
背景技术
三维人体姿态检测计被广泛应用于安防、游戏、娱乐等领域。当前的三维人体姿态检测方法通常为识别人体关键点在图像中的第一二维位置信息,然后根据预先确定的人体关键点之间的位置关系,将第一二维位置信息转换为三维位置信息。
当前的三维人体姿态检测方法所得到的人体姿态存在较大的误差。
发明内容
本公开实施例至少提供一种图像处理方法、装置、电子设备及存储介质。
第一方面,本公开实施例提供了一种图像处理方法,包括:识别第一图像中的目标对象所在的目标区域;基于所述目标对象所在的目标区域,确定所述目标对象的多个关键点分别在所述第一图像中的第一二维位置信息、每个所述关键点相对所述目标对象的参考节点的相对深度、以及所述目标对象的参考节点在相机坐标系中的绝对深度;基于所述目标对象的多个关键点分别对应的所述第一二维位置信息和所述相对深度、以及所述参考节点对应的所述绝对深度,确定所述目标对象的多个关键点分别在所述相机坐标系中的三维位置信息。
这样,本公开实施例能够更精确的得到目标对象的多个关键点分别在相机坐标系中的三维位置信息,目标对象的多个关键点分别在相机坐标系中的三维位置信息能够表征目标对象的三维姿态,三维位置信息的精度越高,则得到的目标对象的三维姿态的精度也就越高。
一种可能的实施方式中,还包括:基于所述目标对象的多个关键点分别在所述相机坐标系中的三维位置信息,得到所述目标对象的姿态。
这样,基于本公开实施例得到的目标对象的多个关键点分别在相机坐标系中的三维位置信息,由于三维位置信息具有更高的精度,因而基于三维位置信息确定的目标对象的姿态也就更为精确。
一种可能的实施方式中,所述识别所述第一图像中的目标对象所在的目标区域,包括:对所述第一图像进行特征提取,得到所述第一图像的特征图;基于所述特征图,从预先生成的多个候选边界框中确定多个目标边 界框;基于多个所述目标边界框,确定所述目标对象所在的目标区域。
这样,分为两步来确定目标对象所在的目标区域,能够精确的将各个目标对象在第一图像中的位置,从第一图像中检测出来,以提升后续关键点检测过程中的人体信息完整性、以及检测精度。
一种可能的实施方式中,所述基于多个所述目标边界框,确定所述目标对象所在的目标区域,包括:基于多个所述目标边界框以及所述特征图,确定每个所述目标边界框的特征子图;对多个所述目标边界框分别对应的特征子图进行边界框回归处理,得到所述目标对象所在的目标区域。
这样,对多个目标边界框分别对应的特征子图进行边界框回归处理,能够精确的将各个目标对象在第一图像中的位置从第一图像中检测出来。
一种可能的实施方式中,基于所述目标对象所在的目标区域,确定所述目标对象的参考节点在相机坐标系中的绝对深度,包括:基于所述目标对象所在的目标区域以及所述第一图像,确定所述目标对象的目标特征图;对所述目标对象对应的目标特征图进行深度识别处理,得到所述目标对象的参考节点的归一化绝对深度;基于所述归一化绝对深度以及相机的参数矩阵,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度。
这样,能够尽可能避免相机的内参不同所造成的直接基于目标特征图预测参考节点的绝对深度,所造成对不同相机在相同视角、相同位置获取的不同第一图像获取的绝对深度不同的情况。
一种可能的实施方式中,所述对所述目标对象对应的目标特征图进行深度识别处理,得到所述目标对象的参考节点的归一化绝对深度,包括:基于所述第一图像,确定初始深度图像;其中,所述初始深度图像中任一第一像素点的像素值为所述第一图像中与所述第一像素点的位置对应的第二像素点在所述相机坐标系中的初始深度值;基于所述目标对象对应的目标特征图,确定与所述目标对象对应的参考节点在所述第一图像中的第二二维位置信息;基于所述第二二维位置信息、以及所述初始深度图像,确定所述目标对象对应的参考节点的初始深度值;基于所述参考节点的初始深度值以及所述目标特征图,确定所述目标对象的参考节点的归一化绝对深度。
这样,能够使得通过该过程得到参考节点的归一化绝对深度更加精确。
一种可能的实施方式中,所述基于所述参考节点的初始深度值以及所述目标特征图,确定所述目标对象的参考节点的归一化绝对深度,包括:对所述目标对象对应的目标特征图进行至少一级第一卷积处理,得到所述目标对象的特征向量;将所述特征向量和所述初始深度值进行拼接,得到拼接向量,并对所述拼接向量进行至少一级第二卷积处理,得到所述初始深度值的修正值;基于所述初始深度值的修正值、以及所述初始深度值,得到所述归一化绝对深度。
一种可能的实施方式中,所述参数矩阵包括:所述相机的焦距;
所述基于所述归一化绝对深度以及相机的参数矩阵,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度,包括:
基于所述归一化绝对深度、所述焦距、所述目标区域的面积、以及所述目标边界框的面积,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度。
一种可能的实施方式中,所述图像处理方法应用于预先训练好的神经网络中,所述神经网络包括目标检测网络、关键点检测网络以及深度预测网络三个分支网络;所述目标检测网络用于获得所述目标对象所在的目标区域;所述关键点检测网络用于获取所述目标对象的多个关键点分别在所述第一图像中的第一二维位置信息、和每个所述关键点相对所述目标对象的参考节点的相对深度;所述深度预测网络用于获取所述参考节点在所述相机坐标系中的绝对深度。
这样,通过目标检测网络、关键点检测网络以及深度预测网络三个分支网络,构成端到端的目标对象姿态检测框架,基于该框架对第一图像进行处理,得到第一图像中每个目标对象的多个关键点分别在相机坐标系中的三维位置信息,处理速度更快,识别精度更高。
第二方面,本公开实施例还提供一种图像处理装置,包括:识别模块,用于识别第一图像中的目标对象所在的目标区域;第一检测模块,用于基于所述目标对象所在的目标区域,确定所述目标对象的多个关键点分别在所述第一图像中的第一二维位置信息、每个所述关键点相对所述目标对象的参考节点的相对深度、以及所述目标对象的参考节点在相机坐标系中的绝对深度;第二检测模块,用于基于所述目标对象的多个关键点分别对应的所述第一二维位置信息和所述相对深度、以及所述参考节点对应的所述绝对深度,确定所述目标对象的多个关键点分别在所述相机坐标系中的三维位置信息。
一种可能的实施方式中,所述第二检测模块,还用于基于所述目标对象的多个关键点分别在所述相机坐标系中的三维位置信息,得到所述目标对象的姿态。
一种可能的实施方式中,所述识别模块,在识别所述第一图像中的目标对象所在的目标区域时,用于:对所述第一图像进行特征提取,得到所述第一图像的特征图;基于所述特征图,从预先生成的多个候选边界框中确定多个目标边界框;基于多个所述目标边界框,确定所述目标对象所在的目标区域。
一种可能的实施方式中,所述识别模块,在基于多个所述目标边界框,确定所述目标对象所在的目标区域时,用于:基于多个所述目标边界框以及所述特征图,确定每个所述目标边界框的特征子图;对多个所述目标边界框分别对应的特征子图进行边界框回归处理,得到所述目标对象所在的目标区域。
一种可能的实施方式中,其中,所述第一检测模块,在基于目标对象所在的目标区域,确定所述目标对象的参考节点在相机坐标系中的绝对深度时,用于:基于所述目标对象所在的目标区域以及所述第一图像,确定所述目标对象的目标特征图;对所述目标对象对应的目标特征图进行深度识别处理,得到所述目标对象的参考节点的归一化绝对深度;基于所述归一化绝对深度以及相机的参数矩阵,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度。
一种可能的实施方式中,所述第一检测模块,在对所述目标对象对应的目标特征图进行深度识别处理,得到所述目标对象的参考节点的归一化绝对深度时,用于:基于所述第一图像,确定初始深度图像;其中,所述初始深度图像中任一第一像素点的像素值为所述第一图像中与所述第一像素点的位置对应的第二像素点在所述相机坐标系中的初始深度值;基于所述目标对象对应的目标特征图,确定与所述目标对象对应的参考节点在所述第一图像中的第二二维位置信息;基于所述第二二维位置信息、以及所述初始深度图像,确定所述目标对象对应的参考节点的初始深度值;基于所述参考节点的初始深度值以及所述目标特征图,确定所述目标对象的参考节点的归一化绝对深度。
一种可能的实施方式中,所述第一检测模块,在基于所述参考节点的初始深度值以及所述目标特征图,确定所述目标对象的参考节点的归一化绝对深度时,用于:对所述目标对象对应的目标特征图进行至少一级第一卷积处理,得到所述目标对象的特征向量;将所述特征向量和所述初始深度值进行拼接,得到拼接向量,并对所述拼接向量进行至少一级第二卷积处理,得到所述初始深度值的修正值;基于所述初始深度值的修正值、以及所述初始深度值,得到所述归一化绝对深度。
一种可能的实施方式中,所述参数矩阵包括:所述相机的焦距;所述第一检测模块,在基于所述归一化绝对深度以及相机的参数矩阵,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度时,用于:
基于所述归一化绝对深度、所述焦距、所述目标区域的面积、以及所述目标边界框的面积,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度。
一种可能的实施方式中,所述图像处理装置利用预先训练好的神经网络实现图像处理,所述神经网络包括目标检测网络、关键点检测网络以及深度预测网络三个分支网络;所述目标检测网络用于获得所述目标对象所在的目标区域;所述关键点检测网络用于获取所述目标对象的多个关键点分别在所述第一图像中的第一二维位置信息、和每个所述关键点相对所述目标对象的参考节点的相对深度;所述深度预测网络用于获取所述参考节点在所述相机坐标系中的绝对深度。
第三方面,本公开实施例还提供一种计算机设备,包括:相互连接的 处理器和存储器,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述机器可读指令被所述处理器执行以实现上述第一方面,或第一方面中任一种可能的实施方式中的图像处理方法的步骤。
第四方面,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述第一方面,或第一方面中任一种可能的实施方式中的图像处理方法的步骤。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种图像处理方法的流程图;
图2示出了本公开实施例所提供的识别第一图像中目标对象所在的目标区域的具体方法的流程图;
图3示出了本公开实施例所提供的基于目标边界框,确定目标对象对应的目标区域的具体示例;
图4示出了本公开实施例所提供的确定目标对象的参考节点在相机坐标系中的绝对深度的具体方法的流程图;
图5示出了本公开实施例所提供的另一种得到参考节点的归一化绝对深度的具体方法的流程图;
图6示出了本公开实施例所提供的目标对象姿态检测框架的具体示例;
图7示出了本公开实施例所提供的另一种目标对象姿态检测框架的具体示例;
图8示出了本公开实施例所提供的一种图像处理装置的示意图;
图9示出了本公开实施例所提供的一种计算机设备的示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前 提下所获得的所有其他实施例,都属于本公开保护的范围。
三维人体姿态检测方法通常为通过神经网络识别人体关键点在待识别图像中的第一二维位置信息,然后根据人体关键点之间的相互位置关系(如不同关键点之间的连接关系、相邻关键点之间的距离范围等)将各个人体关键点的第一二维位置信息转换为三维位置信息;但人的体型复杂多变,不同的人体所对应的人体关键点之间的位置关系也各不相同,导致通过这种方法得到的三维人体姿态存在较大的误差。
另外,当前的三维人体姿态检测方法的精度是建立在人体关键点精确估计的基础上,但由于衣服、肢体等遮挡,在很多情况下并不能精确的从图像中将人体关键点识别出来,进而造成通过上述方法得到的三维人体姿态误差会被进一步拉大。
针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应该是发明人在本公开过程中对本公开做出的贡献。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
基于上述研究,本公开提供了一种图像处理方法及装置,通过识别第一图像中目标对象所在的目标区域,并基于目标区域,确定表征目标对象姿态的多个关键点分别在第一图像中的第一二维位置信息、每个关键点相对于目标对象的参考节点的相对深度、以及目标对象的参考节点在相机坐标系中的绝对深度,从而基于目标对象的第一二维位置信息、相对深度、以及绝对深度,更精确的得到目标对象的多个关键点分别在相机坐标系中的三维位置信息。
为便于对本实施例进行理解,首先对本公开实施例所公开的一种图像处理方法进行详细介绍,本公开实施例所提供的图像处理方法的执行主体一般为具有一定计算能力的计算机设备,该计算机设备例如包括:终端设备或服务器或其它处理设备,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等。在一些可能的实现方式中,该图像处理方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。
下面对本公开实施例提供的图像处理方法加以说明。
参见图1所示,为本公开实施例提供的图像处理方法的流程图,所述方法包括步骤S101~S103,其中:
S101:识别第一图像中的目标对象所在的目标区域;
S102:基于所述目标对象所在的目标区域,确定所述目标对象的多个关键点分别在所述第一图像中的第一二维位置信息、每个所述关键点相对 所述目标对象的参考节点的相对深度、以及所述目标对象的参考节点在相机坐标系中的绝对深度;
S103:基于所述目标对象的多个关键点分别对应的所述第一二维位置信息和所述相对深度、以及所述参考节点对应的所述绝对深度,确定所述目标对象的多个关键点分别在所述相机坐标系中的三维位置信息。
下面分别对上述S101~S103加以详细说明。
I:在上述S101中,第一图像中包括有至少一个目标对象。目标对象例如为人、动物、机器人、车辆等待确定姿态的对象。
一种可能的实施方式中,当第一图像中包括的目标对象多于一个的时候,不同目标对象的类别可以相同,也可以不同;例如,多个目标对象均为人;或者多个目标对象均为车辆。又例如,第一图像中的目标对象包括:人和动物;或者第一图像中的目标对象包括人和车辆,具体根据实际的应用场景需要来确定目标对象类别。
目标对象所在的目标区域,是指第一图像中包括有目标对象的区域。
示例性的,参见图2所示,本公开实施例提供一种识别第一图像中目标对象所在的目标区域的具体方法,包括:
S201:对所述第一图像进行特征提取,得到所述第一图像的特征图。
此处,例如可以利用神经网络对第一图像进行特征提取,以得到第一图像的特征图。
S202:基于所述特征图,从预先生成的多个候选边界框中确定多个目标边界框;
S203:基于多个所述目标边界框,确定所述目标对象所在的目标区域。
在具体实施中,例如可以利用边界框预测算法,得到多个目标边界框。边界框预测算法例如包括RoIAlign、ROI-Pooling等,以RoIAlign为例,RoIAlign可以对预先生成的多个候选边界框进行遍历,确定各个候选边界框对应的子图像属于第一图像中任一目标对象的感兴趣区域(region of interest,ROI)值,该ROI值越高,与之对应的候选边界框对应的子图像属于某个目标对象的概率也就越大;在确定了每个候选边界框对应的ROI值后,根据各个候选边界框分别对应的ROI值从大到小的顺序,从候选边界框中确定多个目标边界框。
目标边界框例如为矩形;目标边界框的信息例如包括:目标边界框中任一顶点在第一图像中的坐标,以及目标边界框的高度值和宽度值。或者,目标边界框的信息例如包括:目标边界框中任一顶点在第一图像的特征图中的坐标,以及目标边界框的高度值和宽度值。
在得到多个目标边界框后,基于多个目标边界框,确定第一图像中所有的目标对象分别对应的目标区域。
参见图3所示,本公开实施例提供一种基于目标边界框,确定目标对象对应的目标区域的具体示例,包括:
S301:基于多个所述目标边界框以及所述特征图,确定每个所述目标边界框的特征子图。
在具体实施中,在目标边界框的信息包括目标边界框上的任一顶点在第一图像中的坐标,以及目标边界框的高度值和宽度值的情况下,特征图中的特征点和第一图像中的像素点具有一定的位置映射关系;根据该目标边界框的相关信息、以及特征图和第一图像之间的映射关系,从第一图像的特征图中确定各个目标边界框分别对应的特征子图。
在目标边界框的信息包括目标边界框中任一顶点在第一图像的特征图中的坐标,以及目标边界框的高度值和宽度值的情况下,可以直接基于该目标边界框,从第一图像的特征图中确定与各个目标边界框分别对应的特征子图。
S302:对多个所述目标边界框分别对应的特征子图进行边界框回归处理,得到所述目标对象所在的目标区域。
此处,例如可以利用边界框回归(Bounding-Box Regression)算法,对各个目标边界框分别对应的特征子图进行边界框回归处理,以得到包括完整目标对象的多个边界框。
在利用边界框回归算法,能够准确的将目标对象从对应的目标区域确定出来,以将目标对象和图像背景区别开,进而减少图像背景对后续图像处理过程的影响。
多个边界框中的每个边界框与一个目标对象对应,基于与该目标对象对应的边界框确定的区域,即为对应目标对象所在的目标区域。
此时,所得到的目标区域的数量,与第一图像中目标对象的数量一致,且每个目标对象对应一个目标区域;若不同的目标对象之间存在相互遮挡的位置关系,则存在相互遮挡关系的目标对象分别对应的目标区域具有一定的重叠度。
本公开另一种实施例中,也可以采用其他目标检测算法是被第一图像中的目标对象所在的目标区域。例如,采用语义分割算法,确定第一图像中每个像素点的语义分割结果,然后根据语义分割结果,确定属于不同目标对象的像素点在第一图像中的位置;然后根据属于同一目标对象的像素点求最小包围框,将最小包围框对应的区域确定为目标对象所在的目标区域。
II:在上述S102中,图像坐标系,是指以第一图像的长和宽两个方向所建立的二维坐标系;相机坐标系,是指以相机的光轴所在方向、以及平行于光轴且相机的光心所在平面中的两个方向建立的三维坐标系。
目标对象的关键点,例如是位于目标对象上,且之间具有相互关系的,并且按照相互关系连接后能够表征目标对象姿态的像素点;例如,在目标对象为人体时,关键点例如包括人体各个关节的关键点。该关键点在图像坐标系中,表示为二维坐标值;在相机坐标系中,表示为三维坐标值。
在具体实施中,例如可以利用关键点检测网络,基于目标对象的目标特征图进行关键点检测处理,得到目标对象的多个关键点分别在第一图像中的二维位置信息,以及每个关键点相对于目标对象的参考节点的相对深度。此处,目标特征图的获取方式可以参见下述对S401的说明,在此不再赘述。
参考节点,例如为在目标对象上预先确定某个部位上的任一像素点。示例性的,可以根据实际的需要来预先确定该参考节点;例如在目标对象为人体时,可以将人体骨盆上的像素点确定为参考节点,或者将人体上任一像素点确定为参考节点,或者将人体的胸腹中央上的像素点确定为参考节点;具体的可以根据需要进行设定。
每个关键点相对于目标对象的参考节点的绝对深度,例如为关键点在相机坐标系的深度方向的坐标值、与参考节点在相机坐标系的深度方向的坐标值的差值。关键点的绝对深度,例如为关键点在相机坐标系的深度方向的坐标值。
参见图4所示,本公开实施例提供一种基于目标对象对应的目标区域,确定目标对象的参考节点在相机坐标系中的绝对深度的具体方法,包括:
S401:基于所述目标对象所在的目标区域以及所述第一图像,确定所述目标对象的目标特征图。
此处,例如可以基于对第一图像进行特征提取所得到第一图像的特征图、以及所述目标区域,从所述特征图中确定目标对象的目标特征图。
这里,为第一图像提取的特征图中的特征点和第一图像中的像素点具有一定的位置映射关系;在得到各个目标对象所在的目标区域后,能够根据该位置映射关系,确定各个目标对象在第一图像的特征图中的所在位置,然后将与各个目标对象的目标特征图从第一图像的特征图中截取出来。
S402:对所述目标对象对应的目标特征图进行深度识别处理,得到所述目标对象的参考节点的归一化绝对深度。
此处,由于不同相机的内参不同,目标对象在不同相机的成像中会有所区别;若要直接确定目标对象的参考节点的绝对深度,会存在由于相机内参造成的误差,因此本公开实施例中,为了减少相机内参不同导致的图像差异对绝对深度造成的而影响,可以首先基于目标特征图,得到目标对象的参考节点的归一化绝对深度,然后再利用归一化绝对深度和相机内参,得到参考节点的绝对深度。该归一化绝对深度,是利用相机的参数矩阵对参考节点进行归一化后得到的绝对深度,在得到归一化绝对深度后,可以利用相机的参数矩阵,恢复参考节点的绝对深度。
在一种可能的实施方式中,例如可以采用预先训练的深度预测网络,对目标特征图执行深度检测处理,得到目标对象的参考节点的归一化绝对深度。
本公开另一种实施例中,参见图5所示,还提供另一种得到参考节点 的归一化绝对深度的具体方法,包括:
S501:基于所述第一图像,确定初始深度图像;其中,所述初始深度图像中任一第一像素点的像素值为所述第一图像中与所述第一像素点的位置对应的第二像素点在所述相机坐标系中的初始深度值。
在具体实施中,初始深度图像中的第一像素点与第一图像中的第二像素点具有一一对应关系,也即,第一像素点在初始深度图像中的坐标值,与位置对应的第二像素点在第一图像中的坐标值相同。
示例性的,可以利用可以采用深度预测网络,确定第一图像中每个像素点(第二像素点)初始深度值;各个第一像素点的初始深度值,构成了第一图像的初始深度图像;在初始深度图像中的任一像素点(第一像素点)的像素值,即为在第一图像中对应位置的像素点(第二像素点)的初始深度值。
S502:基于所述目标对象对应的目标特征图,确定与所述目标对象对应的参考节点在所述第一图像中的第二二维位置信息;
S503:基于所述第二二维位置信息、以及所述初始深度图像,确定所述目标对象对应的参考节点的初始深度值。
此处,目标对象对应的目标特征图,例如可以基于各个目标对象对应的目标区域,从与第一图像的特征图中,为各个目标对象确定的目标特征图。
在得到各个目标对象对应的目标特征图后,例如可以利用预先训练的参考节点检测网络,基于目标特征图中确定目标对象的参考节点在第一图像中的第二二维位置信息。然后利用该第二二维位置信息,从初始深度图像确定与参考节点对应的像素点,并将该从初始深度图像中确定的像素点的像素值,确定为参考节点的初始深度值。
S504:基于所述参考节点的初始深度值以及所述目标特征图,确定所述目标对象的参考节点的归一化绝对深度。
示例性的,例如可以对所述目标对象对应的目标特征图进行至少一级第一卷积处理,得到所述目标对象的特征向量;将所述特征向量和所述初始深度值进行拼接,得到拼接向量,并对所述拼接向量进行至少一级第二卷积处理,得到所述初始深度值的修正值;基于所述初始深度值的修正值、以及所述初始深度值,得到所述归一化绝对深度。
此处,例如可以采用一个用于对初始深度值进行调整的神经网络,该神经网络包括多个卷积层;其中,多个卷积层中的部分卷积层用于对目标特征图进行至少一级第一卷积处理;其他卷积层用于对拼接向量进行至少一级第二卷积处理,进而得到该修正值;然后根据该修正值对初始深度值进行调整,得到目标对象的参考节点的归一化深度。
承接上述S402,本公开实施例所提供的确定目标对象的参考节点在相机坐标系中的绝对深度的具体方法还包括:
S403:基于所述归一化绝对深度以及相机的参数矩阵,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度。
在具体实施中,由于在对不同第一图像进行图像处理过程中,不同的第一图像可能通过不同的相机拍摄而成;而对于不同的相机,所对应的相机内参可能会不同;此处,相机内参例如包括:相机在x轴上的焦距、相机在y轴上的焦距、相机的光心在相机坐标系中的x轴和y轴的坐标。
相机内参不同,即使在相同视角、相同位置获取的第一图像也会有所区别;若直接基于目标特征图预测参考节点的绝对深度,会造成对不同相机在相同视角、相同位置获取的不同第一图像获取的绝对深度不同。
为了避免上述情况的产生,本公开实施例直接预测参考节点的归一化深度,该归一化绝对深度是在不考虑相机内参的情况下得到的;然后根据相机内参、以及归一化绝对深度,恢复参考节点的绝对深度。
在基于归一化绝对深度恢复参考节点的绝对深度时,例如可以基于所述归一化绝对深度、所述焦距、所述目标区域的面积、以及所述目标边界框的面积,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度。
示例性的,任一目标对象的参考节点的归一化绝对深度、和绝对深度满足下述公式(1):
Figure PCTCN2021084625-appb-000001
其中,
Figure PCTCN2021084625-appb-000002
表示参考节点的归一化绝对深度;
Figure PCTCN2021084625-appb-000003
表示参考节点绝对深度;A Box表示目标区域的面积;A RoI表示目标边界框的面积。
(f x,f y)表示相机焦距。示例性的,相机坐标系为三维坐标系;包括x、y和z三个坐标轴;相机坐标系的原点为相机的光心;相机的光轴为相机坐标系的z轴;光心所在的、且垂直于z轴的平面为x轴和y轴所在的平面;f x为相机在x轴上的焦距;f y为相机在y轴上的焦距。
这里需要注意的是,在上述S202中可知,通过RoIAlign确定的目标边界框有多个;且多个目标边界框的面积均相等。
由于相机焦距在相机获取第一图像的时候已经确定,且目标区域和目标边界框在确定目标区域的时候也已经确定,因而在得到参考节点的归一化绝对深度后,根据上述公式(1)得到目标对象的参考节点的绝对深度。
III:在上述S103中,假设每个目标对象包括J个关键点,且第一图像中的目标对象有N个;其中,N个目标对象的三维姿态表示为:
Figure PCTCN2021084625-appb-000004
其中,第m个目标对象的三维姿态
Figure PCTCN2021084625-appb-000005
可以表示为:
Figure PCTCN2021084625-appb-000006
其中,
Figure PCTCN2021084625-appb-000007
表示第m个目标对象的第j个关键点在相机坐标系中x轴方 向的坐标值;
Figure PCTCN2021084625-appb-000008
表示第m个目标对象的第j个关键点在相机坐标系中y轴方向的坐标值;
Figure PCTCN2021084625-appb-000009
表示第m个目标对象的第j个关键点在相机坐标系中z轴方向的坐标值。
N个目标对象所在的目标区域表示为:
Figure PCTCN2021084625-appb-000010
其中,第m个目标对象所在的目标区域
Figure PCTCN2021084625-appb-000011
表示为:
Figure PCTCN2021084625-appb-000012
此处,
Figure PCTCN2021084625-appb-000013
Figure PCTCN2021084625-appb-000014
表示目标区域的左上角所在的顶点的坐标值;
Figure PCTCN2021084625-appb-000015
Figure PCTCN2021084625-appb-000016
分别表示目标区域的宽度值和高度值。
N个目标对象的相对于参考节点的三维姿势表示为:
Figure PCTCN2021084625-appb-000017
其中,第m个目标对象相对于参考节点的三维姿势
Figure PCTCN2021084625-appb-000018
表示为:
Figure PCTCN2021084625-appb-000019
其中,
Figure PCTCN2021084625-appb-000020
表示第m个目标对象的第j个关键点在图像坐标系中x轴的坐标值;
Figure PCTCN2021084625-appb-000021
表示第m个目标对象的第j个关键点在图像坐标系中y轴的坐标值;也即,
Figure PCTCN2021084625-appb-000022
表示第m个目标对象的第j个关键点在图像坐标系中的二维坐标值。
Figure PCTCN2021084625-appb-000023
表示第m个目标对象的第j个节点相对于第m个目标对象的参考节点的相对深度。
使用相机的内参矩阵K,通过反投影得到第m个目标对象的三维姿势,其中,第m个目标对象的第j个节点的三维坐标信息满足下述公式(2)
Figure PCTCN2021084625-appb-000024
其中,
Figure PCTCN2021084625-appb-000025
表示第m个目标对象的参考节点在相机坐标系中的绝对深度值。此处,需要注意的是,该
Figure PCTCN2021084625-appb-000026
基于上述公式(1)对应的实施例获得。
内参矩阵K例如为:(f x,f y,c x,c y);
其中:f x为相机在相机坐标系中x轴上的焦距;f y为相机在相机坐标系中在y轴上的焦距;c x为相机的光心在相机坐标系中在x轴上的坐标值;c y表示相机的光心在相机坐标系中在y轴上的坐标值。
通过上述过程,能够得到目标对象的多个关键点分别在相机坐标系中的三维位置信息;针对第m个目标对象,该目标对象的J个关键点分别对应的三维位置信息,表征第m个目标对象的三维姿态。
本公开实施例通过识别第一图像中目标对象所在的目标区域,并基于目标区域,确定表征目标对象姿态的多个关键点分别在第一图像中的第一 二维位置信息、每个关键点相对于目标对象的参考节点的相对深度、以及目标对象的参考节点在相机坐标系中的绝对深度,从而基于目标对象的第一二维位置信息、相对深度、以及绝对深度,更精确的得到目标对象的多个关键点分别在相机坐标系中的三维位置信息。
本公开另一实施例中,还提供另外一种图像处理方法,其中,该图像处理方法应用于预先训练好的神经网络中。
其中,所述神经网络包括目标检测网络、关键点检测网络以及深度预测网络三个分支网络;所述目标检测网络用于获得所述目标对象所在的目标区域;所述关键点检测网络用于获取所述目标对象的多个关键点分别在所述第一图像中的第一二维位置信息、和每个所述关键点相对所述目标对象的参考节点的相对深度;所述深度预测网络用于获取所述参考节点在所述相机坐标系中的绝对深度。
上述三个分支网络的具体工作过程可以参见上述实施例所示,在此不再赘述。
本公开实施例通过目标检测网络、关键点检测网络以及深度预测网络三个分支网络,构成端到端的目标对象姿态检测框架,基于该框架对第一图像进行处理,得到第一图像中每个目标对象的多个关键点分别在相机坐标系中的三维位置信息,处理速度更快,识别精度更高。
参见图6所示,本公开实施例还提供一种目标对象姿态检测框架的具体示例,包括:
目标检测网络、关键点检测网络、以及深度预测网络三个网络分支;
其中,目标检测网络对第一图像进行特征提取,得到第一图像的特征图;然后,根据第一特征图,采用RoIAlign从预先生成的多个候选边界框中,确定多个目标边界框;对多个目标边界框执行边界框回归处理,得到与每个目标对象对应的目标区域。将目标区域对应的目标特征图,传输至关键点检测网络、以及深度预测网络。
关键点检测网络,基于目标特征图,确定表征目标对象姿态的多个关键点分别在所述第一图像中的第一二维位置信息、每个关键点相对所述目标对象的参考节点的相对深度。其中,针对每个目标特征图中各个关键点的第一二维位置信息、及相对深度,构成该目标特征图中目标对象的三维姿态。此时的三维姿态,是以自身为参照的三维姿态。
深度预测网络,基于目标特征图,确定目标对象的参考节点在相机坐标系中的绝对深度。
最终,根据目标对象的所述第一二维位置信息、相对深度、以及参考节点的所述绝对深度,确定目标对象的多个关键点分别在所述相机坐标系中的三维位置信息。针对每个目标对象,该目标对象上的多个关键点分别在相机坐标系中的三维位置信息,构成了该目标对象在相机坐标系中的三维姿态。此时的三维姿态,是以相机为参照的三维姿态。
参见图7所示,本公开实施例还提供另一种目标对象姿态检测框架的具体示例,包括:
目标检测网络、关键点检测网络、以及深度预测网络;
其中,目标检测网络对第一图像进行特征提取,得到第一图像的特征图;然后,根据第一特征图,采用RoIAlign从预先生成的多个候选边界框中,确定多个目标边界框;对多个目标边界框执行边界框回归处理,得到与每个目标对象对应的目标区域。将目标区域对应的目标特征图,传输至关键点检测网络、以及深度预测网络。
关键点检测网络,基于目标特征图,确定表征目标对象姿态的多个关键点分别在所述第一图像中的第一二维位置信息、每个关键点相对所述目标对象的参考节点的相对深度。其中,针对每个目标特征图中各个关键点的第一二维位置信息、及相对深度,构成该目标特征图中目标对象的三维姿态。此时的三维姿态,是以自身为参照的三维姿态。
深度预测网络,基于第一图像,获取初始深度图像;并基于目标对象对应的目标特征图,确定与目标对象对应的参考节点在所述第一图像中的第二二维位置信息,并基于所述第二二维位置信息、以及所述初始深度图像,确定所述目标对象对应的参考节点的初始深度值;以及对目标对象对应的目标特征图进行至少一级第一卷积处理,得到所述目标对象的特征向量;将所述特征向量和参考节点的初始深度值进行拼接,形成拼接向量,并对所述拼接向量进行至少一级第二卷积处理,得到所述初始深度值的修正值;将修正值与参考节点的初始深度值相加,得到参考节点的归一化绝对深度值。
然后,通过上述公式(1),恢复参考节点的绝对深度值,然后根据目标对象的所述第一二维位置信息、相对深度、以及参考节点的所述绝对深度,确定目标对象的多个关键点分别在所述相机坐标系中的三维位置信息。针对每个目标对象,该目标对象上的多个关键点分别在相机坐标系中的三维位置信息,构成了该目标对象在相机坐标系中的三维姿态。此时的三维姿态,是以相机为参照的三维姿态。
通过上述两种目标对象姿态检测框架中任一种,都能够得到第一图像中每个目标对象的多个关键点分别在相机坐标系中的三维位置信息,处理速度更快,识别精度更高。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
基于同一发明构思,本公开实施例中还提供了与图像处理方法对应的图像处理装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述图像处理方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。
参照图8所示,为本公开实施例提供的一种图像处理装置的示意图,所述装置包括:识别模块81、第一检测模块82、第二检测模块83;其中,
识别模块81,用于识别所述第一图像中的目标对象所在的目标区域;
第一检测模块82,用于基于所述目标对象对应的目标区域,确定表征所述目标对象姿态的多个关键点分别在所述第一图像中的第一二维位置信息、每个所述关键点相对所述目标对象的参考节点的相对深度、以及所述目标对象的参考节点在相机坐标系中的绝对深度;
第二检测模块83,用于基于所述目标对象的所述第一二维位置信息、所述相对深度、以及所述绝对深度,确定所述目标对象的多个关键点分别在所述相机坐标系中的三维位置信息。
一种可能的实施方式中,所述识别模块81,在识别所述第一图像中的目标对象所在的目标区域时,用于:
对所述第一图像进行特征提取,得到所述第一图像的特征图;
基于所述特征图,从预先生成的多个候选边界框中,确定多个目标边界框,并基于所述目标边界框,确定所述目标对象对应的目标区域。
一种可能的实施方式中,所述识别模块81,在基于所述目标边界框,确定所述目标对象对应的目标区域时,用于:
基于多个所述目标边界框,以及所述特征图,确定每个所述目标边界框对应的特征子图;
基于多个所述目标边界框分别对应的特征子图进行边界框回归处理,得到所述目标对象对应的目标区域。
一种可能的实施方式中,所述第一检测模块82,在基于所述目标对象对应的目标区域,确定所述目标对象的参考节点在相机坐标系中的绝对深度时,用于:
基于所述目标对象对应的目标区域以及所述第一图像,确定所述目标图像对应的目标特征图;
基于所述目标对象对应的目标特征图执行深度识别处理,得到所述目标对象的参考节点的归一化绝对深度;
基于所述归一化绝对深度以及所述相机的参数矩阵,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度。
一种可能的实施方式中,所述第一检测模块82,在基于所述目标对象对应的目标特征图执行深度识别处理,得到所述目标对象的参考节点的归一化绝对深度时,用于:
基于所述第一图像,获取初始深度图像;其中,所述初始深度图像中任一第一像素点的像素值,表征所述第一图像中与所述第一像素点位置对应的第二像素点在所述相机坐标系中的初始深度值;
基于所述目标对象对应的目标特征图,确定与所述目标对象对应的参考节点在所述第一图像中的第二二维位置信息,并基于所述第二二维位置 信息、以及所述初始深度图像,确定所述目标对象对应的参考节点的初始深度值;
基于所述目标对象对应的参考节点的初始深度值,以及所述目标对象对应的所述目标特征图,确定所述目标对象的参考节点的归一化绝对深度。
一种可能的实施方式中,所述第一检测模块82,在基于所述目标对象对应的参考节点的初始深度值,以及所述目标对象对应的所述目标特征图,确定所述目标对象的参考节点的归一化绝对深度时,用于:
对所述目标对象对应的目标特征图进行至少一级第一卷积处理,得到所述目标对象的特征向量;
将所述特征向量和所述初始深度值进行拼接,形成拼接向量,并对所述拼接向量进行至少一级第二卷积处理,得到所述初始深度值的修正值;
基于所述初始深度值的修正值、以及所述初始深度值,得到所述归一化绝对深度。
一种可能的实施方式中,所述图像处理装置中部署有预先训练好的神经网络,所述神经网络包括目标检测网络、关键点检测网络以及深度预测网络三个分支网络;所述目标检测网络用于获得所述目标对象所在的目标区域;所述关键点检测网络用于获取所述目标对象的多个关键点分别在所述第一图像中的第一二维位置信息、和每个所述关键点相对所述目标对象的参考节点的相对深度;所述深度预测网络用于获取所述参考节点在所述相机坐标系中的绝对深度。
本公开实施例通过识别第一图像中目标对象所在的目标区域,并基于目标区域,确定表征目标对象姿态的多个关键点分别在第一图像中的第一二维位置信息、每个关键点相对于目标对象的参考节点的相对深度、以及目标对象的参考节点在相机坐标系中的绝对深度,从而基于目标对象的第一二维位置信息、相对深度、以及绝对深度,更精确的得到目标对象的多个关键点分别在相机坐标系中的三维位置信息。
另外,本公开实施例通过目标检测网络、关键点检测网络以及深度预测网络三个分支网络,构成端到端的目标对象姿态检测框架,基于该框架对第一图像进行处理,得到第一图像中每个目标对象的多个关键点分别在相机坐标系中的三维位置信息,处理速度更快,识别精度更高。
关于装置中的各模块的处理流程、以及各模块之间的交互流程的描述可以参照上述方法实施例中的相关说明,这里不再详述。
本公开实施例还提供了一种计算机设备10,如图9所示,为本公开实施例提供的计算机设备10结构示意图,包括:
处理器11和存储器12;所述存储器12存储有所述处理器11可执行的机器可读指令,当计算机设备运行时,所述机器可读指令被所述处理器执行以实现下述步骤:
识别所述第一图像中的目标对象所在的目标区域;
基于所述目标对象对应的目标区域,确定表征所述目标对象姿态的多个关键点分别在所述第一图像中的第一二维位置信息、每个所述关键点相对所述目标对象的参考节点的相对深度、以及所述目标对象的参考节点在相机坐标系中的绝对深度;
基于所述目标对象的所述第一二维位置信息、所述相对深度、以及所述绝对深度,确定所述目标对象的多个关键点分别在所述相机坐标系中的三维位置信息。
上述指令的具体执行过程可以参考本公开实施例中所述的图像处理方法的步骤,此处不再赘述。
本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行上述方法实施例中所述的图像处理方法的步骤。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。
本公开实施例还提供一种计算机程序产品,该计算机程序产品承载有程序代码,所述程序代码包括的指令可用于执行上述方法实施例中所述的图像处理方法的步骤,具体可参见上述方法实施例,在此不再赘述。
其中,上述计算机程序产品可以具体通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品具体体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使 用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应所述以权利要求的保护范围为准。

Claims (20)

  1. 一种图像处理方法,包括:
    识别第一图像中的目标对象所在的目标区域;
    基于所述目标对象所在的目标区域,确定所述目标对象的多个关键点分别在所述第一图像中的第一二维位置信息、每个所述关键点相对所述目标对象的参考节点的相对深度、以及所述目标对象的参考节点在相机坐标系中的绝对深度;
    基于所述目标对象的多个关键点分别对应的所述第一二维位置信息和所述相对深度、以及所述参考节点对应的所述绝对深度,确定所述目标对象的多个关键点分别在所述相机坐标系中的三维位置信息。
  2. 根据权利要求1所述的图像处理方法,其中,还包括:基于所述目标对象的多个关键点分别在所述相机坐标系中的三维位置信息,得到所述目标对象的姿态。
  3. 根据权利要求1或2所述的图像处理方法,其中,所述识别所述第一图像中的目标对象所在的目标区域,包括:
    对所述第一图像进行特征提取,得到所述第一图像的特征图;
    基于所述特征图,从预先生成的多个候选边界框中确定多个目标边界框;
    基于多个所述目标边界框,确定所述目标对象所在的目标区域。
  4. 根据权利要求3所述的图像处理方法,其中,所述基于多个所述目标边界框,确定所述目标对象所在的目标区域,包括:
    基于多个所述目标边界框以及所述特征图,确定每个所述目标边界框的特征子图;
    对多个所述目标边界框分别对应的特征子图进行边界框回归处理,得到所述目标对象所在的目标区域。
  5. 根据权利要求1-4任一项所述的图像处理方法,其中,基于所述目标对象所在的目标区域,确定所述目标对象的参考节点在相机坐标系中的绝对深度,包括:
    基于所述目标对象所在的目标区域以及所述第一图像,确定所述目标对象的目标特征图;
    对所述目标对象对应的目标特征图进行深度识别处理,得到所述目标对象的参考节点的归一化绝对深度;
    基于所述归一化绝对深度以及相机的参数矩阵,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度。
  6. 根据权利要求5所述的图像处理方法,其中,所述对所述目标对象对应的目标特征图进行深度识别处理,得到所述目标对象的参考节点的归 一化绝对深度,包括:
    基于所述第一图像,确定初始深度图像;其中,所述初始深度图像中任一第一像素点的像素值为所述第一图像中与所述第一像素点的位置对应的第二像素点在所述相机坐标系中的初始深度值;
    基于所述目标对象对应的目标特征图,确定与所述目标对象对应的参考节点在所述第一图像中的第二二维位置信息;
    基于所述第二二维位置信息、以及所述初始深度图像,确定所述目标对象对应的参考节点的初始深度值;
    基于所述参考节点的初始深度值以及所述目标特征图,确定所述目标对象的参考节点的归一化绝对深度。
  7. 根据权利要求6所述的图像处理方法,其中,所述基于所述参考节点的初始深度值以及所述目标特征图,确定所述目标对象的参考节点的归一化绝对深度,包括:
    对所述目标对象对应的目标特征图进行至少一级第一卷积处理,得到所述目标对象的特征向量;
    将所述特征向量和所述初始深度值进行拼接,得到拼接向量,并对所述拼接向量进行至少一级第二卷积处理,得到所述初始深度值的修正值;
    基于所述初始深度值的修正值、以及所述初始深度值,得到所述归一化绝对深度。
  8. 根据权利要求5-7任一项所述的图像处理方法,其特征在于,所述参数矩阵包括:所述相机的焦距;
    所述基于所述归一化绝对深度以及相机的参数矩阵,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度,包括:
    基于所述归一化绝对深度、所述焦距、所述目标区域的面积、以及所述目标边界框的面积,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度。
  9. 根据权利要求1-8任一项所述的图像处理方法,其中,所述图像处理方法应用于预先训练好的神经网络中,所述神经网络包括目标检测网络、关键点检测网络以及深度预测网络三个分支网络;所述目标检测网络用于获得所述目标对象所在的目标区域;所述关键点检测网络用于获取所述目标对象的多个关键点分别在所述第一图像中的第一二维位置信息、和每个所述关键点相对所述目标对象的参考节点的相对深度;所述深度预测网络用于获取所述参考节点在所述相机坐标系中的绝对深度。
  10. 一种图像处理装置,其中,包括:
    识别模块,用于识别第一图像中的目标对象所在的目标区域;
    第一检测模块,用于基于所述目标对象所在的目标区域,确定所述目标对象的多个关键点分别在所述第一图像中的第一二维位置信息、每个所述关键点相对所述目标对象的参考节点的相对深度、以及所述目标对象的 参考节点在相机坐标系中的绝对深度;
    第二检测模块,用于基于所述目标对象的多个关键点分别对应的所述第一二维位置信息和所述相对深度、以及所述参考节点对应的所述绝对深度,确定所述目标对象的多个关键点分别在所述相机坐标系中的三维位置信息。
  11. 根据权利要求10所述的图像处理装置,其中,所述第二检测模块,还用于基于所述目标对象的多个关键点分别在所述相机坐标系中的三维位置信息,得到所述目标对象的姿态。
  12. 根据权利要求10或11所述的图像处理装置,其中,所述识别模块,在识别所述第一图像中的目标对象所在的目标区域时,用于:
    对所述第一图像进行特征提取,得到所述第一图像的特征图;
    基于所述特征图,从预先生成的多个候选边界框中确定多个目标边界框;
    基于多个所述目标边界框,确定所述目标对象所在的目标区域。
  13. 根据权利要求12所述的图像处理装置,其中,所述识别模块,在基于多个所述目标边界框,确定所述目标对象所在的目标区域时,用于:
    基于多个所述目标边界框以及所述特征图,确定每个所述目标边界框的特征子图;
    对多个所述目标边界框分别对应的特征子图进行边界框回归处理,得到所述目标对象所在的目标区域。
  14. 根据权利要求10-13任一项所述的图像处理装置,其中,所述第一检测模块,在基于目标对象所在的目标区域,确定所述目标对象的参考节点在相机坐标系中的绝对深度时,用于:
    基于所述目标对象所在的目标区域以及所述第一图像,确定所述目标对象的目标特征图;
    对所述目标对象对应的目标特征图进行深度识别处理,得到所述目标对象的参考节点的归一化绝对深度;
    基于所述归一化绝对深度以及相机的参数矩阵,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度。
  15. 根据权利要求14所述的图像处理装置,其中,所述第一检测模块,在对所述目标对象对应的目标特征图进行深度识别处理,得到所述目标对象的参考节点的归一化绝对深度时,用于:
    基于所述第一图像,确定初始深度图像;其中,所述初始深度图像中任一第一像素点的像素值为所述第一图像中与所述第一像素点的位置对应的第二像素点在所述相机坐标系中的初始深度值;
    基于所述目标对象对应的目标特征图,确定与所述目标对象对应的参考节点在所述第一图像中的第二二维位置信息;
    基于所述第二二维位置信息、以及所述初始深度图像,确定所述目标 对象对应的参考节点的初始深度值;
    基于所述参考节点的初始深度值以及所述目标特征图,确定所述目标对象的参考节点的归一化绝对深度。
  16. 根据权利要求15所述的图像处理装置,其中,所述第一检测模块,在基于所述参考节点的初始深度值以及所述目标特征图,确定所述目标对象的参考节点的归一化绝对深度时,用于:
    对所述目标对象对应的目标特征图进行至少一级第一卷积处理,得到所述目标对象的特征向量;
    将所述特征向量和所述初始深度值进行拼接,得到拼接向量,并对所述拼接向量进行至少一级第二卷积处理,得到所述初始深度值的修正值;
    基于所述初始深度值的修正值、以及所述初始深度值,得到所述归一化绝对深度。
  17. 根据权利要求14-16任一项所述的图像处理装置,其特征在于,所述参数矩阵包括:所述相机的焦距;所述第一检测模块,在基于所述归一化绝对深度以及相机的参数矩阵,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度时,用于:
    基于所述归一化绝对深度、所述焦距、所述目标区域的面积、以及所述目标边界框的面积,得到所述目标对象的参考节点在所述相机坐标系中的绝对深度。
  18. 根据权利要求10-17任一项所述的图像处理装置,其中,所述图像处理装置利用预先训练好的神经网络实现图像处理,所述神经网络包括目标检测网络、关键点检测网络以及深度预测网络三个分支网络;所述目标检测网络用于获得所述目标对象所在的目标区域;所述关键点检测网络用于获取所述目标对象的多个关键点分别在所述第一图像中的第一二维位置信息、和每个所述关键点相对所述目标对象的参考节点的相对深度;所述深度预测网络用于获取所述参考节点在所述相机坐标系中的绝对深度。
  19. 一种计算机设备,其中,包括:相互连接的处理器和存储器,所述存储器存储有所述处理器可执行的机器可读指令,当计算机设备运行时,所述机器可读指令被所述处理器执行以实现如权利要求1至9任一所述的图像处理方法的步骤。
  20. 一种计算机可读存储介质,其中,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至9任意一项所述的图像处理方法的步骤。
PCT/CN2021/084625 2020-05-13 2021-03-31 图像处理方法、装置、电子设备及存储介质 WO2021227694A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010403620.5A CN111582207B (zh) 2020-05-13 2020-05-13 图像处理方法、装置、电子设备及存储介质
CN202010403620.5 2020-05-13

Publications (1)

Publication Number Publication Date
WO2021227694A1 true WO2021227694A1 (zh) 2021-11-18

Family

ID=72110786

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084625 WO2021227694A1 (zh) 2020-05-13 2021-03-31 图像处理方法、装置、电子设备及存储介质

Country Status (3)

Country Link
CN (1) CN111582207B (zh)
TW (1) TWI777538B (zh)
WO (1) WO2021227694A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2598452A (en) * 2020-06-22 2022-03-02 Ariel Ai Ltd 3D object model reconstruction from 2D images
CN114354618A (zh) * 2021-12-16 2022-04-15 浙江大华技术股份有限公司 一种焊缝检测的方法及装置
CN114782547A (zh) * 2022-04-13 2022-07-22 北京爱笔科技有限公司 一种三维坐标确定方法及装置
CN114972958A (zh) * 2022-07-27 2022-08-30 北京百度网讯科技有限公司 关键点检测方法、神经网络的训练方法、装置和设备
CN115018918A (zh) * 2022-08-04 2022-09-06 南昌虚拟现实研究院股份有限公司 三维坐标的确定方法、装置、电子设备及存储介质
CN115063789A (zh) * 2022-05-24 2022-09-16 中国科学院自动化研究所 基于关键点匹配的3d目标检测方法及装置
US11688136B2 (en) 2020-06-22 2023-06-27 Snap Inc. 3D object model reconstruction from 2D images
WO2023236008A1 (en) * 2022-06-06 2023-12-14 Intel Corporation Methods and apparatus for small object detection in images and videos

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582207B (zh) * 2020-05-13 2023-08-15 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及存储介质
EP3965071A3 (en) * 2020-09-08 2022-06-01 Samsung Electronics Co., Ltd. Method and apparatus for pose identification
CN112163480B (zh) * 2020-09-16 2022-09-13 北京邮电大学 一种行为识别方法及装置
CN112528831B (zh) * 2020-12-07 2023-11-24 深圳市优必选科技股份有限公司 多目标姿态估计方法、多目标姿态估计装置及终端设备
CN112907517A (zh) * 2021-01-28 2021-06-04 上海商汤智能科技有限公司 一种图像处理方法、装置、计算机设备及存储介质
CN113344998B (zh) * 2021-06-25 2022-04-29 北京市商汤科技开发有限公司 深度检测方法、装置、计算机设备及存储介质
CN113470112A (zh) * 2021-06-30 2021-10-01 Oppo广东移动通信有限公司 图像处理方法、装置、存储介质以及终端
CN113610967B (zh) * 2021-08-13 2024-03-26 北京市商汤科技开发有限公司 三维点检测的方法、装置、电子设备及存储介质
CN113610966A (zh) * 2021-08-13 2021-11-05 北京市商汤科技开发有限公司 三维姿态调整的方法、装置、电子设备及存储介质
CN116386016B (zh) * 2023-05-22 2023-10-10 杭州睿影科技有限公司 一种异物处理方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120099782A1 (en) * 2010-10-20 2012-04-26 Samsung Electronics Co., Ltd. Image processing apparatus and method
US20180018503A1 (en) * 2015-12-11 2018-01-18 Tencent Technology (Shenzhen) Company Limited Method, terminal, and storage medium for tracking facial critical area
CN107871134A (zh) * 2016-09-23 2018-04-03 北京眼神科技有限公司 一种人脸检测方法及装置
CN108460338A (zh) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 人体姿态估计方法和装置、电子设备、存储介质、程序
CN110378308A (zh) * 2019-07-25 2019-10-25 电子科技大学 改进的基于Faster R-CNN的港口SAR图像近岸舰船检测方法
CN111582207A (zh) * 2020-05-13 2020-08-25 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472753B (zh) * 2018-10-30 2021-09-07 北京市商汤科技开发有限公司 一种图像处理方法、装置、计算机设备和计算机存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120099782A1 (en) * 2010-10-20 2012-04-26 Samsung Electronics Co., Ltd. Image processing apparatus and method
US20180018503A1 (en) * 2015-12-11 2018-01-18 Tencent Technology (Shenzhen) Company Limited Method, terminal, and storage medium for tracking facial critical area
CN107871134A (zh) * 2016-09-23 2018-04-03 北京眼神科技有限公司 一种人脸检测方法及装置
CN108460338A (zh) * 2018-02-02 2018-08-28 北京市商汤科技开发有限公司 人体姿态估计方法和装置、电子设备、存储介质、程序
CN110378308A (zh) * 2019-07-25 2019-10-25 电子科技大学 改进的基于Faster R-CNN的港口SAR图像近岸舰船检测方法
CN111582207A (zh) * 2020-05-13 2020-08-25 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及存储介质

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2598452A (en) * 2020-06-22 2022-03-02 Ariel Ai Ltd 3D object model reconstruction from 2D images
US11688136B2 (en) 2020-06-22 2023-06-27 Snap Inc. 3D object model reconstruction from 2D images
GB2598452B (en) * 2020-06-22 2024-01-10 Snap Inc 3D object model reconstruction from 2D images
CN114354618A (zh) * 2021-12-16 2022-04-15 浙江大华技术股份有限公司 一种焊缝检测的方法及装置
CN114782547A (zh) * 2022-04-13 2022-07-22 北京爱笔科技有限公司 一种三维坐标确定方法及装置
CN115063789A (zh) * 2022-05-24 2022-09-16 中国科学院自动化研究所 基于关键点匹配的3d目标检测方法及装置
CN115063789B (zh) * 2022-05-24 2023-08-04 中国科学院自动化研究所 基于关键点匹配的3d目标检测方法及装置
WO2023236008A1 (en) * 2022-06-06 2023-12-14 Intel Corporation Methods and apparatus for small object detection in images and videos
CN114972958A (zh) * 2022-07-27 2022-08-30 北京百度网讯科技有限公司 关键点检测方法、神经网络的训练方法、装置和设备
CN115018918A (zh) * 2022-08-04 2022-09-06 南昌虚拟现实研究院股份有限公司 三维坐标的确定方法、装置、电子设备及存储介质
CN115018918B (zh) * 2022-08-04 2022-11-04 南昌虚拟现实研究院股份有限公司 三维坐标的确定方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
TWI777538B (zh) 2022-09-11
CN111582207A (zh) 2020-08-25
CN111582207B (zh) 2023-08-15
TW202143100A (zh) 2021-11-16

Similar Documents

Publication Publication Date Title
WO2021227694A1 (zh) 图像处理方法、装置、电子设备及存储介质
CN110135455B (zh) 影像匹配方法、装置及计算机可读存储介质
EP3457357B1 (en) Methods and systems for surface fitting based change detection in 3d point-cloud
JP6745328B2 (ja) 点群データを復旧するための方法及び装置
WO2021057742A1 (zh) 定位方法及装置、设备、存储介质
US10872227B2 (en) Automatic object recognition method and system thereof, shopping device and storage medium
US20190295266A1 (en) Point cloud matching method
CN110648397B (zh) 场景地图生成方法、装置、存储介质及电子设备
WO2019042426A1 (zh) 增强现实场景的处理方法、设备及计算机存储介质
CN110986969B (zh) 地图融合方法及装置、设备、存储介质
CN111459269B (zh) 一种增强现实显示方法、系统及计算机可读存储介质
JP5833507B2 (ja) 画像処理装置
Liang et al. Image-based positioning of mobile devices in indoor environments
CN111582204A (zh) 姿态检测方法、装置、计算机设备及存储介质
WO2021244161A1 (zh) 基于多目全景图像的模型生成方法及装置
US20200005078A1 (en) Content aware forensic detection of image manipulations
GB2566443A (en) Cross-source point cloud registration
US20150262362A1 (en) Image Processor Comprising Gesture Recognition System with Hand Pose Matching Based on Contour Features
Tepelea et al. A vision module for visually impaired people by using Raspberry PI platform
Liang et al. Reduced-complexity data acquisition system for image-based localization in indoor environments
JP2013101423A (ja) 画像マッチング装置及び画像マッチングプログラム
Park et al. Estimating the camera direction of a geotagged image using reference images
JP6086491B2 (ja) 画像処理装置およびそのデータベース構築装置
EP3410389A1 (en) Image processing method and device
CN115239776B (zh) 点云的配准方法、装置、设备和介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21804629

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21804629

Country of ref document: EP

Kind code of ref document: A1