WO2021184359A1 - 目标跟随方法、目标跟随装置、可移动设备和存储介质 - Google Patents

目标跟随方法、目标跟随装置、可移动设备和存储介质 Download PDF

Info

Publication number
WO2021184359A1
WO2021184359A1 PCT/CN2020/080439 CN2020080439W WO2021184359A1 WO 2021184359 A1 WO2021184359 A1 WO 2021184359A1 CN 2020080439 W CN2020080439 W CN 2020080439W WO 2021184359 A1 WO2021184359 A1 WO 2021184359A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
information
key point
angle
image
Prior art date
Application number
PCT/CN2020/080439
Other languages
English (en)
French (fr)
Inventor
任创杰
张李亮
朱高
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN202080004952.4A priority Critical patent/CN112639874A/zh
Priority to PCT/CN2020/080439 priority patent/WO2021184359A1/zh
Publication of WO2021184359A1 publication Critical patent/WO2021184359A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Definitions

  • the embodiment of the present invention relates to the field of intelligent control technology, and in particular to a target following method, target following device, removable equipment and storage medium.
  • the embodiment of the present invention provides a target following method, a target following device, a movable device, and a storage medium to solve the technical problem of poor stability of the following process of the movable device in the prior art.
  • the first aspect of the embodiments of the present invention provides a target following method, including:
  • a second aspect of the embodiments of the present invention provides a target following device, including:
  • Memory used to store computer programs
  • the processor is configured to run a computer program stored in the memory to realize:
  • a third aspect of the embodiments of the present invention provides a movable device, including the target following device described in the second aspect.
  • a fourth aspect of the embodiments of the present invention provides a computer-readable storage medium in which program instructions are stored, and the program instructions are used to implement the method described in the first aspect.
  • the target following method, target following device, removable device and storage medium provided by the embodiments of the present invention, by acquiring a captured image, the posture information of the target and the size information of the bounding box where the target is located are determined according to the image.
  • the posture information of the target and the size information of the bounding box are used to follow the target, which can reduce or avoid problems such as rushing and shaking back and forth of the movable device caused by the change of the target posture, and effectively improve the stability and safety of following sex.
  • FIG. 1 is a schematic flowchart of a target following method according to Embodiment 1 of the present invention
  • Embodiment 2 is a schematic flowchart of a target following method provided by Embodiment 2 of the present invention.
  • Embodiment 3 is a schematic diagram of the principle of determining key point information in a target following method provided by Embodiment 2 of the present invention.
  • FIG. 4 is a schematic diagram of the positions of the Gaussian distribution area and the zero response background of the confidence feature map in the target following method according to the second embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of a target following method according to Embodiment 3 of the present invention.
  • FIG. 6 is a schematic diagram of the positions of key points in a squat state in a target following method according to Embodiment 3 of the present invention.
  • FIG. 7 is a schematic diagram of the positions of key points in an upright walking state in a target following method provided by Embodiment 3 of the present invention.
  • FIG. 8 is a schematic structural diagram of a target following device according to Embodiment 4 of the present invention.
  • FIG. 1 is a schematic flowchart of a target following method according to Embodiment 1 of the present invention. As shown in Figure 1, the target following method in this embodiment may include:
  • Step 101 Acquire a captured image.
  • the method in this embodiment may be applied to a movable device
  • the movable device may be any movable device such as a drone or an unmanned vehicle
  • a camera may be provided on the movable device
  • the photographing device can be used to photograph the target.
  • acquiring the photographed image in this step may specifically include: acquiring the image photographed by the photographing device in the movable device.
  • the target may be a movable object such as a person or a car.
  • Step 102 Determine the posture information of the target and the size information of the bounding box where the target is located according to the image.
  • the neural network can be trained through samples, and the trained neural network can be used to process the image to obtain the corresponding posture information.
  • the target in the image can be detected by a method such as a target detection algorithm, and then the posture information of the target can be determined according to the neural network.
  • the posture information may include, but is not limited to: standing, walking, squatting, lying down, etc. If the target is a car, the posture information may include, but is not limited to: go straight, turn left, turn right, and so on.
  • the bounding box in which the target is located may be a rectangular frame occupied by the target in the image.
  • the image can be processed through a single object tracking (SOT) algorithm or other algorithms to obtain the bounding box where the target is located.
  • SOT single object tracking
  • the size information of the bounding box may include the height and/or width of the bounding box.
  • the height may be the length of the bounding box in the first direction
  • the width may be the length of the bounding box in the second direction.
  • the first direction and the second direction may be a vertical direction and a horizontal direction, respectively.
  • Step 103 Follow the target according to the posture information of the target and the size information of the bounding box.
  • following the target may refer to controlling the movable device to follow the target, specifically, it may be achieved by always controlling the distance between the movable device and the target within a preset range. For example, if the target moves forward, the movable device also moves forward, and when the target stops, the movable device also stops.
  • the posture information of the target and the size information of the bounding box can be referred to, and the follow-up strategy of the target can be realized according to the posture information and the size information.
  • following the target according to the posture information of the target and the size information of the bounding box may include: determining to follow the target according to the posture information of the target The strategy; follow the target according to the determined strategy and the size information of the bounding box.
  • the strategy may include an algorithm for calculating the distance of the target based on the size information of the bounding box.
  • the distance of the target can be calculated by methods such as monocular ranging, and the following process can be controlled according to the distance.
  • the specific algorithm for determining the distance can be adjusted according to the posture information.
  • the first algorithm is used to calculate the distance based on the size information of the bounding box; if the target is in the second state, the second algorithm is used to calculate the distance based on the size information of the bounding box distance.
  • the first algorithm and the second algorithm can be set according to actual needs.
  • a simple example is that the height of the bounding box can be multiplied by a scale factor to obtain the distance of the target.
  • the scale factor corresponding to different algorithms can be different.
  • the boundary is directly changed.
  • the height of the box is multiplied by a scale factor to estimate the distance to the target.
  • the size of the corresponding bounding box will also change when the pose of the target changes. For example, when the target is in a squatting state, the height of the bounding box is approximately equal to one third of the height of the bounding box in the upright state. Therefore, different scale factors can be set for different posture information.
  • the first posture may be an upright walking state
  • the first algorithm is to multiply the height of the bounding box of the target by a first coefficient to obtain the distance of the target
  • the second posture is a squatting posture
  • the second algorithm is to multiply the height of the bounding box of the target by a second coefficient to obtain the distance of the target, and the second coefficient may be smaller than the first coefficient.
  • the first coefficient may be 100
  • the second coefficient may be 33.
  • following the target according to the determined strategy and the size information of the bounding box may include: according to the determined algorithm and the location of the target Calculate the distance of the target according to the size information of the bounding box of, and determine the acceleration to follow according to the distance of the target.
  • the determined strategy may also be the corresponding relationship between the size information of the bounding box and the following acceleration.
  • the height of the bounding box is multiplied by a certain parameter to directly obtain the corresponding acceleration, and the intermediate distance calculation step is omitted.
  • the movable device can be controlled to follow the target with the following acceleration. It is understandable that the farther the distance is, the greater the acceleration can be, and the closer the distance is, the smaller the acceleration can be, or even negative.
  • the corresponding acceleration of the following in the non-upright walking state is less than the acceleration of the following in the upright walking state.
  • the upright walking state can indicate that the target is in an upright state or a walking state
  • the non-upright walking state can indicate that the target is in a state other than upright and walking, such as squatting, lying down, etc.
  • the target when the target is in an upright walking state, when the distance of the target is D1, the height of the bounding box of the target in the image is H1; when the target is in a non-upright walking state In the following squatting state, when the distance of the target is D2, the height of the bounding box of the target in the image is H2. Since the user's body will bend when squatting, when H1 is equal to H2, D1 is greater than D2. Therefore, the following acceleration in the upright walking state should be greater than the following acceleration corresponding to the same bounding box height in the non-upright walking state .
  • the corresponding follow acceleration in the non-upright walking state can be less than the follow acceleration in the upright walking state, which can avoid the problem of the mobile device rushing to the person after the user squats.
  • following the target according to the posture information of the target and the size information of the bounding box may include: if the target is in an upright walking state, according to the boundary The size information of the frame follows the target; if the target is in a non-upright walking state, the following is suspended.
  • Pausing to follow the target when the target is in a non-upright walking state can simply and effectively realize the control of the movable device in a squat or other non-upright walking state, and avoid the movable device from rushing into people.
  • posture estimation as auxiliary perception
  • different postures adopting different follow-up strategies will help mobile devices to better plan the following path, provide users with more complete human-computer interaction functions and a more friendly human-computer interaction experience, and improve User experience.
  • the user can instruct the movable device to enter the follow mode through voice instructions, wave operations, etc. After entering the follow mode, the method in steps 101 to 103 can be used to follow the target.
  • the posture information of the target is detected, and the posture information is combined with the size information of the bounding box of the target to follow.
  • the size information of the bounding box of the target Compared with the method of following only by the size information of the bounding box, it has higher stability .
  • the method of following only by size when the target squats down, the size of the target in the image will be reduced, and the movable device will misjudge that the target has gone far away, so it will make a forward acceleration action. , But in fact the target did not go far, this kind of misjudgment will lead to practical problems such as rushing and shaking back and forth.
  • the method in this embodiment needs to implement follow-up based on the posture information of the target, so as to avoid the misjudgment of the target distance caused by the target's changing posture.
  • the pose information of the target and the size information of the bounding box where the target is located are determined according to the image, and based on the pose information of the target and the size of the bounding box.
  • the size information, following the target can reduce or avoid problems such as rushing and shaking back and forth of the movable device caused by the change of the target posture, and effectively improve the stability and safety of following.
  • the second embodiment of the present invention provides a target following method.
  • the key points of the target are determined through images, and then the posture information of the target is determined according to the key points.
  • FIG. 2 is a schematic flowchart of a target following method according to Embodiment 2 of the present invention. As shown in Figure 2, the target following method in this embodiment may include:
  • Step 201 Acquire a captured image.
  • Step 202 Determine the bounding box where the target is located and the key point information of the target according to the image.
  • determining the bounding box of the target according to the image can be achieved by a single target tracking algorithm or other algorithms.
  • Determining the key point information of the target according to the image can be achieved by deep learning algorithms such as neural networks.
  • the key point information in the image can be directly determined through the neural network, or the region of interest (ROI) in the image can be cropped first, and then the key point information can be further determined.
  • the basis in step 202 The image determining the key point information of the target may include: determining the ROI image where the target in the image is located; and determining the key point information in the ROI image according to a neural network.
  • FIG. 3 is a schematic diagram of the principle of determining key point information in a target following method provided in Embodiment 2 of the present invention.
  • the image taken by the movable platform may be an RGB image.
  • a bounding box (bounding box) where the target in the RGB image is located can be determined, and the category of the bounding box is a person.
  • the RGB image and a single bounding box provided by the SOT algorithm as input, the corresponding ROI image can be obtained.
  • the size of the RGB image is 1000*800*3.
  • the bounding box where the target is located can be determined from the RGB image.
  • the expression form of the bounding box can be the coordinate information of the four corners of the bounding box.
  • the ROI image can be cropped from the GRB image.
  • the size of the ROI image can be 100*100*3, and the target is located in the ROI image.
  • the ROI image can be input into the neural network model, and the model can be used to determine the key point information.
  • the adopted model may be a convolutional neural network (Convolutional Neural Networks, CNN), and specifically may be a fully convolutional neural network (Fully Convolutional Networks, FCN).
  • the key point information of the target may include position information of multiple key points of the target, and the position information may specifically be the coordinates of the key points.
  • the multiple key points may include, but are not limited to: nose, middle shoulder, right shoulder joint, right elbow joint, right hand, left shoulder joint, left elbow joint, left hand, right hip joint, right knee, right ankle, left hip At least two of the joints, left knee, and left ankle.
  • the output of the neural network may be the location information of the key points in the image, or the output of the neural network may be the confidence feature map, and the location information of the key points in the image can be determined according to the confidence feature map.
  • the processing for the neural network may include two stages of training and detection.
  • the training phase can be implemented before the detection phase, or the neural network can be trained between any two detections.
  • samples can be used to train the neural network and adjust the parameters in the neural network so that the output result is similar to the target result.
  • the detection stage can be used to follow the process, using the neural network parameters that have been fully trained to detect the image and output the confidence feature map.
  • the training process may include: obtaining training samples, the training samples including sample images and a confidence feature map corresponding to the sample images; training the neural network according to the training samples.
  • the neural network is trained by using the confidence feature map as the target result, so that the output result of the neural network is close to the target result, which can effectively improve the anti-interference performance of the neural network and avoid over-fitting of the neural network.
  • the process of acquiring the training sample may include: acquiring a sample image and position information of key points in the sample image; and determining a confidence feature map corresponding to the sample image according to the position information of the key points.
  • a confidence feature map corresponding to the sample image a pixel point closer to the key point has a higher corresponding probability.
  • the sample image may be an ROI image cropped from any image obtained from the database.
  • the position information of the key points in the image is determined by manual labeling, and the position information of the key points is generated according to the position information of the key points. Confidence characteristic map.
  • the confidence feature map corresponding to the shoulder joint can be generated according to the position information.
  • the principle of generating the confidence feature map is that the closer the pixel is to the real position of the shoulder joint, the greater the probability that the pixel belongs to the shoulder joint.
  • the pixel with coordinates (50, 50) has the largest probability. Is 0.8, the corresponding probability of the pixel with coordinates (55, 55) should be greater than the probability of the pixel with coordinates (60, 60), for example, the corresponding probabilities of the two can be 0.1 and 0.01, respectively.
  • the image edge is far away from ( The probability that the pixels of 50, 50) belong to the shoulder joint is very small, close to zero.
  • the confidence feature map corresponding to the sample image may be generated through a two-dimensional Gaussian distribution according to the position information of the key points.
  • the position coordinates of the pixel points can obey the expected key point coordinates and the two-dimensional Gaussian distribution with variance D1; or the distance between the pixel point and the marked key point can obey the expectation as 0. Gaussian distribution with variance D2.
  • the variances D1 and D2 can be set according to actual needs.
  • the two-dimensional Gaussian distribution is used to determine the confidence feature map corresponding to the sample image, which can effectively simulate the probability that each pixel is a key point and improve the detection accuracy.
  • the confidence feature map may also be composed of a Gaussian distribution and a background with zero response. Specifically, within the preset range around the key point, the probability corresponding to each pixel point can be determined according to the Gaussian distribution. Outside the preset range, a zero-response background can be set. The probability corresponding to each pixel is set to 0.
  • the Gaussian distribution is used to generate the probability corresponding to each pixel point.
  • the preset range may be centered on the shoulder joint with a radius of The circle of 5, when a pixel point is separated by more than 5 pixels from the coordinate point of the shoulder joint in the image, the pixel point is almost impossible to belong to the shoulder joint, and the corresponding probability is 0.
  • FIG. 4 is a schematic diagram of the positions of the Gaussian distribution area and the zero response background of the confidence feature map in the target following method provided in the second embodiment of the present invention.
  • the black dots in the middle represent the key points manually labeled, and the shaded area represents the Gaussian distribution area.
  • the probability of each pixel in this area is determined by the Gaussian distribution.
  • the area outside the shadow is In the zero response background area, the probability of each pixel in the zero response background area is 0.
  • the confidence feature map is composed of Gaussian distribution and zero response background, which can effectively simplify the generation process of the confidence feature map and improve the generation efficiency and accuracy of the confidence feature map.
  • Gaussian distribution In addition to the Gaussian distribution, other methods can also be used to generate a confidence feature map based on the location of the marked key point. As long as the distance between the pixel point and the key point is greater, the probability that the pixel point belongs to the key point is lower. .
  • a confidence feature map can be generated for each key point.
  • the neural network is trained to determine the confidence feature maps corresponding to the key points in the image according to the images.
  • determining the key point information in the ROI image according to the neural network may include: inputting the ROI image to the neural network to obtain a confidence feature map corresponding to multiple key points, where any key The confidence feature map corresponding to a point includes the probability that each pixel belongs to the key point; the key point information of the target is determined according to the confidence feature maps corresponding to the multiple key points.
  • 8 key points of left and right shoulder joints, left and right hip joints, left and right knees, and left and right ankles need to be used to determine the posture information of the target. Then input the captured image into the neural network, and the corresponding 8 key points can be obtained through the neural network. According to the eight confidence feature maps, the locations of the eight key points can be determined respectively.
  • determining the key point information of the target according to the confidence feature maps corresponding to the multiple key points may include: in the confidence feature map corresponding to any key point, determining the key point with the highest probability Pixel; if the probability corresponding to the pixel with the highest probability is greater than a preset threshold, the location information of the key point of the target is the location information of the pixel with the highest probability.
  • the corresponding probability is 0.7, which is greater than the preset threshold, then the pixel belongs to the credibility of the shoulder joint If the degree is high enough, the coordinates of the shoulder joint can be considered as (10, 10). If the probability corresponding to the pixel with the highest probability is less than the preset threshold, it means that the probability of all the pixels belonging to the shoulder joint is not high enough, and it can be considered that the shoulder joint is missing in the figure.
  • the preset threshold can be set according to actual needs, for example, it can be 0.5.
  • Step 203 Determine the posture information of the target according to the key point information of the target.
  • the corresponding posture information can be determined according to the key point information. Specifically, after the key points are obtained, the limbs can be formed according to the connection relationship formed between the key points, and the formed limbs can be used as the basis for judging the target posture.
  • the posture information of the target can be determined based on at least part of the lines formed between multiple key points of the target, so as to quickly and accurately realize posture detection.
  • 8 key points are obtained through the image, and the 8 key points are connected in pairs, and 28 connections can be obtained. According to at least some of the 28 connections, the posture information of the target can be determined.
  • the corresponding posture information may be determined according to the length of the connection line. For example, if the length of the connection between the shoulder joint and the knee is less than the length of the connection between the shoulder joint and the hip joint, then the target can be considered to be in a squatting state; if the length of the connection between the shoulder joint and the knee is approximately Equal to the length between the shoulder joint and the hip joint plus the length of the line between the hip joint and the knee, then the target can be considered to be upright.
  • the angle information corresponding to at least part of the lines formed between the multiple key points of the target may be calculated; according to the angle information corresponding to the at least part of the lines, it is determined The posture information of the target.
  • the angle information corresponding to each line may include: the angle between the line and the reference line, and/or the angle between the line and any one or more other lines;
  • the reference line is a horizontal line or a vertical line.
  • the target is considered to be in a state of tilt;
  • the line between the knees is 90°, then the target can be considered to be sitting down.
  • the posture information of the target can also be comprehensively determined according to the length and angle of the connection line to improve the accuracy of posture recognition.
  • Step 204 Follow the target according to the posture information of the target and the size information of the bounding box.
  • step 204 For the specific implementation method and principle of step 204 in this embodiment, reference may be made to the foregoing embodiment, which will not be repeated here.
  • the target following method provided in this embodiment can determine the key point information of the target in the captured image according to the neural network, and can more comprehensively analyze the human body posture according to the key point information.
  • the recognition accuracy is higher and more flexible, and when the action category to be recognized needs to be changed, all samples do not need to be relabeled, which saves labor costs and reduces the amount of development when requirements change; the key is determined by the confidence characteristic map Compared with the scheme that directly uses the key point coordinates as the training target, the position of the point is not prone to overfitting, the recognition accuracy is higher, and it has stronger anti-interference ability. There is no need to collect a large number of samples and label the corresponding data.
  • the workload of manual labeling is further reduced; through the two-dimensional Gaussian distribution, the confidence characteristic map corresponding to the sample image can be quickly and accurately determined, which makes the training process more stable, avoids manual labeling errors, has anti-interference performance, and improves The key point recognition accuracy rate.
  • the number of pixels in the confidence feature map output by the neural network may be less than the number of pixels in the input ROI image.
  • the ROI image is an RGB image of h*w*3, h and w are the length and width of the input respectively
  • each confidence feature map includes 25*25 pixels.
  • the size of the target result can be set to 1/4 of the input image, and the function of reducing the image through the neural network can be realized.
  • the third embodiment of the present invention provides a target following method. This embodiment is based on the technical solutions provided in the foregoing embodiments, and specifically determines the posture information of the user through the tilt angle of the body and/or the bending angle of the legs.
  • FIG. 5 is a schematic flowchart of a target following method according to Embodiment 3 of the present invention. As shown in Figure 5, the target following method in this embodiment may include:
  • Step 501 Acquire a captured image.
  • Step 502 Determine the bounding box where the target is located and the key point information of the target according to the image.
  • step 501 to step 502 can refer to the foregoing embodiment, which will not be repeated here.
  • Step 503 Determine the body inclination angle and/or leg bending angle of the target according to the key point information of the target.
  • the body tilt angle of the target may include the left body tilt angle and/or the right body tilt angle, and the body tilt angle on either side is the angle between the first line and the second line on that side,
  • the first line is the line between the shoulder joint and the ipsilateral hip joint of the target, and the second line is the line between the hip joint and the ipsilateral knee.
  • the body inclination angle on the left is the angle of the line between the left shoulder joint and the left hip joint relative to the line between the left hip joint and the left knee;
  • the body inclination angle on the right is The angle of the line between the right shoulder joint and the right hip joint compared to the angle of the line between the right hip joint and the right knee.
  • the inclination angle of the body on the left side and the inclination angle of the body on the right are often relatively close. In practical applications, the inclination angle of the body on one side can be calculated only.
  • the target leg bending angle may include the left leg bending angle and/or the right leg bending angle, and the leg bending angle on either side is the clip between the third line and the fourth line on that side.
  • the third line is the line between the ankle of the target and the knee on the same side
  • the fourth line is the line between the knee and the hip joint on the same side.
  • the bending angle of the left leg can be the angle of the line between the left ankle and the left knee relative to the line between the left knee and the left hip joint; the right leg is bent The angle can be the angle of the line between the right ankle and the right knee relative to the line between the right knee and the right hip joint.
  • Step 504 Determine the posture information of the target according to the inclination angle of the body and/or the bending angle of the legs of the target.
  • the target is in a non-upright walking state. Otherwise, it can be determined that the target is walking upright.
  • the first angle and the second angle may be the same or different.
  • the first angle and the second angle may both be 150°.
  • the specific judgment logic can be: if the inclination angle of the body on either side is less than 150°, the target is considered to be in a non-upright walking state; if the bending angle of the legs on both sides is less than 150°, the target is considered In a non-upright walking state; if the inclination angle of the body on both sides is greater than 150°, and the bending angle of at least one leg is greater than 150°, the target is considered to be in an upright walking state.
  • FIG. 6 is a schematic diagram of the positions of key points in a squat state in a target following method according to Embodiment 3 of the present invention.
  • the black dots represent the key points of the target.
  • the four dots from top to bottom represent the shoulder joints, knees, hip joints and ankles.
  • the body tilt angle ⁇ a is between the shoulder joint and the hip joint.
  • the bending angle of the leg ⁇ b is the angle between the connection between the ankle and the knee relative to the connection between the knee and the hip.
  • the target side of the body is inclined
  • the angle ⁇ a and the leg bending angle ⁇ b are both less than 150°, so it is determined that the target is in a non-upright walking state.
  • FIG. 7 is a schematic diagram of the positions of key points in an upright walking state in a target following method provided by Embodiment 3 of the present invention. As shown in Figure 7, the four dots represent from top to bottom: shoulder joint, hip joint, knee and ankle. The inclination angle of one side of the target body ⁇ c and the bending angle of leg ⁇ d are both greater than 150°, so it is judged The target is walking upright.
  • Step 504 follow the target according to the posture information of the target and the size information of the bounding box.
  • step 504 can refer to the foregoing embodiment, which will not be repeated here.
  • the target following method determines whether the target is in an upright walking state through the body tilt angle and the leg bending angle, if the body tilt angle on either side is less than the first angle, or if the leg bending angles on both sides are equal If the angle is less than the second angle, it is determined that the target is in a non-upright walking state, and the posture information of the target can be quickly and accurately recognized. When only one leg is bent at a small angle, it is still judged to be an upright walking state to avoid being in a state of being bent when one leg is bent. It is misjudged as a non-upright walking state, which improves the accuracy of posture judgment.
  • FIG. 8 is a schematic structural diagram of a target following device according to Embodiment 4 of the present invention.
  • the target following device may execute the target following method corresponding to FIG. 1.
  • the target following device may include:
  • the memory 11 is used to store computer programs
  • the processor 12 is configured to run a computer program stored in the memory to realize:
  • the structure of the target follower device may further include a communication interface 13 for communicating with other devices or a communication network.
  • the processor 12 when following the target according to the posture information of the target and the size information of the bounding box, the processor 12 is specifically configured to:
  • the strategy includes an algorithm for calculating the distance of the target based on the size information of the bounding box.
  • the processor 12 when following the target according to the determined strategy and the size information of the bounding box, the processor 12 is specifically configured to:
  • the acceleration to follow is determined.
  • the corresponding following acceleration in the upright walking state is greater than the following acceleration in the upright walking state.
  • the processor 12 when acquiring the captured image, is specifically configured to:
  • the processor 12 when following the target according to the posture information of the target and the size information of the bounding box, the processor 12 is specifically configured to:
  • the processor 12 when determining the posture information of the target according to the image, is specifically configured to:
  • the posture information of the target is determined according to the key point information of the target.
  • the key point information of the target includes position information of multiple key points of the target.
  • the processor 12 when determining the key point information of the target according to the image, is specifically configured to:
  • the key point information in the ROI image is determined according to the neural network.
  • the processor 12 when determining the key point information in the ROI image according to a neural network, the processor 12 is specifically configured to:
  • the key point information of the target is determined according to the confidence characteristic maps corresponding to the multiple key points.
  • the processor 12 when determining the key point information of the target according to the confidence characteristic maps corresponding to the multiple key points, the processor 12 is specifically configured to:
  • the location information of the key point of the target is the location information of the pixel with the highest probability.
  • the processor 12 is further configured to:
  • the training sample including a sample image and a confidence feature map corresponding to the sample image
  • the neural network is trained.
  • the processor 12 when acquiring training samples, is specifically configured to:
  • a pixel point closer to the key point has a higher corresponding probability.
  • the processor 12 when determining the confidence characteristic map corresponding to the sample image according to the position information of the key point, the processor 12 is specifically configured to:
  • the confidence feature map corresponding to the sample image is determined through a two-dimensional Gaussian distribution.
  • the number of pixels in the confidence feature map output by the neural network is less than the number of pixels in the ROI image.
  • the processor 12 when determining the posture information of the target according to the key point information of the target, the processor 12 is specifically configured to:
  • the posture information of the target is determined according to at least part of the links formed between the multiple key points of the target.
  • the processor 12 when determining the posture information of the target according to at least part of the lines formed between the multiple key points of the target, the processor 12 is specifically configured to:
  • the posture information of the target is determined according to the angle information corresponding to the at least part of the connection.
  • the angle information corresponding to each line includes: the angle between the line and the reference line, and/or the angle between the line and any one or more other lines
  • the included angle; the reference line is a horizontal line or a vertical line.
  • the processor 12 when determining the posture information of the target according to the key point information of the target, the processor 12 is specifically configured to:
  • the posture information of the target is determined according to the inclination angle of the body and/or the bending angle of the legs of the target.
  • the body tilt angle of the target includes the left body tilt angle and/or the right body tilt angle, wherein the body tilt angle on either side is the first connecting line and the second connecting line on that side.
  • the angle between the lines, the first line is the line between the shoulder joint of the target and the hip joint on the same side, and the second line is the line between the hip joint and the knee on the same side Connection
  • the target leg bending angle includes the left leg bending angle and/or the right leg bending angle, wherein the leg bending angle on either side is the angle between the third and fourth lines on that side
  • the angle, the third line is the line between the ankle of the target and the knee on the same side
  • the fourth line is the line between the knee and the hip joint on the same side.
  • the processor 12 when determining the posture information of the target according to the body inclination angle and/or leg bending angle of the target, the processor 12 is specifically configured to:
  • the target is in a non-upright walking state.
  • the target follower device shown in FIG. 8 can execute the methods of the embodiments shown in FIG. 1 to FIG. 7. For parts that are not described in detail in this embodiment, reference may be made to the related descriptions of the embodiments shown in FIG. 1 to FIG. 7. For the implementation process and technical effects of this technical solution, please refer to the description in the embodiment shown in FIG. 1 to FIG. 7, which will not be repeated here.
  • An embodiment of the present invention also provides a movable device, including the target following device described in any of the foregoing embodiments.
  • the movable device may further include:
  • a photographing device connected to the processor, used to photograph an image and send it to the processor;
  • the driving device is connected to the processor, and is used to drive the movable device to follow the target under the control of the processor.
  • the driving device may be a motor or the like, and the movement of the movable device can be realized by the driving device, thereby realizing following the target.
  • the movable device is a drone or an unmanned vehicle.
  • an embodiment of the present invention provides a storage medium, the storage medium is a computer-readable storage medium, the computer-readable storage medium stores program instructions, and the program instructions are used to implement the embodiments shown in FIGS. 1 to 7 above. Goal follow method in.
  • the disclosed related remote control device and method can be implemented in other ways.
  • the embodiments of the remote control device described above are merely illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units or components. It can be combined or integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, remote control devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present invention essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read_Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disks or optical disks and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

一种目标跟随方法、目标跟随装置、可移动设备和存储介质,其中方法包括:获取拍摄的图像(101),根据图像确定目标的姿态信息以及目标所在的边界框的尺寸信息(102),根据目标的姿态信息以及边界框的尺寸信息,对目标进行跟随(103)。本发明可以减少或避免因为目标姿态变化导致的可移动设备冲人、前后摇晃等问题,有效提高跟随的稳定性和安全性。

Description

目标跟随方法、目标跟随装置、可移动设备和存储介质 技术领域
本发明实施例涉及智能控制技术领域,尤其涉及一种目标跟随方法、目标跟随装置、可移动设备和存储介质。
背景技术
随着科技的不断发展,可移动设备如无人机等的应用也越来越广泛。无人机可以在飞行的过程中,对目标进行智能跟随。现有技术的不足之处在于,无人机对目标的跟随过程的稳定性较差,常常出现无人机冲人、前后摇晃等问题,严重时还会危及用户人身安全。
发明内容
本发明实施例提供了一种目标跟随方法、目标跟随装置、可移动设备和存储介质,用以解决现有技术中可移动设备的跟随过程的稳定性较差的技术问题。
本发明实施例第一方面提供一种目标跟随方法,包括:
获取拍摄的图像;
根据所述图像确定目标的姿态信息以及所述目标所在的边界框的尺寸信息;
根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随。
本发明实施例第二方面提供一种目标跟随装置,包括:
存储器,用于存储计算机程序;
处理器,用于运行所述存储器中存储的计算机程序以实现:
获取拍摄的图像;
根据所述图像确定目标的姿态信息以及所述目标所在的边界框的尺寸信息;
根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行 跟随。
本发明实施例第三方面提供一种可移动设备,包括第二方面所述的目标跟随装置。
本发明实施例第四方面提供一种计算机可读存储介质,所述计算机可读存储介质中存储有程序指令,所述程序指令用于实现第一方面所述的方法。
本发明实施例提供的目标跟随方法、目标跟随装置、可移动设备和存储介质,通过获取拍摄的图像,根据所述图像确定目标的姿态信息以及所述目标所在的边界框的尺寸信息,根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随,可以减少或避免因为目标姿态变化导致的可移动设备冲人、前后摇晃等问题,有效提高跟随的稳定性和安全性。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本发明的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1为本发明实施例一提供的一种目标跟随方法的流程示意图;
图2为本发明实施例二提供的一种目标跟随方法的流程示意图;
图3为本发明实施例二提供的一种目标跟随方法中确定关键点信息的原理示意图;
图4为本发明实施例二提供的一种目标跟随方法中置信度特征图的高斯分布区域和零响应背景的位置示意图;
图5为本发明实施例三提供的一种目标跟随方法的流程示意图;
图6为本发明实施例三提供的一种目标跟随方法中下蹲状态的关键点的位置示意图;
图7为本发明实施例三提供的一种目标跟随方法中直立行走状态的关键点的位置示意图;
图8为本发明实施例四提供的一种目标跟随装置的结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发 明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。
实施例一
本发明实施例一提供一种目标跟随方法。图1为本发明实施例一提供的一种目标跟随方法的流程示意图。如图1所示,本实施例中的目标跟随方法,可以包括:
步骤101、获取拍摄的图像。
可选的,本实施例中的方法,可以应用于可移动设备,所述可移动设备可以为无人机或无人车等任意能够移动的设备,所述可移动设备上可以设置有拍摄装置,所述拍摄装置可以用于对目标进行拍摄。
相应的,本步骤中的获取拍摄的图像,可以具体包括:获取可移动设备中的拍摄装置拍摄的图像。通过对目标进行拍摄,可以实现根据拍摄的图像对目标进行跟随。所述目标可以为人或者车等可以移动的物体。
步骤102、根据所述图像确定目标的姿态信息以及所述目标所在的边界框的尺寸信息。
可选的,可以通过样本对神经网络进行训练,利用训练后的神经网络对图像进行处理,得到对应的姿态信息。或者,可以先通过目标检测算法等方法检测图像中的目标,然后再根据神经网络确定所述目标的姿态信息。
若所述目标为人,则所述姿态信息可以包括但不限于:站立、行走、下蹲、卧倒等。若所述目标为车,则所述姿态信息可以包括但不限于:直行、左转、右转等。
所述目标所在的边界框(bounding box)可以为所述目标在所述图像中所占的矩形框。可选的,可以通过单目标跟踪(Single Object Tracking,SOT)算法或者其它算法,对所述图像进行处理,得到目标所在的边界框。
所述边界框的尺寸信息可以包括所述边界框的高度和/或宽度。可选的,所述高度可以为所述边界框在第一方向上的长度,所述宽度可以为所述边界 框在第二方向上的长度。所述第一方向和所述第二方向可以分别为竖直方向和水平方向。
步骤103、根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随。
其中,对所述目标进行跟随,可以是指控制所述可移动设备跟随所述目标,具体地,可以通过将可移动设备与目标之间的距离始终控制在预设范围内来实现。例如,目标向前走,则可移动设备也随之向前走,目标停止,则可移动设备也停止。
在跟随过程中,可以参考所述目标的姿态信息以及边界框的尺寸信息,根据姿态信息及尺寸信息实现对目标的跟随策略。
在一个可选的实施方式中,根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随,可以包括:根据所述目标的姿态信息,确定对所述目标进行跟随的策略;根据所确定的策略以及所述边界框的尺寸信息,对所述目标进行跟随。
其中,所述策略可以包括通过边界框的尺寸信息计算所述目标的距离的算法。具体地,在对目标进行跟随时,可以通过单目测距等方法计算目标的距离,并根据距离来控制跟随过程。为了提高跟随的稳定性,可以根据姿态信息调整确定距离的具体算法。
例如,若所述目标处于第一姿态,则通过第一算法根据所述边界框的尺寸信息计算距离;若所述目标处于第二状态,则通过第二算法根据所述边界框的尺寸信息计算距离。
所述第一算法和所述第二算法可以根据实际需要来设置。一个简单的示例是,可以将所述边界框的高度乘以比例系数,得到所述目标的距离。不同的算法对应的比例系数可以不同。
在目标的姿态信息不变的情况下,目标的距离越远,在图像中所述目标的边界框的高度越小,距离越近,所述目标的边界框的高度越大,因此直接将边界框的高度乘以一个比例系数,就可以估算目标的距离。
在目标的距离不变的情况下,所述目标的姿态发生变化时对应的边界框尺寸也会发生变化。例如,所述目标处于下蹲状态时,边界框的高度约等于直立状态下边界框高度的三分之一。因此,可以为不同的姿态信息设置不同的比例系数。
可选的,所述第一姿态可以为直立行走状态,所述第一算法为将目标的边界框的高度乘以第一系数,得到所述目标的距离;所述第二姿态为下蹲姿态,所述第二算法为将目标的边界框的高度乘以第二系数,得到所述目标的距离,所述第二系数可以小于第一系数。例如,所述第一系数可以为100,所述第二系数可以为33。
在所述策略包括用于确定目标距离的算法的情况下,根据所确定的策略以及所述边界框的尺寸信息,对所述目标进行跟随,可以包括:根据所确定的算法以及所述目标所在的边界框的尺寸信息,计算所述目标的距离;根据所述目标的距离,确定跟随的加速度。
当然,所确定的策略也可以是边界框的尺寸信息与跟随的加速度的对应关系,例如,边界框的高度乘以一定的参数后直接得到对应的加速度,省略中间计算距离的步骤。
在确定跟随加速度后,可以控制可移动设备以所述跟随加速度对所述目标进行跟随。可以理解的是,距离越远,加速度可以越大,距离越近,加速度可以越小,甚至为负。
可选的,对于同样尺寸的边界框,非直立行走状态下对应的跟随的加速度小于直立行走状态下的跟随的加速度。其中,直立行走状态可以表示目标处于直立状态或行走状态,非直立行走状态可以表示目标处于直立和行走以外的状态,如蹲下、趴下等。
具体地,在跟随过程中,在目标处于直立行走状态时,当所述目标的距离为D1时,所述图像中所述目标的边界框的高度为H1;在所述目标处于非直立行走状态如下蹲状态时,当所述目标的距离为D2时,所述图像中所述目标的边界框的高度为H2。由于下蹲时用户身体会弯曲,因此,在H1等于H2的情况下,D1是大于D2的,因此,在直立行走状态下的跟随加速度应该大于非直立行走状态下同样边界框高度对应的跟随加速度。
通过上述方案,对于同样尺寸的边界框,非直立行走状态下对应的跟随加速度可以小于直立行走状态下的跟随加速度,能够在一定程度上避免用户下蹲后可移动设备冲人的问题。
在另一个可选的实施方式中,根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随,可以包括:若所述目标处于直立行走状态,则根据所述边界框的尺寸信息对所述目标进行跟随;若所述目标处于非 直立行走状态,则暂停跟随。
在所述目标处于非直立行走状态时暂停跟随,能够简单、有效地实现下蹲或其它非直立行走状态下对可移动设备的控制,避免可移动设备冲人。
利用姿态估计来作为辅助感知,不同的姿态采用不同的跟随策略,有助于可移动设备更好地规划跟随路径,为用户提供更完善的人机交互功能和更友好的人机交互体验,提高用户体验度。
在实际应用中,用户可以通过语音指示、挥手操作等方式,指示可移动设备进入跟随模式,在进入跟随模式后,可以采用上述步骤101至步骤103中的方法,对目标进行跟随。
在跟随过程中,检测目标的姿态信息,并根据姿态信息结合所述目标的边界框的尺寸信息进行跟随,相比于仅通过边界框的尺寸信息来进行跟随的方法,具有更高的稳定性。例如,在仅通过尺寸来进行跟随的方法中,当目标蹲下时,目标在图像中的尺寸就会减小,可移动设备会误判目标已经走远,因此会做出向前加速的动作,但是实际上目标并没有走远,这种误判会导致冲人、前后摇晃等实际问题。而本实施例中的方法,需要基于目标的姿态信息来实现跟随,避免目标改变姿态导致对目标距离的误判。
以上给出了目标为人时的跟随策略示例。当目标为其它物体时,也可以针对不同的姿态信息采用不同的跟随策略,例如,当目标为车时,从直行状态变为转弯状态,可能也会导致图像中目标所在边界框的尺寸信息发生变化,结合目标的姿态信息,可以更好地实现对目标的跟随。
本实施例提供的目标跟随方法,通过获取拍摄的图像,根据所述图像确定目标的姿态信息以及所述目标所在的边界框的尺寸信息,并根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随,能够减少或避免因为目标姿态变化导致的可移动设备冲人、前后摇晃等问题,有效提高跟随的稳定性和安全性。
实施例二
本发明实施例二提供一种目标跟随方法。本实施例是在上述实施例提供的技术方案的基础上,通过图像确定目标的关键点,再根据关键点确定目标的姿态信息。
图2为本发明实施例二提供的一种目标跟随方法的流程示意图。如图2所 示,本实施例中的目标跟随方法,可以包括:
步骤201、获取拍摄的图像。
步骤202、根据所述图像确定目标所在的边界框以及所述目标的关键点信息。
其中,根据图像确定目标所在的边界框,可以通过单目标跟踪算法或其它算法实现。根据图像确定目标的关键点信息,可以通过神经网络等深度学习算法实现。具体地,可以通过神经网络直接确定图像中的关键点信息,也可以先裁剪出图像中的感兴趣区域(Region Of Interest,ROI),再进一步确定关键点信息,相应的,步骤202中的根据所述图像确定所述目标的关键点信息,可以包括:确定所述图像中的目标所在的ROI图像;根据神经网络确定所述ROI图像中的关键点信息。
图3为本发明实施例二提供的一种目标跟随方法中确定关键点信息的原理示意图。如图3所示,可移动平台拍摄的图像可以为RGB图像,通过单目标跟踪算法或者其它算法,可以确定RGB图像中目标所在的边界框(bounding box),该bounding box的类别为人。使用RGB图像以及SOT算法提供的单个bounding box作为输入,可以得到相应的ROI图像。
例如,RGB图像的大小为1000*800*3,利用SOT的算法,可以从RGB图像中确定目标所在的bounding box,bounding box的表现形式可以为边界框四个角的坐标信息。根据bounding box,可以从GRB图像裁剪出ROI图像,例如,ROI图像的大小可以为100*100*3,目标位于该ROI图像中。
获取到ROI图像后,可以将ROI图像输入到神经网络模型中,利用模型确定关键点信息。本实施例中,所采用的模型可以为卷积神经网络(Convolutional Neural Networks,CNN),具体可以为全卷积神经网络(Fully Convolutional Networks,FCN)。
所述目标的关键点信息可以包括所述目标的多个关键点的位置信息,所述位置信息可以具体为关键点所在的坐标。其中,所述多个关键点可以包括但不限于:鼻子、肩中部、右肩关节、右肘关节、右手、左肩关节、左肘关节、左手、右髋关节、右膝盖、右脚踝、左髋关节、左膝盖、左脚踝中的至少两项。
神经网络的输出可以为图像中关键点的位置信息,或者,所述神经网络的输出可以为置信度特征图,根据置信度特征图可以确定图像中的关键点的 位置信息。下面以神经网络的输出为置信度特征图为例进行说明。
本实施例中,针对神经网络的处理可以包括训练和检测两个阶段。训练阶段可以在检测阶段之前实现,或者,可以在任意两次检测之间对神经网络进行训练。在训练阶段,可以利用样本来训练神经网络,调整神经网络中的参数,使得输出结果与目标结果相近。检测阶段可以用于跟随过程,利用已经经过充分训练的神经网络参数,来对图像进行检测,输出置信度特征图。
下面先介绍神经网络模型的训练阶段。可选的,训练的过程可以包括:获取训练样本,所述训练样本包括样本图像及所述样本图像对应的置信度特征图;根据训练样本,对所述神经网络进行训练。通过将置信度特征图作为目标结果对神经网络进行训练,使得神经网络的输出结果接近目标结果,能够有效提高神经网络的抗干扰性,避免神经网络过拟合。
可选的,训练样本的获取过程可以包括:获取样本图像及所述样本图像中的关键点的位置信息;根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图。其中,所述样本图像对应的置信度特征图中,距离所述关键点越近的像素点对应的概率越高。
所述样本图像可以为从数据库获取的任意图像中裁剪出的ROI图像,针对每个样本图像,利用人工标注的方法来确定该图像中的关键点的位置信息,根据关键点的位置信息,生成置信度特征图。
假设通过人工标注,确定图像中的肩关节所在的位置坐标为(50,50),那么根据该位置信息可以生成肩关节对应的置信度特征图。生成置信度特征图的原理是,像素点越接近肩关节所在的真实位置,该像素点属于肩关节的概率越大,例如,坐标为(50,50)的像素点对应的概率最大,假设可以为0.8,坐标为(55,55)的像素点对应的概率应该大于坐标为(60,60)的像素点对应的概率,例如两者对应的概率可以分别为0.1和0.01,图像边缘的远离(50,50)的像素点属于肩关节的概率非常小,接近于0。
可选的,可以根据关键点的位置信息,通过二维高斯分布生成所述样本图像对应的置信度特征图。具体地,置信度特征图中,像素点的位置坐标,可以服从期望为关键点坐标、方差为D1的二维高斯分布;或者,像素点与标注的关键点之间的距离,可以服从期望为0、方差为D2的高斯分布。其中,方差D1、D2可以根据实际需要来设置。通过二维高斯分布确定样本图像对应的置信度特征图,能够有效模拟各个像素点属于关键点的概率,提高检测准确 性。
可选的,置信度特征图也可以由高斯分布和零响应的背景组成。具体地,在关键点周围预设范围内,可以根据高斯分布确定各个像素点对应的概率,在预设范围之外,可以设置零响应的背景,简单来说,就是将预设范围之外的各个像素点对应的概率设置为0。
以所述关键点为肩关节为例,在肩关节所在位置的预设范围内,采用高斯分布生成各个像素点对应的概率,例如,所述预设范围可以为以肩关节为中心、半径为5的圆,当某一像素点与图像中肩关节所在的坐标点之间间隔5个像素点以上时,该像素点几乎不可能属于肩关节,对应的概率为0。
图4为本发明实施例二提供的一种目标跟随方法中置信度特征图的高斯分布区域和零响应背景的位置示意图。如图4所示,置信度特征图中,中间的黑点表示人工标注的关键点,阴影部分表示高斯分布区域,该区域内每个像素点对应的概率通过高斯分布确定,阴影以外的区域为零响应背景区域,零响应背景区域内各个像素点对应的概率均为0。通过高斯分布和零响应背景组成置信度特征图,能够有效简化置信度特征图的生成过程,提高置信度特征图的生成效率和准确性。
除了高斯分布以外,也可以采用其它方法来根据标注的关键点的位置生成置信度特征图,只要满足像素点与关键点之间的距离越远,像素点属于该关键点的概率越低即可。
若所述样本图像中标注出了多个关键点,则可以针对每一个关键点生成一个置信度特征图。获取多个样本图像及对应的置信度特征图,对神经网络进行训练,神经网络被训练为根据图像确定其中的关键点对应的置信度特征图。
在训练完成后,可以根据训练得到的神经网络对跟随过程中拍摄的图像进行处理。如图3所示,根据神经网络确定所述ROI图像中的关键点信息,可以包括:将所述ROI图像输入至神经网络,得到多个关键点对应的置信度特征图,其中,任一关键点对应的置信度特征图包括各个像素点属于该关键点的概率;根据所述多个关键点对应的置信度特征图确定所述目标的关键点信息。
例如,在确定目标的姿态信息时需要用到左右肩关节、左右髋关节、左右膝盖和左右脚踝共8个关键点,则将拍摄的图像输入神经网络,通过神经网络可以获取8个关键点对应的置信度特征图,根据8个置信度特征图可以分别 确定8个关键点所在的位置。
可选的,根据所述多个关键点对应的置信度特征图确定所述目标的关键点信息,可以包括:在任一关键点对应的置信度特征图中,确定属于该关键点的概率最高的像素点;若所述概率最高的像素点对应的概率大于预设阈值,则所述目标的该关键点的位置信息为所述概率最高的像素点的位置信息。
例如,在肩关节对应的置信度特征图中,若概率最高的像素点的坐标位于(10,10),其对应的概率为0.7,大于预设阈值,则该像素点属于肩关节的可信度足够高,那么可以认为肩关节的坐标为(10,10)。若概率最高的像素点对应的概率小于预设阈值,则说明全部像素点属于肩关节的概率都不够高,那么可以认为图中缺少肩关节。所述预设阈值可以根据实际需要来设置,例如可以为0.5。
步骤203、根据所述目标的关键点信息确定所述目标的姿态信息。
在根据神经网络确定目标的关键点信息后,可以根据关键点信息确定对应的姿态信息。具体地,在获得关键点后,可以根据各个关键点之间形成的连接关系形成肢体,所形成的肢体可以作为目标姿态的判断依据。
可选的,可以根据所述目标的多个关键点之间形成的连线中的至少部分连线,确定所述目标的姿态信息,从而快速、准确地实现姿态检测。
例如,通过图像获取了8个关键点,8个关键点中两两相连,可以得到28条连线,根据28条连线中的至少部分连线,可以确定目标的姿态信息。
在一个可选的实施方式中,可以根据连线的长度确定对应的姿态信息。例如,若肩关节与膝盖之间的连线长度,小于肩关节与髋关节之间的连线长度,那么可以认为目标处于蹲下的状态;若肩关节与膝盖之间的连线长度,约等于肩关节与髋关节之间的长度加上髋关节与膝盖之间的连线长度,那么可以认为目标处于直立状态。
在另一个可选的实施方式中,可以计算所述目标的多个关键点之间形成的连线中的至少部分连线对应的角度信息;根据所述至少部分连线对应的角度信息,确定所述目标的姿态信息。
其中,每条连线对应的角度信息可以包括:所述连线与基准线之间的夹角,和/或,所述连线与其它任意一个或多个连线之间的夹角;所述基准线为水平线或竖直线。
例如,若肩关节与髋关节之间的连线,与竖直线之间的角度小于一定值, 则认为目标处于身体倾斜状态;若肩关节与髋关节之间的连线,和髋关节与膝盖之间的连线呈90°,那么可以认为目标处于坐下的状态。
当然,也可以根据连线的长度和角度综合确定目标的姿态信息,提高姿态识别的准确性。
步骤204、根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随。
本实施例中的步骤204的具体实现方法和原理可以参见上述实施例,此处不再赘述。
本实施例提供的目标跟随方法,可以根据神经网络确定拍摄图像中目标的关键点信息,根据关键点信息能够更全面地解析人体姿态,相比于根据神经网络直接输出姿态信息的方案来说,识别的准确性更高,更加灵活,并且,当需要更换需识别的动作类别时,无需对所有样本进行重新标注,节约了人工成本,减少需求变更时的开发量;通过置信度特征图确定关键点的位置,相比于直接以关键点坐标作为训练目标的方案来说,不容易发生过拟合,识别准确度较高,具有更强的抗干扰性,无需采集大量样本和标注相应数据,进一步减少了人工标注的工作量;通过二维高斯分布,能够迅速、准确地确定所述样本图像对应的置信度特征图,使得训练过程更稳定,避免人工标注误差,具有抗干扰性,提高了关键点识别准确率。
在上述实施例提供的技术方案的基础上,可选的,所述神经网络输出的置信度特征图的像素点个数可以小于输入的ROI图像的像素点个数。
例如,ROI图像为h*w*3的RGB图像,h和w分别为输入的长和宽,神经网络输出h’*w’*k的置信度特征图,h’和w’为输出的长和宽,其中,h’=0.25*h,w’=0.25*w,k为关键点的类别数量,本实施例中,k=8,分别为左右肩关节,左右髋关节,左右膝盖,左右脚踝。
假设输入的ROI图像有100*100个像素点,那么输出8个置信度特征图,每个置信度特征图包括25*25个像素点。在训练时,可以设置目标结果的尺寸为输入图像的1/4,就可以实现通过神经网络缩小图像的功能。
将输出的置信度特征图包含的像素点个数设置为小于输入的ROI图像的像素点个数,可以提高拍摄图像的处理效率,减少输出结果的占用空间,并且,由于人工标注关键点是存在一定误差的,通过减少输出图像的尺寸,可以在一定程度上避免误差,提高识别准确性。
实施例三
本发明实施例三提供提供一种目标跟随方法。本实施例是在上述实施例提供的技术方案的基础上,具体通过身体倾斜角度和/或腿部弯曲角度来确定用户的姿态信息。
图5为本发明实施例三提供的一种目标跟随方法的流程示意图。如图5所示,本实施例中的目标跟随方法,可以包括:
步骤501、获取拍摄的图像。
步骤502、根据所述图像确定目标所在的边界框以及所述目标的关键点信息。
本实施例中,步骤501至步骤502的具体实现方案可以参照前述实施例,此处不再赘述。
步骤503、根据所述目标的关键点信息确定所述目标的身体倾斜角度和/或腿部弯曲角度。
其中,所述目标的身体倾斜角度可以包括左侧身体倾斜角度和/或右侧身体倾斜角度,任意一侧的身体倾斜角度为该侧第一连线和第二连线之间的夹角,所述第一连线为所述目标的该侧肩关节与同侧髋关节之间的连线,所述第二连线为所述髋关节与同侧膝盖之间的连线。
具体地,左侧的身体倾斜角度为左侧肩关节与左侧髋关节之间的连线,相对于左侧髋关节与左侧膝盖之间的连线的角度;右侧的身体倾斜角度为右侧肩关节与右侧髋关节之间的连线,相比于右侧髋关节与右侧膝盖之间的连线的角度。一般情况下,左侧的身体倾斜角度和右侧的身体倾斜角度往往比较接近,在实际应用中,可以仅计算一侧的身体倾斜角度。
所述目标的腿部弯曲角度可以包括左侧腿部弯曲角度和/或右侧腿部弯曲角度,任意一侧的腿部弯曲角度为该侧第三连线和第四连线之间的夹角,所述第三连线为所述目标的该侧脚踝与同侧膝盖之间的连线,所述第四连线为所述膝盖与同侧髋关节之间的连线。当根据身体倾斜角度和腿部弯曲角度共同确定姿态信息时,同侧的第二连线可以与第四连线重合。
具体地,左侧的腿部弯曲角度可以为左侧的脚踝与左侧膝盖之间的连线,相对于左侧膝盖与左侧髋关节之间的连线的角度;右侧的腿部弯曲角度可以为右侧的脚踝与右侧膝盖之间的连线,相对于右侧膝盖与右侧髋关节之间的 连线的角度。
步骤504、根据所述目标的身体倾斜角度和/或腿部弯曲角度,确定所述目标的姿态信息。
可选的,若任意一侧的身体倾斜角度小于第一角度,或者,若两侧腿部弯曲角度均小于第二角度,则确定所述目标处于非直立行走状态。反之,则可以确定目标处于直立行走状态。
所述第一角度与所述第二角度可以相同,也可以不同。例如,所述第一角度和所述第二角度可以均为150°。
在进行姿态判断时,具体的判断逻辑可以为:若任意一侧的身体倾斜角度小于150°,则认为目标处于非直立行走状态;若两侧的腿部弯曲角度均小于150°,则认为目标处于非直立行走状态;若两侧的身体倾斜角度均大于150°,并且,至少一侧的腿部弯曲角度大于150°,则认为目标处于直立行走状态。
图6为本发明实施例三提供的一种目标跟随方法中下蹲状态的关键点的位置示意图。如图6所示,黑色的圆点表示目标的关键点,四个圆点从上至下分别代表:肩关节、膝盖、髋关节和脚踝,身体倾斜角度∠a为肩关节与髋关节之间连线相对于髋关节与膝盖之间连线的夹角,腿部弯曲角度∠b为脚踝与膝盖之间连线相对于膝盖与髋关节之间连线的夹角,目标的一侧身体倾斜角度∠a和腿部弯曲角度∠b均小于150°,因此判定目标处于非直立行走状态。
图7为本发明实施例三提供的一种目标跟随方法中直立行走状态的关键点的位置示意图。如图7所示,四个圆点从上至下分别代表:肩关节、髋关节、膝盖和脚踝,目标的一侧身体倾斜角度∠c和腿部弯曲角度∠d均大于150°,因此判定目标处于直立行走状态。
步骤504、根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随。
本实施例中,步骤504的具体实现方案可以参照前述实施例,此处不再赘述。
本实施例提供的目标跟随方法,通过身体倾斜角度和腿部弯曲角度来确定目标是否处于直立行走状态,若任意一侧的身体倾斜角度小于第一角度,或者,若两侧腿部弯曲角度均小于第二角度,则确定所述目标处于非直立行走状态,能够快速准确地识别目标的姿态信息,当只有一侧腿部弯曲角度较 小时,仍然判定属于直立行走状态,避免单腿弯曲时被误判为非直立行走状态,提高姿态判断的准确性。
实施例四
图8为本发明实施例四提供的一种目标跟随装置的结构示意图。所述目标跟随装置可以执行上述图1所对应的目标跟随方法,参考附图8所示,所述目标跟随装置可以包括:
存储器11,用于存储计算机程序;
处理器12,用于运行所述存储器中存储的计算机程序以实现:
获取拍摄的图像;
根据所述图像确定目标的姿态信息以及所述目标所在的边界框的尺寸信息;
根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随。
可选的,该目标跟随装置的结构中还可以包括通信接口13,用于与其他设备或通信网络通信。
在一个可实施的方式中,在根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随时,所述处理器12具体用于:
根据所述目标的姿态信息,确定对所述目标进行跟随的策略;
根据所确定的策略以及所述边界框的尺寸信息,对所述目标进行跟随。
在一个可实施的方式中,所述策略包括通过边界框的尺寸信息计算所述目标的距离的算法。
在一个可实施的方式中,在根据所确定的策略以及所述边界框的尺寸信息,对所述目标进行跟随时,所述处理器12具体用于:
根据所确定的算法以及所述目标所在的边界框的尺寸信息,计算所述目标的距离;
根据所述目标的距离,确定跟随的加速度。
在一个可实施的方式中,对于同样尺寸的边界框,直立行走状态下对应的跟随的加速度大于直立行走状态下的跟随的加速度。
在一个可实施的方式中,在获取拍摄的图像时,所述处理器12具体用于:
获取可移动设备中的拍摄装置拍摄的图像;
相应的,在根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随时,所述处理器12具体用于:
根据所述姿态信息以及所述边界框的尺寸信息,控制所述可移动设备跟随所述目标。
在一个可实施的方式中,在根据所述图像确定目标的姿态信息时,所述处理器12具体用于:
根据所述图像确定目标的关键点信息;
根据所述目标的关键点信息确定所述目标的姿态信息。
在一个可实施的方式中,所述目标的关键点信息包括所述目标的多个关键点的位置信息。
在一个可实施的方式中,在根据所述图像确定目标的关键点信息时,所述处理器12具体用于:
确定所述图像中的目标所在的感兴趣区域ROI图像;
根据神经网络确定所述ROI图像中的关键点信息。
在一个可实施的方式中,在根据神经网络确定所述ROI图像中的关键点信息时,所述处理器12具体用于:
将所述ROI图像输入至神经网络,得到多个关键点对应的置信度特征图,其中,任一关键点对应的置信度特征图包括各个像素点属于该关键点的概率;
根据所述多个关键点对应的置信度特征图确定所述目标的关键点信息。
在一个可实施的方式中,在根据所述多个关键点对应的置信度特征图确定所述目标的关键点信息时,所述处理器12具体用于:
在任一关键点对应的置信度特征图中,确定属于该关键点的概率最高的像素点;
若所述概率最高的像素点对应的概率大于预设阈值,则所述目标的该关键点的位置信息为所述概率最高的像素点的位置信息。
在一个可实施的方式中,在根据神经网络确定所述ROI图像中的关键点信息之前,所述处理器12还用于:
获取训练样本,所述训练样本包括样本图像及所述样本图像对应的置信度特征图;
根据训练样本,对所述神经网络进行训练。
在一个可实施的方式中,在获取训练样本时,所述处理器12具体用于:
获取样本图像及所述样本图像中的关键点的位置信息;
根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图;
其中,所述样本图像对应的置信度特征图中,距离所述关键点越近的像素点对应的概率越高。
在一个可实施的方式中,在根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图时,所述处理器12具体用于:
根据所述关键点的位置信息,通过二维高斯分布确定所述样本图像对应的置信度特征图。
在一个可实施的方式中,所述神经网络输出的置信度特征图的像素点个数小于所述ROI图像的像素点个数。
在一个可实施的方式中,在根据所述目标的关键点信息确定所述目标的姿态信息时,所述处理器12具体用于:
根据所述目标的多个关键点之间形成的连线中的至少部分连线,确定所述目标的姿态信息。
在一个可实施的方式中,在根据所述目标的多个关键点之间形成的连线中的至少部分连线,确定所述目标的姿态信息时,所述处理器12具体用于:
计算所述目标的多个关键点之间形成的连线中的至少部分连线对应的角度信息;
根据所述至少部分连线对应的角度信息,确定所述目标的姿态信息。
在一个可实施的方式中,每条连线对应的角度信息包括:所述连线与基准线之间的夹角,和/或,所述连线与其它任意一个或多个连线之间的夹角;所述基准线为水平线或竖直线。
在一个可实施的方式中,在根据所述目标的关键点信息确定所述目标的姿态信息时,所述处理器12具体用于:
根据所述目标的关键点信息确定所述目标的身体倾斜角度和/或腿部弯曲角度;
根据所述目标的身体倾斜角度和/或腿部弯曲角度,确定所述目标的姿态信息。
在一个可实施的方式中,所述目标的身体倾斜角度包括左侧身体倾斜角度和/或右侧身体倾斜角度,其中,任意一侧的身体倾斜角度为该侧第一连线和第二连线之间的夹角,所述第一连线为所述目标的该侧肩关节与同侧髋关 节之间的连线,所述第二连线为所述髋关节与同侧膝盖之间的连线;
所述目标的腿部弯曲角度包括左侧腿部弯曲角度和/或右侧腿部弯曲角度,其中,任意一侧的腿部弯曲角度为该侧第三连线和第四连线之间的夹角,所述第三连线为所述目标的该侧脚踝与同侧膝盖之间的连线,所述第四连线为所述膝盖与同侧髋关节之间的连线。
在一个可实施的方式中,在根据所述目标的身体倾斜角度和/或腿部弯曲角度,确定所述目标的姿态信息时,所述处理器12具体用于:
若任意一侧的身体倾斜角度小于第一角度,或者,若两侧腿部弯曲角度均小于第二角度,则确定所述目标处于非直立行走状态。
图8所示目标跟随装置可以执行图1-图7所示实施例的方法,本实施例未详细描述的部分,可参考对图1-图7所示实施例的相关说明。该技术方案的执行过程和技术效果参见图1-图7所示实施例中的描述,在此不再赘述。
本发明实施例还提供一种可移动设备,包括上述任一实施例所述的目标跟随装置。
可选的,所述可移动设备还可以包括:
拍摄装置,与所述处理器连接,用于拍摄图像并发送给所述处理器;
驱动装置,与所述处理器连接,用于在所述处理器的控制下驱动所述可移动设备对所述目标进行跟随。
所述驱动装置可以为电机等,通过驱动装置可以实现可移动设备的移动,从而实现对目标的跟随。
可选的,所述可移动设备为无人机或无人车。
本发明实施例提供的可移动设备中各部件的结构、功能可以参见前述实施例,此处不再赘述。
另外,本发明实施例提供了一种存储介质,该存储介质为计算机可读存储介质,该计算机可读存储介质中存储有程序指令,程序指令用于实现上述图1-图7所示实施例中的目标跟随方法。
以上各个实施例中的技术方案、技术特征在与本相冲突的情况下均可以单独,或者进行组合,只要未超出本领域技术人员的认知范围,均属于本发明保护范围内的等同实施例。
在本发明所提供的几个实施例中,应该理解到,所揭露的相关遥控装置和方法,可以通过其它的方式实现。例如,以上所描述的遥控装置实施例仅 仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,遥控装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得计算机处理器(processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read_Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁盘或者光盘等各种可以存储程序代码的介质。
以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (46)

  1. 一种目标跟随方法,其特征在于,包括:
    获取拍摄的图像;
    根据所述图像确定目标的姿态信息以及所述目标所在的边界框的尺寸信息;
    根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随。
  2. 根据权利要求1所述的方法,其特征在于,根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随,包括:
    根据所述目标的姿态信息,确定对所述目标进行跟随的策略;
    根据所确定的策略以及所述边界框的尺寸信息,对所述目标进行跟随。
  3. 根据权利要求2所述的方法,其特征在于,所述策略包括通过边界框的尺寸信息计算所述目标的距离的算法。
  4. 根据权利要求3所述的方法,其特征在于,根据所确定的策略以及所述边界框的尺寸信息,对所述目标进行跟随,包括:
    根据所确定的算法以及所述目标所在的边界框的尺寸信息,计算所述目标的距离;
    根据所述目标的距离,确定跟随的加速度。
  5. 根据权利要求4所述的方法,其特征在于,对于同样尺寸的边界框,直立行走状态下对应的跟随的加速度大于直立行走状态下的跟随的加速度。
  6. 根据权利要求1所述的方法,其特征在于,获取拍摄的图像,包括:
    获取可移动设备中的拍摄装置拍摄的图像;
    相应的,根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随,包括:
    根据所述姿态信息以及所述边界框的尺寸信息,控制所述可移动设备跟随所述目标。
  7. 根据权利要求1所述的方法,其特征在于,根据所述图像确定目标的姿态信息,包括:
    根据所述图像确定目标的关键点信息;
    根据所述目标的关键点信息确定所述目标的姿态信息。
  8. 根据权利要求7所述的方法,其特征在于,所述目标的关键点信息包 括所述目标的多个关键点的位置信息。
  9. 根据权利要求7所述的方法,其特征在于,根据所述图像确定目标的关键点信息,包括:
    确定所述图像中的目标所在的感兴趣区域ROI图像;
    根据神经网络确定所述ROI图像中的关键点信息。
  10. 根据权利要求9所述的方法,其特征在于,根据神经网络确定所述ROI图像中的关键点信息,包括:
    将所述ROI图像输入至神经网络,得到多个关键点对应的置信度特征图,其中,任一关键点对应的置信度特征图包括各个像素点属于该关键点的概率;
    根据所述多个关键点对应的置信度特征图确定所述目标的关键点信息。
  11. 根据权利要求10所述的方法,其特征在于,根据所述多个关键点对应的置信度特征图确定所述目标的关键点信息,包括:
    在任一关键点对应的置信度特征图中,确定属于该关键点的概率最高的像素点;
    若所述概率最高的像素点对应的概率大于预设阈值,则所述目标的该关键点的位置信息为所述概率最高的像素点的位置信息。
  12. 根据权利要求9所述的方法,其特征在于,在根据神经网络确定所述ROI图像中的关键点信息之前,还包括:
    获取训练样本,所述训练样本包括样本图像及所述样本图像对应的置信度特征图;
    根据训练样本,对所述神经网络进行训练。
  13. 根据权利要求12所述的方法,其特征在于,获取训练样本,包括:
    获取样本图像及所述样本图像中的关键点的位置信息;
    根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图;
    其中,所述样本图像对应的置信度特征图中,距离所述关键点越近的像素点对应的概率越高。
  14. 根据权利要求13所述的方法,其特征在于,根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图,包括:
    根据所述关键点的位置信息,通过二维高斯分布确定所述样本图像对应的置信度特征图。
  15. 根据权利要求10所述的方法,其特征在于,所述神经网络输出的置 信度特征图的像素点个数小于所述ROI图像的像素点个数。
  16. 根据权利要求7所述的方法,其特征在于,根据所述目标的关键点信息确定所述目标的姿态信息,包括:
    根据所述目标的多个关键点之间形成的连线中的至少部分连线,确定所述目标的姿态信息。
  17. 根据权利要求16所述的方法,其特征在于,根据所述目标的多个关键点之间形成的连线中的至少部分连线,确定所述目标的姿态信息,包括:
    计算所述目标的多个关键点之间形成的连线中的至少部分连线对应的角度信息;
    根据所述至少部分连线对应的角度信息,确定所述目标的姿态信息。
  18. 根据权利要求17所述的方法,其特征在于,每条连线对应的角度信息包括:所述连线与基准线之间的夹角,和/或,所述连线与其它任意一个或多个连线之间的夹角;所述基准线为水平线或竖直线。
  19. 根据权利要求7所述的方法,其特征在于,根据所述目标的关键点信息确定所述目标的姿态信息,包括:
    根据所述目标的关键点信息确定所述目标的身体倾斜角度和/或腿部弯曲角度;
    根据所述目标的身体倾斜角度和/或腿部弯曲角度,确定所述目标的姿态信息。
  20. 根据权利要求19所述的方法,其特征在于,所述目标的身体倾斜角度包括左侧身体倾斜角度和/或右侧身体倾斜角度,其中,任意一侧的身体倾斜角度为该侧第一连线和第二连线之间的夹角,所述第一连线为所述目标的该侧肩关节与同侧髋关节之间的连线,所述第二连线为所述髋关节与同侧膝盖之间的连线;
    所述目标的腿部弯曲角度包括左侧腿部弯曲角度和/或右侧腿部弯曲角度,其中,任意一侧的腿部弯曲角度为该侧第三连线和第四连线之间的夹角,所述第三连线为所述目标的该侧脚踝与同侧膝盖之间的连线,所述第四连线为所述膝盖与同侧髋关节之间的连线。
  21. 根据权利要求20所述的方法,其特征在于,根据所述目标的身体倾斜角度和/或腿部弯曲角度,确定所述目标的姿态信息,包括:
    若任意一侧的身体倾斜角度小于第一角度,或者,若两侧腿部弯曲角度 均小于第二角度,则确定所述目标处于非直立行走状态。
  22. 一种目标跟随装置,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于运行所述存储器中存储的计算机程序以实现:
    获取拍摄的图像;
    根据所述图像确定目标的姿态信息以及所述目标所在的边界框的尺寸信息;
    根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随。
  23. 根据权利要求22所述的装置,其特征在于,在根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随时,所述处理器具体用于:
    根据所述目标的姿态信息,确定对所述目标进行跟随的策略;
    根据所确定的策略以及所述边界框的尺寸信息,对所述目标进行跟随。
  24. 根据权利要求23所述的装置,其特征在于,所述策略包括通过边界框的尺寸信息计算所述目标的距离的算法。
  25. 根据权利要求24所述的装置,其特征在于,在根据所确定的策略以及所述边界框的尺寸信息,对所述目标进行跟随时,所述处理器具体用于:
    根据所确定的算法以及所述目标所在的边界框的尺寸信息,计算所述目标的距离;
    根据所述目标的距离,确定跟随的加速度。
  26. 根据权利要求25所述的装置,其特征在于,对于同样尺寸的边界框,直立行走状态下对应的跟随的加速度大于直立行走状态下的跟随的加速度。
  27. 根据权利要求22所述的装置,其特征在于,在获取拍摄的图像时,所述处理器具体用于:
    获取可移动设备中的拍摄装置拍摄的图像;
    相应的,在根据所述目标的姿态信息以及所述边界框的尺寸信息,对所述目标进行跟随时,所述处理器具体用于:
    根据所述姿态信息以及所述边界框的尺寸信息,控制所述可移动设备跟随所述目标。
  28. 根据权利要求22所述的装置,其特征在于,在根据所述图像确定目 标的姿态信息时,所述处理器具体用于:
    根据所述图像确定目标的关键点信息;
    根据所述目标的关键点信息确定所述目标的姿态信息。
  29. 根据权利要求28所述的装置,其特征在于,所述目标的关键点信息包括所述目标的多个关键点的位置信息。
  30. 根据权利要求28所述的装置,其特征在于,在根据所述图像确定目标的关键点信息时,所述处理器具体用于:
    确定所述图像中的目标所在的感兴趣区域ROI图像;
    根据神经网络确定所述ROI图像中的关键点信息。
  31. 根据权利要求30所述的装置,其特征在于,在根据神经网络确定所述ROI图像中的关键点信息时,所述处理器具体用于:
    将所述ROI图像输入至神经网络,得到多个关键点对应的置信度特征图,其中,任一关键点对应的置信度特征图包括各个像素点属于该关键点的概率;
    根据所述多个关键点对应的置信度特征图确定所述目标的关键点信息。
  32. 根据权利要求31所述的装置,其特征在于,在根据所述多个关键点对应的置信度特征图确定所述目标的关键点信息时,所述处理器具体用于:
    在任一关键点对应的置信度特征图中,确定属于该关键点的概率最高的像素点;
    若所述概率最高的像素点对应的概率大于预设阈值,则所述目标的该关键点的位置信息为所述概率最高的像素点的位置信息。
  33. 根据权利要求30所述的装置,其特征在于,在根据神经网络确定所述ROI图像中的关键点信息之前,所述处理器还用于:
    获取训练样本,所述训练样本包括样本图像及所述样本图像对应的置信度特征图;
    根据训练样本,对所述神经网络进行训练。
  34. 根据权利要求33所述的装置,其特征在于,在获取训练样本时,所述处理器具体用于:
    获取样本图像及所述样本图像中的关键点的位置信息;
    根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图;
    其中,所述样本图像对应的置信度特征图中,距离所述关键点越近的像素点对应的概率越高。
  35. 根据权利要求34所述的装置,其特征在于,在根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图时,所述处理器具体用于:
    根据所述关键点的位置信息,通过二维高斯分布确定所述样本图像对应的置信度特征图。
  36. 根据权利要求30所述的装置,其特征在于,所述神经网络输出的置信度特征图的像素点个数小于所述ROI图像的像素点个数。
  37. 根据权利要求28所述的装置,其特征在于,在根据所述目标的关键点信息确定所述目标的姿态信息时,所述处理器具体用于:
    根据所述目标的多个关键点之间形成的连线中的至少部分连线,确定所述目标的姿态信息。
  38. 根据权利要求37所述的装置,其特征在于,在根据所述目标的多个关键点之间形成的连线中的至少部分连线,确定所述目标的姿态信息时,所述处理器具体用于:
    计算所述目标的多个关键点之间形成的连线中的至少部分连线对应的角度信息;
    根据所述至少部分连线对应的角度信息,确定所述目标的姿态信息。
  39. 根据权利要求38所述的装置,其特征在于,每条连线对应的角度信息包括:所述连线与基准线之间的夹角,和/或,所述连线与其它任意一个或多个连线之间的夹角;所述基准线为水平线或竖直线。
  40. 根据权利要求28所述的装置,其特征在于,在根据所述目标的关键点信息确定所述目标的姿态信息时,所述处理器具体用于:
    根据所述目标的关键点信息确定所述目标的身体倾斜角度和/或腿部弯曲角度;
    根据所述目标的身体倾斜角度和/或腿部弯曲角度,确定所述目标的姿态信息。
  41. 根据权利要求40所述的装置,其特征在于,所述目标的身体倾斜角度包括左侧身体倾斜角度和/或右侧身体倾斜角度,其中,任意一侧的身体倾斜角度为该侧第一连线和第二连线之间的夹角,所述第一连线为所述目标的该侧肩关节与同侧髋关节之间的连线,所述第二连线为所述髋关节与同侧膝盖之间的连线;
    所述目标的腿部弯曲角度包括左侧腿部弯曲角度和/或右侧腿部弯曲角 度,其中,任意一侧的腿部弯曲角度为该侧第三连线和第四连线之间的夹角,所述第三连线为所述目标的该侧脚踝与同侧膝盖之间的连线,所述第四连线为所述膝盖与同侧髋关节之间的连线。
  42. 根据权利要求41所述的装置,其特征在于,在根据所述目标的身体倾斜角度和/或腿部弯曲角度,确定所述目标的姿态信息时,所述处理器具体用于:
    若任意一侧的身体倾斜角度小于第一角度,或者,若两侧腿部弯曲角度均小于第二角度,则确定所述目标处于非直立行走状态。
  43. 一种可移动设备,其特征在于,包括权利要求22-42任一项所述的目标跟随装置。
  44. 根据权利要求43所述的设备,其特征在于,所述可移动设备还包括:
    拍摄装置,用于拍摄图像并发送给所述处理器;
    驱动装置,用于在所述处理器的控制下驱动所述可移动设备对所述目标进行跟随。
  45. 根据权利要求43所述的设备,其特征在于,所述可移动设备为无人机或无人车。
  46. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序指令,所述程序指令用于实现权利要求1-21中任意一项所述的目标跟随方法。
PCT/CN2020/080439 2020-03-20 2020-03-20 目标跟随方法、目标跟随装置、可移动设备和存储介质 WO2021184359A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080004952.4A CN112639874A (zh) 2020-03-20 2020-03-20 目标跟随方法、目标跟随装置、可移动设备和存储介质
PCT/CN2020/080439 WO2021184359A1 (zh) 2020-03-20 2020-03-20 目标跟随方法、目标跟随装置、可移动设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/080439 WO2021184359A1 (zh) 2020-03-20 2020-03-20 目标跟随方法、目标跟随装置、可移动设备和存储介质

Publications (1)

Publication Number Publication Date
WO2021184359A1 true WO2021184359A1 (zh) 2021-09-23

Family

ID=75291245

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/080439 WO2021184359A1 (zh) 2020-03-20 2020-03-20 目标跟随方法、目标跟随装置、可移动设备和存储介质

Country Status (2)

Country Link
CN (1) CN112639874A (zh)
WO (1) WO2021184359A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115037877A (zh) * 2022-06-08 2022-09-09 湖南大学重庆研究院 自动跟随方法、装置以及安全监测方法、装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113515143B (zh) * 2021-06-30 2024-06-21 深圳市优必选科技股份有限公司 机器人导航方法、机器人及计算机可读存储介质
CN114326766B (zh) * 2021-12-03 2024-08-20 深圳先进技术研究院 一种车机协同自主跟踪与降落方法
CN115920420A (zh) * 2023-02-20 2023-04-07 自贡创赢智能科技有限公司 一种跟随式电动恐龙

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108351649A (zh) * 2015-09-15 2018-07-31 深圳市大疆创新科技有限公司 用于uav交互指令和控制的系统和方法
CN108399642A (zh) * 2018-01-26 2018-08-14 上海深视信息科技有限公司 一种融合旋翼无人机imu数据的通用目标跟随方法和系统
CN109241875A (zh) * 2018-08-20 2019-01-18 北京市商汤科技开发有限公司 姿态检测方法及装置、电子设备和存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105892493B (zh) * 2016-03-31 2019-03-01 纳恩博(常州)科技有限公司 一种信息处理方法和移动装置
CN108986164B (zh) * 2018-07-03 2021-01-26 百度在线网络技术(北京)有限公司 基于图像的位置检测方法、装置、设备及存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108351649A (zh) * 2015-09-15 2018-07-31 深圳市大疆创新科技有限公司 用于uav交互指令和控制的系统和方法
CN108399642A (zh) * 2018-01-26 2018-08-14 上海深视信息科技有限公司 一种融合旋翼无人机imu数据的通用目标跟随方法和系统
CN109241875A (zh) * 2018-08-20 2019-01-18 北京市商汤科技开发有限公司 姿态检测方法及装置、电子设备和存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115037877A (zh) * 2022-06-08 2022-09-09 湖南大学重庆研究院 自动跟随方法、装置以及安全监测方法、装置

Also Published As

Publication number Publication date
CN112639874A (zh) 2021-04-09

Similar Documents

Publication Publication Date Title
WO2021184359A1 (zh) 目标跟随方法、目标跟随装置、可移动设备和存储介质
CN110570455B (zh) 一种面向房间vr的全身三维姿态跟踪方法
CN107813310B (zh) 一种基于双目视觉多手势机器人控制方法
CN106055091B (zh) 一种基于深度信息和校正方式的手部姿态估计方法
WO2020228217A1 (zh) 移乘搬运护理机器人的人体姿态视觉识别方法、存储介质及电子装置
US6853880B2 (en) Autonomous action robot
CN102971768B (zh) 姿势状态估计装置及姿势状态估计方法
US11940774B2 (en) Action imitation method and robot and computer readable storage medium using the same
JP4149213B2 (ja) 指示位置検出装置及び自律ロボット
CN110728739B (zh) 一种基于视频流的虚拟人控制与交互方法
CN111179426A (zh) 基于深度学习的机器人室内环境三维语义地图构建方法
TWI759767B (zh) 智慧車運動控制方法、設備和儲存介質
WO2022042304A1 (zh) 识别场景轮廓的方法、装置、计算机可读介质及电子设备
Sun et al. Gesture-based piloting of an aerial robot using monocular vision
WO2022217794A1 (zh) 一种动态环境移动机器人的定位方法
US20220262093A1 (en) Object detection method and system, and non-transitory computer-readable medium
CN111898519B (zh) 便携式的特定区域内运动训练辅助视觉伺服机器人系统及姿态评估方法
US10964046B2 (en) Information processing apparatus and non-transitory computer readable medium storing information processing program for estimating face orientation by using an omni-directional camera
CN114036969A (zh) 一种多视角情况下的3d人体动作识别算法
WO2021203368A1 (zh) 图像处理方法、装置、电子设备和存储介质
WO2022266853A1 (en) Methods and devices for gesture recognition
Hata et al. Detection of distant eye-contact using spatio-temporal pedestrian skeletons
Haker et al. Self-organizing maps for pose estimation with a time-of-flight camera
TWI686775B (zh) 利用影像偵測閱讀姿勢之方法及系統、電腦可讀取之記錄媒體及電腦程式產品
CN114789440B (zh) 基于图像识别的目标对接方法、装置、设备及其介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20925542

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20925542

Country of ref document: EP

Kind code of ref document: A1