WO2021203368A1 - 图像处理方法、装置、电子设备和存储介质 - Google Patents

图像处理方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2021203368A1
WO2021203368A1 PCT/CN2020/083997 CN2020083997W WO2021203368A1 WO 2021203368 A1 WO2021203368 A1 WO 2021203368A1 CN 2020083997 W CN2020083997 W CN 2020083997W WO 2021203368 A1 WO2021203368 A1 WO 2021203368A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
image
target
key point
information
Prior art date
Application number
PCT/CN2020/083997
Other languages
English (en)
French (fr)
Inventor
任创杰
李思晋
李鑫超
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN202080004938.4A priority Critical patent/CN112655021A/zh
Priority to PCT/CN2020/083997 priority patent/WO2021203368A1/zh
Publication of WO2021203368A1 publication Critical patent/WO2021203368A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker

Definitions

  • the embodiments of the present invention relate to the technical field of drones, and in particular to an image processing method, device, electronic equipment, and storage medium.
  • the disadvantage of the prior art is that the steps required to use the corresponding function are relatively cumbersome, take a long time, and the use efficiency of the equipment is low.
  • the embodiments of the present invention provide an image processing method, device, electronic equipment, and storage medium, which are used to solve the technical problems of cumbersome operation steps and low operation efficiency of electronic equipment in the prior art.
  • the first aspect of the present invention provides an image processing method, including:
  • the second aspect of the present invention provides an image processing device, including:
  • Memory used to store computer programs
  • the processor is configured to run a computer program stored in the memory to realize:
  • a third aspect of the present invention provides an electronic device, including the image processing device described in the second aspect.
  • a fourth aspect of the embodiments of the present invention provides a computer-readable storage medium in which program instructions are stored, and the program instructions are used to implement the method described in the first aspect.
  • An image processing method, apparatus, electronic device, and storage medium provided by the embodiments of the present invention can acquire a captured video stream, and determine a target whose posture information meets preset conditions according to at least one frame of image in the video stream, and Enabling the function corresponding to the preset conditions simplifies the steps required to use the corresponding function, reduces the time spent, improves the use efficiency of the device, and provides users with more complete human-computer interaction functions and more friendly human-computers Interactive experience, improve user experience.
  • FIG. 1 is a schematic flowchart of an image processing method according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic flowchart of an image processing method according to Embodiment 2 of the present invention.
  • FIG. 3 is a schematic flowchart of an image processing method according to Embodiment 3 of the present invention.
  • FIG. 4 is a schematic diagram of the positions of key points of a waving gesture with one hand in an image processing method according to Embodiment 3 of the present invention
  • FIG. 5 is a schematic flowchart of determining user key point information in an image processing method provided in Embodiment 3 of the present invention.
  • FIG. 6 is a schematic diagram of the principle of determining key point information in an image processing method provided by Embodiment 3 of the present invention.
  • FIG. 7 is a schematic diagram of the positions of the Gaussian distribution area and the zero response background of the confidence feature map in the image processing method provided in the third embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of an image processing apparatus according to Embodiment 4 of the present invention.
  • the image processing method provided by the embodiment of the present invention can determine the user's posture information from the captured image, and activate the corresponding function according to the posture information.
  • the method provided by the embodiment of the present invention can be applied to any electronic device, such as mobile phone, camera, cloud Taiwan, drones, unmanned vehicles, AR (Augmented Reality) equipment, monitoring equipment, etc.
  • FIG. 1 is a schematic flowchart of an image processing method according to Embodiment 1 of the present invention. As shown in Figure 1, the image processing method in this embodiment may include:
  • Step 101 Obtain a captured video stream.
  • the execution subject of the method in this embodiment may be an image processing device in a drone.
  • the drone may be provided with a photographing device, and acquiring the photographed video stream in this step may specifically include: acquiring the video stream photographed by the photographing device of the drone.
  • Step 102 Determine a target whose posture information meets a preset condition according to at least one frame of image in the video stream.
  • the video stream shot by the shooting device may include multiple frames of images, at least one frame of image is selected from the multiple frames of images, and a target in which the posture information meets a preset condition is determined.
  • the target may be an object such as a person or a car. If the target is a person, the posture information may include, but is not limited to: standing, walking, squatting, lying down, etc. If the target is a car, the posture information may include, but is not limited to: go straight, turn left, turn right, and so on.
  • Step 103 Enable the function corresponding to the preset condition.
  • the function activated in this step can be any function of the drone, and the preset conditions and activated functions can be set according to actual needs.
  • the posture information satisfying the preset condition may include, but is not limited to, any one or more of: appearing in a predetermined posture, maintaining the predetermined posture for more than a preset time, and changing from the first posture to the second posture.
  • the enabled corresponding functions can include but are not limited to any one or more of take-off, landing, change of attitude, audio recording, video recording, photographing, entering power saving mode, and shutting down.
  • the drone may be provided with an audio playback device, and if a user clapping is detected, the function of automatically playing music may be turned on.
  • the drone can be used to track the vehicle and activate the corresponding function according to the vehicle's posture information. For example, if it is detected that the vehicle is turning, it can be raised UAVs to expand the field of vision and prevent vehicles from being lost.
  • the image processing method provided in this embodiment can obtain a captured video stream, determine a target whose posture information meets a preset condition based on at least one frame of image in the video stream, and activate a function corresponding to the preset condition, It simplifies the steps required to use the corresponding functions, reduces the time spent, improves the use efficiency of the drone, provides users with more complete human-computer interaction functions and a more friendly human-computer interaction experience, and improves user experience.
  • the second embodiment of the present invention provides an image processing method. This embodiment is based on the technical solution provided by the foregoing embodiment, and automatically enters the follow mode when the user is detected to wave a hand.
  • FIG. 2 is a schematic flowchart of an image processing method according to Embodiment 2 of the present invention. As shown in Figure 2, the image processing method in this embodiment may include:
  • Step 201 Obtain a captured video stream, where at least one frame of image in the video stream is used to determine the posture information of the user.
  • the image used to determine the user's posture information is recorded as the image to be processed.
  • a frame of image can be selected from the video stream as the image to be processed, which is simple and convenient for calculation, and can effectively improve the efficiency of user gesture detection.
  • continuous multiple frames of images of the video stream may be used as images to be processed, which can effectively improve the accuracy of user gesture detection.
  • multiple frames of images can be selected from the video stream at intervals, for example, one frame of images is selected every 1 second, which can balance efficiency and accuracy.
  • Step 202 For each frame of the at least one frame of image, determine the posture information of at least one user in the image.
  • the neural network can be trained through samples, and the trained neural network can be used to process the image to obtain the corresponding posture information.
  • algorithms such as OpenPose and YOLO can also be used directly to detect the user's posture information in the image.
  • the posture information of at least one user in the image can be obtained through step 202.
  • step 202 can obtain the posture information of the user in the multiple frames of images. Some users may only appear in one or a few frames of images, but the posture information of these users can still be detected.
  • Step 203 Determine a target to be followed according to the determined posture information of at least one user, where the target to be followed is a user whose posture information meets a preset condition.
  • the target to be followed may be one or more.
  • a target to be followed is taken as an example for description.
  • determining the target to be followed according to the determined posture information of at least one user may include: if there is and only one user’s posture information satisfies a preset condition, determining that the user is The target to follow.
  • the preset condition may be maintaining the preset posture for more than a preset time. Then, if there is and only one user maintains the preset posture for more than the preset time, it is determined that the user is the target to be followed.
  • the preset posture may be a one-handed waving posture, and the preset time may be 1 second. Then, only when a single user is in a single-handed wave state for more than 1 second, can it become the target to be followed. If a single person waved with both hands, lowered both hands, single-handed raising time is not long enough, or multiple people waved with one hand at the same time, it is impossible to determine the target to be followed. By setting and only one user can trigger the automatic follow function when the preset conditions are met, it can quickly and accurately realize single-person tracking and avoid following the wrong target.
  • determining the target to be followed according to the determined posture information of at least one user may include: if the posture information of multiple users meets a preset condition, determining the multiple users The user who is detected first to meet the preset conditions is the target to be followed.
  • the user who is first detected to wave their hands with one hand for more than 1 second may be the target to be followed.
  • the user who first meets the posture condition as the target to be followed interference from other users can be effectively avoided, and smooth follow-up can be ensured.
  • determining the target to be followed according to the determined posture information of at least one user may include: if the posture information of multiple users meets a preset condition, determining the multiple users The user closest to the center of the shooting frame is the target to be followed.
  • the user who is closest to the center of the screen can be selected as the target to be followed among the users who have waved with one hand for more than 1 second.
  • the target to be followed is closest to the center of the screen, saving time for turning to the target, and improving the efficiency of following.
  • determining the target to be followed according to the determined posture information of at least one user may include: if the posture information of multiple users meets a preset condition and the multiple users include The preset user is determined to be the target to be followed.
  • the multiple users can be identified, and if the preset users are included, the preset users can be used as the target to be followed.
  • the identity recognition can be realized through face recognition, iris recognition, and the like.
  • the preset user may be any user previously set.
  • the owner of a drone can set himself as a preset user.
  • the drone can recognize the owner and use it as the user to follow. Target.
  • the personalized needs of users can be effectively met.
  • Step 204 Follow the target.
  • gestures can also be used as gestures to trigger automatic follow-up, such as clapping, nodding and so on.
  • following the target can be achieved by always controlling the distance between the drone and the target within a preset range. For example, if the target moves forward, the drone will also move forward, and if the target stops, the drone will also stop.
  • the specific follow strategy can be set according to actual needs, which is not limited in this embodiment.
  • the image processing method by acquiring a captured video stream, at least one frame of the image in the video stream is used to determine the posture information of the user, and for each frame of the at least one frame of image, all the images are determined.
  • the posture information of at least one user in the image If there is a user’s posture that satisfies a preset condition, such as a one-handed wave or other posture, it can be determined that the user is the target to be followed, and the target can be followed.
  • the third embodiment of the present invention provides an image processing method.
  • the detection of the user's posture is achieved by first determining the key points and then determining the posture information.
  • FIG. 3 is a schematic flowchart of an image processing method according to Embodiment 3 of the present invention. As shown in FIG. 3, the image processing method in this embodiment may include:
  • Step 301 Obtain a captured video stream, where at least one frame of image in the video stream is used to determine the posture information of the user.
  • step 301 the specific implementation principle and method of step 301 can be referred to the foregoing embodiment, which will not be repeated here.
  • Step 302 For each frame of the at least one frame of image, determine at least one user to be analyzed according to the image.
  • all users in the image can be identified by means such as Multi-Object Tracking (MOT), and the at least one user to be analyzed can be all or part of the detected images in the image. user.
  • MOT Multi-Object Tracking
  • a preset number of users can be selected from all users as the at least one user to be analyzed, which can effectively improve the efficiency of the algorithm and reduce the burden on the device.
  • the preset number can be set according to actual needs, for example, it can be 4.
  • the number of all users in the image is less than or equal to the preset number, then all users are regarded as the object to be analyzed; if the number of all users in the image is greater than the preset number, it can be determined according to a certain Conditions filter users.
  • a preset number of users close to the center of the image may be selected as the at least one user to be analyzed.
  • the image center may refer to the horizontal center line of the image, may also refer to the vertical center line of the image, or may also refer to the center point of the image.
  • a preset number of users with the most foreground in the image may be selected as the at least one user to be analyzed.
  • the preset number of users with the most prospects may refer to the preset number of users with the closest distance to the device.
  • the judgment of the distance can be achieved by means of image definition changes or infrared detection.
  • Step 303 In each frame of image, for each of the at least one user to be analyzed, determine the key point information of the user, and determine the key point information of the user according to the key point information of the user. Posture information.
  • the key point information of each user can be detected, and the posture information of the user can be determined according to the key point information.
  • a deep learning algorithm such as a neural network can be used to directly determine the key point information in the image.
  • the key point information of the user may include position information of multiple key points of the user.
  • the location information may specifically be the coordinates of the key point.
  • the multiple key points may include, but are not limited to: nose, middle shoulder, right shoulder joint, right elbow joint, right hand, left shoulder joint, left elbow joint, left hand, right hip joint, right knee, right ankle, At least two of the left hip joint, left knee, and left ankle.
  • the posture information of the user may be determined according to the key point information of the user.
  • FIG. 4 is a schematic diagram of the positions of key points of a waving gesture with one hand in an image processing method according to Embodiment 3 of the present invention.
  • the black dots represent the key points of the user, where the elbow joint 401 on the left is higher than the shoulder joint 402 on the same side, and the elbow joint 404 on the right is lower than the shoulder joint 403 on the same side. Therefore, it can be determined that the user is waving with one hand.
  • Step 304 Determine a target to be followed according to the determined posture information of at least one user, where the target to be followed is a user whose posture information meets a preset condition.
  • Step 305 follow the target.
  • step 304 to step 305 can be referred to the foregoing embodiment, and will not be repeated here.
  • At least one user to be analyzed can be determined according to the image, and for each of the at least one user to be analyzed, key points of the user can be determined Information, and determine the user’s posture information according to the user’s key point information, which can effectively improve the efficiency of detection and ensure that the corresponding function is activated in time and accurately; and the key point information is determined first and then the corresponding posture information can be determined.
  • a more comprehensive analysis of human body posture compared with the scheme of directly outputting posture information based on neural networks, the recognition accuracy is higher and more flexible, and when it is necessary to change the action category to be recognized, there is no need to re-recognize all samples. Marking saves labor costs and reduces the amount of development when requirements change.
  • an optional implementation method is to directly determine the image based on the entire image through a deep learning algorithm.
  • the key point information of the user is to first determine the region of interest (ROI) image where the user is located, and then determine the key point information in the ROI image according to a neural network.
  • ROI region of interest
  • FIG. 5 is a schematic flowchart of determining user key point information in an image processing method according to Embodiment 3 of the present invention. For each user to be analyzed in the image, the method in Figure 5 can be used to determine its key point information. As shown in Figure 5, determining the key point information of the user may include:
  • Step 501 Determine the ROI image where the user is located.
  • the captured image can be cropped through the bounding box where the user is located to obtain the ROI image corresponding to the user.
  • FIG. 6 is a schematic diagram of the principle of determining key point information in an image processing method provided in Embodiment 3 of the present invention.
  • the captured image may be an RGB image.
  • the bounding box where the user is located in the RGB image can be determined, and the category of the bounding box is human.
  • the manifestation of the bounding box can be the coordinate information of the four corners of the bounding box, and the ROI image corresponding to the user can be determined through the bounding box and the RGB image.
  • methods such as multi-target tracking algorithms can identify all users in the image and select the users to be analyzed.
  • the multi-target tracking algorithm can obtain the bounding boxes corresponding to multiple users. When the number of bounding boxes is greater than the preset number, select the preset number of bounding boxes from them, and use the RGB image and the preset number of bounding boxes. Box as input, you can get the corresponding ROI image.
  • the bounding box of 5 users can be determined from the RGB image, and the bounding box of 4 users can be selected from it.
  • 4 ROI images can be cropped from the GRB image, which are the ROI images corresponding to the 4 users.
  • Step 502 Input the ROI image to the neural network to obtain the confidence feature maps corresponding to multiple key points.
  • the confidence feature map corresponding to any key point includes the probability that each pixel belongs to the key point.
  • the ROI image of the user can be input into the neural network model, and the model can be used to determine the confidence feature map corresponding to the user.
  • the adopted model may be a convolutional neural network (Convolutional Neural Networks, CNN), and specifically may be a fully convolutional neural network (Fully Convolutional Networks, FCN).
  • the processing for the neural network may include two stages of training and detection.
  • the training phase can be implemented before the detection phase, or the neural network can be trained between any two detections.
  • samples can be used to train the neural network and adjust the parameters in the neural network so that the output result is similar to the target result.
  • the detection stage the neural network parameters that have been fully trained are used to detect the image and output the confidence feature map.
  • the training process may include: obtaining training samples, the training samples including sample images and a confidence feature map corresponding to the sample images; training the neural network according to the training samples.
  • the neural network is trained by using the confidence feature map as the target result, so that the output result of the neural network is close to the target result, which can effectively improve the anti-interference performance of the neural network and avoid over-fitting of the neural network.
  • the process of acquiring the training sample may include: acquiring a sample image and position information of key points in the sample image; and determining a confidence feature map corresponding to the sample image according to the position information of the key points.
  • a confidence feature map corresponding to the sample image a pixel point closer to the key point has a higher corresponding probability.
  • the sample image may be an ROI image cropped from any image obtained from the database.
  • the position information of the key points in the image is determined by manual labeling, and the position information of the key points is generated according to the position information of the key points. Confidence characteristic map.
  • the confidence feature map corresponding to the shoulder joint can be generated according to the position information.
  • the principle of generating the confidence feature map is that the closer the pixel is to the real position of the shoulder joint, the greater the probability that the pixel belongs to the shoulder joint.
  • the pixel with coordinates (50, 50) has the largest probability. Is 0.8, the corresponding probability of the pixel with coordinates (55, 55) should be greater than the probability of the pixel with coordinates (60, 60), for example, the corresponding probabilities of the two can be 0.1 and 0.01, respectively.
  • the image edge is far away from ( The probability that the pixels of 50, 50) belong to the shoulder joint is very small, close to zero.
  • the confidence feature map corresponding to the sample image may be generated through a two-dimensional Gaussian distribution according to the position information of the key points.
  • the position coordinates of the pixel points can obey the expected key point coordinates and the two-dimensional Gaussian distribution with variance D1; or the distance between the pixel point and the marked key point can obey the expectation as 0. Gaussian distribution with variance D2.
  • the variances D1 and D2 can be set according to actual needs.
  • the two-dimensional Gaussian distribution is used to determine the confidence feature map corresponding to the sample image, which can effectively simulate the probability that each pixel is a key point and improve the detection accuracy.
  • the confidence feature map may also be composed of a Gaussian distribution and a background with zero response. Specifically, within the preset range around the key point, the probability corresponding to each pixel point can be determined according to the Gaussian distribution. Outside the preset range, a zero-response background can be set. The probability corresponding to each pixel is set to 0.
  • the Gaussian distribution is used to generate the probability corresponding to each pixel point.
  • the preset range may be centered on the shoulder joint with a radius of The circle of 5, when a pixel point is separated by more than 5 pixels from the coordinate point of the shoulder joint in the image, the pixel point is almost impossible to belong to the shoulder joint, and the corresponding probability is 0.
  • FIG. 7 is a schematic diagram of the positions of the Gaussian distribution area and the zero response background of the confidence feature map in the image processing method provided in the third embodiment of the present invention.
  • the black dot in the middle represents the key points manually labeled, and the shaded part represents the Gaussian distribution area.
  • the probability of each pixel in this area is determined by the Gaussian distribution.
  • the area outside the shadow is In the zero response background area, the probability of each pixel in the zero response background area is 0.
  • the confidence feature map is composed of Gaussian distribution and zero response background, which can effectively simplify the generation process of the confidence feature map and improve the generation efficiency and accuracy of the confidence feature map.
  • Gaussian distribution In addition to the Gaussian distribution, other methods can also be used to generate a confidence feature map based on the location of the marked key point. As long as the distance between the pixel point and the key point is greater, the probability that the pixel point belongs to the key point is lower. .
  • a confidence feature map can be generated for each key point.
  • the neural network is trained to determine the confidence feature maps corresponding to the key points in the image according to the images.
  • the actual captured images can be processed according to the neural network obtained by the training. As shown in FIG. 6, inputting the ROI image into the neural network can obtain the confidence feature maps corresponding to multiple key points.
  • Step 503 Determine key point information of the user according to the confidence characteristic maps corresponding to the multiple key points.
  • the position information of the multiple key points can be determined according to the confidence feature maps.
  • four key points of left and right shoulder joints and left and right elbow joints need to be used when determining the posture information of the target, and the captured images are input into the neural network, and the confidence feature maps corresponding to the four key points can be obtained through the neural network. According to the four confidence feature maps, the locations of the four key points can be determined respectively.
  • determining the key point information of the user according to the confidence feature maps corresponding to the multiple key points in this step may include: determining that the key point belongs to the confidence feature map corresponding to any key point If the probability corresponding to the pixel with the highest probability is greater than the preset threshold, the location information of the key point of the user is the location information of the pixel with the highest probability.
  • the corresponding probability is 0.7, which is greater than the preset threshold, then the pixel belongs to the credibility of the shoulder joint If the degree is high enough, the coordinates of the shoulder joint can be considered as (10, 10). If the probability corresponding to the pixel with the highest probability is less than the preset threshold, it means that the probability of all the pixels belonging to the shoulder joint is not high enough, and it can be considered that the shoulder joint is missing in the figure.
  • the preset threshold can be set according to actual needs, for example, it can be 0.5.
  • the corresponding posture information can be determined according to the key point information. Specifically, after the key points are obtained, the limbs can be formed according to the connection relationship formed between the key points, and the formed limbs can be used as the basis for determining the posture.
  • the method of determining user key point information provided in Figure 5 can determine the position of the key point through the confidence feature map. Compared with the scheme that directly uses the key point coordinates as the training target, overfitting is less likely to occur, and the recognition accuracy is Higher, with stronger anti-interference, no need to collect a large number of samples and label corresponding data, reducing the workload of manual labeling; through the two-dimensional Gaussian distribution, the confidence characteristic map corresponding to the sample image can be quickly and accurately determined , Makes the training process more stable, avoids manual labeling errors, has anti-interference, and improves the accuracy of key point recognition.
  • the number of pixels in the confidence feature map output by the neural network may be less than the number of pixels in the input ROI image.
  • the ROI image is an RGB image of h*w*3, h and w are the input length and width respectively
  • each confidence feature map includes 25*25 pixels.
  • the size of the target result can be set to 1/4 of the input image, and the function of reducing the image through the neural network can be realized.
  • FIG. 8 is a schematic structural diagram of an image processing apparatus according to Embodiment 4 of the present invention.
  • the image processing device may execute the image processing method corresponding to FIG. 1.
  • the image processing device may include:
  • the memory 11 is used to store computer programs
  • the processor 12 is configured to run a computer program stored in the memory to realize:
  • the structure of the image processing apparatus may further include a communication interface 13 for communicating with other devices or a communication network.
  • the processor 12 when the function corresponding to the preset condition is activated, the processor 12 is specifically configured to:
  • the processor 12 when determining a target whose posture information satisfies a preset condition according to at least one frame of image in the video stream, the processor 12 is specifically configured to:
  • the target to be followed is determined according to the determined posture information of at least one user, where the target to be followed is a user whose posture information meets a preset condition.
  • the processor 12 when determining the target to be followed according to the posture information of the at least one user, the processor 12 is specifically configured to:
  • the processor 12 when determining the target to be followed according to the posture information of the at least one user, the processor 12 is specifically configured to:
  • the posture information of multiple users meets the preset condition, it is determined that the user who meets the preset condition is the first to be detected among the multiple users as the target to be followed.
  • the processor 12 when determining the target to be followed according to the posture information of the at least one user, the processor 12 is specifically configured to:
  • the user closest to the center of the shooting screen among the multiple users is determined as the target to be followed.
  • the processor 12 when determining the target to be followed according to the posture information of the at least one user, the processor 12 is specifically configured to:
  • the posture information of a plurality of users meets a preset condition and the plurality of users includes a preset user, then it is determined that the preset user is the target to be followed.
  • the processor 12 is specifically configured to:
  • the preset posture is a one-handed waving posture.
  • the processor 12 when determining the posture information of at least one user in the image, is specifically configured to:
  • the processor 12 when determining at least one user to be analyzed according to the image, the processor 12 is specifically configured to:
  • the processor 12 when selecting a preset number of users from all users as the at least one user to be analyzed, the processor 12 is specifically configured to:
  • a preset number of users close to the center of the image are selected as the at least one user to be analyzed.
  • the processor 12 when determining the key point information of the user, is specifically configured to:
  • the key point information in the ROI image is determined according to the neural network.
  • the processor 12 when determining the ROI image of the region of interest where the user is located, the processor 12 is specifically configured to:
  • the captured image is cropped according to the bounding box where the user is located determined according to the multi-target tracking algorithm to obtain the ROI image corresponding to the user.
  • the processor 12 when determining the key point information in the ROI image according to a neural network, the processor 12 is specifically configured to:
  • the key point information of the user is determined according to the confidence characteristic maps corresponding to the multiple key points.
  • the processor 12 when determining the key point information of the user according to the confidence characteristic maps corresponding to the multiple key points, the processor 12 is specifically configured to:
  • the location information of the key point of the user is the location information of the pixel with the highest probability.
  • the processor 12 is further configured to:
  • the training sample including a sample image and a confidence feature map corresponding to the sample image
  • the neural network is trained.
  • the processor 12 when acquiring training samples, is specifically configured to:
  • a pixel point closer to the key point has a higher corresponding probability.
  • the processor 12 when determining the confidence characteristic map corresponding to the sample image according to the position information of the key point, the processor 12 is specifically configured to:
  • the confidence feature map corresponding to the sample image is determined through a two-dimensional Gaussian distribution.
  • the number of pixels in the confidence feature map output by the neural network is less than the number of pixels in the ROI image.
  • the processor 12 when determining the posture information of the user according to the key point information of the user, the processor 12 is specifically configured to:
  • the elbow joint on either side of the user is higher than the ipsilateral shoulder joint, and the elbow joint on the other side is lower than the ipsilateral shoulder joint, it is determined that the user is in a one-handed waving posture.
  • the image processing device shown in FIG. 8 can execute the methods of the embodiments shown in FIG. 1 to FIG. 7. For parts that are not described in detail in this embodiment, reference may be made to the related descriptions of the embodiments shown in FIG. 1 to FIG. 7. For the implementation process and technical effects of this technical solution, please refer to the description in the embodiment shown in FIG. 1 to FIG. 7, which will not be repeated here.
  • An embodiment of the present invention also provides an electronic device, including the image processing device described in any of the foregoing embodiments.
  • the electronic device is a drone or an unmanned vehicle.
  • the electronic device may further include:
  • a photographing device for sending the photographed video stream to the processor
  • the driving device is used to drive the electronic device to follow the target under the control of the processor.
  • the driving device may be a motor or the like, and the movement of the electronic device can be realized by the driving device, so as to realize the following the target.
  • an embodiment of the present invention provides a storage medium, the storage medium is a computer-readable storage medium, the computer-readable storage medium stores program instructions, and the program instructions are used to implement the embodiments shown in FIGS. 1 to 7 above. Image processing methods in.
  • the disclosed related devices and methods can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple units or components may be divided. It can be combined or integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present invention essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium.
  • the aforementioned storage media include: U disk, mobile hard disk, Read-Only Memory (ROM), Random Access Memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes.

Abstract

本发明实施例提供一种图像处理方法、装置、电子设备和存储介质,其中方法,包括:获取拍摄的视频流;根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标;启用与所述预设条件对应的功能。本发明实施例提供的图像处理方法、装置、电子设备和存储介质,可以获取拍摄的视频流,根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标,并启用与所述预设条件对应的功能,简化了使用相应功能所需的步骤,减少了花费的时间,提高了设备的使用效率,为用户提供更完善的人机交互功能和更友好的人机交互体验,提高用户体验度。

Description

图像处理方法、装置、电子设备和存储介质 技术领域
本发明实施例涉及无人机技术领域,尤其涉及一种图像处理方法、装置、电子设备和存储介质。
背景技术
现有技术中,在智能设备与用户进行交互的过程中,往往需要用户进行一定的操作才能使用相应的功能。以提供智能跟随功能的无人机为例,用户想要进入智能跟随模式,需要在无人机或者绑定的手机上进行一系列的复杂操作,按照提示一步一步完成指定的步骤,才能使用无人机的智能跟随功能。
现有技术的不足之处在于,使用相应功能需要的步骤比较繁琐,花费的时间较久,设备的使用效率低下。
发明内容
本发明实施例提供了一种图像处理方法、装置、电子设备和存储介质,用于解决现有技术中电子设备的操作步骤繁琐、操作效率低下的技术问题。
本发明的第一方面提供了一种图像处理方法,包括:
获取拍摄的视频流;
根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标;
启用与所述预设条件对应的功能。
本发明的第二方面提供了一种图像处理装置,包括:
存储器,用于存储计算机程序;
处理器,用于运行所述存储器中存储的计算机程序以实现:
获取拍摄的视频流;
根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标;
启用与所述预设条件对应的功能。
本发明的第三方面提供了一种电子设备,包括第二方面所述的图像处理装置。
本发明实施例第四方面提供一种计算机可读存储介质,所述计算机可读存储介质中存储有程序指令,所述程序指令用于实现第一方面所述的方法。
本发明实施例提供的一种图像处理方法、装置、电子设备和存储介质,可以获取拍摄的视频流,根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标,并启用与所述预设条件对应的功能,简化了使用相应功能所需的步骤,减少了花费的时间,提高了设备的使用效率,为用户提供更完善的人机交互功能和更友好的人机交互体验,提高用户体验度。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本发明实施例一提供的一种图像处理方法的流程示意图;
图2为本发明实施例二提供的一种图像处理方法的流程示意图;
图3为本发明实施例三提供的一种图像处理方法的流程示意图;
图4为本发明实施例三提供的一种图像处理方法中单手挥手姿态的关键点位置示意图;
图5为本发明实施例三提供的一种图像处理方法中确定用户关键点信息的流程示意图;
图6为本发明实施例三提供的一种图像处理方法中确定关键点信息的原理示意图;
图7为本发明实施例三提供的一种图像处理方法中置信度特征图的高斯分布区域和零响应背景的位置示意图;
图8为本发明实施例四提供的一种图像处理装置的结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。
本发明实施例提供的图像处理方法,可以通过拍摄的图像确定用户的姿态信息,并根据姿态信息启用相应的功能,本发明实施例提供的方法可以应用于任意电子设备,例如手机、相机、云台、无人机、无人车、AR(Augmented Reality,增强现实)设备、监控设备等。
下面结合附图,以所述电子设备为无人机为例,对本发明的一些实施方式作详细说明。在各实施例之间不冲突的情况下,下述的实施例或实施例中的特征可以相互组合。
实施例一
本发明实施例一提供一种图像处理方法。图1为本发明实施例一提供的一种图像处理方法的流程示意图。如图1所示,本实施例中的图像处理方法,可以包括:
步骤101、获取拍摄的视频流。
本实施例中方法的执行主体可以为无人机中的图像处理装置。所述无人机上可以设置有拍摄装置,本步骤中的获取拍摄的视频流,可以具体包括:获取无人机的拍摄装置拍摄的视频流。
步骤102、根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标。
所述拍摄装置拍摄的视频流可以包括多帧图像,从所述多帧图像中选择至少一帧图像,确定其中姿态信息满足预设条件的目标。
所述目标可以为人或者车等物体。若所述目标为人,则所述姿态信息可以包括但不限于:站立、行走、下蹲、卧倒等。若所述目标为车,则所述姿态信息可以包括但不限于:直行、左转、右转等。
步骤103、启用与所述预设条件对应的功能。
其中,本步骤中所启用的功能可以为无人机具有的任意功能,所述预设条件与启用的功能可以根据实际需要来设置。例如,所述姿态信息满足预设条件可以包括但不限于:出现预定姿态、维持预定姿态超过预设时间、由第一姿态转为第二姿态等中的任意一项或多项。启用的相应功能可以包括但不限于:起飞、降落、改变姿态、录音、录像、拍照、进入省电模式、关机等 中的任意一项或多项。
在一个可选的实施方式中,所述无人机可以设置有音频播放装置,若检测到用户拍手,则可以开启自动播放音乐的功能。
在另一个可选的实施方式中,所述无人机可以用于对车辆进行跟踪,并根据车辆的姿态信息启动对应的功能,例如,若检测到所述车辆处于转弯状态,则可以升高无人机,以扩大视野,防止跟丢车辆。
本实施例提供的图像处理方法,可以获取拍摄的视频流,根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标,并启用与所述预设条件对应的功能,简化了使用相应功能所需的步骤,减少了花费的时间,提高了无人机的使用效率,为用户提供更完善的人机交互功能和更友好的人机交互体验,提高用户体验度。
实施例二
本发明实施例二提供一种图像处理方法。本实施例是在上述实施例提供的技术方案的基础上,在检测到用户挥手时自动进入跟随模式。
图2为本发明实施例二提供的一种图像处理方法的流程示意图。如图2所示,本实施例中的图像处理方法,可以包括:
步骤201、获取拍摄的视频流,所述视频流中的至少一帧图像用于确定用户的姿态信息。
本步骤中,将用于确定用户姿态信息的图像记为待处理的图像。
在一个可选的实施方式中,可以从所述视频流中选择一帧图像作为待处理的图像,简单、便于计算,能够有效提高用户姿态检测的效率。
在另一个可选的实施方式中,可以将所述视频流的连续多帧图像作为待处理的图像,能够有效提高用户姿态检测的准确性。
在又一个可选的实施方式中,可以从所述视频流中间隔选取多帧图像,例如,每隔1秒选取一帧图像,能够兼顾效率和准确率。
步骤202、针对所述至少一帧图像中的每一帧图像,确定所述图像中的至少一个用户的姿态信息。
可选的,可以通过样本对神经网络进行训练,利用训练后的神经网络对图像进行处理,得到对应的姿态信息。或者,也可以直接利用OpenPose、YOLO等算法检测图像中用户的姿态信息。
在待处理的图像只有一帧的情况下,通过步骤202可以得到所述图像中的至少一个用户的姿态信息。
在待处理的图像有多帧的情况下,通过步骤202可以得到多帧图像中的用户的姿态信息。有些用户可能只在一帧或者少数几帧图像中出现,但是依然可以检测这些用户的姿态信息。
步骤203、根据所确定的至少一个用户的姿态信息,确定待跟随的目标,其中,所述待跟随的目标为姿态信息满足预设条件的用户。
可选的,所述待跟随的目标可以为一个或者多个。在跟随多个目标的场景下,当多个目标分离时,可以停止跟随,也可以从中选择部分目标继续进行跟随。本实施例中,以待跟随的目标为一个为例来进行说明。
在一个可选的实施方式中,根据所确定的至少一个用户的姿态信息,确定待跟随的目标,可以包括:若有且仅有一个用户的姿态信息满足预设条件,则确定所述用户为待跟随的目标。
例如,所述预设条件可以为维持预设姿态超过预设时间。那么,若有且仅有一个用户维持预设姿态超过预设时间,则确定所述用户为待跟随的目标。
可选的,所述预设姿态可以为单手挥手姿态,所述预设时间可以为1秒。那么,只有当单个用户处于单手挥手状态的时间超过1秒时,才能成为待跟随的目标。若单人双手挥手、双手放低、单人单手举高时间不够长,或者多人同时单手挥手,都不能确定待跟随的目标。通过设置有且仅有一个用户满足预设条件时才能触发自动跟随功能,能够快速、准确地实现单人跟踪,避免跟错目标。
在另一个可选的实施方式中,根据所确定的至少一个用户的姿态信息,确定待跟随的目标,可以包括:若有多个用户的姿态信息满足预设条件,则确定所述多个用户中最先被检测到满足预设条件的用户为待跟随的目标。
例如,若有多个用户单手挥手的时间超过1秒,那么可以将最先检测到单手挥手超过1秒的用户作为待跟随的目标。通过将最先满足姿态条件的用户设置为待跟随的目标,能够有效避免其它用户的干扰,保证跟随顺利进行。
在另一个可选的实施方式中,根据所确定的至少一个用户的姿态信息,确定待跟随的目标,可以包括:若有多个用户的姿态信息满足预设条件,则确定所述多个用户中距离拍摄画面中心最近的用户为待跟随的目标。
例如,若有多个用户单手挥手的时间超过1秒,那么可以在单手挥手超过 1秒的用户中选择距离画面中心最近的用户作为待跟随的目标。通过在满足条件的多个用户中选择靠近画面中心的用户作为待跟随的目标,能够保证待跟随的目标最靠近画面中心,节约转向目标的时间,提高跟随的效率。
在另一个可选的实施方式中,根据所确定的至少一个用户的姿态信息,确定待跟随的目标,可以包括:若有多个用户的姿态信息满足预设条件且所述多个用户中包括预设用户,则确定所述预设用户为待跟随的目标。
例如,若有多个用户单手挥手的时间超过1秒,则可以对所述多个用户进行身份识别,如果其中包含了预设用户,则可以将所述预设用户作为待跟随的目标。
其中,所述身份识别可以通过人脸识别、虹膜识别等方式来实现。所述预设用户可以是之前设定的任意用户。例如,无人机的所有者可以将自己设置为预设用户,在有多人同时做出单手挥手的姿态时,无人机可以从中识别出所述所有者,并以其作为待跟随的目标。通过优先对预设用户进行跟随,可以有效满足用户的个性化需求。
步骤204、对所述目标进行跟随。
在确定待跟随的目标后,可以进入跟随模式,对所述目标进行跟随,从而实现通过单手挥手自动进入跟随模式。当然,除了单手挥手以外,也可以将其它姿态作为触发自动跟随的姿态,例如拍手、点头等。
可选的,对所述目标进行跟随,可以通过将无人机与目标之间的距离始终控制在预设范围内来实现。例如,目标向前走,则无人机也随之向前走,目标停止,则无人机也停止。具体的跟随策略可以根据实际需要来设置,本实施例对此不作限制。
本实施例提供的图像处理方法,通过获取拍摄的视频流,所述视频流中的至少一帧图像用于确定用户的姿态信息,针对所述至少一帧图像中的每一帧图像,确定所述图像中的至少一个用户的姿态信息,若有用户的姿态满足预设条件如出现单手挥手或其它姿态,则可以确定所述用户为待跟随的目标,并对所述目标进行跟随,能够有效实现通过单手挥手或其它姿态直接自动进入跟随模式,相比于通过手机连接无人机的遥控器、打开应用程序、点击一系列的按钮、选中跟随目标等一系列操作才能进入跟随模式的方案来说,简化了进入跟随模式所需的步骤,减少了花费的时间,提高了无人机自动跟随的效率,节约了无人机的电量,延长了无人机的使用时间。
实施例三
本发明实施例三提供一种图像处理方法。本实施例是在上述实施例提供的技术方案的基础上,通过先确定关键点再确定姿态信息的方法来实现对用户姿态的检测。
图3为本发明实施例三提供的一种图像处理方法的流程示意图。如图3所示,本实施例中的图像处理方法,可以包括:
步骤301、获取拍摄的视频流,所述视频流中的至少一帧图像用于确定用户的姿态信息。
本实施例中,步骤301的具体实现原理和方法可以参见前述实施例,此处不再赘述。
步骤302、针对所述至少一帧图像中的每一帧图像,根据所述图像确定待分析的至少一个用户。
本步骤中,可以通过多目标跟踪算法(Multi-Object Tracking,MOT)等方式来识别所述图像中的全部用户,所述待分析的至少一个用户可以是所述图像中检测到的全部或部分用户。
可选的,可以从所述全部用户中选择预设数量的用户作为所述待分析的至少一个用户,能够有效提高算法的效率,减轻设备的负担。所述预设数量可以根据实际需要来设置,例如可以为4。
具体地,若所述图像中的全部用户数量小于或等于预设数量,则将所述全部用户作为待分析的对象;若所述图像中的全部用户数量大于预设数量,则可以根据一定的条件对用户进行筛选。
在一个可选的实施方式中,可以选择靠近所述图像中心的预设数量的用户作为所述待分析的至少一个用户。
其中,所述图像中心可以是指图像的水平中线,也可以是指图像的竖直中线,或者,也可以是指图像的中心点。
在另一个可选的实施方式中,可以选择图像中最前景的预设数量的用户作为所述待分析的至少一个用户。其中,最前景的预设数量的用户可以是指与设备的距离最近的预设数量的用户。
例如,图像中检测出五个用户,其中四个与设备的距离大概在3米左右,另外一个的距离大概在10米左右,则可以选择前四个作为待分析的对象。距 离的判断可以通过图像清晰度变化或者通过红外检测等方式实现。
通过在全部用户中选择满足一定条件的预设数量的用户,可以在提高效率的基础上避免图像中重要位置的用户被忽略,保证设备正常进入跟随模式。
步骤303、在所述每一帧图像中,针对所述待分析的至少一个用户中的每个用户,确定所述用户的关键点信息,并根据所述用户的关键点信息确定所述用户的姿态信息。
在确定待分析的至少一个用户后,可以检测其中每个用户的关键点信息,并根据关键点信息确定用户的姿态信息。
可选的,可以通过神经网络等深度学习算法来直接确定图像中的关键点信息。其中,所述用户的关键点信息可以包括所述用户的多个关键点的位置信息。所述位置信息可以具体为关键点所在的坐标。
可选的,所述多个关键点可以包括但不限于:鼻子、肩中部、右肩关节、右肘关节、右手、左肩关节、左肘关节、左手、右髋关节、右膝盖、右脚踝、左髋关节、左膝盖、左脚踝中的至少两项。
在确定用户的关键点信息后,可以根据所述用户的关键点信息确定所述用户的姿态信息。
在通过单手挥手进入跟随模式的场景下,若所述用户的任意一侧的肘关节高于同侧肩关节,且另一侧的肘关节低于同侧肩关节,则可以确定所述用户处于单手挥手姿态。通过两侧肩关节和肘关节的高度关系,可以快速、准确地确定用户是否处于单手挥手姿态。
图4为本发明实施例三提供的一种图像处理方法中单手挥手姿态的关键点位置示意图。如图4所示,黑色的圆点表示用户的关键点,其中,左侧的肘关节401高于同侧的肩关节402,并且,右侧的肘关节404低于同侧的肩关节403,因此可以判定用户处于单手挥手状态。
步骤304、根据所确定的至少一个用户的姿态信息,确定待跟随的目标,其中,所述待跟随的目标为姿态信息满足预设条件的用户。
步骤305、对所述目标进行跟随。
本实施例中,步骤304至步骤305的具体实现原理和过程可以参见前述实施例,此处不再赘述。
本实施例提供的图像处理方法,针对每一帧图像,可以根据所述图像确定待分析的至少一个用户,针对所述待分析的至少一个用户中的每个用户, 确定所述用户的关键点信息,并根据所述用户的关键点信息确定所述用户的姿态信息,能够有效提高检测的效率,保证及时准确地启用相应的功能;并且,先确定关键点信息再确定对应的姿态信息,能够更全面地解析人体姿态,相比于根据神经网络直接输出姿态信息的方案来说,识别的准确性更高,更加灵活,并且,当需要更换需识别的动作类别时,无需对所有样本进行重新标注,节约了人工成本,减少需求变更时的开发量。
在上述实施例三提供的技术方案中,针对每一帧图像,在确定用户的关键点信息时,一种可选的实现方法是,可以直接根据整个图像,通过深度学习算法,确定所述图像中用户的关键点信息。另一种可选的实现方法是,可以先确定所述用户所在的感兴趣区域(Region Of Interest,ROI)图像,然后,根据神经网络确定所述ROI图像中的关键点信息。
图5为本发明实施例三提供的一种图像处理方法中确定用户关键点信息的流程示意图。针对图像中的每一个待分析的用户,都可以采用图5中的方法来确定其关键点信息。如图5所示,确定用户的关键点信息,可以包括:
步骤501、确定用户所在的ROI图像。
可选的,可以通过用户所在的边界框(bounding box),对拍摄的图像进行裁剪,得到所述用户对应的ROI图像。
图6为本发明实施例三提供的一种图像处理方法中确定关键点信息的原理示意图。如图6所示,拍摄的图像可以为RGB图像,通过多目标跟踪算法或者其它算法,可以确定RGB图像中用户所在的bounding box,该bounding box的类别为人。bounding box的表现形式可以为边界框四个角的坐标信息,通过bounding box和RGB图像,可以确定用户对应的ROI图像。
如前所述,通过多目标跟踪算法等方法可以识别图像中的全部用户,并从中选择待分析的用户。具体地,通过多目标跟踪算法可以得到多个用户对应的bounding box,在bounding box的数量大于预设数量的时候,从中选择预设数量的bounding box,使用RGB图像以及所述预设数量的bounding box作为输入,可以得到相应的ROI图像。
例如,利用MOT算法,可以从RGB图像中确定5个用户的bounding box,从中可以选择4个用户的bounding box。根据所选择的4个bounding box,可以从GRB图像中裁剪出4个ROI图像,分别为4个用户对应的ROI图像。
步骤502、将所述ROI图像输入至神经网络,得到多个关键点对应的置信度特征图。
其中,任一关键点对应的置信度特征图包括各个像素点属于该关键点的概率。
在获取到每个用户的ROI图像后,可以将该用户的ROI图像输入到神经网络模型中,利用模型确定该用户对应的置信度特征图。本实施例中,所采用的模型可以为卷积神经网络(Convolutional Neural Networks,CNN),具体可以为全卷积神经网络(Fully Convolutional Networks,FCN)。
本实施例中,针对神经网络的处理可以包括训练和检测两个阶段。训练阶段可以在检测阶段之前实现,或者,可以在任意两次检测之间对神经网络进行训练。在训练阶段,可以利用样本来训练神经网络,调整神经网络中的参数,使得输出结果与目标结果相近。在检测阶段,利用已经经过充分训练的神经网络参数,来对图像进行检测,输出置信度特征图。
下面先介绍神经网络模型的训练阶段。可选的,训练的过程可以包括:获取训练样本,所述训练样本包括样本图像及所述样本图像对应的置信度特征图;根据训练样本,对所述神经网络进行训练。通过将置信度特征图作为目标结果对神经网络进行训练,使得神经网络的输出结果接近目标结果,能够有效提高神经网络的抗干扰性,避免神经网络过拟合。
可选的,训练样本的获取过程可以包括:获取样本图像及所述样本图像中的关键点的位置信息;根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图。其中,所述样本图像对应的置信度特征图中,距离所述关键点越近的像素点对应的概率越高。
所述样本图像可以为从数据库获取的任意图像中裁剪出的ROI图像,针对每个样本图像,利用人工标注的方法来确定该图像中的关键点的位置信息,根据关键点的位置信息,生成置信度特征图。
假设通过人工标注,确定图像中的肩关节所在的位置坐标为(50,50),那么根据该位置信息可以生成肩关节对应的置信度特征图。生成置信度特征图的原理是,像素点越接近肩关节所在的真实位置,该像素点属于肩关节的概率越大,例如,坐标为(50,50)的像素点对应的概率最大,假设可以为0.8,坐标为(55,55)的像素点对应的概率应该大于坐标为(60,60)的像素点对应的概率,例如两者对应的概率可以分别为0.1和0.01,图像边缘的远 离(50,50)的像素点属于肩关节的概率非常小,接近于0。
可选的,可以根据关键点的位置信息,通过二维高斯分布生成所述样本图像对应的置信度特征图。具体地,置信度特征图中,像素点的位置坐标,可以服从期望为关键点坐标、方差为D1的二维高斯分布;或者,像素点与标注的关键点之间的距离,可以服从期望为0、方差为D2的高斯分布。其中,方差D1、D2可以根据实际需要来设置。通过二维高斯分布确定样本图像对应的置信度特征图,能够有效模拟各个像素点属于关键点的概率,提高检测准确性。
可选的,置信度特征图也可以由高斯分布和零响应的背景组成。具体地,在关键点周围预设范围内,可以根据高斯分布确定各个像素点对应的概率,在预设范围之外,可以设置零响应的背景,简单来说,就是将预设范围之外的各个像素点对应的概率设置为0。
以所述关键点为肩关节为例,在肩关节所在位置的预设范围内,采用高斯分布生成各个像素点对应的概率,例如,所述预设范围可以为以肩关节为中心、半径为5的圆,当某一像素点与图像中肩关节所在的坐标点之间间隔5个像素点以上时,该像素点几乎不可能属于肩关节,对应的概率为0。
图7为本发明实施例三提供的一种图像处理方法中置信度特征图的高斯分布区域和零响应背景的位置示意图。如图7所示,置信度特征图中,中间的黑点表示人工标注的关键点,阴影部分表示高斯分布区域,该区域内每个像素点对应的概率通过高斯分布确定,阴影以外的区域为零响应背景区域,零响应背景区域内各个像素点对应的概率均为0。通过高斯分布和零响应背景组成置信度特征图,能够有效简化置信度特征图的生成过程,提高置信度特征图的生成效率和准确性。
除了高斯分布以外,也可以采用其它方法来根据标注的关键点的位置生成置信度特征图,只要满足像素点与关键点之间的距离越远,像素点属于该关键点的概率越低即可。
若所述样本图像中标注出了多个关键点,则可以针对每一个关键点生成一个置信度特征图。获取多个样本图像及对应的置信度特征图,对神经网络进行训练,神经网络被训练为根据图像确定其中的关键点对应的置信度特征图。
在训练完成后,可以根据训练得到的神经网络对实际拍摄的图像进行处 理。如图6所示,将所述ROI图像输入至神经网络,可以得到多个关键点对应的置信度特征图。
步骤503、根据所述多个关键点对应的置信度特征图确定所述用户的关键点信息。
如图6所示,在确定多个关键点对应的置信度特征图后,可以根据置信度特征图确定多个关键点的位置信息。
例如,在确定目标的姿态信息时需要用到左右肩关节、左右肘关节共4个关键点,则将拍摄的图像输入神经网络,通过神经网络可以获取4个关键点对应的置信度特征图,根据4个置信度特征图可以分别确定4个关键点所在的位置。
可选的,本步骤中的根据所述多个关键点对应的置信度特征图确定所述用户的关键点信息,可以包括:在任一关键点对应的置信度特征图中,确定属于该关键点的概率最高的像素点;若所述概率最高的像素点对应的概率大于预设阈值,则所述用户的该关键点的位置信息为所述概率最高的像素点的位置信息。
例如,在肩关节对应的置信度特征图中,若概率最高的像素点的坐标位于(10,10),其对应的概率为0.7,大于预设阈值,则该像素点属于肩关节的可信度足够高,那么可以认为肩关节的坐标为(10,10)。若概率最高的像素点对应的概率小于预设阈值,则说明全部像素点属于肩关节的概率都不够高,那么可以认为图中缺少肩关节。所述预设阈值可以根据实际需要来设置,例如可以为0.5。
在根据神经网络确定目标的关键点信息后,可以根据关键点信息确定对应的姿态信息。具体地,在获得关键点后,可以根据各个关键点之间形成的连接关系形成肢体,所形成的肢体可以作为姿态的判断依据。
图5提供的确定用户关键点信息的方法,可以通过置信度特征图确定关键点的位置,相比于直接以关键点坐标作为训练目标的方案来说,不容易发生过拟合,识别准确度较高,具有更强的抗干扰性,无需采集大量样本和标注相应数据,减少了人工标注的工作量;通过二维高斯分布,能够迅速、准确地确定所述样本图像对应的置信度特征图,使得训练过程更稳定,避免人工标注误差,具有抗干扰性,提高了关键点识别准确率。
在上述实施例提供的技术方案的基础上,可选的,所述神经网络输出的 置信度特征图的像素点个数可以小于输入的ROI图像的像素点个数。
例如,ROI图像为h*w*3的RGB图像,h和w分别为输入的长和宽,神经网络输出h’*w’*k的置信度特征图,h’和w’分别为输出的长和宽,其中,h’=0.25*h,w’=0.25*w,k为关键点的类别数量,本实施例中,k=4,分别为左右肩关节,左右肘关节。
假设输入的ROI图像有100*100个像素点,那么输出8个置信度特征图,每个置信度特征图包括25*25个像素点。在训练时,可以设置目标结果的尺寸为输入图像的1/4,就可以实现通过神经网络缩小图像的功能。
将输出的置信度特征图包含的像素点个数设置为小于输入的ROI图像的像素点个数,可以提高拍摄图像的处理效率,减少输出结果的占用空间,并且,由于人工标注关键点是存在一定误差的,通过减少输出图像的尺寸,可以在一定程度上避免误差,提高识别准确性。
实施例四
图8为本发明实施例四提供的一种图像处理装置的结构示意图。所述图像处理装置可以执行上述图1所对应的图像处理方法,参考附图8所示,所述图像处理装置可以包括:
存储器11,用于存储计算机程序;
处理器12,用于运行所述存储器中存储的计算机程序以实现:
获取拍摄的视频流;
根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标;
启用与所述预设条件对应的功能。
可选的,该图像处理装置的结构中还可以包括通信接口13,用于与其他设备或通信网络通信。
在一个可实施的方式中,在启用与所述预设条件对应的功能时,所述处理器12具体用于:
对所述目标进行跟随。
在一个可实施的方式中,在根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标时,所述处理器12具体用于:
针对所述至少一帧图像中的每一帧图像,确定所述图像中的至少一个用户的姿态信息;
根据所确定的至少一个用户的姿态信息,确定待跟随的目标,其中,所述待跟随的目标为姿态信息满足预设条件的用户。
在一个可实施的方式中,在根据所述至少一个用户的姿态信息,确定待跟随的目标时,所述处理器12具体用于:
若有且仅有一个用户的姿态信息满足预设条件,则确定所述用户为待跟随的目标。
在一个可实施的方式中,在根据所述至少一个用户的姿态信息,确定待跟随的目标时,所述处理器12具体用于:
若有多个用户的姿态信息满足预设条件,则确定所述多个用户中最先被检测到满足预设条件的用户为待跟随的目标。
在一个可实施的方式中,在根据所述至少一个用户的姿态信息,确定待跟随的目标时,所述处理器12具体用于:
若有多个用户的姿态信息满足预设条件,则确定所述多个用户中距离拍摄画面中心最近的用户为待跟随的目标。
在一个可实施的方式中,在根据所述至少一个用户的姿态信息,确定待跟随的目标时,所述处理器12具体用于:
若有多个用户的姿态信息满足预设条件且所述多个用户中包括预设用户,则确定所述预设用户为待跟随的目标。
在一个可实施的方式中,在若有且仅有一个用户的姿态信息满足预设条件,则确定所述用户为待跟随的目标时,所述处理器12具体用于:
若有且仅有一个用户维持预设姿态超过预设时间,则确定所述用户为待跟随的目标。
在一个可实施的方式中,所述预设姿态为单手挥手姿态。
在一个可实施的方式中,在确定所述图像中的至少一个用户的姿态信息时,所述处理器12具体用于:
根据所述图像确定待分析的至少一个用户;
针对所述待分析的至少一个用户中的每个用户,确定所述用户的关键点信息,并根据所述用户的关键点信息确定所述用户的姿态信息,其中,所述用户的关键点信息包括所述用户的多个关键点的位置信息。
在一个可实施的方式中,在根据所述图像确定待分析的至少一个用户时,所述处理器12具体用于:
通过多目标跟踪算法识别所述图像中的全部用户;
从所述全部用户中选择预设数量的用户作为所述待分析的至少一个用户。
在一个可实施的方式中,在从所述全部用户中选择预设数量的用户作为所述待分析的至少一个用户时,所述处理器12具体用于:
若所述图像中的全部用户数量大于预设数量,则选择靠近所述图像中心的预设数量的用户作为所述待分析的至少一个用户。
在一个可实施的方式中,在确定所述用户的关键点信息时,所述处理器12具体用于:
确定所述用户所在的感兴趣区域ROI图像;
根据神经网络确定所述ROI图像中的关键点信息。
在一个可实施的方式中,在确定所述用户所在的感兴趣区域ROI图像时,所述处理器12具体用于:
通过根据多目标跟踪算法确定的所述用户所在的边界框,对拍摄的图像进行裁剪,得到所述用户对应的ROI图像。
在一个可实施的方式中,在根据神经网络确定所述ROI图像中的关键点信息时,所述处理器12具体用于:
将所述ROI图像输入至神经网络,得到多个关键点对应的置信度特征图,其中,任一关键点对应的置信度特征图包括各个像素点属于该关键点的概率;
根据所述多个关键点对应的置信度特征图确定所述用户的关键点信息。
在一个可实施的方式中,在根据所述多个关键点对应的置信度特征图确定所述用户的关键点信息时,所述处理器12具体用于:
在任一关键点对应的置信度特征图中,确定属于该关键点的概率最高的像素点;
若所述概率最高的像素点对应的概率大于预设阈值,则所述用户的该关键点的位置信息为所述概率最高的像素点的位置信息。
在一个可实施的方式中,在根据神经网络确定所述ROI图像中的关键点信息之前,所述处理器12还用于:
获取训练样本,所述训练样本包括样本图像及所述样本图像对应的置信度特征图;
根据训练样本,对所述神经网络进行训练。
在一个可实施的方式中,在获取训练样本时,所述处理器12具体用于:
获取样本图像及所述样本图像中的关键点的位置信息;
根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图;
其中,所述样本图像对应的置信度特征图中,距离所述关键点越近的像素点对应的概率越高。
在一个可实施的方式中,在根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图时,所述处理器12具体用于:
根据所述关键点的位置信息,通过二维高斯分布确定所述样本图像对应的置信度特征图。
在一个可实施的方式中,所述神经网络输出的置信度特征图的像素点个数小于所述ROI图像的像素点个数。
在一个可实施的方式中,在根据所述用户的关键点信息确定所述用户的姿态信息时,所述处理器12具体用于:
若所述用户的任意一侧的肘关节高于同侧肩关节,且另一侧的肘关节低于同侧肩关节,则确定所述用户处于单手挥手姿态。
图8所示图像处理装置可以执行图1-图7所示实施例的方法,本实施例未详细描述的部分,可参考对图1-图7所示实施例的相关说明。该技术方案的执行过程和技术效果参见图1-图7所示实施例中的描述,在此不再赘述。
本发明实施例还提供一种电子设备,包括上述任一实施例所述的图像处理装置。
可选的,所述电子设备为无人机或无人车。
可选的,所述电子设备还可以包括:
拍摄装置,用于将拍摄的视频流发送给所述处理器;
驱动装置,用于在所述处理器的控制下驱动所述电子设备对所述目标进行跟随。
所述驱动装置可以为电机等,通过驱动装置可以实现电子设备的移动,从而实现对目标的跟随。
本发明实施例提供的电子设备中各部件的结构、功能可以参见前述实施例,此处不再赘述。
另外,本发明实施例提供了一种存储介质,该存储介质为计算机可读存储介质,该计算机可读存储介质中存储有程序指令,程序指令用于实现上述图1-图7所示实施例中的图像处理方法。
以上各个实施例中的技术方案、技术特征在与本相冲突的情况下均可以单独,或者进行组合,只要未超出本领域技术人员的认知范围,均属于本申请保护范围内的等同实施例。
在本发明所提供的几个实施例中,应该理解到,所揭露的相关装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得计算机处理器(processor)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或者光盘等各种可以存储程序代码的介质。
以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通 技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (46)

  1. 一种图像处理方法,其特征在于,包括:
    获取拍摄的视频流;
    根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标;
    启用与所述预设条件对应的功能。
  2. 根据权利要求1所述的方法,其特征在于,启用与所述预设条件对应的功能,包括:
    对所述目标进行跟随。
  3. 根据权利要求2所述的方法,其特征在于,根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标,包括:
    针对所述至少一帧图像中的每一帧图像,确定所述图像中的至少一个用户的姿态信息;
    根据所确定的至少一个用户的姿态信息,确定待跟随的目标,其中,所述待跟随的目标为姿态信息满足预设条件的用户。
  4. 根据权利要求3所述的方法,其特征在于,根据所述至少一个用户的姿态信息,确定待跟随的目标,包括:
    若有且仅有一个用户的姿态信息满足预设条件,则确定所述用户为待跟随的目标。
  5. 根据权利要求3所述的方法,其特征在于,根据所述至少一个用户的姿态信息,确定待跟随的目标,包括:
    若有多个用户的姿态信息满足预设条件,则确定所述多个用户中最先被检测到满足预设条件的用户为待跟随的目标。
  6. 根据权利要求3所述的方法,其特征在于,根据所述至少一个用户的姿态信息,确定待跟随的目标,包括:
    若有多个用户的姿态信息满足预设条件,则确定所述多个用户中距离拍摄画面中心最近的用户为待跟随的目标。
  7. 根据权利要求3所述的方法,其特征在于,根据所述至少一个用户的姿态信息,确定待跟随的目标,包括:
    若有多个用户的姿态信息满足预设条件且所述多个用户中包括预设用户,则确定所述预设用户为待跟随的目标。
  8. 根据权利要求4所述的方法,其特征在于,若有且仅有一个用户的姿 态信息满足预设条件,则确定所述用户为待跟随的目标,包括:
    若有且仅有一个用户维持预设姿态超过预设时间,则确定所述用户为待跟随的目标。
  9. 根据权利要求8所述的方法,其特征在于,所述预设姿态为单手挥手姿态。
  10. 根据权利要求3所述的方法,其特征在于,确定所述图像中的至少一个用户的姿态信息,包括:
    根据所述图像确定待分析的至少一个用户;
    针对所述待分析的至少一个用户中的每个用户,确定所述用户的关键点信息,并根据所述用户的关键点信息确定所述用户的姿态信息,其中,所述用户的关键点信息包括所述用户的多个关键点的位置信息。
  11. 根据权利要求10所述的方法,其特征在于,根据所述图像确定待分析的至少一个用户,包括:
    通过多目标跟踪算法识别所述图像中的全部用户;
    从所述全部用户中选择预设数量的用户作为所述待分析的至少一个用户。
  12. 根据权利要求11所述的方法,其特征在于,从所述全部用户中选择预设数量的用户作为所述待分析的至少一个用户,包括:
    若所述图像中的全部用户数量大于预设数量,则选择靠近所述图像中心的预设数量的用户作为所述待分析的至少一个用户。
  13. 根据权利要求10所述的方法,其特征在于,确定所述用户的关键点信息,包括:
    确定所述用户所在的感兴趣区域ROI图像;
    根据神经网络确定所述ROI图像中的关键点信息。
  14. 根据权利要求13所述的方法,其特征在于,确定所述用户所在的感兴趣区域ROI图像,包括:
    通过根据多目标跟踪算法确定的所述用户所在的边界框,对拍摄的图像进行裁剪,得到所述用户对应的ROI图像。
  15. 根据权利要求13所述的方法,其特征在于,根据神经网络确定所述ROI图像中的关键点信息,包括:
    将所述ROI图像输入至神经网络,得到多个关键点对应的置信度特征图,其中,任一关键点对应的置信度特征图包括各个像素点属于该关键点的概率;
    根据所述多个关键点对应的置信度特征图确定所述用户的关键点信息。
  16. 根据权利要求15所述的方法,其特征在于,根据所述多个关键点对应的置信度特征图确定所述用户的关键点信息,包括:
    在任一关键点对应的置信度特征图中,确定属于该关键点的概率最高的像素点;
    若所述概率最高的像素点对应的概率大于预设阈值,则所述用户的该关键点的位置信息为所述概率最高的像素点的位置信息。
  17. 根据权利要求13所述的方法,其特征在于,在根据神经网络确定所述ROI图像中的关键点信息之前,还包括:
    获取训练样本,所述训练样本包括样本图像及所述样本图像对应的置信度特征图;
    根据训练样本,对所述神经网络进行训练。
  18. 根据权利要求17所述的方法,其特征在于,获取训练样本,包括:
    获取样本图像及所述样本图像中的关键点的位置信息;
    根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图;
    其中,所述样本图像对应的置信度特征图中,距离所述关键点越近的像素点对应的概率越高。
  19. 根据权利要求18所述的方法,其特征在于,根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图,包括:
    根据所述关键点的位置信息,通过二维高斯分布确定所述样本图像对应的置信度特征图。
  20. 根据权利要求15所述的方法,其特征在于,所述神经网络输出的置信度特征图的像素点个数小于所述ROI图像的像素点个数。
  21. 根据权利要求10所述的方法,其特征在于,根据所述用户的关键点信息确定所述用户的姿态信息,包括:
    若所述用户的任意一侧的肘关节高于同侧肩关节,且另一侧的肘关节低于同侧肩关节,则确定所述用户处于单手挥手姿态。
  22. 一种图像处理装置,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于运行所述存储器中存储的计算机程序以实现:
    获取拍摄的视频流;
    根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标;
    启用与所述预设条件对应的功能。
  23. 根据权利要求22所述的装置,其特征在于,在启用与所述预设条件对应的功能时,所述处理器具体用于:
    对所述目标进行跟随。
  24. 根据权利要求23所述的装置,其特征在于,在根据所述视频流中的至少一帧图像,确定姿态信息满足预设条件的目标时,所述处理器具体用于:
    针对所述至少一帧图像中的每一帧图像,确定所述图像中的至少一个用户的姿态信息;
    根据所确定的至少一个用户的姿态信息,确定待跟随的目标,其中,所述待跟随的目标为姿态信息满足预设条件的用户。
  25. 根据权利要求24所述的装置,其特征在于,在根据所述至少一个用户的姿态信息,确定待跟随的目标时,所述处理器具体用于:
    若有且仅有一个用户的姿态信息满足预设条件,则确定所述用户为待跟随的目标。
  26. 根据权利要求24所述的装置,其特征在于,在根据所述至少一个用户的姿态信息,确定待跟随的目标时,所述处理器具体用于:
    若有多个用户的姿态信息满足预设条件,则确定所述多个用户中最先被检测到满足预设条件的用户为待跟随的目标。
  27. 根据权利要求24所述的装置,其特征在于,在根据所述至少一个用户的姿态信息,确定待跟随的目标时,所述处理器具体用于:
    若有多个用户的姿态信息满足预设条件,则确定所述多个用户中距离拍摄画面中心最近的用户为待跟随的目标。
  28. 根据权利要求24所述的装置,其特征在于,在根据所述至少一个用户的姿态信息,确定待跟随的目标时,所述处理器具体用于:
    若有多个用户的姿态信息满足预设条件且所述多个用户中包括预设用户,则确定所述预设用户为待跟随的目标。
  29. 根据权利要求25所述的装置,其特征在于,在若有且仅有一个用户的姿态信息满足预设条件,则确定所述用户为待跟随的目标时,所述处理器具体用于:
    若有且仅有一个用户维持预设姿态超过预设时间,则确定所述用户为待 跟随的目标。
  30. 根据权利要求29所述的装置,其特征在于,所述预设姿态为单手挥手姿态。
  31. 根据权利要求24所述的装置,其特征在于,在确定所述图像中的至少一个用户的姿态信息时,所述处理器具体用于:
    根据所述图像确定待分析的至少一个用户;
    针对所述待分析的至少一个用户中的每个用户,确定所述用户的关键点信息,并根据所述用户的关键点信息确定所述用户的姿态信息,其中,所述用户的关键点信息包括所述用户的多个关键点的位置信息。
  32. 根据权利要求31所述的装置,其特征在于,在根据所述图像确定待分析的至少一个用户时,所述处理器具体用于:
    通过多目标跟踪算法识别所述图像中的全部用户;
    从所述全部用户中选择预设数量的用户作为所述待分析的至少一个用户。
  33. 根据权利要求32所述的装置,其特征在于,在从所述全部用户中选择预设数量的用户作为所述待分析的至少一个用户时,所述处理器具体用于:
    若所述图像中的全部用户数量大于预设数量,则选择靠近所述图像中心的预设数量的用户作为所述待分析的至少一个用户。
  34. 根据权利要求31所述的装置,其特征在于,在确定所述用户的关键点信息时,所述处理器具体用于:
    确定所述用户所在的感兴趣区域ROI图像;
    根据神经网络确定所述ROI图像中的关键点信息。
  35. 根据权利要求34所述的装置,其特征在于,在确定所述用户所在的感兴趣区域ROI图像时,所述处理器具体用于:
    通过根据多目标跟踪算法确定的所述用户所在的边界框,对拍摄的图像进行裁剪,得到所述用户对应的ROI图像。
  36. 根据权利要求34所述的装置,其特征在于,在根据神经网络确定所述ROI图像中的关键点信息时,所述处理器具体用于:
    将所述ROI图像输入至神经网络,得到多个关键点对应的置信度特征图,其中,任一关键点对应的置信度特征图包括各个像素点属于该关键点的概率;
    根据所述多个关键点对应的置信度特征图确定所述用户的关键点信息。
  37. 根据权利要求36所述的装置,其特征在于,在根据所述多个关键点 对应的置信度特征图确定所述用户的关键点信息时,所述处理器具体用于:
    在任一关键点对应的置信度特征图中,确定属于该关键点的概率最高的像素点;
    若所述概率最高的像素点对应的概率大于预设阈值,则所述用户的该关键点的位置信息为所述概率最高的像素点的位置信息。
  38. 根据权利要求34所述的装置,其特征在于,在根据神经网络确定所述ROI图像中的关键点信息之前,所述处理器还用于:
    获取训练样本,所述训练样本包括样本图像及所述样本图像对应的置信度特征图;
    根据训练样本,对所述神经网络进行训练。
  39. 根据权利要求38所述的装置,其特征在于,在获取训练样本时,所述处理器具体用于:
    获取样本图像及所述样本图像中的关键点的位置信息;
    根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图;
    其中,所述样本图像对应的置信度特征图中,距离所述关键点越近的像素点对应的概率越高。
  40. 根据权利要求39所述的装置,其特征在于,在根据所述关键点的位置信息,确定所述样本图像对应的置信度特征图时,所述处理器具体用于:
    根据所述关键点的位置信息,通过二维高斯分布确定所述样本图像对应的置信度特征图。
  41. 根据权利要求36所述的装置,其特征在于,所述神经网络输出的置信度特征图的像素点个数小于所述ROI图像的像素点个数。
  42. 根据权利要求31所述的装置,其特征在于,在根据所述用户的关键点信息确定所述用户的姿态信息时,所述处理器具体用于:
    若所述用户的任意一侧的肘关节高于同侧肩关节,且另一侧的肘关节低于同侧肩关节,则确定所述用户处于单手挥手姿态。
  43. 一种电子设备,其特征在于,包括权利要求22-42任一项所述的图像处理装置。
  44. 根据权利要求43所述的设备,其特征在于,所述电子设备为无人机或无人车。
  45. 根据权利要求43所述的设备,其特征在于,所述电子设备还包括:
    拍摄装置,用于将拍摄的视频流发送给所述处理器;
    驱动装置,用于在所述处理器的控制下驱动所述电子设备对所述目标进行跟随。
  46. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有程序指令,所述程序指令用于实现权利要求1-21中任意一项所述的图像处理方法。
PCT/CN2020/083997 2020-04-09 2020-04-09 图像处理方法、装置、电子设备和存储介质 WO2021203368A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080004938.4A CN112655021A (zh) 2020-04-09 2020-04-09 图像处理方法、装置、电子设备和存储介质
PCT/CN2020/083997 WO2021203368A1 (zh) 2020-04-09 2020-04-09 图像处理方法、装置、电子设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/083997 WO2021203368A1 (zh) 2020-04-09 2020-04-09 图像处理方法、装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2021203368A1 true WO2021203368A1 (zh) 2021-10-14

Family

ID=75368434

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/083997 WO2021203368A1 (zh) 2020-04-09 2020-04-09 图像处理方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN112655021A (zh)
WO (1) WO2021203368A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113886212A (zh) * 2021-10-25 2022-01-04 北京字跳网络技术有限公司 用户状态的控制方法、装置、电子设备和存储介质
US20220012480A1 (en) * 2020-01-17 2022-01-13 Gm Cruise Holdings Llc Gesture based authentication for autonomous vehicles

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106458318A (zh) * 2014-05-23 2017-02-22 莉莉机器人公司 用于照相和/或摄像的无人航拍直升机
CN107835371A (zh) * 2017-11-30 2018-03-23 广州市华科尔科技股份有限公司 一种多旋翼无人机手势自拍方法
CN109448007A (zh) * 2018-11-02 2019-03-08 北京迈格威科技有限公司 图像处理方法、图像处理装置及存储介质
US10438062B1 (en) * 2014-03-07 2019-10-08 Draganfly Innovations Inc. Cascade recognition for personal tracking via unmanned aerial vehicle (UAV)

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10438062B1 (en) * 2014-03-07 2019-10-08 Draganfly Innovations Inc. Cascade recognition for personal tracking via unmanned aerial vehicle (UAV)
CN106458318A (zh) * 2014-05-23 2017-02-22 莉莉机器人公司 用于照相和/或摄像的无人航拍直升机
CN107835371A (zh) * 2017-11-30 2018-03-23 广州市华科尔科技股份有限公司 一种多旋翼无人机手势自拍方法
CN109448007A (zh) * 2018-11-02 2019-03-08 北京迈格威科技有限公司 图像处理方法、图像处理装置及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220012480A1 (en) * 2020-01-17 2022-01-13 Gm Cruise Holdings Llc Gesture based authentication for autonomous vehicles
US11790683B2 (en) * 2020-01-17 2023-10-17 Gm Cruise Holdings Llc Gesture based authentication for autonomous vehicles
CN113886212A (zh) * 2021-10-25 2022-01-04 北京字跳网络技术有限公司 用户状态的控制方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN112655021A (zh) 2021-04-13

Similar Documents

Publication Publication Date Title
CN107239728B (zh) 基于深度学习姿态估计的无人机交互装置与方法
US11703949B2 (en) Directional assistance for centering a face in a camera field of view
CN108388882B (zh) 基于全局-局部rgb-d多模态的手势识别方法
WO2020125499A9 (zh) 一种操作提示方法及眼镜
CN110135249B (zh) 基于时间注意力机制和lstm的人体行为识别方法
CN107168527A (zh) 基于区域卷积神经网络的第一视角手势识别与交互方法
CN106326853B (zh) 一种人脸跟踪方法及装置
Sun et al. Gesture-based piloting of an aerial robot using monocular vision
WO2021184359A1 (zh) 目标跟随方法、目标跟随装置、可移动设备和存储介质
WO2021203368A1 (zh) 图像处理方法、装置、电子设备和存储介质
CN111046734A (zh) 基于膨胀卷积的多模态融合视线估计方法
CN105159452A (zh) 一种基于人脸姿态估计的控制方法与系统
CN110807391A (zh) 基于视觉的人-无人机交互用人体姿态指令识别方法
Perera et al. Human pose and path estimation from aerial video using dynamic classifier selection
KR102160128B1 (ko) 인공지능 기반 스마트 앨범 생성 방법 및 장치
CN113158833A (zh) 一种基于人体姿态的无人车控制指挥方法
CN111860451A (zh) 一种基于人脸表情识别的游戏交互方法
WO2023273372A1 (zh) 手势识别对象确定方法及装置
KR101100240B1 (ko) 멀티모달 상호작용을 이용한 로봇의 물체 학습 시스템 및 방법
CN113591562A (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
CN111078008B (zh) 一种早教机器人的控制方法
WO2023202062A1 (zh) 基于图像识别的目标对接方法、终端设备及其介质
CN112183155B (zh) 动作姿态库建立、动作姿态生成、识别方法及装置
Bhat et al. Real-time gesture control UAV with a low resource framework
WO2022217598A1 (zh) 肢体识别方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20930217

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20930217

Country of ref document: EP

Kind code of ref document: A1