WO2020237611A1

WO2020237611A1 - Image processing method and apparatus, control terminal and mobile device

Info

Publication number: WO2020237611A1
Application number: PCT/CN2019/089425
Authority: WO
Inventors: 蔡剑钊; 赵峰; 周游
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2020-12-03
Also published as: CN111602139A

Abstract

An image processing method and apparatus, a control terminal and a mobile device. The image processing method comprises: acquiring a target image comprising a target object (101); determining a target area in the target image (102); in the target area, determining a main body feature area of the target object (103); determining, according to initial depth information of the main body feature area, depth information of the target object (104); and determining, according to the depth information, three-dimensional physical information of the target object (105). According to the present invention, during the process of obtaining the depth information, the interference of a background, an obstruction, and a non-main body part of the target object in the target image is removed, and therefore, the probability of introducing useless information during the process of computing the depth information is reduced and the accuracy of the three-dimensional physical information is improved. In addition, in the present invention, scanning and processing are performed with regard to a main body feature area, so as to obtain three-dimensional physical information of a corresponding target object; and compared with directly performing scanning and processing on an overall target image, the amount of computation is reduced and processing efficiency is increased.

Description

Image processing method, device, control terminal and movable equipment

Technical field

The present invention belongs to the field of image processing technology, and particularly relates to an image processing method, device, control terminal and movable equipment.

Background technique

As an important field of intelligent computing, computer vision technology has been greatly developed and applied. Computer vision technology replaces human visual organs with imaging systems to achieve tracking and positioning of target objects.

At present, to realize the tracking and positioning of the target object, it is necessary to first know the depth information of the target object, that is, to obtain a depth information map (Depth Map) used to represent the depth information of the target. There are two ways to obtain the depth information map. , Solution 1, referring to Figure 1, perform feature detection on the image 1 including the target object 2 acquired by the imaging system, and by delimiting a feature frame 3 including the target object 2 and part of the background picture 4, according to all features in the feature frame 3 Pixels are used to calculate the depth information of the target object 2, so as to draw the depth information map of the target object 2. The second solution is to directly use the image semantic segmentation algorithm or the semantic parsing algorithm to recognize the target object 2 in the image 1, thereby drawing the depth information map of the target object 2.

However, in the current solution 1, since the delineated feature frame 3 includes the target object 2 and part of the background image 4, the process of drawing the depth information map using the feature frame 3 introduces a lot of useless information, such as the background image 4. The depth information of some non-important parts of the target object 2, etc. As a result, the final depth information map cannot accurately express the target object 2, resulting in poor tracking and positioning accuracy of the target object 2. In addition, in the second solution, algorithm processing is directly performed on the whole image of image 1, which results in the need for larger computing resources and higher processing costs.

Summary of the invention

The present invention provides an image processing method, device, control terminal and movable equipment, so as to solve the need for large computing resources to determine the three-dimensional physical information of an object in the prior art, resulting in high processing cost, and tracking and positioning accuracy of the object Poor problem.

In order to solve the above technical problems, the present invention is implemented as follows:

In the first aspect, an embodiment of the present invention provides an image processing method, which may include:

Acquiring a target image including the target object;

Determining a target area in the target image, and at least a main body of the target object is located in the target area;

In the target area, determine the main feature area of the target object;

Determine the depth information of the target object according to the initial depth information of the subject feature area;

The three-dimensional physical information of the target object is determined according to the depth information.

In the second aspect, an embodiment of the present invention provides an image processing device, and the image processing device may include:

The receiver is configured to perform: acquiring a target image including a target object;

The processor is used to execute:

In the target area, determine the main feature area of the target object;

In a third aspect of the embodiments of the present invention, a computer-readable storage medium is provided, and a computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, the steps of the image processing method described above are implemented .

In a fourth aspect of the embodiments of the present invention, a control terminal is provided, which is characterized in that it includes the image processing device, a transmitting device, and a receiving device. The transmitting device sends a shooting instruction to a movable device, and the receiving device The image taken by the movable device is received, and the image processing device processes the image.

In a fifth aspect of the embodiments of the present invention, there is provided a movable device including a photographing device, the movable device further comprising an image processing device, and the image processing device receives an image photographed by the photographing device and performs image processing.

In the embodiments of the present invention, the present invention obtains a target image including the target object; determines the target area in the target image, where at least the main body of the target object is located in the target area; in the target area, determines the main characteristic area of the target object; The initial depth information of the feature area of the subject determines the depth information of the target object, and the three-dimensional physical information of the target object is determined according to the depth information. In the process of obtaining depth information, the present invention removes the interference of the background, occlusions, and non-subject parts of the target object in the target image, so it reduces the probability of introducing useless information in the process of calculating depth information and improves the three-dimensional The accuracy of the physical information. In addition, the present invention scans and processes the feature area of the subject to obtain the three-dimensional physical information of the corresponding target object. Compared with scanning and processing the entire target image directly, the amount of calculation is reduced and the processing is improved. effectiveness.

Description of the drawings

FIG. 1 is a flowchart of steps of an image processing method provided by an embodiment of the present invention;

2 is a schematic diagram of a target image provided by an embodiment of the present invention;

Fig. 3 is a schematic diagram of another target image provided by an embodiment of the present invention;

4 is a flowchart of specific steps of an image processing method provided by an embodiment of the present invention;

Figure 5 is a flowchart of specific steps of another image processing method provided by an embodiment of the present invention;

Fig. 6 is a schematic diagram of another target image provided by an embodiment of the present invention;

FIG. 7 is a scene diagram of acquiring initial depth information of a target object according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of another target image provided by an embodiment of the present invention;

FIG. 9 is a probability distribution diagram of a timing matching operation provided by an embodiment of the present invention;

FIG. 10 is a block diagram of an image processing device according to an embodiment of the present invention;

FIG. 11 is a block diagram of a movable device according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of the hardware structure of a control terminal provided by an embodiment of the present invention.

Detailed ways

The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, rather than all of them. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of steps of an image processing method provided by an embodiment of the present invention. As shown in Fig. 1, the method may include:

Step 101: Obtain a target image including a target object.

In the embodiment of the present invention, the image processing method provided by the embodiment of the present invention can be applied to a movable device, which includes: unmanned aerial vehicle, unmanned vehicle, unmanned boat, handheld camera, etc., The movable equipment is usually provided with an image processing device with a photographing function. In addition, the normal operation of the movable equipment is inseparable from the image processing device's photographing and processing of the surrounding objects of the movable equipment and the depth information of the object.

For example, when an unmanned vehicle is driving unmanned, it needs to use the image processing device installed on it to collect images of objects in the surrounding environment of the unmanned vehicle in real time, and further process the image to obtain the depth information of the object. People and vehicles can use depth information to achieve the purpose of determining the orientation of objects to achieve unmanned automatic driving.

In this step, the target image including the target object is acquired. Specifically, the camera in the image processing device may be used to shoot one or more images of the target image with the target object.

Step 102: Determine a target area in the target image, and at least a main body of the target object is located in the target area.

Specifically, in the embodiment of the present invention, after the target image including the target object is acquired, the target area in the target image can be further determined to achieve the purpose of detecting the object in the target image, wherein the target area may include the target object At least the main body, that is, the target area, may completely overlap or partially overlap with at least the main body of the target object.

2, which shows a schematic diagram of a target image provided by an embodiment of the present invention, where the target image 10 includes a human target object 11 and two street lamps 12 in the background. If the entire target image 10 is scanned directly To determine the depth information of the character target object 11, firstly, it will cause too much calculation. Secondly, in the process of determining the depth information of the character target object 11, irrelevant background and relevant information of the street lamp 12 will be introduced, resulting in the character target object 11. The probability of error in depth information is greater.

Therefore, in the embodiment of the present invention, the area where the human target object 11 is located can be roughly selected by the target area frame 13, and the target area frame 13 may include the entire human target object 11 and a small part of the background area.

In addition, when the size of the target object is relatively large or the shape is irregular, the main body part of the target object may be determined first, and the target area at least includes the main body part.

Compared with directly scanning the entire target image 10, first delimit the target area frame 13 including the entire human target object 11, and then scan the area in the target area frame 13, which can reduce the amount of calculation to a certain extent, and the target The area frame 13 filters out the two irrelevant street lights 12 in the background, and reduces the probability of error when calculating the depth information of the character target object 11.

Specifically, the area where the character target object 11 is selected through the target area frame 13 can be implemented in the following two ways:

Manner 1: By receiving the user's frame selection operation, the target area frame 13 is generated, and the area where the human target object 11 is located is selected through the target area frame 13.

Method 2: Through deep learning, train to obtain a recognition model that can recognize and determine the character target object 11 in the target image 10, so that after the target image 10 is input to the recognition model, the recognition model can automatically output the target area frame including the character target object 11 13. This method is similar to the current face region positioning technology, and the present invention will not repeat it here.

It should be noted that the shape of the target area is preferably a rectangle. Of course, the shape of the target area may also be a circle, an irregular shape, etc., according to actual needs, which is not limited in the embodiment of the present invention.

Step 103: Determine the subject characteristic area of the target object in the target area.

In the embodiment of the present invention, the subject feature area can accurately reflect the orientation of the target object. For example, when the movable device locates the target object, the subject feature area can represent the center of mass of the entire target object, so that the subject feature can be The trajectory generated by the area movement is determined as the trajectory generated when the target object moves.

Specifically, according to the type of the target object, the subject characteristic area of the target object can be determined.

For example, referring to Figure 2, in the case where the target object is a human, since the limbs of the human move with a large amplitude, the measured depth information will have a large variance, so the human torso can be defined (that is, the part in Figure 2 The area ABCD) is the subject feature area, so that the subject feature area is further delimited in the target area frame 13 to reduce the variance generated during the subsequent calculation of the depth information.

For example, referring to FIG. 3, when the target object in the target image 20 is a car 21, and the captured target image 20 has an occlusion area 22, the target area frame 23 encircles the entire car 21 and a partial occlusion area 22. 21 is a relatively regular-shaped object. Therefore, the area of the car 21 outside the occlusion region 22 in the target region frame 23 can be defined as the main feature region, so as to reduce the variance generated by the occlusion region 22 in the subsequent calculation of depth information.

Step 104: Determine the depth information of the target object according to the initial depth information of the subject feature area.

In the embodiment of the present invention, the perception of depth information is the prerequisite for humans to produce stereoscopic object vision. The depth information refers to the number of bits used to store each pixel in the image, which determines the number of colors that each pixel of the color image may have. Or determine the number of gray levels that each pixel of the gray image may have.

In reality, objects in the observation range of the human eye have depth changes from near to far. For example, a ruler is placed horizontally on the desktop, and the user stands at the starting point of the ruler to observe. There is a straight edge in the visual range. The scale of the ruler tends to change from small to large, and as the line of sight moves to the other end of the ruler, there will be a feeling that the interval between the scales is shrinking. This is the effect of depth information on human vision.

In the field of computer vision, the depth information of an object can be a grayscale image that contains the depth information of each pixel. The size of the depth information is expressed in the depth of the grayscale. The grayscale image represents the distance between the object and the camera through a grayscale gradient. The distance.

Therefore, in the embodiments of the present invention, by acquiring depth information of objects near the movable device, operations such as orientation positioning and ranging of the objects near the movable device can be performed, thereby improving the intelligent experience of the movable device.

Specifically, step 103 has determined the subject feature area of the target object, so that the initial depth information of the subject feature area is further used to determine the depth information of the target object.

Specifically, the initial depth information of the subject feature area can be acquired in multiple ways. For example, in an implementation scheme, the current mobile device can include the configuration of a binocular camera module, so the depth information of the target object can be acquired. It is realized by the method of passive ranging sensing. This method uses two cameras separated by a certain distance to obtain two images of the same target object at the same time, and uses a stereo matching algorithm to find the pixel points corresponding to the main feature area in the two images, and then The disparity information is calculated according to the triangulation principle, and the disparity information can be converted to obtain the initial depth information used to characterize the feature area of the subject in the scene.

In another implementation scheme, it is also possible to obtain the initial depth information of the feature area of the subject through the method of active ranging sensing. Compared with passive ranging sensing, the most obvious feature of active ranging sensing is: using the device itself The emitted energy completes the collection of initial depth information, which also ensures that the acquisition of depth images is independent of the acquisition of color images. Therefore, in the embodiment of the present invention, continuous near-infrared pulses can be emitted to the target object through the movable device , And then use the sensor of the movable device to receive the light pulse reflected by the target object. By comparing the phase difference between the emitted light pulse and the light pulse reflected by the target object, the transmission delay between the light pulses can be inferred and the target object relative Depending on the distance of the transmitter, a depth image containing the initial depth information corresponding to the main feature area of the target object is finally obtained.

Further, after the initial depth information of the subject feature area is determined, the depth information of the corresponding target object can be obtained by averaging the initial depth information of the subject feature area. In the process of obtaining the depth information, since the target image is removed The background, occluders, and non-subject parts of the target object interfere with each other, so the probability of introducing useless information in the process of calculating the depth information is reduced, and the accuracy of the depth information is improved. In addition, the embodiment of the present invention is aimed at the target image. The local feature area of the subject is scanned and processed to obtain the depth information of the corresponding target object. Compared with scanning and processing the entire target image directly, the amount of calculation is reduced and the processing efficiency is improved.

Step 105: Determine three-dimensional physical information of the target object according to the depth information.

In this step, if you want to track the target object, you must obtain the three-dimensional physical information of the target object. Therefore, the three-dimensional physical information of the target object can be further determined according to the depth information of the target object. The three-dimensional physical information of the target object can be used To indicate the orientation and trajectory of the target object.

Specifically, the depth information of the object can be a grayscale image containing the depth information of each pixel. The depth information is expressed in the depth of the grayscale. The grayscale image represents the distance of the object from the camera through a grayscale gradient. degree. The depth information of the target object can be converted into a grayscale image, and by calculating the grayscale gradient value in the grayscale image, the corresponding relationship between the grayscale gradient value and the distance can be used to determine the target object and the movable device. To determine the position coordinates of the target object at different times, and to associate the position coordinates of the target object at different times with the corresponding time to obtain the three-dimensional physical information of the target object. In addition, the position coordinates of the target object at different times can also be associated with the corresponding time and reflected on a specific map to obtain the three-dimensional physical information of the target object.

In summary, an image processing method provided by an embodiment of the present invention obtains a target image including a target object; determines a target area in the target image, where at least the main body of the target object is located in the target area; in the target area, determines the target object The subject feature area of the subject; the depth information of the target object is determined according to the initial depth information of the subject feature area, and the three-dimensional physical information of the target object is determined according to the depth information. In the process of obtaining depth information, the present invention removes the interference of the background, occlusions, and non-subject parts of the target object in the target image, so it reduces the probability of introducing useless information in the process of calculating depth information and improves the three-dimensional The accuracy of the physical information. In addition, the present invention scans and processes the feature area of the subject to obtain the three-dimensional physical information of the corresponding target object. Compared with scanning and processing the entire target image directly, the amount of calculation is reduced and the processing is improved. effectiveness.

Fig. 4 is a flowchart of specific steps of an image processing method provided by an embodiment of the present invention. As shown in Fig. 4, the method may include:

Step 201: Obtain a target image including the target object.

For details of this step, please refer to the above step 101, which will not be repeated here.

Step 202: Determine a target area in the target image, and at least a main body of the target object is located in the target area.

For details of this step, please refer to the above step 102, which will not be repeated here.

Step 203: Divide the target area into multiple sub-areas by extracting edge features of the target area.

In the embodiment of the present invention, the edge feature is used to indicate the obviously changing edge or discontinuous area in an image. Since the edge is the boundary line between different areas in an image, an edge image can be a two-dimensional image. For value images, the purpose of edge detection is to capture areas with sharp changes in brightness. In an ideal situation, the edge detection of the target area can obtain an edge feature composed of a series of continuous curves in the target area, which is used to represent the boundary of the object, and the entire target area can be divided by the intersection of each edge feature For multiple sub-regions.

Step 204: Determine the classification categories of the multiple sub-regions through the classification model.

Optionally, the step 204 may also be implemented in a manner of determining the classification categories of multiple sub-regions through a convolutional neural network model or determining the classification categories of multiple sub-regions through a classifier.

Based on deep learning, the training data set can be used to train the classification model. The classification model is used to classify the categories of each sub-region. Specifically, the training process of the classification model can include: the region using the preset pattern and the preset The corresponding relationship between the classifications of the patterns, the classification model is trained, so that the classification model can achieve the purpose of inputting a certain area and outputting the classification corresponding to the area.

In this step, multiple sub-regions of the target region can be input into the trained classification model, and the classification model will output the classification category of each sub-region.

Step 205: Combine the sub-areas corresponding to the target classification category among the multiple sub-areas to obtain the subject feature area.

In this step, the target classification category matching the subject feature area can be determined first, and then the subregions corresponding to the target classification category are connected to obtain the subject feature area.

For example, in the case where the target object is a human, the limbs movement amplitude is large when the human is moving, which will cause the measured depth information to have a large variance. Therefore, the human body can be defined as the main feature area, and the target classification category can be determined as The human body torso category combines the sub-regions corresponding to the human body torso category to obtain the subject feature region.

Optionally, when the target object is under force or in motion, the offset of the contour of the main feature region is less than or equal to a preset threshold.

Specifically, the definition of the subject feature area is that when the target object is under force or in motion, the offset of the contour of the subject feature area is less than or equal to the preset threshold, that is, the subject feature area of the target object moves or moves on the target object. Under the force state, it can maintain a relatively stable state to avoid introducing too much useless information when calculating the depth information of the target object later.

In addition, the measurement of the offset of the contour of the subject feature region may specifically include: obtaining consecutive frames of frame images including the target object under a fixed shooting angle of view, and displacing the contour of the subject feature region in adjacent frame images The difference is recorded as the offset, or the displacement difference between the contour of the subject feature area in one frame of image and the contour of the subject feature area of the frame image several frames before is recorded as the offset.

Step 206: Determine the depth information of the target object according to the initial depth information of the subject feature area.

For details of this step, refer to the above step 104, which will not be repeated here.

Step 207: Determine the position coordinates of the target object at different times according to the depth information.

In the embodiment of the present invention, the depth information of the object can be a grayscale image, including the depth information of each pixel. The size of the depth information is expressed in the depth of the grayscale. The grayscale image represents the representative object through grayscale gradient. The distance from the camera.

Therefore, the depth information of the target object can be converted into a grayscale image, and by calculating the grayscale gradient value in the grayscale image, the corresponding relationship between the grayscale gradient value and the distance can be used to determine the target object and the movable The spacing between devices. When the target object is in motion, the depth information is constantly updated, so a new grayscale image can be obtained according to the updated depth information, so as to determine the target object at different times according to the distance between the target object and the movable device The position coordinates at the moment.

Step 208: Determine the three-dimensional physical information of the target object according to the position coordinates of the target object at different times.

In the embodiment of the present invention, the position coordinates of the target object at different times can be correlated with the corresponding times to obtain the three-dimensional physical information of the target object. In addition, the position coordinates of the target object at different times can also be associated with the corresponding time and reflected on a specific map to obtain the three-dimensional physical information of the target object.

In summary, the image processing method provided by the embodiments of the present invention obtains a target image including a target object; determines a target area in the target image, where at least the main body of the target object is located in the target area; in the target area, determines the target object The subject feature area of the subject; the depth information of the target object is determined according to the initial depth information of the subject feature area, and the three-dimensional physical information of the target object is determined according to the depth information. In the process of obtaining depth information, the present invention removes the interference of the background, occlusions, and non-subject parts of the target object in the target image, so it reduces the probability of introducing useless information in the process of calculating depth information and improves the three-dimensional The accuracy of the physical information. In addition, the present invention scans and processes the feature area of the subject to obtain the three-dimensional physical information of the corresponding target object. Compared with scanning and processing the entire target image directly, the amount of calculation is reduced and the processing is improved. effectiveness.

FIG. 5 is a flowchart of specific steps of an image processing method provided by an embodiment of the present invention. As shown in FIG. 5, the method may include:

Step 301: At a preset moment, acquire a first image and a second image of the target object through the binocular camera module.

In the embodiment of the present invention, a binocular camera module can be used to determine the initial depth information of the target object. The binocular camera module includes two first and second cameras with a fixed optical center and a fixed distance of interest. A device based on the principle of binocular parallax and obtaining three-dimensional geometric information of a target object from multiple images. Specifically, referring to FIG. 6, which shows a schematic diagram of a target image provided by an embodiment of the present invention, the first image 30 of the target object may be acquired by the first camera of the binocular camera module at a preset time T1. At the same time, the second image 40 of the target object is acquired by the second camera of the binocular camera module.

Step 302: Determine a target area in the first image and the second image, where at least the main body of the target object is located in the target area.

In this step, referring to FIG. 6, the first target area 31 in the first image 30 can be determined, and the second target area 41 in the second image 40 can be determined. Specifically, the method of determining the target area in the image can refer to the above Step 102, it will not be repeated here.

Step 303: In the target area, determine the subject characteristic area of the target object.

In this step, referring to FIG. 6, the first subject feature area EFGH in the first target area 31 can be determined, and the second subject feature area E'F'G'H' in the second target area 41 can be determined, specifically In the target area, the method for determining the main feature area of the target object can refer to the above step 103, which will not be repeated here.

Step 304: Perform matching processing on the first subject feature region of the first image and the second image, and/or compare the second subject feature region of the second image with the first image Perform matching processing, and calculate the initial depth information.

The actual operation of using the binocular camera module to obtain the initial depth information of the target object includes 4 steps: camera calibration-binocular correction-binocular matching-calculation of depth information.

Camera calibration: Among them, camera calibration is a process of eliminating the distortion of the camera due to the characteristics of the optical lens. Through camera calibration, the internal and external parameters and distortion parameters of the first camera and the second camera of the binocular camera module can be obtained.

Binocular correction: After the first image and the second image are acquired, the internal and external parameters and distortion parameters of the first camera and the second camera obtained by the camera calibration are used to perform distortion elimination and processing on the first image and the second image. The alignment process obtains the first image and the second image without distortion.

Binocular matching: performing matching processing on the first subject feature area of the first image and the second image, and/or matching the second subject feature area of the second image with the first The image is matched.

Specifically, referring to FIG. 6, the pixels in the first subject feature area EFGH can be matched with the pixels in the entire second image 40, or the second subject feature area E'F'G'H' can be matched. The pixels in the first image 30 are matched with the pixels in the entire first image 30. In addition, the pixels in the first main feature area EFGH can be matched with the pixels in the entire second image 40, and the second The pixel points in the subject feature area E'F'G'H' are matched with the pixels in the entire first image 30. The function of binocular matching is to match the corresponding pixels of the same scene in the left and right views (that is, the first image and the second image). The purpose of this is to obtain the disparity value. After the disparity value is obtained, the operation of calculating depth information can be further performed.

In the embodiment of the present invention, since the subject feature region that can accurately reflect the centroid of the target object has been determined, the first subject feature region of the first image can be matched with the second image, and/or the second image The second feature area of the subject is matched with the first image. Both methods can achieve the purpose of binocular matching. At the same time, the binocular matching processing is performed on the main feature area, compared with the binocular matching directly on the entire first image and the second image, the calculation amount is reduced and the processing efficiency is improved.

Optionally, step 304 may specifically include:

Sub-step 3041, matching the first subject feature area of the first image with the second image, and/or compare the second subject feature area of the second image with the first The images are matched to get the disparity value.

In this step, the operation of calculating depth information in determining the depth information can be performed. To calculate the depth information, the disparity value between the first camera and the second camera needs to be calculated, which specifically includes:

In this step, referring to FIG. 7, there is shown a scene graph for acquiring initial depth information of a target object provided by an embodiment of the present invention, where P is a point in the main feature area of the target object, and OR and OT are respectively The optical centers of the first camera and the second camera, the imaging points of point P on the photoreceptors of the two first and second cameras are P and P'respectively (the imaging plane of the camera is rotated and placed in front of the lens) , F is the focal length of the camera, B is the center distance between the two cameras, and the disparity value between point P and point P'is dis, then: disparity value dis=B-(Xr-Xt).

Optionally, the first subject feature region of the first image is matched with the second image, and/or the second subject feature region of the second image is matched with the first image. The image is matched, specifically, the feature pixel points extracted from the first feature area can be matched in the second image; and/or the feature extracted from the second subject feature area The pixel points are implemented by matching processing in the first image.

Optionally, the characteristic pixel is a pixel in the image whose gray value change is greater than a preset threshold or the curvature on the edge of the image is greater than the preset curvature value.

In the embodiment of the present invention, in order to further reduce the amount of data processing in the initial depth information calculation process, the characteristic pixels extracted from the first characteristic region may be further matched in the second image; and/or further The feature pixel points extracted from the second subject feature region are matched in the first image. Among them, a characteristic pixel is a pixel in the image whose gray value change is greater than a preset threshold or the curvature on the edge of the image is greater than the preset curvature value. The characteristic pixel can be a corner point of the target object, boundary points, etc., which have drastic changes. Point.

For example, referring to Figure 6, the feature pixels extracted in the first feature region may be the four corner points E, F, G, and H of the human torso, and the feature pixels extracted in the second feature region may be the human torso The four corner points E', F', G', H'.

In sub-step 3042, the initial depth information is calculated according to the disparity value.

In this step, referring to Figure 7, assuming that the initial depth information is Z, after the disparity value dis=B-(Xr-Xt) is obtained, the principle of similar triangles can be followed: B-(Xr-Xt)/B =(Zf)/Z, the initial depth information Z=(fB)/(Xr-Xt) can be obtained.

Therefore, according to the focal length of the binocular camera module, the distance between the optical centers of the first camera and the second camera, and the parallax value, the initial depth information of the target object can be calculated.

Optionally, after step 301, it may further include:

Step 305: Acquire a first image and a second image of the target object through the binocular camera module at multiple times.

In the embodiment of the present invention, it is also possible to determine the key feature pixels in the subject feature area through the timing matching operation, and add corresponding weight values to the key feature pixels according to the confidence of the key feature pixels. In the process of calculating the depth information of the target object, the weight value can be weighted in the initial depth information, so that the calculated depth information of the target object is more stable and accurate.

Specifically, the key feature pixel point may be a point that is relatively stable at different times and is unlikely to change in relative position. Therefore, in this step, it is first necessary to acquire the first image and the second image of the target object at multiple times through the binocular camera module.

For example, referring to FIG. 8, it shows a schematic diagram of a target image provided by an embodiment of the present invention. At time T1, the first camera of the binocular camera module can obtain the first image 30 of the target object, and the binocular The second camera of the camera module obtains the second image 40 of the target object; and at time T2, the first camera of the binocular camera module obtains the third image 50 of the target object, and the second camera of the binocular camera module The camera acquires a fourth image 60 of the target object.

Further, it is possible to determine the first target area 31 in the first image 30, determine the second target area 41 in the second image 40, determine the third target area 51 in the third image 50, and determine the fourth image 60 The fourth target area 61.

Further, it is possible to determine the first subject feature area EFGH in the first target area 31, determine the second subject feature area E'F'G'H' in the second target area 41, and determine the first subject feature area in the third target area 51 Three subject feature areas IJKL, and a fourth subject feature area I'J'K'L' in the fourth target area 61 is determined.

Step 306: Perform matching processing on the first subject feature region of the first image with the second image acquired at the corresponding time, and/or compare the second subject feature region of the second image with the corresponding The first image acquired at any time is subjected to matching processing.

In this step, at the same time, the image obtained by the first camera can be matched with the image obtained by the second camera to determine whether there is a comparison between the image obtained by the first camera and the image obtained by the second camera at the same time. Stable key feature pixel points, specifically, the first subject feature area of the first image can be matched with the second image acquired at the corresponding time, and/or all of the second image The second subject feature area is matched with the first image acquired at the corresponding time. Referring to FIG. 8, the pixels in the first subject feature area EFGH can be matched with the pixels in the entire second image 40, or the pixels in the second subject feature area E'F'G'H' can be matched. Perform matching processing with the pixels in the entire first image 30. In addition, it is also possible to perform matching processing on the pixels in the first subject feature area EFGH with the pixels in the entire second image 40, and to perform the second subject feature area The pixels in E'F'G'H' are matched with the pixels in the entire first image 30.

Step 307: Determine the first number of successful matching times of the matching process.

In this step, if a pixel in the image obtained by the first camera does not change from the position coordinate of the corresponding pixel in the image obtained by the second camera, it can be determined that the pixel matches at this moment. Success, increase the confidence of the pixel as a key feature pixel. By counting the matching results of the matching processing for the pixel points at each moment, the first number of successful matching can be obtained.

For example, referring to FIG. 8, if the positions of the pixel points E, F, G, and H in the first body feature region EFGH and the pixel points E', F', and F'in the second body feature region E'F'G'H' There is no relative change in the positions of G'and H', and the number of first matching successes is increased by one pixel point E, F, G, and H.

Step 308: Perform matching processing between the feature regions in the multiple first images acquired at different times.

In this step, matching processing can be performed on the characteristic regions in the first image at different times.

For example, referring to FIG. 8, the pixel points in the first subject feature region EFGH at time T1 can be matched with the pixels in the third subject feature region IJKL at time T2.

Step 309: Determine the second number of successful matching times of the matching process.

For example, referring to FIG. 8, if the positions of the pixels E, F, G, and H in the first main feature area EFGH and the positions of the pixels I, J, K, and L in the third main feature area IJKL do not change relatively , Then the number of second matching successes is increased for the pixels E, F, G, and H.

Optionally, after step 304 and step 309, it may further include:

Step 310: Set a weight value for the initial depth information according to the first number of successful matching times and the second number of successful matching times, and the size of the weight value is positively correlated with the number of matching times.

In this step, the weight value P _t corresponding to the initial depth information determined according to the first number of successful matching times c1 and the second number of successful matching times c2 can be realized according to the following formula 1:

Specifically referring to FIG. 7, at time T1, for point E in the first image 30, according to the matching operation between point E and point E'in the second image, the first number of successful matching times c1 can be obtained, and at the same time The initial depth information Ed of point E can be calculated by binocular matching, and the initial confidence of point E is set as

At time T2, for the point E in the first image 30, according to the matching operation between the point E and the point I in the third image, the second number of successful matching c2 can be obtained, and according to the above formula 1, the calculation is obtained The weight value P _{t of} point E.

In the same way, the weight values P _{t of} points F, G, and H can be obtained. It should be noted that the parameter 60 and the parameter 5 in formula 1 can be updated and set based on experience and requirements, which is not limited in the embodiment of the present invention.

Step 311: Perform a weighted average calculation according to the initial depth information and the weight value corresponding to the initial depth information to obtain the depth information of the target object.

In the process of averaging the initial depth information of the subject feature area to obtain the depth information of the corresponding target object, the weight value P _{t of the} point E is weighted to the initial depth information corresponding to the point E in the subject feature area to achieve the pixel The confidence of the point achieves the purpose of weighted average, making the calculated depth information of the target object more stable and accurate.

For example, referring to FIG. 8, it is assumed that the pixels E, F, G, and H in the first subject feature region EFGH are determined as feature pixels, and the weight value P _t1 and the initial depth information Ed of the point E are calculated for the point E , F calculation point for the obtained initial depth data Fd weight value P _t2 and point E, to give the initial depth information Gd weight value P _t3 and point E point G calculated for the obtained weight value P _t4 point H is calculated for and The initial depth information Hd of point E is calculated by weighted average, and finally the target object can be calculated

Further, in practical applications, referring to FIG. 9, there is shown a probability distribution diagram of a timing matching operation provided by an embodiment of the present invention, where the abscissa is the number of frames, and the number of frames is used to represent the binocular camera The continuous multiple frames of the first image and the second image of the target object acquired by the module. At one of the multiple moments, the first image and the second image of the target object acquired by the binocular camera module can be regarded as one frame . The ordinate is the probability, which is used to indicate the degree of confidence. In the process of matching between the feature regions in the multiple first images acquired at different times, if the matching is continuously successful, the higher the confidence, the higher the The higher the weight is; on the contrary, the failure of the match will gradually reduce the confidence, thereby reducing its weight. As shown in Figure 9, after successive matching successes, the confidence level gradually rises to a maximum of 100%, and once a matching failure occurs, the confidence probability is gradually reduced to 0%.

Step 312: Determine three-dimensional physical information of the target object according to the depth information.

For details of this step, refer to the above step 105, which will not be repeated here.

FIG. 10 is a block diagram of an image processing apparatus according to an embodiment of the present invention. As shown in FIG. 10, the image processing apparatus 400 may include: a receiver 401 and a processor 402;

The receiver 401 is configured to perform: acquiring a target image including a target object;

The processor 402 is configured to execute:

In the target area, determine the main feature area of the target object;

Optionally, the processor 402 is further configured to execute:

Dividing the target area into multiple sub-areas by extracting edge features of the target area;

Determine the classification categories of multiple sub-regions through a classification model;

Among the multiple sub-regions, sub-regions corresponding to the target classification category are merged to obtain the subject feature region.

Optionally, the processor 402 is further configured to execute:

Using a convolutional neural network model to determine the classification categories of multiple sub-regions;

Or, through a classifier, the classification categories of multiple sub-regions are determined.

Optionally, the receiver 401 is further configured to perform:

At a preset moment, the first image and the second image of the target object are acquired through the binocular camera module.

Optionally, the processor 402 is further configured to execute:

Perform matching processing on the first subject feature region of the first image and the second image, and/or perform matching processing on the second subject feature region of the second image and the first image , Calculating the initial depth information;

According to the initial depth information, the depth information of the target object is determined.

Optionally, the processor 402 is further configured to execute:

Perform matching processing on the first subject feature region of the first image and the second image, and/or perform matching processing on the second subject feature region of the second image and the first image , Get the disparity value;

According to the disparity value, the initial depth information is calculated.

Optionally, the processor 402 is further configured to execute:

The feature pixels extracted from the first feature area are matched in the second image; and/or the feature pixels extracted from the second subject feature area are in the first Matching processing in the image.

Optionally, the receiver 401 is further configured to perform:

At multiple times, the first image and the second image of the target object are acquired through the binocular camera module.

Optionally, the processor 402 is further configured to execute:

Perform matching processing between the first subject feature area of the first image and the second image acquired at the corresponding time, and/or compare the second subject feature area of the second image with the second image acquired at the corresponding time Performing matching processing on the first image;

The first number of successful matching times of the matching process is determined.

Optionally, the processor 402 is further configured to execute:

Performing matching processing between the feature regions in the multiple first images acquired at different times;

The second number of successful matching times of the matching process is determined.

Optionally, the processor 402 is further configured to execute:

Setting a weight value for the initial depth information according to the first number of successful matching times and the second number of successful matching times, and the size of the weight value is positively correlated with the number of matching times;

According to the initial depth information and the weight value corresponding to the initial depth information, a weighted average calculation is performed to obtain the depth information of the target object.

Optionally, the processor 402 is further configured to execute:

Determine the position coordinates of the target object at different times according to the depth information;

The three-dimensional physical information of the target object is determined according to the position coordinates of the target object at different times.

In summary, the image processing device provided by the embodiment of the present invention acquires a target image including a target object; determines the target area in the target image, where at least the main body of the target object is located in the target area; in the target area, determines the main body of the target object Feature area: Determine the depth information of the target object according to the initial depth information of the feature area of the subject, and determine the three-dimensional physical information of the target object according to the depth information. In the process of obtaining depth information, the present invention removes the interference of the background, occlusions, and non-subject parts of the target object in the target image, so it reduces the probability of introducing useless information in the process of calculating depth information and improves the three-dimensional The accuracy of the physical information. In addition, the present invention scans and processes the feature area of the subject to obtain the three-dimensional physical information of the corresponding target object. Compared with scanning and processing the entire target image directly, the amount of calculation is reduced and the processing is improved. effectiveness.

The embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the image processing method are implemented.

An embodiment of the present invention also provides a control terminal, which is characterized in that it includes the image processing device, a transmitting device, and a receiving device. The transmitting device sends a shooting instruction to a movable device, and the receiving device receives the movable device. The image taken by the device, the image processing device processes the image.

11, an embodiment of the present invention also provides a portable device 500, including a photographing device 501, and further includes the image processing device 400 described in FIG. 10, the image processing device 400 receives the image taken by the photographing device 501 and Perform image processing.

Optionally, the movable device 500 further includes a controller 502 and a power system 503, and the controller 502 controls the power output of the power system 503 according to the processing result processed by the image processing device 400.

Specifically, the power system includes a motor that drives the propeller and a motor that drives the movement of the pan/tilt. Therefore, the controller 502 can change the posture of the movable device 500 or the orientation of the pan/tilt (that is, the orientation of the camera 501) according to the image processing result.

Optionally, the image processing device 400 is integrated in the controller 502.

Optionally, the movable device 500 includes at least one of a drone, an unmanned vehicle, an unmanned boat, and a handheld camera.

12 is a schematic diagram of the hardware structure of a control terminal for implementing various embodiments of the present invention. The control terminal 600 includes but is not limited to: a radio frequency unit 601, a network module 602, an audio output unit 603, an input unit 604, a sensor 605, and a display unit 606, a user input unit 607, an interface unit 608, a memory 609, a processor 610, a power supply 611 and other components. Those skilled in the art can understand that the structure of the control terminal shown in FIG. 12 does not constitute a limitation on the control terminal, and the control terminal may include more or less components than those shown in the figure, or a combination of certain components, or different components Layout. In the embodiment of the present invention, the control terminal includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a vehicle-mounted terminal, a wearable device, and a pedometer.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 601 can be used for receiving and sending signals during the process of sending and receiving information or talking. Specifically, the downlink data from the base station is received and processed by the processor 610; in addition, Uplink data is sent to the base station. Generally, the radio frequency unit 601 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 601 can also communicate with the network and other devices through a wireless communication system.

The control terminal provides users with wireless broadband Internet access through the network module 602, such as helping users to send and receive emails, browse web pages, and access streaming media.

The audio output unit 603 can convert the audio data received by the radio frequency unit 601 or the network module 602 or stored in the memory 609 into audio signals and output them as sounds. Moreover, the audio output unit 603 may also provide audio output related to a specific function performed by the control terminal 600 (for example, call signal reception sound, message reception sound, etc.). The audio output unit 603 includes a speaker, a buzzer, a receiver, and the like.

The input unit 604 is used to receive audio or video signals. The input unit 604 may include a graphics processing unit (GPU) 6041 and a microphone 6042. The graphics processor 6041 is configured to monitor static pictures or videos obtained by an image capture control terminal (such as a camera) in the video capture mode or the image capture mode. Image data is processed. The processed image frame may be displayed on the display unit 606. The image frame processed by the graphics processor 6041 may be stored in the memory 609 (or other storage medium) or sent via the radio frequency unit 601 or the network module 602. The microphone 6042 can receive sound, and can process such sound into audio data. The processed audio data can be converted into a format that can be sent to the mobile communication base station via the radio frequency unit 601 for output in the case of a telephone call mode.

The control terminal 600 also includes at least one sensor 605, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor. The ambient light sensor can adjust the brightness of the display panel 6061 according to the brightness of the ambient light. The proximity sensor can turn off the display panel 6061 and the display panel 6061 when the control terminal 600 is moved to the ear. / Or backlight. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (usually three-axis), and can detect the magnitude and direction of gravity when stationary, and can be used to identify and control terminal posture (such as horizontal and vertical screen switching, related games) , Magnetometer attitude calibration), vibration recognition related functions (such as pedometer, percussion), etc.; sensor 605 can also include fingerprint sensor, pressure sensor, iris sensor, molecular sensor, gyroscope, barometer, hygrometer, thermometer, Infrared sensors, etc., will not be repeated here.

The display unit 606 is used to display information input by the user or information provided to the user. The display unit 606 may include a display panel 6061, and the display panel 6061 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc.

The user input unit 607 may be used to receive inputted numeric or character information, and generate key signal input related to user settings and function control of the control terminal. Specifically, the user input unit 607 includes a touch panel 6071 and other input devices 6072. The touch panel 6071, also called a touch screen, can collect user touch operations on or near it (for example, the user uses any suitable objects or accessories such as fingers, stylus, etc.) on the touch panel 6071 or near the touch panel 6071. operating). The touch panel 6071 may include two parts, a touch detection control terminal and a touch controller. Among them, the touch detection control terminal detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection control terminal and converts it into contact coordinates , And then send to the processor 610, receive the command sent by the processor 610 and execute it. In addition, the touch panel 6071 can be implemented in multiple types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 6071, the user input unit 607 may also include other input devices 6072. Specifically, other input devices 6072 may include, but are not limited to, a physical keyboard, function keys (such as volume control buttons, switch buttons, etc.), trackball, mouse, and joystick, which will not be repeated here.

Further, the touch panel 6071 can cover the display panel 6061. When the touch panel 6071 detects a touch operation on or near it, it is transmitted to the processor 610 to determine the type of the touch event, and then the processor 610 determines the type of the touch event according to the touch. The type of event provides corresponding visual output on the display panel 6061. Although the touch panel 6071 and the display panel 6061 are used as two independent components to realize the input and output functions of the control terminal, in some embodiments, the touch panel 6071 and the display panel 6061 can be integrated to realize the control terminal. Input and output functions are not limited here.

The interface unit 608 is an interface for connecting the external control terminal and the control terminal 600. For example, the external control terminal may include a wired or wireless headset port, an external power source (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a control terminal with an identification module, and audio input / Output (I/O) port, video I/O port, headphone port, etc. The interface unit 608 can be used to receive input (for example, data information, power, etc.) from an external control terminal and transmit the received input to one or more elements in the control terminal 600 or can be used to connect to the control terminal 600 Transfer data between external control terminals.

The memory 609 can be used to store software programs and various data. The memory 609 may mainly include a storage program area and a storage data area. The storage program area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data (such as audio data, phone book, etc.) created by the use of mobile phones. In addition, the memory 609 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.

The processor 610 is the control center of the control terminal. It uses various interfaces and lines to connect the various parts of the entire control terminal, runs or executes the software programs and/or modules stored in the memory 609, and calls the data stored in the memory 609. , Perform various functions of the control terminal and process data, so as to monitor the control terminal as a whole. The processor 610 may include one or more processing units; preferably, the processor 610 may integrate an application processor and a modem processor, where the application processor mainly processes the operating system, user interface and application programs, etc., the modem The processor mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 610.

The control terminal 600 may also include a power supply 611 (such as a battery) for supplying power to various components. Preferably, the power supply 611 may be logically connected to the processor 610 through a power management system, so as to manage charging, discharging, and power consumption management through the power management system. And other functions.

In addition, the control terminal 600 includes some functional modules not shown, which will not be repeated here.

Preferably, the embodiment of the present invention also provides a control terminal, including a processor 610, a memory 609, and a computer program stored on the memory 609 and running on the processor 610. When the computer program is executed by the processor 610, Each process of the foregoing image processing method embodiment is implemented, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.

The embodiment of the present invention also provides a computer-readable storage medium, and a computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, each process of the above-mentioned image processing method embodiment is realized, and the same technology can be achieved. The effect, in order to avoid repetition, will not be repeated here. Wherein, the computer readable storage medium, such as read-only memory (Read-Only Memory, ROM for short), random access memory (Random Access Memory, RAM for short), magnetic disk or optical disk, etc.

Referring to FIG. 12, an embodiment of the present invention also provides a portable device, including a photographing device, and further comprising the image processing device described in FIG. 10, the image processing device receives an image photographed by the photographing device and performs image processing.

Optionally, the movable device further includes a controller and a power system, and the controller controls the power output of the power system according to the processing result processed by the image processing device.

Optionally, the image processing device is integrated in the controller.

Optionally, the movable equipment includes at least one of a drone, an unmanned vehicle, an unmanned boat, and a handheld camera.

The various embodiments in this specification are described in a progressive manner. Each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other.

Those skilled in the art should understand that the embodiments of the present application can be provided as a method, a control terminal, or a computer program product. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.

This application is described with reference to the flowchart and/or block diagram of the method, terminal device (system), and computer program product according to the application. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processors of general-purpose computers, special-purpose computers, embedded processors, or other programmable data processing terminal equipment to generate a machine, so that instructions executed by the processor of the computer or other programmable data processing terminal equipment Generate a control terminal for realizing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing terminal equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction control terminal, The instruction controls the terminal to realize the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing terminal equipment, so that a series of operation steps are executed on the computer or other programmable terminal equipment to produce computer-implemented processing, so that the computer or other programmable terminal equipment The instructions executed above provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Although the preferred embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present application.

Finally, it should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these entities Or there is any such actual relationship or sequence between operations. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article, or terminal device that includes a series of elements includes not only those elements, but also those that are not explicitly listed. Other elements listed, or also include elements inherent to this process, method, article, or terminal device. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other same elements in the process, method, article or terminal device that includes the element.

The processing method and control terminal of an application icon provided by this application are described in detail above. Specific examples are used in this article to illustrate the principle and implementation of this application. The description of the above embodiments is only for help Understand the methods and core ideas of this application; at the same time, for those of ordinary skill in the art, according to the ideas of this application, there will be changes in the specific implementation and scope of application. In summary, the content of this specification does not It should be understood as a limitation of this application.

Claims

An image processing method, characterized in that the method includes:

Acquiring a target image including the target object;

Determining a target area in the target image, and at least a main body of the target object is located in the target area;

In the target area, determine the main feature area of the target object;

Determine the depth information of the target object according to the initial depth information of the subject feature area;

The three-dimensional physical information of the target object is determined according to the depth information.
The method according to claim 1, wherein the step of determining the main characteristic area of the target object in the target area comprises:

Dividing the target area into multiple sub-areas by extracting edge features of the target area;

Determine the classification categories of multiple sub-regions through a classification model;

Among the multiple sub-regions, sub-regions corresponding to the target classification category are merged to obtain the subject feature region.
The method according to any one of claims 1 to 2, wherein when the target object is under force or in motion, the offset of the contour of the subject characteristic region is less than or equal to a preset threshold.
The method according to claim 2, wherein the step of determining the classification categories of a plurality of the sub-regions through a classification model comprises:

Using a convolutional neural network model to determine the classification categories of multiple sub-regions;

Or, through a classifier, the classification categories of multiple sub-regions are determined.
The method according to claim 1, wherein the step of obtaining a target image including a target object comprises:

At a preset moment, the first image and the second image of the target object are acquired through the binocular camera module.
The method according to claim 5, wherein the step of determining the depth information of the target object according to the initial depth information of the pixels in the subject feature area comprises:

Perform matching processing on the first subject feature region of the first image and the second image, and/or perform matching processing on the second subject feature region of the second image and the first image , Calculating the initial depth information;

According to the initial depth information, the depth information of the target object is determined.
The method according to claim 6, performing a matching process on the first subject feature area of the first image and the second image, and/or the second subject feature area of the second image Performing matching processing with the first image to obtain the initial depth information by calculation includes:

Perform matching processing on the first subject feature region of the first image and the second image, and/or perform matching processing on the second subject feature region of the second image and the first image , Get the disparity value;

According to the disparity value, the initial depth information is calculated.
According to the method of claim 6 or 7, the first subject feature area of the first image is matched with the second image, and/or the second subject of the second image The matching process between the characteristic region and the first image specifically includes:

The feature pixels extracted from the first feature area are matched in the second image; and/or the feature pixels extracted from the second subject feature area are in the first Matching processing in the image.
The method according to claim 8, wherein the characteristic pixel is a pixel in the image whose gray value change is greater than a preset threshold or the curvature on the edge of the image is greater than the preset curvature value.
The method according to claim 6, wherein the first subject feature area of the first image is matched with the second image, and/or the second subject feature of the second image is After the region is matched with the first image, and after the initial depth information is calculated, the method further includes:

At multiple times, the first image and the second image of the target object are acquired through the binocular camera module.
The method according to claim 10, after acquiring the first image and the second image of the target object through the binocular camera module at multiple times, further comprising:

Perform matching processing between the first subject feature area of the first image and the second image acquired at the corresponding time, and/or compare the second subject feature area of the second image with the second image acquired at the corresponding time Performing matching processing on the first image;

The first number of successful matching times of the matching process is determined.
The method according to claim 11, wherein after acquiring the first image and the second image of the target object through the binocular camera module at multiple times, the method further comprises:

Performing matching processing between the feature regions in the multiple first images acquired at different times;

The second number of successful matching times of the matching process is determined.
The method according to claim 12, wherein the determining the depth information of the target object according to the initial depth information specifically includes:

Setting a weight value for the initial depth information according to the first number of successful matching times and the second number of successful matching times, and the size of the weight value is positively correlated with the number of matching times;

According to the initial depth information and the weight value corresponding to the initial depth information, a weighted average calculation is performed to obtain the depth information of the target object.
The method of claim 1, wherein the determining three-dimensional physical information of the target object according to the depth information comprises:

Determine the position coordinates of the target object at different times according to the depth information;

The three-dimensional physical information of the target object is determined according to the position coordinates of the target object at different times.
An image processing device, characterized in that the device includes: a receiver and a processor;

The receiver is configured to perform: acquiring a target image including a target object;

The processor is used to execute:

Determining a target area in the target image, and at least a main body of the target object is located in the target area;

In the target area, determine the main feature area of the target object;

Determine the depth information of the target object according to the initial depth information of the subject feature area;

The three-dimensional physical information of the target object is determined according to the depth information.
The apparatus according to claim 15, wherein the processor is further configured to execute:

Dividing the target area into multiple sub-areas by extracting edge features of the target area;

Determine the classification categories of multiple sub-regions through a classification model;

Among the multiple sub-regions, sub-regions corresponding to the target classification category are merged to obtain the subject feature region.
The device according to any one of claims 15 to 16, wherein when the target object is under force or in motion, the offset of the contour of the subject characteristic region is less than or equal to a preset threshold.
The device according to claim 16, wherein the processor is further configured to execute:

Using a convolutional neural network model to determine the classification categories of multiple sub-regions;

Or, through a classifier, the classification categories of multiple sub-regions are determined.
The device according to claim 15, wherein the receiver is further configured to perform:

At a preset moment, the first image and the second image of the target object are acquired through the binocular camera module.
The device according to claim 19, wherein the processor is further configured to execute:

Perform matching processing on the first subject feature region of the first image and the second image, and/or perform matching processing on the second subject feature region of the second image and the first image , Calculating the initial depth information;

According to the initial depth information, the depth information of the target object is determined.
The apparatus according to claim 20, wherein the processor is further configured to execute:

Perform matching processing on the first subject feature region of the first image and the second image, and/or perform matching processing on the second subject feature region of the second image and the first image , Get the disparity value;

According to the disparity value, the initial depth information is calculated.
The device according to claim 20 or 21, wherein the processor is further configured to execute:

The feature pixels extracted from the first feature area are matched in the second image; and/or the feature pixels extracted from the second subject feature area are in the first Matching processing in the image.
The device according to claim 22, wherein the characteristic pixel is a pixel in the image whose gray value change is greater than a preset threshold or the curvature on the edge of the image is greater than the preset curvature value.
The device according to claim 20, wherein the receiver is further configured to perform:

At multiple times, the first image and the second image of the target object are acquired through the binocular camera module.
The device according to claim 24, wherein the processor is further configured to execute:

Perform matching processing between the first subject feature area of the first image and the second image acquired at the corresponding time, and/or compare the second subject feature area of the second image with the second image acquired at the corresponding time Performing matching processing on the first image;

The first number of successful matching times of the matching process is determined.
The device according to claim 25, wherein the processor is further configured to execute:

Performing matching processing between the feature regions in the multiple first images acquired at different times;

The second number of successful matching times of the matching process is determined.
The device according to claim 26, wherein the processor is further configured to execute:

Setting a weight value for the initial depth information according to the first number of successful matching times and the second number of successful matching times, and the size of the weight value is positively correlated with the number of matching times;

According to the initial depth information and the weight value corresponding to the initial depth information, a weighted average calculation is performed to obtain the depth information of the target object.
The apparatus according to claim 15, wherein the processor is further configured to execute:

Determine the position coordinates of the target object at different times according to the depth information;

The three-dimensional physical information of the target object is determined according to the position coordinates of the target object at different times.
A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the image processing method according to any one of claims 1 to 14 is implemented step.
A control terminal, characterized by comprising the image processing device according to any one of claims 15 to 28, a transmitting device, and a receiving device. The transmitting device sends a shooting instruction to a movable device, and the receiving device receives the An image taken by a mobile device, and the image processing device processes the image.
The control terminal according to claim 30, wherein the movable device includes at least one of a drone, an unmanned vehicle, an unmanned boat, and a handheld camera.
A portable device, comprising a photographing device, characterized in that the portable device further comprises the image processing device according to any one of claims 15 to 28, and the image processing device receives the image taken by the photographing device and performs Image Processing.
The movable device according to claim 32, wherein the movable device further comprises a controller and a power system, and the controller controls the power of the power system according to the processing result processed by the image processing device. Output.
The movable device according to claim 33, wherein the image processing device is integrated in the controller.
The movable device of claim 32, wherein the movable device comprises at least one of a drone, an unmanned vehicle, an unmanned boat, and a handheld camera.