WO2020164092A1 - 图像处理方法、设备、可移动平台、无人机及存储介质 - Google Patents

图像处理方法、设备、可移动平台、无人机及存储介质 Download PDF

Info

Publication number
WO2020164092A1
WO2020164092A1 PCT/CN2019/075171 CN2019075171W WO2020164092A1 WO 2020164092 A1 WO2020164092 A1 WO 2020164092A1 CN 2019075171 W CN2019075171 W CN 2019075171W WO 2020164092 A1 WO2020164092 A1 WO 2020164092A1
Authority
WO
WIPO (PCT)
Prior art keywords
semantic
image data
confidence
target image
data
Prior art date
Application number
PCT/CN2019/075171
Other languages
English (en)
French (fr)
Inventor
任创杰
李鑫超
李思晋
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to JP2021543242A priority Critical patent/JP2022520019A/ja
Priority to EP19915297.6A priority patent/EP3920095A4/en
Priority to PCT/CN2019/075171 priority patent/WO2020164092A1/zh
Priority to CN201980004951.7A priority patent/CN111213155A/zh
Publication of WO2020164092A1 publication Critical patent/WO2020164092A1/zh
Priority to US17/402,533 priority patent/US20210390329A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B64AIRCRAFT; AVIATION; COSMONAUTICS
    • B64CAEROPLANES; HELICOPTERS
    • B64C39/00Aircraft not otherwise provided for
    • B64C39/02Aircraft not otherwise provided for characterised by special use
    • B64C39/024Aircraft not otherwise provided for characterised by special use of the remote controlled vehicle type, i.e. RPV
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B64AIRCRAFT; AVIATION; COSMONAUTICS
    • B64DEQUIPMENT FOR FITTING IN OR TO AIRCRAFT; FLIGHT SUITS; PARACHUTES; ARRANGEMENT OR MOUNTING OF POWER PLANTS OR PROPULSION TRANSMISSIONS IN AIRCRAFT
    • B64D1/00Dropping, ejecting, releasing, or receiving articles, liquids, or the like, in flight
    • B64D1/16Dropping or releasing powdered, liquid, or gaseous matter, e.g. for fire-fighting
    • B64D1/18Dropping or releasing powdered, liquid, or gaseous matter, e.g. for fire-fighting by spraying, e.g. insecticides
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/188Vegetation
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G5/00Traffic control systems for aircraft, e.g. air-traffic control [ATC]
    • G08G5/0047Navigation or guidance aids for a single aircraft
    • G08G5/0069Navigation or guidance aids for a single aircraft specially adapted for an unmanned aircraft
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B64AIRCRAFT; AVIATION; COSMONAUTICS
    • B64UUNMANNED AERIAL VEHICLES [UAV]; EQUIPMENT THEREFOR
    • B64U2101/00UAVs specially adapted for particular uses or applications
    • B64U2101/30UAVs specially adapted for particular uses or applications for imaging, photography or videography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30242Counting objects in image

Definitions

  • the present invention relates to the field of control technology, in particular to an image processing method, equipment, movable platform, drone and storage medium.
  • the embodiments of the present invention provide an image processing method, equipment, a movable platform, an unmanned aerial vehicle, and a storage medium, which can efficiently and quickly automatically identify a target area in target image data.
  • an embodiment of the present invention provides an image processing method, including:
  • Target image data includes a target image and depth data corresponding to each pixel in the target image
  • the position of the target area in the target image data is determined.
  • an embodiment of the present invention provides another image processing method, including:
  • Target image data includes a target image and depth data corresponding to each pixel in the target image
  • the number of target objects with the same semantic category on the target image data is determined.
  • an embodiment of the present invention provides an image processing device, including a memory and a processor;
  • the memory is used to store program instructions
  • the processor executes the program instructions stored in the memory, and when the program instructions are executed, the processor is configured to execute the following steps:
  • Target image data includes a target image and depth data corresponding to each pixel in the target image
  • the position of the target area in the target image data is determined.
  • an embodiment of the present invention provides another image processing device, including a memory and a processor;
  • the memory is used to store program instructions
  • the processor executes the program instructions stored in the memory, and when the program instructions are executed, the processor is configured to execute the following steps:
  • Target image data includes a target image and depth data corresponding to each pixel in the target image
  • the number of point data with the same semantic category on the target image data is determined.
  • an embodiment of the present invention provides a movable platform, including: a memory and a processor;
  • the memory is used to store program instructions
  • the processor executes the program instructions stored in the memory, and when the program instructions are executed, the processor is configured to execute the following steps:
  • Target image data includes a target image and depth data corresponding to each pixel in the target image
  • the position of the target area in the target image data is determined.
  • an embodiment of the present invention provides another movable platform, including: a memory and a processor;
  • the memory is used to store program instructions
  • the processor executes the program instructions stored in the memory, and when the program instructions are executed, the processor is configured to execute the following steps:
  • Target image data includes a target image and depth data corresponding to each pixel in the target image
  • the number of target objects with the same semantic category on the target image data is determined.
  • an embodiment of the present invention provides an unmanned aerial vehicle, the unmanned aerial vehicle including: a fuselage; a power system provided on the fuselage for providing flight power; as in the third aspect or the first The image processing equipment described in the four aspects.
  • an embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the above-mentioned first or second aspect is implemented.
  • Image processing method when the computer program is executed by a processor, the above-mentioned first or second aspect is implemented.
  • the image processing device may obtain target image data, the target image data including the target image and depth data corresponding to each pixel in the target image, and process the target image data to obtain the target image
  • the semantic confidence feature map of the data, and the location of the target region in the target image data is determined according to the confidence feature map.
  • Fig. 1 is a schematic structural diagram of an image processing system provided by an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of an image processing method according to an embodiment of the present invention.
  • 3a is a schematic diagram of a confidence characteristic map provided by an embodiment of the present invention.
  • 3b is a schematic diagram of an interface of target image data provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of marking a target object according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of another image processing method provided by an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of reference frame selection in an embodiment of a method for 3D reconstruction of a target scene provided by an embodiment of the present invention
  • FIG. 7 is a schematic structural diagram of an image processing device provided by an embodiment of the present invention.
  • Fig. 8 is a schematic structural diagram of another image processing device provided by an embodiment of the present invention.
  • the image processing method provided in the embodiment of the present invention may be executed by an image processing system, and the image processing system includes an image processing device and a movable platform.
  • the image processing equipment may be set on a movable platform (such as a drone) equipped with a load (such as a camera, an infrared detection device, a surveying instrument, etc.).
  • the image processing device may also be set on other movable devices, such as a robot that can move autonomously, an unmanned vehicle, an unmanned boat, and other movable devices.
  • the image processing device may be a component of a movable platform, that is, the movable platform includes the image processing device; in other embodiments, the image processing device may also be spatially independent On a movable platform.
  • FIG. 1 is a schematic structural diagram of an image processing system provided by an embodiment of the present invention.
  • the image processing system shown in FIG. 1 includes: an image processing device 11 and a movable platform 12, the image processing device 11 may be a control terminal of the movable platform 12, and specifically may be any one or more of a remote control, a smart phone, a tablet computer, a laptop computer, a ground station, and a wearable device (watch, bracelet).
  • the movable platform 12 may include movable equipment such as robots, unmanned vehicles, and unmanned ships that can move autonomously.
  • the movable platform 12 includes a power system 121, which is used to provide power for the movable platform 12 to move.
  • the movable platform 12 may also include a camera device 122, which is arranged on the main body of the movable platform 12.
  • the camera device 122 is used for image or video shooting during the movement of the movable platform 32, and includes, but is not limited to, a multispectral imager, a hyperspectral imager, a visible light camera, and an infrared camera.
  • the image processing device 11 in the image processing system can obtain target image data through the camera 122 mounted on the movable platform 12, and process the target image data to obtain the target image A confidence feature map of the semantics of the data, so as to determine the position of the target area in the target image data according to the confidence feature map.
  • the target image data includes a target image and depth data corresponding to each pixel in the target image.
  • FIG. 2 is a schematic flowchart of an image processing method provided by an embodiment of the present invention.
  • the method may be executed by an image processing device, wherein the specific explanation of the image processing device is as described above.
  • the method of the embodiment of the present invention includes the following steps.
  • the image processing device may obtain target image data.
  • the target image data includes a target image and depth data corresponding to each pixel in the target image.
  • the target image data may be obtained by the target image and depth data captured by a camera device mounted on a movable platform.
  • the target image includes but is not limited to a top view. Image from perspective.
  • the target image data includes a color image; or, the target image data includes a color image and depth data corresponding to the color image; or, the target image data includes an orthophoto; or, The target image data includes an orthophoto and depth data corresponding to the orthophoto.
  • S202 Process the target image data to obtain a semantic confidence feature map of the target image data.
  • the image processing device may process the target image data to obtain the semantic confidence feature map of the target image data.
  • the image processing device when the image processing device processes the target image data to obtain a semantic confidence feature map of the target image data, it may process the target image data based on a semantic recognition model to obtain the The semantic category and semantic confidence of each pixel in the target image data are based on the position data and height data corresponding to the target image data, and the semantic category and semantic category of each pixel in the target image data.
  • Semantic confidence generating point cloud data containing semantic categories and semantic confidence, so as to generate a confidence feature map as shown in Figure 3a according to the point cloud data containing semantic categories and semantic confidence, Figure 3a
  • Figure 3a It is a schematic diagram of a confidence feature map provided by an embodiment of the present invention. As shown in FIG. 3a, the confidence feature map includes point cloud data including semantic categories and semantic confidence.
  • Figure 3b is used as an example for illustration.
  • Figure 3b is a schematic diagram of a target image data interface provided by an embodiment of the present invention.
  • the image processing device can be based on the position data and height data of the target image data 31 shown in Figure 3b.
  • the semantic category and semantic confidence of each pixel are generated to generate point cloud data containing the semantic category and semantic confidence as shown in Figure 3a.
  • the image processing device can generate the semantic category and the semantic confidence corresponding to the n pixels in the target image data 31 as K1, K2,..., Kn according to the position data being m, the height data being h, and Kn, as shown in Figure 3a. Shown are point cloud data containing semantic categories and semantic confidence.
  • the point cloud data and the confidence feature map both include a plurality of point data, and each point data includes position data, height data, and multiple semantic categories with different confidence levels.
  • the point cloud Each point data contained in the data corresponds to each pixel point in the target image data.
  • the point cloud data in the confidence feature map is composed of multiple circles generated by a Gaussian distribution, and the confidence feature map generated by the Gaussian distribution improves the stability of the confidence feature map.
  • this embodiment does not limit the correspondence between the point cloud data and the pixels in the target image data.
  • the point cloud data can have a one-to-one correspondence with the pixels in the image data; each point cloud data can also correspond to multiple pixels.
  • the semantics of points are determined by the clustering results of multiple pixels.
  • the semantic recognition model may be a Convolutional Neural Network (CNN) model
  • the architecture of the CNN model mainly includes an input layer, a convolutional layer, an excitation layer, and a pooling layer.
  • CNN Convolutional Neural Network
  • a plurality of subnets may be included.
  • the subnets are arranged in a sequence from lowest to highest, and the input image data is processed through each of the subnets in the sequence.
  • the subnets in the sequence include multiple module subnets and optionally one or more other subnets, all of which are composed of one or more conventional neural network layers, such as a maximum pooling layer and a convolutional layer , Fully connected layer, regularization layer, etc.
  • Each subnet receives the previous output representation generated by the previous subnet in the sequence; processes the previous output representation through pass-through convolution to generate a pass-through output; processes the previous output representation through one or more groups of the neural network layer
  • the front output representation is used to generate one or more groups, and the through output and the group output are connected to generate the output representation of the module subnet.
  • the input layer is used to input image data
  • the convolutional layer is used to perform operations on the image data
  • the excitation layer is used to perform nonlinear mapping on the output result of the convolutional layer
  • the pooling layer is used to compress the amount of data and parameters, reduce overfitting, and improve performance.
  • This solution uses semantically annotated sample image data as input data, enters the input layer of the CNN model, and after the convolutional layer calculation, outputs different semantic confidence levels through multiple channels, such as farmland channels (confidence levels), fruit trees Channel (confidence level), river channel (confidence level), etc.
  • the output result of CNN it can be expressed as a tensor value.
  • the tensor value represents the three-dimensional point cloud information of the pixel and n
  • the semantic information of two channels, among which K1, K2,..., Kn represent the confidence, and the semantic channel with the highest confidence in the tensor data is used as the semantic category of the pixel.
  • the semantic category corresponding to the i-th channel is taken as the semantic category of the pixel.
  • the addition of depth data adds a dimension of information to the RGB pixel information obtained by the mobile platform.
  • the depth of field data is the data obtained by shooting with a binocular camera, and it can be calculated by processing a series of continuous image frames during the flight of the plane by the monocular camera.
  • the multiple semantic categories with different confidence levels are output from multiple channels after being identified by a semantic recognition model; in some embodiments, the difference from the output of a general neural network is: Add a segmented output function after the output channel of the neural network. If the channel confidence result is negative, the channel confidence result is set to zero to ensure that the confidence of the neural network output is positive floating point data. Using positive floating point data as the confidence of the semantic channel, you can directly obtain a greater confidence by subtracting two pixel data. Since the subtraction of tensor only needs to subtract the numerical content corresponding to the array, its operation The amount is very small, and the calculation speed can be greatly improved under the same computing power. Especially suitable for high-precision map drawing process, due to the high-precision map requires a lot of calculations, and the computational power is tight.
  • the position data corresponding to the target image data includes the longitude and latitude of the target image
  • the height data corresponding to the target image data is the height of the target image.
  • the location data and altitude data of the target image data may be obtained according to Global Positioning System (GPS) information, or the location data and altitude data of the target image data may be obtained according to carrier phase It is calculated by differential technology (Real-time kinematic, RTK).
  • the feature data corresponding to the target image data may be calculated according to the position data and height data of the target image data.
  • the image processing device processes the target image data to obtain the semantic confidence feature map of the target image data, it can be based on the semantics of each point data in the confidence feature map.
  • the confidence level of the post-processing is performed on the confidence level feature map, and the confidence level feature map is updated according to the result of the post-processing.
  • the image processing device may detect the confidence feature map when post-processing the confidence feature map according to the semantic confidence of each point data in the confidence feature map
  • the semantic confidence of each point data is deleted, and the point data whose semantic confidence in the confidence feature map is less than or equal to the preset confidence threshold is deleted, so that the image processing device is processed according to the post-processing
  • the confidence feature map may be updated based on the point cloud data after the deletion process.
  • the image processing device may detect that the confidence feature map is post-processed according to the semantic confidence of each point data in the confidence feature map. Confidence in the semantics of each point data in the confidence feature map, and delete the point data whose semantic confidence in the confidence feature map is less than or equal to a preset confidence threshold of 0.6, so that the image is processed
  • the device may update the confidence characteristic map based on the point cloud data after the deletion processing.
  • the point cloud data in the confidence feature map can be marked with different shapes of point data to mark different semantic categories, such as circular point data to mark a tree, and square point data to mark People, use triangular point data to mark rice, etc., in order to distinguish target objects of different semantic categories.
  • the point cloud data in the confidence feature map can also use point data of different colors to mark different semantic categories, such as using green round point data to mark the tree, and yellow round point data. Point data to mark people, red round point data to mark rice, etc.
  • the point cloud data in the confidence feature map may also be implemented by other marking methods, which are not specifically limited in the embodiment of the present invention.
  • the image processing device may obtain a sample database, the sample database including sample image data, and generate initial semantic recognition according to a preset semantic recognition algorithm Model, thereby training and optimizing the initial semantic recognition model based on each sample image data in the sample database to obtain the semantic recognition model.
  • the sample image data includes a sample image and semantic annotation information; or, the sample image data includes a sample image, depth data corresponding to each pixel in the sample image, and semantic annotation information.
  • the sample image data includes a sample image and depth data corresponding to each pixel in the sample image
  • the sample image may be an RGB image
  • the depth data may be obtained through a depth image.
  • the image processing device may generate an initial semantic recognition model according to a preset semantic recognition algorithm, and use the sample image data including semantic annotation information as input data, and input the initial semantic recognition model for training to obtain the training result, wherein ,
  • the training result includes the semantics of each pixel in the sample image and the confidence of each semantic.
  • the image processing device may compare the semantics of each pixel in the sample image in the training result with the semantic annotation information of the sample image, and if there is no match, adjust the initial semantic recognition The parameters in the model are generated until the semantics of each pixel in the training result sample image matches the semantic annotation information.
  • the sample image may include a color image or an orthoimage; in some embodiments, the orthoimage is a top view image that has been geometrically corrected (for example, to have a uniform scale), Different from the uncorrected top-view image, the amount of the orthophoto can be used to measure the actual distance, because it is a true description of the earth's surface obtained after geometric correction, and the orthophoto is informative, intuitive, and measurable.
  • the characteristics of the test may be an RGB image determined according to RGB values.
  • the depth data reflects the distance from the camera device to the subject.
  • the image processing device trains and optimizes the initial semantic recognition model based on each sample image data in the sample database, and when the semantic recognition model is obtained, the initial semantic recognition model may be called Identify the sample image included in the sample image data and the depth data corresponding to each pixel in the sample image to obtain a recognition result, if the recognition result matches the semantic annotation information included in the sample image data , Then the model parameters of the initial semantic recognition model can be optimized to obtain the semantic recognition model.
  • S203 Determine the position of the target area in the target image data according to the confidence characteristic map.
  • the image processing device may determine the position of the target area in the target image data according to the confidence characteristic map.
  • the image processing device when the image processing device determines the position of the target area in the target image data according to the confidence characteristic map, it may acquire the position data and the position data of each point data in the confidence characteristic map. Semantic category, and according to the position data and semantic category of each point data in the confidence feature map, determine the image regions with the same semantic category in the confidence feature map, so that the image regions in the confidence feature map have the same semantic category. The image area of the semantic category determines the position data of the target area on the ground in the target image data.
  • the target object on the ground in the target image data shown in Fig. 4 can be determined, and the position data of the target area corresponding to the target object on the ground can be determined, as shown in Fig. 4
  • the semantic category of the marked target object is a tree.
  • the semantic category of the target object may also include people, telephone poles, crops, etc., which is not specifically limited in the embodiment of the present invention.
  • the image processing device when the image processing device determines image regions with the same semantic category in the confidence feature map according to the position data and semantic category of each point data in the confidence feature map, it may The semantic category on the confidence feature map is determined, the image regions with the same continuous semantic category on the confidence feature map are determined, and the edge processing operation is performed on each image region with the continuous same semantic category to obtain the Each image area of different semantic categories on the point cloud map.
  • the image processing device may plan according to the position and semantic category of the target area in the target image data Route and control the movable platform to move according to the route.
  • the movable platform can be controlled to move along the route and perform tasks corresponding to the semantic category of the target area.
  • the image processing device when the image processing device plans a route according to the location and semantic category of the target area in the target image data, it may perform different tasks according to the image areas with different semantic categories on the confidence feature map.
  • the image areas of semantic categories are classified, and the route corresponding to the image areas of each category is planned according to the image areas of different categories.
  • the image processing device may determine the semantic category corresponding to the current position of the movable platform in the confidence characteristic map during the process of controlling the movable platform to move along the route. Whether it matches the semantic category of the target task. If the judgment result is yes, control the movable platform to execute the target task, and if the judgment result is no, control the movable platform to stop executing the target task.
  • the movable platform includes, but is not limited to, an unmanned aerial vehicle or an unmanned vehicle that automatically travels along a route.
  • the movable platform in the process of controlling the movable platform to move according to the route, is controlled to stay at a marked point in the route to perform a predetermined operation corresponding to the target task.
  • the predetermined operation includes a pesticide spraying operation
  • the pesticide spraying operation includes an operation of circular spraying around a designated point.
  • the image processing device can plan the area where the tree is located according to the location and semantic category of the target area in the target image data Route.
  • the image processing device can control the drone to move along the route, and control the drone to perform the task of spraying pesticides while moving along the route.
  • the image processing device may obtain target image data, the target image data including the target image and depth data corresponding to each pixel in the target image, and process the target image data to obtain the target image
  • the semantic confidence feature map of the data, and the location of the target region in the target image data is determined according to the confidence feature map.
  • FIG. 5 is a schematic flowchart of another image processing method provided by an embodiment of the present invention.
  • the method may be executed by an image processing device, wherein the specific explanation of the image processing device is as described above.
  • the difference between the embodiment of the present invention and the embodiment described in FIG. 2 is that the embodiment of the present invention mainly describes in detail the counting of target objects with the same semantic category in the target image data.
  • the target object can be fruit trees, buildings, people, vehicles, etc. that can be identified and counted in the target image.
  • the image processing device may obtain target image data.
  • the target image data includes the target image and depth data corresponding to each pixel in the target image.
  • the target image data includes a color image; or, the target image data includes a color image and depth data corresponding to the color image; or, the target image data includes an orthophoto; or, The target image data includes an orthophoto and depth data corresponding to the orthophoto.
  • S502 Process the target image data to obtain a semantic confidence feature map of the target image data.
  • the image processing device may process the target image data to obtain a semantic confidence feature map of the target image data.
  • the image processing device may process the target image data based on a semantic recognition model to obtain the semantic category and semantic confidence of each pixel in the target image data, and according to the The position data and height data corresponding to the target image data and the semantic category and semantic confidence of each pixel in the target image data are generated to generate point cloud data containing the semantic category and semantic confidence, so as The point cloud data containing the semantic category and the confidence of the semantics is used to generate the confidence feature map.
  • the specific implementation is as described above and will not be repeated here.
  • the point cloud data and the confidence feature map each include a plurality of point data, and each point data includes position data, height data, and multiple semantic categories with different confidence levels; the point cloud Each point data contained in the data corresponds to each pixel point in the target image data.
  • the image processing device processes the target image data to obtain the semantic confidence feature map of the target image data, it can be based on the semantics of each point data in the confidence feature map.
  • the confidence level of the post-processing is performed on the confidence level feature map, and the confidence level feature map is updated according to the result of the post-processing. The specific implementation is as described above and will not be repeated here.
  • the image processing device may detect the confidence feature map when post-processing the confidence feature map according to the confidence of the semantics of each point data in the confidence feature map.
  • the semantic confidence of each point data in the confidence feature map delete the point data whose semantic confidence is less than or equal to the preset confidence threshold in the confidence feature map, and based on the deleted point cloud data, Update the confidence characteristic map.
  • the specific implementation is as described above and will not be repeated here.
  • the image processing device may obtain a sample database, the sample database including sample image data; and generate initial semantic recognition according to a preset semantic recognition algorithm Model, and training and optimizing the initial semantic recognition model based on each sample image data in the sample database to obtain the semantic recognition model.
  • the sample image data includes a sample image and semantic annotation information; or, the sample image data includes a sample image, depth data corresponding to each pixel in the sample image, and semantic annotation information.
  • the image processing device may call the initial semantic recognition model to recognize the sample image included in the sample image data and the depth data corresponding to each pixel in the sample image, and obtain the recognition result If the recognition result matches the semantic annotation information included in the sample image data, the model parameters of the initial semantic recognition model can be optimized to obtain the semantic recognition model.
  • the specific implementation is as described above and will not be repeated here.
  • S503 Determine the number of target objects with the same semantic category on the target image data according to the confidence characteristic map.
  • the image processing device may determine the number of target objects with the same semantic category on the target image data according to the confidence characteristic map.
  • the image processing device when it determines the number of target objects with the same semantics on the target image data according to the confidence feature map, it can be based on the semantic category of each point data on the confidence feature map. , Classify the point data of different semantic categories on the confidence feature map, and calculate the number of point data of different categories on the confidence feature map, thereby determining the value of the point data of different categories on the confidence feature map The number is the number of target objects with the same semantics on the target image data.
  • Figure 3a can be taken as an example for illustration. It is assumed that the image processing device determines that the semantic categories on the confidence feature map are all trees according to the semantic category of each point data on the confidence feature map as shown in Figure 3a. If the image processing device calculates that the number of point data whose semantic category is a tree on the confidence feature map is 300, it can be determined that the number of trees on the target image data is 300.
  • the point data in the confidence feature map can be marked with different shapes, and the image processing device determines the target object with the same semantics on the target image data according to the confidence feature map.
  • different semantic categories can be determined according to the shape of each point data on the confidence feature map.
  • the image processing device can calculate the image area on the confidence feature map The number of circular point data is used to determine the number of trees, and the number of square point data on the confidence feature map is calculated to determine the number of people.
  • the point data in the confidence feature map can be marked with different colors, and the image processing device determines the target object with the same semantics on the target image data according to the confidence feature map.
  • different semantic categories can be determined according to the color of each point data on the confidence characteristic map.
  • the image processing device can calculate The number of red circular point data on the confidence feature map is used to determine the number of trees, and the number of yellow circular point data on the confidence feature map is calculated to determine the number of people.
  • the image processing device can track the feature points in the target image data according to the target image data. In some embodiments, the image processing device can determine the point cloud data according to the feature points.
  • an implementation method taking drones as an example can be:
  • first pose information of the first image frame in the world coordinate system includes: first real-time dynamic RTK information and first pan/tilt angle information;
  • the first pose information estimate the second pose information of the second image frame in the world coordinate system, where the second pose information includes: second RTK information and second pan/tilt angle information;
  • first image frame and the second image frame are two adjacent frames in the image sequence.
  • the pose of the second image frame is estimated according to the RTK information and pan/tilt angle information of the first image frame provided by the sensor. Because the accurate RTK information and pan/tilt angle information provided by the sensor are used, the accuracy of the estimated pose information of the second image frame will be greatly improved, and the accurate pose information improves the accuracy and speed of feature matching.
  • performing feature matching between the feature information of the first image frame and the feature information of the second image frame according to the first pose information and the second pose information may specifically include: acquiring the first image frame and the second image frame According to the features of the first image frame, according to the first pose information and the second pose information, the corresponding search range is determined in the second image frame to perform feature matching. Due to the acquisition of accurate pose information, not only the accurate search range can be determined, but the search range can be greatly reduced, thus not only improving the accuracy of feature matching but also increasing the speed of feature matching.
  • the overlap rate between two adjacent frames of images is low, resulting in poor tracking of feature points.
  • it is added to determine whether the previous frame is a key frame, and if it is a key frame, the feature information of the key frame is used to replace the original feature information of the previous frame. Since the key frame has an additional 3D point cloud generation operation, the available 3D point cloud generated by the image of the overlapping area can be used to the maximum within a limited time, so that the number of effective feature points for tracking is increased.
  • the RTK information and pan/tilt angle information provided by the sensor are added to the pose calculation, so that the pose calculation has higher accuracy and is less susceptible to interference from mismatching. It solves the problem that in the prior art, in the vision-based solution, when there is a mismatch, the accuracy of the pose solution is reduced or even errors occur.
  • the method for 3D reconstruction of the target scene provided in this embodiment before fusing the 3D point cloud of the key frame, may further include: corresponding to the key frame Using the non-linear optimization method to optimize the pose information of the key frame and the position of the three-dimensional point cloud.
  • This embodiment does not limit the specific algorithm used in the nonlinear optimization.
  • the Gauss Newton method, the crack Berg-Marquardt method, etc. may be used.
  • optimization processing is performed based on the RTK information and the pan/tilt angle information. It can include:
  • a local map which can be composed of the current frame, the common view key frames of the current frame, and the point clouds they can observe.
  • RTK information and pan/tilt angle information corresponding to each key frame participating in the optimization are added, so that the pose calculation of the key frame and the position of the three-dimensional point cloud are more accurate.
  • This embodiment introduces more accurate sensor information, that is, RTK information and pan/tilt angle information, in the nonlinear optimization process.
  • the optimized cost function not only considers the reprojection error, but also considers the current estimated pose and sensor.
  • the gap between the provided poses can be estimated by using the optimized cost function. The problem of poor stability caused by only considering the visual reprojection error in the prior art is solved.
  • this embodiment will also perform global optimization on all reserved key frames and three-dimensional point clouds. It is understandable that adding RTK information and pan/tilt angle information to the global optimization makes the final output result more accurate.
  • the image In the sequence a reference frame is selected for the key frame, and then the depth map of the key frame is determined according to the selected reference frame, and the three-dimensional point cloud of the key frame is obtained according to the depth map of the key frame.
  • the reference frame may include at least a first image frame and a second image frame. Wherein, the first image frame is located before the key frame in time series, and the second image frame is located after the key frame in time series.
  • the drone can fly along the planned route.
  • the drone flies along a route, a considerable part of the current image frame does not exist in the previously captured image frame. That is to say, if the reference frame only includes the image frames taken before the current image frame, when the depth map of the current image frame is determined according to the reference frame, there will be a considerable part of the parallax area that has no solution, and there must be large areas in the depth map Invalid area.
  • the reference frame in this embodiment includes both the first image frame that is located before the reference frame in time sequence. , Also includes the second image frame after the reference frame in time sequence, which improves the overlap rate between the key frame and the reference frame, reduces the area where the parallax is unsolvable, and improves the depth of the key frame obtained based on the reference frame The accuracy of the graph.
  • the reference frame includes two frames adjacent to the key frame.
  • the overlap rate between two adjacent frames is 70%.
  • the reference frame only includes the image frame before the key frame, at least 30% of the parallax in the key frame is unsolvable .
  • the reference frame selection strategy provided in this embodiment enables all areas in the key frame to find matching areas in the reference frame, avoiding the phenomenon of unresolved parallax, and improving the depth map of the key frame. accuracy.
  • the first image frame may include a preset number of image frames before the Nth frame
  • the second image frame may include a preset number of image frames after the Nth frame.
  • the first image frame may be one of a preset number of image frames before the Nth frame
  • the second image frame may be a preset number of image frames after the Nth frame One frame in.
  • the reference frame may include at least the third Image frame.
  • the epipolar directions of the third image frame and the key frame are not parallel.
  • the epipolar line in this embodiment is the epipolar line in the epipolar geometry, that is, the line of intersection between the polar plane and the image.
  • the polar line directions of the third image frame and the key frame are not parallel, that is, the first intersection line of the polar plane and the third image frame is not parallel to the second intersection line of the polar plane and the key frame.
  • the third image frame may include an image frame that has overlapping pixels with the key frame in the air belt adjacent to the key frame.
  • the third image frame may be an image frame with the highest overlap rate with the key frame in the adjacent flight belt of the key frame.
  • FIG. 6 is a schematic diagram of reference frame selection in an embodiment of a method for 3D reconstruction of a target scene provided by an embodiment of the present invention.
  • the solid line is used to indicate the flight route of the drone, the route covers the target scene, the arrow indicates the flight direction of the drone, and the black circle and black square on the flight route indicate the drone's shooting
  • the device shoots at this position, that is, the black circle and the black square correspond to an image frame of the target scene.
  • the image sequence of the target scene can be obtained through the shooting device on the UAV, such as a monocular camera, including multiple consecutive image frames in time sequence.
  • M-1, M, M+1, N-1, N, N+1 in FIG. 6 represent the frame numbers of image frames, and N and M are natural numbers, and the specific values of N and M are not limited in this embodiment .
  • the reference frame may include the N-1th frame and the N+1th frame shown in the figure.
  • the reference frame may include the Mth frame shown in the figure.
  • the reference frame may include the Mth frame, the N-1th frame, and the N+1th frame shown in the figure, as shown in Figure 3.
  • the reference frame may also include more image frames, for example, it may also include the M-1th frame, the M+1th frame, the N-2th frame, and so on.
  • the overlap ratio of the key frame and the reference frame and the calculation speed can be comprehensively considered for selection.
  • an implementation manner of obtaining the depth map of the key frame based on the reference frame may be: obtaining the depth map of the key frame according to the disparity between the key frame and the reference frame.
  • the depth map of the key frame can be obtained according to the disparity of the same object in the key frame and the reference frame.
  • an implementation manner of obtaining the three-dimensional point cloud of the key frame based on the image sequence may be: obtaining the depth map of the key frame according to the image sequence; according to the depth of the key frame Figure, obtain the three-dimensional point cloud of the key frame.
  • an implementation manner of obtaining the depth map of the key frame may be: determining the matching cost corresponding to the key frame according to the image sequence; To determine the depth map of the key frame.
  • the pixel points in the image sequence and the key frame can be matched to determine the matching cost corresponding to the key frame.
  • matching cost aggregation can be performed, and then the disparity is determined, and the depth map of the key frame is determined according to the correspondence between the disparity and the depth.
  • disparity optimization may be performed to enhance the disparity. According to the optimized and enhanced parallax, the depth map of the key frame is determined.
  • the flying height of drones is usually about 100 meters, and drones usually shoot vertically downwards. Due to the undulations on the ground, the reflection of sunlight is different, and the images taken by drones have unignorable light. Changes, lighting changes will reduce the accuracy of the 3D reconstruction of the target scene.
  • determining the matching cost corresponding to the key frame according to the image sequence may include: According to the image sequence, the first type matching cost and the second type matching cost corresponding to the key frame are determined; the matching cost corresponding to the key frame is determined to be equal to the weighted sum of the first type matching cost and the second type matching cost.
  • the matching cost is more robust to illumination, and thus reduces The influence of illumination changes on 3D reconstruction improves the accuracy of 3D reconstruction.
  • the weighting coefficients of the first-type matching cost and the second-type matching cost can be set according to specific needs, which is not limited in this embodiment.
  • the first-type matching cost may be determined based on Zero-based Normalized Cross Correlation (ZNCC). Based on ZNCC, the similarity between key frames and reference frames can be accurately measured.
  • ZNCC Zero-based Normalized Cross Correlation
  • the second type of matching cost may be determined based on the illumination invariant feature.
  • the illumination invariant features in the image frames collected by the drone can be extracted, such as Local Binary Patterns (LBP), census sequence, etc., and then the second type can be determined based on the illumination invariant features Matching cost.
  • LBP Local Binary Patterns
  • the census sequence in this embodiment can be determined in the following way: select any point in the image frame and draw a 3 ⁇ 3 rectangle with the point as the center, and every point in the rectangle except the center point is the same as the center point For comparison, the gray value is less than the center point, it is recorded as 1, and the gray value is greater than the center point, it is recorded as 0, and the sequence of length 8 with only 0 and 1 is used as the census sequence of the center point, that is, the center pixel The gray value is replaced by the census sequence.
  • the Hamming distance can be used to determine the second type matching cost of the key frame.
  • the matching cost corresponding to the key frame can be equal to the weighted sum of ZNCC and census.
  • an implementation manner of determining the depth map of the key frame may be: dividing the key frame into multiple image blocks; determining the matching corresponding to each image block according to the image sequence Cost: According to the matching cost corresponding to each of the image blocks, the matching cost corresponding to the key frame is determined.
  • one or more of the following methods may be used to divide the key frame into multiple image blocks:
  • the key frame is divided into multiple image blocks.
  • the key frame may be divided into a plurality of image blocks in a clustering manner.
  • the number of image blocks may be preset, and then the key frames are divided according to the preset number of image blocks.
  • the size of the image block can be preset, and then the key frames are divided according to the preset size of the image block.
  • the matching cost corresponding to each image block can be determined in parallel according to the image sequence.
  • software and/or hardware may be used to determine the matching cost corresponding to each image block in parallel.
  • multithreading may be used to determine the matching cost corresponding to each image block in parallel
  • a graphics processing unit GPU may be used to determine the matching cost corresponding to each image block in parallel.
  • the three-dimensional reconstruction method of the target scene provided by this embodiment, on the basis of the above-mentioned embodiment, divides the key frame into multiple image blocks, determines the matching cost corresponding to each image block in parallel according to the image sequence, and then according to each image The matching cost corresponding to the block, the matching cost corresponding to the key frame is determined, the calculation speed of the matching cost is increased, and the real-time performance of the 3D reconstruction of the target scene is improved.
  • the depth sampling times can be determined according to the depth range and accuracy.
  • the depth sampling times are positively correlated with the depth range and negatively correlated with the accuracy. For example, if the depth range is 50 meters and the accuracy requirement is 0.1 meters, then the depth sampling times can be 500.
  • the preset depth sampling times can be used, or the instant localization and mapping (Simultaneous Localization and Mapping, SLAM) can be used to recover some sparse 3D points in the key frame, and then according to these sparse 3D points Point to determine the depth range of the entire key frame, and then determine the number of depth samples according to the depth range and accuracy requirements of the entire key frame. If the number of depth samples is N, the matching cost needs to be calculated N times for each pixel in the key frame. For a key frame with a size of 640*480 pixels, 640*480*N matching costs need to be calculated.
  • SLAM Simultaneous Localization and Mapping
  • the matching cost corresponding to each image block is determined according to the image sequence , May include: determining the depth sampling times of the image block according to the sparse points in each image block; determining the matching cost corresponding to each image block according to the image sequence and the depth sampling times of each image block.
  • the key frame can contain a variety of subjects, such as pedestrians, cars, trees, tall buildings, etc. Therefore, the depth range of the entire key frame is relatively large, and the preset accuracy If required, the number of deep sampling is larger.
  • the depth range corresponding to each image block in the key frame is relatively small. For example, when only pedestrians are included in an image block, the depth range corresponding to the image block will be much smaller than the depth range of the entire key frame. Under the same accuracy requirements , Can greatly reduce the number of depth sampling. That is to say, under the same accuracy requirements, the depth sampling times of the image block in the key frame must be less than or equal to the depth sampling times of the entire key frame.
  • the image processing device may obtain target image data, where the target image data includes the target image and depth data corresponding to each pixel in the target image, and processes the target image data to obtain the A semantic confidence feature map of the target image data, so as to determine the number of target objects with the same semantic category on the target image data according to the confidence feature map.
  • FIG. 7 is a schematic structural diagram of an image processing device provided by an embodiment of the present invention.
  • the image processing device includes: a memory 701, a processor 702, and a data interface 703.
  • the memory 701 may include a volatile memory (volatile memory); the memory 701 may also include a non-volatile memory (non-volatile memory); the memory 701 may also include a combination of the foregoing types of memories.
  • the processor 702 may be a central processing unit (CPU).
  • the processor 702 may further include a hardware chip.
  • the aforementioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof. For example, it may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), or any combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • FPGA field-programmable gate array
  • the memory 701 is used to store program instructions.
  • the processor 702 can call the program instructions stored in the memory 701 to perform the following steps:
  • Target image data includes a target image and depth data corresponding to each pixel in the target image
  • the position of the target area in the target image data is determined.
  • processor 702 processes the target image data to obtain the semantic confidence feature map of the target image data, it is specifically used for:
  • the confidence feature map is generated.
  • the point cloud data and the confidence feature map both include a plurality of point data, and each point data includes position data, height data, and multiple semantic categories with different confidence levels.
  • processor 702 processes the target image data to obtain the semantic confidence feature map of the target image data, it is further used to:
  • the confidence feature map is updated according to the result of the post-processing.
  • processor 702 is specifically configured to perform post-processing on the confidence feature map according to the semantic confidence of each point data in the confidence feature map:
  • the confidence characteristic map is updated.
  • processor 702 is specifically configured to: when determining the position of the target area in the target image data according to the confidence characteristic map:
  • the processor 702 is further configured to:
  • Plan a route according to the location and semantic category of the target area in the target image data
  • processor 702 is specifically configured to: when planning a route according to the location and semantic category of the target area in the target image data:
  • plan the routes corresponding to each type of image area According to different types of image areas, plan the routes corresponding to each type of image area.
  • processor 702 controls the movable platform to move according to the route, it is specifically used for:
  • the movable platform includes an unmanned aerial vehicle or an unmanned vehicle that automatically travels according to a route.
  • processor is also used for:
  • the movable platform In the process of controlling the movable platform to move according to the route, the movable platform is controlled to stay at the marked points in the route to perform a predetermined operation corresponding to the target task.
  • the predetermined operation includes pesticide spraying operation.
  • the pesticide spraying operation includes an operation of circular spraying around a designated point.
  • the target image data includes a color image
  • the target image data includes a color image and depth data corresponding to the color image; or,
  • the target image data includes an orthophoto; or,
  • the target image data includes an orthophoto and depth data corresponding to the orthophoto.
  • the processor 702 is further configured to:
  • sample database including sample image data
  • the sample image data includes a sample image and semantic annotation information; or, the sample image data includes a sample image, depth data corresponding to each pixel in the sample image, and semantic annotation information.
  • the processor 702 trains and optimizes the initial semantic recognition model based on each sample image data in the sample database, and when obtaining the semantic recognition model, it is specifically used to:
  • the model parameters of the initial semantic recognition model are optimized to obtain the semantic recognition model.
  • the image processing device may obtain target image data, the target image data including the target image and depth data corresponding to each pixel in the target image, and process the target image data to obtain the target image
  • the semantic confidence feature map of the data, and the location of the target region in the target image data is determined according to the confidence feature map.
  • FIG. 8 is a schematic structural diagram of another image processing device provided by an embodiment of the present invention.
  • the image processing device includes: a memory 801, a processor 802, and a data interface 803.
  • the memory 801 may include a volatile memory (volatile memory); the memory 801 may also include a non-volatile memory (non-volatile memory); the memory 801 may also include a combination of the foregoing types of memories.
  • the processor 802 may be a central processing unit (CPU).
  • the processor 802 may further include a hardware chip.
  • the aforementioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof. For example, it may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), or any combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • FPGA field-programmable gate array
  • the memory 801 is used to store program instructions.
  • the processor 802 can call the program instructions stored in the memory 801 to perform the following steps:
  • Target image data includes a target image and depth data corresponding to each pixel in the target image
  • the number of target objects with the same semantic category on the target image data is determined.
  • the processor 802 determines the number of target objects with the same semantics on the target image data according to the confidence characteristic map, it is specifically configured to:
  • the number of point data of different categories on the confidence feature map is the number of target objects with the same semantics on the target image data.
  • processor 802 processes the target image data to obtain the semantic confidence feature map of the target image data, it is specifically used for:
  • the confidence feature map is generated.
  • the point cloud data and the confidence feature map both include a plurality of point data, and each point data includes position data, height data, and multiple semantic categories with different confidence levels; each point cloud data contains Each point data corresponds to each pixel point in the target image data.
  • processor 802 processes the target image data to obtain the semantic confidence feature map of the target image data, it is further used to:
  • the confidence feature map is updated according to the result of the post-processing.
  • processor 802 performs post-processing on the confidence feature map according to the semantic confidence of each point data in the confidence feature map, it is specifically configured to:
  • the confidence characteristic map is updated.
  • the target image data includes a color image
  • the target image data includes a color image and depth data corresponding to the color image; or,
  • the target image data includes an orthophoto; or,
  • the target image data includes an orthophoto and depth data corresponding to the orthophoto.
  • the processor 802 is further configured to:
  • sample database including sample image data
  • the sample image data includes a sample image and semantic annotation information; or, the sample image data includes a sample image, depth data corresponding to each pixel in the sample image, and semantic annotation information.
  • the processor 802 trains and optimizes the initial semantic recognition model based on each sample image data in the sample database to obtain the semantic recognition model, which is specifically used for:
  • the model parameters of the initial semantic recognition model are optimized to obtain the semantic recognition model.
  • the image processing device may obtain target image data, where the target image data includes the target image and depth data corresponding to each pixel in the target image, and processes the target image data to obtain the A semantic confidence feature map of the target image data, so as to determine the number of target objects with the same semantic category on the target image data according to the confidence feature map.
  • the embodiment of the present invention also provides a movable platform.
  • the movable platform includes: a power system for providing mobile power for the movable platform; a memory and a processor; and a processor for executing the following steps :
  • Target image data includes a target image and depth data corresponding to each pixel in the target image
  • the position of the target area in the target image data is determined.
  • the processor processes the target image data to obtain the semantic confidence feature map of the target image data, it is specifically used for:
  • the confidence feature map is generated.
  • the point cloud data and the confidence feature map both include a plurality of point data, and each point data includes position data, height data, and multiple semantic categories with different confidence levels.
  • the processor processes the target image data to obtain the semantic confidence feature map of the target image data, it is further used for:
  • the confidence feature map is updated according to the result of the post-processing.
  • the processor is specifically configured to perform post-processing on the confidence feature map according to the confidence of the semantics of each point data in the confidence feature map:
  • the confidence characteristic map is updated.
  • the processor is specifically configured to: when determining the position of the target area in the target image data according to the confidence characteristic map:
  • the processor is further configured to:
  • Plan a route according to the location and semantic category of the target area in the target image data
  • the processor is specifically configured to: when planning a route according to the location and semantic category of the target area in the target image data:
  • plan the routes corresponding to each type of image area According to different types of image areas, plan the routes corresponding to each type of image area.
  • the processor controls the movable platform to move according to the route, it is specifically used for:
  • the movable platform includes an unmanned aerial vehicle or an unmanned vehicle that automatically travels according to a route.
  • processor is also used for:
  • the movable platform In the process of controlling the movable platform to move according to the route, the movable platform is controlled to stay at the marked points in the route to perform a predetermined operation corresponding to the target task.
  • the predetermined operation includes pesticide spraying operation.
  • the pesticide spraying operation includes an operation of circular spraying around a designated point.
  • the target image data includes a color image
  • the target image data includes a color image and depth data corresponding to the color image; or,
  • the target image data includes an orthophoto; or,
  • the target image data includes an orthophoto and depth data corresponding to the orthophoto.
  • the processor processes the target image data based on the semantic recognition model, it is further configured to:
  • sample database including sample image data
  • the sample image data includes a sample image and semantic annotation information; or, the sample image data includes a sample image, depth data corresponding to each pixel in the sample image, and semantic annotation information.
  • the processor trains and optimizes the initial semantic recognition model based on each sample image data in the sample database, and when obtaining the semantic recognition model, it is specifically used for:
  • the model parameters of the initial semantic recognition model are optimized to obtain the semantic recognition model.
  • the movable platform can obtain target image data, the target image data includes the target image and the depth data corresponding to each pixel in the target image, and the target image data is processed to obtain the target image
  • the semantic confidence feature map of the data, and the location of the target region in the target image data is determined according to the confidence feature map.
  • the embodiment of the present invention also provides another movable platform.
  • the movable platform includes: a power system for providing mobile power for the movable platform; a memory and a processor; and a processor for executing the following step:
  • Target image data includes a target image and depth data corresponding to each pixel in the target image
  • the number of target objects with the same semantic category on the target image data is determined.
  • the processor determines the number of target objects with the same semantics on the target image data according to the confidence characteristic map, it is specifically configured to:
  • the number of point data of different categories on the confidence feature map is the number of target objects with the same semantics on the target image data.
  • the processor processes the target image data to obtain the semantic confidence feature map of the target image data, it is specifically used for:
  • the confidence feature map is generated.
  • the point cloud data and the confidence feature map both include a plurality of point data, and each point data includes position data, height data, and multiple semantic categories with different confidence levels; each point cloud data contains Each point data corresponds to each pixel point in the target image data.
  • the processor processes the target image data to obtain the semantic confidence feature map of the target image data, it is further used for:
  • the confidence feature map is updated according to the result of the post-processing.
  • the processor performs post-processing on the confidence feature map according to the semantic confidence of each point data in the confidence feature map, it is specifically configured to:
  • the confidence characteristic map is updated.
  • the target image data includes a color image
  • the target image data includes a color image and depth data corresponding to the color image; or,
  • the target image data includes an orthophoto; or,
  • the target image data includes an orthophoto and depth data corresponding to the orthophoto.
  • the processor processes the target image data based on the semantic recognition model, it is further configured to:
  • sample database including sample image data
  • the sample image data includes a sample image and semantic annotation information; or, the sample image data includes a sample image, depth data corresponding to each pixel in the sample image, and semantic annotation information.
  • the processor trains and optimizes the initial semantic recognition model based on each sample image data in the sample database to obtain the semantic recognition model, which is specifically used for:
  • the model parameters of the initial semantic recognition model are optimized to obtain the semantic recognition model.
  • the mobile platform can acquire target image data, the target image data includes the target image and the depth data corresponding to each pixel in the target image, and the target image data is processed to obtain the A semantic confidence feature map of the target image data, so as to determine the number of target objects with the same semantic category on the target image data according to the confidence feature map.
  • An embodiment of the present invention also provides an unmanned aerial vehicle, including: a fuselage; a power system provided on the fuselage for providing flight power; a camera device for shooting target image data; and the power system includes : Paddles and motors, used to drive the rotation of the paddles; the image processing equipment as described in Figure 7 or Figure 8.
  • a computer-readable storage medium stores a computer program.
  • the present invention is implemented as shown in FIG.
  • the image processing method described in the embodiment can also implement the image processing device of the corresponding embodiment of the present invention described in FIG. 7 or FIG. 8, which will not be repeated here.
  • the computer-readable storage medium may be the internal storage unit of the device described in any of the foregoing embodiments, such as the hard disk or memory of the device.
  • the computer-readable storage medium may also be an external storage device of the device, such as a plug-in hard disk equipped on the device, a Smart Media Card (SMC), or a Secure Digital (SD) card. , Flash Card, etc.
  • the computer-readable storage medium may also include both an internal storage unit of the device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the device.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
  • the program can be stored in a computer readable storage medium. During execution, it may include the procedures of the above-mentioned method embodiments.
  • the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Remote Sensing (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pest Control & Pesticides (AREA)
  • Astronomy & Astrophysics (AREA)
  • Image Analysis (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

本发明实施例提供了一种图像处理方法、设备、可移动平台、无人机及存储介质,其中,方法包括:获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。通过这种方式,可自动生成置信度特征图,实现了快速、高效地识别目标图像数据中的目标区域。

Description

图像处理方法、设备、可移动平台、无人机及存储介质 技术领域
本发明涉及控制技术领域,尤其涉及一种图像处理方法、设备、可移动平台、无人机及存储介质。
背景技术
目前可移动平台(如无人机、无人车、无人船)的发展越来越重要,发展速度越来越快。可移动平台的应用非常多,其中,以挂载有拍摄装置的无人机为例,无人机在航拍技术上的应用尤为广泛。然而,传统的无人机的航拍技术在拍摄过程中无法自动识别所拍摄图像中某图像区域中目标对象的数量,需依靠人工来判断拍摄图像中该图像区域中目标对象的数量,这种方法操作繁琐,效率较低。因此如何更高效、快速地识别目标对象成为研究的重点。
发明内容
本发明实施例提供了一种图像处理方法、设备、可移动平台、无人机及存储介质,可高效、快速地自动识别出目标图像数据中的目标区域。
第一方面,本发明实施例提供了一种图像处理方法,包括:
获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。
第二方面,本发明实施例提供了另一种图像处理方法,包括:
获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
根据所述置信度特征图,确定所述目标图像数据上具有相同语义类别的目标对象的数量。
第三方面,本发明实施例提供了一种图像处理设备,包括存储器和处理器;
所述存储器,用于存储程序指令;
所述处理器,执行所述存储器存储的程序指令,当程序指令被执行时,所述处理器用于执行如下步骤:
获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。
第四方面,本发明实施例提供了另一种图像处理设备,包括存储器和处理器;
所述存储器,用于存储程序指令;
所述处理器,执行所述存储器存储的程序指令,当程序指令被执行时,所述处理器用于执行如下步骤:
获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
根据所述置信度特征图,确定所述目标图像数据上具有相同语义类别的点数据的数量。
第五方面,本发明实施例提供了一种可移动平台,包括:存储器和处理器;
所述存储器,用于存储程序指令;
所述处理器,执行所述存储器存储的程序指令,当程序指令被执行时,所述处理器用于执行如下步骤:
获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。
第六方面,本发明实施例提供了另一种可移动平台,包括:存储器和处理 器;
所述存储器,用于存储程序指令;
所述处理器,执行所述存储器存储的程序指令,当程序指令被执行时,所述处理器用于执行如下步骤:
获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
根据所述置信度特征图,确定所述目标图像数据上具有相同语义类别的目标对象的数量。
第七方面,本发明实施例提供了一种无人机,所述无人机包括:机身;设置于所述机身上的动力系统,用于提供飞行动力;如上述第三方面或第四方面所述的图像处理设备。
第八方面,本发明实施例提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现如上述第一方面或第二方面所述的图像处理方法。
本发明实施例中,图像处理设备可以获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据,并对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图,以及根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。通过这种实施方式,实现了快速、高效地识别目标图像数据中的目标区域,从而提高图像处理效率。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种图像处理系统的结构示意图;
图2是本发明实施例提供的一种图像处理方法的流程示意图;
图3a是本发明实施例提供的一种置信度特征图的示意图;
图3b是本发明实施例提供的一种目标图像数据的界面示意图;
图4是本发明实施例提供的一种标记目标对象的示意图;
图5是本发明实施例提供的另一种图像处理方法的流程示意图;
图6是本发明实施例提供的目标场景三维重建方法一实施例中参考帧选取的示意图;
图7是本发明实施例提供的一种图像处理设备的结构示意图;
图8是本发明实施例提供的另一种图像处理设备的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
下面结合附图,对本发明的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。
在本发明实施例提供的图像处理方法可以由一种图像处理系统执行,所述图像处理系统包括图像处理设备和可移动平台。在某些实施例中,所述图像处理设备可以设置在配置有负载(如拍摄装置、红外探测装置、测绘仪等)的可移动平台(如无人机)上。在其他实施例中,所述图像处理设备还可以设置在其他可移动设备上,如能够自主移动的机器人、无人车、无人船等可移动设备。在某些实施例中,所述图像处理设备可以是可移动平台的部件,即所述可移动平台包括所述图像处理设备;在其他实施例中,所述图像处理设备还可以在空间上独立于可移动平台。下面结合附图对本发明实施例中的图像处理系统进行举例说明。
具体请参见图1,图1是本发明实施例提供的一种图像处理系统的结构示意图,如图1所示的图像处理系统包括:图像处理设备11和可移动平台12,所述图像处理设备11可以为可移动平台12的控制终端,具体地可以为遥控器、智能手机、平板电脑、膝上型电脑、地面站、穿戴式设备(手表、手环)中的任意一种或多种。所述可移动平台12可以包括能够自主移动的机器人、无人车、无人船等可移动设备。可移动平台12包括动力系统121,动力系统用于 为可移动平台12提供移动的动力,可移动平台12还可以包括摄像装置122,摄像装置122通过设置于可移动平台12的主体上。摄像装置122用于在可移动平台32的移动过程中进行图像或视频拍摄,包括但不限于多光谱成像仪、高光谱成像仪、可见光相机及红外相机等。
本发明实施例中,所述图像处理系统中图像处理设备11可以通过挂载在所述可移动平台12上的摄像装置122获取目标图像数据,并对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图,从而根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。在某些实施例中,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据。
下面结合附图对应用于可移动平台的图像处理方法的进行示意性说明。
请参见图2,图2是本发明实施例提供的一种图像处理方法的流程示意图,所述方法可以由图像处理设备执行,其中,所述图像处理设备的具体解释如前所述。具体地,本发明实施例的所述方法包括如下步骤。
S201:获取目标图像数据。
本发明实施例中,图像处理设备可以获取目标图像数据,在某些实施例中,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据。在某些实施例中,所述目标图像数据可以通过挂载在可移动平台上的摄像装置拍摄得到的目标图像和景深数据得到,在某些实施例中,所述目标图像包括但不限于俯视图视角下的图像。
在某些实施例中,所述目标图像数据包括彩色图像;或者,所述目标图像数据包括彩色图像和所述彩色图像对应的景深数据;或者,所述目标图像数据包括正射影像;或者,所述目标图像数据包括正射影像和所述正射影像对应的景深数据。
S202:对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图。
本发明实施例中,图像处理设备可以对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图。
在一个实施例中,图像处理设备在对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图时,可以基于语义识别模型处理所述目 标图像数据,以获得所述目标图像数据中每个像素点所具有的语义类别和语义的置信度,并根据所述目标图像数据对应的位置数据、高度数据以及所述目标图像数据中每个像素点所具有的语义类别和语义的置信度,生成包含语义类别和语义的置信度的点云数据,从而根据所述包含语义类别和语义的置信度的点云数据,生成如图3a所示的置信度特征图,图3a是本发明实施例提供的一种置信度特征图的示意图,如图3a所示,所述置信度特征图包括包含语义类别和语义的置信度的点云数据。
具体可以图3b为例进行说明,图3b是本发明实施例提供的一种目标图像数据的界面示意图,所述图像处理设备可以根据如图3b所示的目标图像数据31的位置数据、高度数据、每个像素点的语义类别和语义置信度,生成如图3a所示包含语义类别和语义的置信度的点云数据。例如,假设所述目标图像数据31的位置数据为m、高度数据为h、所述目标图像数据31中n个像素点对应的语义类别和语义置信度分别为K1,K2,…,Kn,则图像处理设备可以根据所述位置数据为m、高度数据为h、所述目标图像数据31中n个像素点对应的语义类别和语义置信度分别为K1,K2,…,Kn,生成如图3a所示包含语义类别和语义的置信度的点云数据。
在某些实施例中,所述点云数据和所述置信度特征图均包含复数个点数据,每个点数据包括位置数据、高度数据和不同置信度的多个语义类别,所述点云数据包含的每个点数据与所述目标图像数据中的每个像素点对应。在某些实施例中,所述置信度特征图中的点云数据是由高斯分布生成的多个圆形组成,通过高斯分布生成的置信度特征图,提高了置信度特征图的稳定性。当然,本实施例不对点云数据与目标图像数据中像素点的对应关系进行限定,点云数据可以与图像数据中的像素点呈一一对应关系;每个点云数据也可以对应多个像素点,其语义由多个像素点的聚类结果决定。
在某些实施例中,所述语义识别模型可以为卷积神经网络(Convolutional Neural Network,CNN)模型,所述CNN模型的架构主要包括输入层、卷积层、激励层、池化层。在神经网络模型中,可以包括多个子网,所述子网被布置在从最低到最高的序列中,并且,通过所述序列中的子网中的每一个来处理输入的图像数据。序列中的子网包括多个模块子网以及可选地包括一个或多个其它子网,所述其它子网均由一个或者多个常规神经网络层组成,例如最大池化层、 卷积层、全连接层、正则化层等。每个子网接收由序列中的前子网生成的在前输出表示;通过直通卷积来处理所述在前输出表示,以生成直通输出;通过神经网络层的一个或者多个群组来处理在前输出表示,以生成一个或者多个群组,连接所述直通输出和所述群组输出,以生成所述模块子网的输出表示。
在某些实施例中,所述输入层用于输入图像数据,所述卷积层用于对所述图像数据进行运算,所述激励层用于对卷积层输出的结果做非线性映射,所述池化层用于压缩数据和参数的量,减少过拟合,提高性能。本方案采用进行语义标注后的样本图像数据作为输入数据,输入CNN模型的输入层,经过卷积层计算之后,通过多个通道输出不同语义的置信度,例如,农田通道(置信度)、果树通道(置信度)、河流通道(置信度)等。作为CNN的输出结果,可以表示为一个张量数值,例如,对于某一个像素点{经纬度,高度,K1,K2,…,Kn},该张量数值表示了像素点的三维点云信息和n个通道的语义信息,其中,K1,K2,…,Kn表示置信度,张量数据中置信度最大的语义通道被作为该像素点的语义类别。例如,第i个语义通道的置信度Ki=0.8,是最高的置信度,则该第i个通道对应的语义类别被作为该像素点的语义类别。在某些实施例中,景深数据的加入,为可移动平台获得的RGB像素信息增加了一个维度的信息,利用RGB数据集合景深数据,能够优化训练的过程,并且大大提高训练模型对地面物体识别的准确度。景深数据是通过双目相机拍摄获得的数据,可以是通过单目相机在飞机飞行过程中对一系列连续图像帧处理获得的数据计算得到。
在某些实施例中,所述不同置信度的多个语义类别是通过语义识别模型识别之后从多个通道输出得到的;在某些实施例中,与一般神经网络输出的结果不同的是,在神经网络的输出通道后增加分段输出函数,若通道置信度结果为负值,则将通道置信度结果置为零,保证神经网络输出的置信度为正浮点数据。使用正浮点数据作为语义通道的置信度,可以直接通过两个像素点数据的减法运算获得较大的置信度,由于张量的减法运算只需要对数组对应的数值内容进行减法操作,其运算量非常小,在同等算力的情况下,可以大大提高运算速度。尤其适合高精度地图绘制过程中,由于高精度地图需要大量运算,而造成的算力紧张问题。
在某些实施例中,所述目标图像数据对应的位置数据包括所述目标图像的 经度和纬度,所述目标图像数据对应的高度数据为所述目标图像的高度。在某些实施例中,所述目标图像数据的位置数据和高度数据可以根据全球定位系统(Global Positioning System,GPS)信息得到,或者,所述目标图像数据的位置数据和高度数据可以根据载波相位差分技术(Real-time kinematic,RTK)计算得到。在某些实施例中,所述目标图像数据对应的地物数据可以根据所述目标图像数据的位置数据和高度数据计算得到。通过这种实施方式可以生成目标图像数据的语义的置信度特征图,以便可移动平台在拍摄应用中可以根据置信度特征图上的语义类别,确定地物类别。
在一个实施例中,所述图像处理设备对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图之后,可以根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理,并根据后处理的结果更新所述置信度特征图。
在一个实施例中,所述图像处理设备根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理时,可以检测所述置信度特征图中每个点数据的语义的置信度,并对所述置信度特征图中语义的置信度小于或等于预设置信度阈值的点数据进行删除处理,以使所述图像处理设备在根据后处理的结果更新所述置信度特征图时,可以基于所述删除处理后的点云数据,更新所述置信度特征图。
例如,假设预设置信度阈值为0.6,所述图像处理设备根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理时,可以检测所述置信度特征图中每个点数据的语义的置信度,并对所述置信度特征图中语义的置信度小于或等于预设置信度阈值0.6的点数据进行删除处理,以使所述图像处理设备在根据后处理的结果更新所述置信度特征图时,可以基于所述删除处理后的点云数据,更新所述置信度特征图。
在某些实施例中,所述置信度特征图中的点云数据可以用不同形状的点数据来标记不同的语义类别,如用圆形的点数据来标记树、用方形的点数据来标记人、用三角形的点数据来标记水稻等,以便于对不同语义类别的目标对象进行区分。在某些实施例中,所述置信度特征图中的点云数据还可以用不同颜色的点数据来标记不同的语义类别,如用绿色圆形的点数据来标记树、用黄色圆形的点数据来标记人、用红色圆形的点数据来标记水稻等。当然,所述置信度 特征图中的点云数据还可以用其他的标记方式来实现,本发明实施例不做具体限定。
在一个实施例中,所述图像处理设备在基于语义识别模型处理所述目标图像数据之前,可以获取样本数据库,所述样本数据库包括样本图像数据,并根据预设的语义识别算法生成初始语义识别模型,从而基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型。在某些实施例中,所述样本图像数据包括样本图像和语义标注信息;或者,所述样本图像数据包括样本图像、所述样本图像中各个像素点对应的景深数据和语义标注信息。
在某些实施例中,所述样本图像数据包括样本图像和样本图像中各像素点对应的景深数据,所述样本图像可以是RGB图像,所述景深数据可以通过深度图像获取。所述图像处理设备可以根据预设的语义识别算法生成初始语义识别模型,并将所述包括语义标注信息的样本图像数据作为输入数据,输入该初始语义识别模型中进行训练,得到训练结果,其中,所述训练结果包括所述样本图像中每个像素点的语义以及各语义的置信度。在得到训练结果之后,所述图像处理设备可以将所述训练结果中样本图像中每个像素点的语义与所述样本图像的语义标注信息进行对比,如果不匹配,则调整所述初始语义识别模型中的参数,直至训练结果样本图像中每个像素点的语义与所述语义标注信息相匹配时,生成所述语义识别模型。
在一些实施例中,所述样本图像可以包括彩色图像或正射影像;在某些实施例中,所述正射影像是一种经过几何纠正(比如使之拥有统一的比例尺)的俯视图像,与没有纠正过的俯视图像不同的是,正射影像量可用于测实际距离,因为它是通过几何纠正后得到的地球表面的真实描述,所述正射影像具有信息量丰富、直观、可量测的特性。在某些实施例中,所述彩色图像可以是根据RGB值确定的RGB图像。在某些实施例中,所述景深数据反映所述摄像装置到被拍摄物的距离。
在一个实施例中,所述图像处理设备在基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型时,可以调用所述初始语义识别模型对所述样本图像数据包括的所述样本图像以及所述样本图像中各个像素点对应的景深数据进行识别,得到识别结果,若所 述识别结果与所述样本图像数据包括的语义标注信息相匹配,则可以对所述初始语义识别模型的模型参数进行优化,以得到所述语义识别模型。
S203:根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。
本发明实施例中,图像处理设备可以根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。
在一个实施例中,所述图像处理设备在根据所述置信度特征图,确定所述目标图像数据中目标区域的位置时,可以获取所述置信度特征图中每个点数据的位置数据和语义类别,并根据所述置信度特征图中每个点数据的位置数据和语义类别,确定所述置信度特征图中具有相同语义类别的图像区域,从而根据所述置信度特征图中具有相同语义类别的图像区域,确定所述目标图像数据中地面上的目标区域的位置数据。
例如,根据图3a所示的置信度特征图可以确定出如图4所示的目标图像数据中地面上的目标对象,并确定出所述地面上目标对象对应的目标区域的位置数据,图4是本发明实施例提供的一种标记目标对象的示意图,如图4所示标记的目标对象的语义类别为树。当然在其他实施例中,所述目标对象的语义类别还可以包括人、电线杆、农作物等,本发明实施例不做具体限定。
在一个实施例中,所述图像处理设备在根据所述置信度特征图中每个点数据的位置数据和语义类别,确定所述置信度特征图中具有相同语义类别的图像区域时,可以根据所述置信度特征图上的语义类别,确定所述置信度特征图上具有连续相同语义类别的图像区域,并对所述具有连续相同语义类别的各图像区域进行边沿处理操作,以得到所述点云地图上不同语义类别的各图像区域。
在一个实施例中,所述图像处理设备在根据所述置信度特征图,确定所述目标图像数据中目标区域的位置之后,可以根据所述目标图像数据中目标区域的位置和语义类别,规划航线,并控制可移动平台按照所述航线移动。通过这种实施方式,可以控制可移动平台按照所述航线移动,并执行与所述目标区域的语义类别对应的任务。
在一些实施例中,所述图像处理设备在根据所述目标图像数据中目标区域的位置和语义类别,规划航线时,可以根据所述置信度特征图上具有不同语义类别的图像区域,对不同语义类别的图像区域进行分类,并根据不同类别的图像区域,规划各类别的图像区域对应的航线。
在一些实施例中,所述图像处理设备在控制所述可移动平台按照所述航线移动的过程中,可以判断所述可移动平台的当前位置在所述置信度特征图中所对应的语义类别是否与目标任务的语义类别相匹配。如果判断结果为是,则控制所述可移动平台执行所述目标任务,如果判断结果为否,则控制所述可移动平台停止执行所述目标任务。在某些实施例中,所述可移动平台包括但不限于无人机或者按照航线自动行驶的无人车。
在一些实施例中,在控制所述可移动平台按照所述航线移动的过程中,控制所述可移动平台在所述航线中的标记点停留,以执行与目标任务对应的预定操作。在某些实施例中,所述预定操作包括农药喷洒操作,所述农药喷洒操作包括围绕指定点进行环形喷洒的操作。
例如,假设所述可移动平台为无人机,所述目标区域的语义类别为树,则所述图像处理设备可以根据所述目标图像数据中目标区域的位置和语义类别,规划出树所在区域的航线。当所述无人机需要执行喷洒农药的任务时,所述图像处理设备可以控制无人机按照所述航线移动,以及控制无人机在按照所述航线移动的过程中执行喷洒农药的任务。
本发明实施例中,图像处理设备可以获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据,并对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图,以及根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。通过这种实施方式,实现了快速、高效地识别目标图像数据中的目标区域的位置,从而提高了对图像区域的定位效率。
请参见图5,图5是本发明实施例提供的另一种图像处理方法的流程示意图,所述方法可以由图像处理设备执行,其中,图像处理设备的具体解释如前所述。本发明实施例与上述图2所述实施例的区别在于,本发明实施例主要是对目标图像数据中具有相同语义类别的目标对象的计数进行详细的说明。目标对象可以是果树、建筑、人、车辆等等在目标图像中可以被识别和计数的物体。
S501:获取目标图像数据。
本发明实施例中,图像处理设备可以获取目标图像数据,在某些实施例中,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数 据。
在某些实施例中,所述目标图像数据包括彩色图像;或者,所述目标图像数据包括彩色图像和所述彩色图像对应的景深数据;或者,所述目标图像数据包括正射影像;或者,所述目标图像数据包括正射影像和所述正射影像对应的景深数据。具体实施例如前所述,此处不再赘述。
S502:对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图。
本发明实施例中,图像处理设备可以对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图。
在一个实施例中,所述图像处理设备可以基于语义识别模型处理所述目标图像数据,以获得所述目标图像数据中每个像素点所具有的语义类别和语义的置信度,并根据所述目标图像数据对应的位置数据、高度数据以及所述目标图像数据中每个像素点所具有的语义类别和语义的置信度,生成包含语义类别和语义的置信度的点云数据,从而根据所述包含语义类别和语义的置信度的点云数据,生成所述置信度特征图。具体实施例如前所述,此处不再赘述。
在某些实施例中,所述点云数据和所述置信度特征图均包含复数个点数据,每个点数据包括位置数据、高度数据和不同置信度的多个语义类别;所述点云数据包含的每个点数据与所述目标图像数据中的每个像素点对应。
在一个实施例中,所述图像处理设备对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图之后,可以根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理,并根据后处理的结果更新所述置信度特征图。具体实施例如前所述,此处不再赘述。
在一个实施例中,所述图像处理设备在根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理时,可以检测所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图中语义的置信度小于或等于预设置信度阈值的点数据进行删除处理,并基于所述删除处理后的点云数据,更新所述置信度特征图。具体实施例如前所述,此处不再赘述。
在一个实施例中,所述图像处理设备在基于语义识别模型处理所述目标图像数据之前,可以获取样本数据库,所述样本数据库包括样本图像数据;并根据预设的语义识别算法生成初始语义识别模型,以及基于所述样本数据库中的 各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型。其中,所述样本图像数据包括样本图像和语义标注信息;或者,所述样本图像数据包括样本图像、所述样本图像中各个像素点对应的景深数据和语义标注信息。具体实施例如前所述,此处不再赘述。
在一个实施例中,所述图像处理设备可以调用所述初始语义识别模型对所述样本图像数据包括的所述样本图像以及所述样本图像中各个像素点对应的景深数据进行识别,得到识别结果,若所述识别结果与所述样本图像数据包括的语义标注信息相匹配,则可以对所述初始语义识别模型的模型参数进行优化,以得到所述语义识别模型。具体实施例如前所述,此处不再赘述。
S503:根据所述置信度特征图,确定所述目标图像数据上具有相同语义类别的目标对象的数量。
本发明实施例中,图像处理设备可以根据所述置信度特征图,确定所述目标图像数据上具有相同语义类别的目标对象的数量。
在一个实施例中,图像处理设备在根据所述置信度特征图,确定所述目标图像数据上具有相同语义的目标对象的数量时,可以根据所述置信度特征图上各点数据的语义类别,对所述置信度特征图上不同语义类别的点数据进行分类,并计算所述置信度特征图上不同类别的点数据的数量,从而确定所述置信度特征图上不同类别的点数据的数量为所述目标图像数据上具有相同语义的目标对象的数量。
具体可以图3a为例进行说明,假设所述图像处理设备根据如图3a所示的置信度特征图上各点数据的语义类别,确定出所述置信度特征图上的语义类别为均为树,如果所述图像处理设备计算所述置信度特征图上语义类别为树的点数据的数量为300,从而可以确定所述目标图像数据上树的数量为300。
在一个实施例中,所述置信度特征图中的点数据可以用不同的形状进行标记,图像处理设备在根据所述置信度特征图,确定所述目标图像数据上具有相同语义的目标对象的数量时,可以根据所述置信度特征图上各点数据的形状确定不同的语义类别。假设所述置信度特征图中包括圆形点数据的图像区域和方形点数据的图像区域,且圆形代表树,方形代表人,则所述图像处理设备可以通过计算所述置信度特征图上圆形点数据的数量来确定树的数量,并通过计算所述置信度特征图上方形点数据的数量来确定人的数量。
在一个实施例中,所述置信度特征图中的点数据可以用不同的颜色进行标记,图像处理设备在根据所述置信度特征图,确定所述目标图像数据上具有相同语义的目标对象的数量时,可以根据所述置信度特征图上各点数据的颜色确定不同的语义类别。假设所述置信度特征图中包括红色圆形点数据的图像区域和黄色圆形点数据的图像区域,且红色圆形代表树,黄色圆形代表人,则所述图像处理设备可以通过计算所述置信度特征图上红色圆形点数据的数量来确定树的数量,并通过计算所述置信度特征图上黄色圆形点数据的数量来确定人的数量。
在一些实施例中,图像处理设备可以根据目标图像数据对目标图像数据中的特征点进行跟踪,在某些实施例中,所述图像处理设备可以根据所述特征点确定点云数据。其中,以无人机为例的一种实现方式可以是:
获取第一图像帧在世界坐标系中的第一位姿信息,所述第一位姿信息包括:第一实时动态RTK信息和第一云台角信息;
根据所述第一位姿信息,估计第二图像帧在世界坐标系中的第二位姿信息,所述第二位姿信息包括:第二RTK信息和第二云台角信息;
根据所述第一位姿信息和所述第二位姿信息对所述第一图像帧的特征信息和所述第二图像帧的特征信息进行特征匹配;
根据特征匹配结果,进行特征点的跟踪;
其中,所述第一图像帧和所述第二图像帧为所述图像序列中相邻的两帧。
现有基于视觉的方案中通常采用匀速运动模型对相机下一帧的位姿进行估计,由于无人机机动灵敏,其运行通常不符合匀速运动模型,因此基于匀速运动模型估计的位姿将极不准确,进而导致特征点的跟踪数量和精度降低。
为了获得准确的位姿估计,本实施例中根据传感器提供的第一图像帧的RTK信息和云台角信息,对第二图像帧的位姿进行估计。由于采用了传感器提供的准确的RTK信息和云台角信息,因此估计出的第二图像帧的位姿信息的准确度将大幅提升,准确的位姿信息提高了特征匹配的准确度和速度。
本实施例中根据第一位姿信息和第二位姿信息对第一图像帧的特征信息和第二图像帧的特征信息进行特征匹配,具体可以包括:获取第一图像帧和第二图像帧的特征,针对第一图像帧的特征,根据第一位姿信息和第二位姿信息,在第二图像帧中确定相应的搜索范围,进行特征匹配。由于获取了准确的位姿 信息,不仅可以确定准确的搜索范围,而且可以大大缩小搜索范围,因此不仅提高了特征匹配的准确率而且提高了特征匹配的速度。
由于无人机飞行速度较快,因此相邻两帧图像之间的重叠率较低,导致特征点跟踪效果差。本实施例中在特征跟踪时,加入对上一帧是否为关键帧的判断,若为关键帧,则用关键帧的特征信息替换上一帧原始的特征信息。由于关键帧有额外的三维点云生成操作,可以在限定的时间内最大限度的利用重叠区域图像生成的可用三维点云,使得跟踪的有效特征点数量得到提升。
本实施例中在完成特征跟踪之后,需要利用所有的特征点匹配对进行位姿解算。本实施例在位姿解算中加入传感器提供的RTK信息和云台角信息,使得位姿解算精度更高且不易受到误匹配的干扰。解决了现有技术中,基于视觉的方案中,当存在误匹配时,导致位姿解算精度降低甚至出现错误的问题。
在上述实施例的基础上,为了进一步提高目标场景三维重建的准确性,本实施例提供的目标场景三维重建方法,在融合关键帧的三维点云之前,还可以包括:根据所述关键帧对应的RTK信息和云台角信息,采用非线性优化的方式对所述关键帧的位姿信息及三维点云的位置进行优化。
本实施例对于非线性优化所采用的具体算法不做限制,例如可以采用高斯牛顿法、裂纹伯格-马夸尔特方法等。
本实施例中在根据关键帧及其三维点云构建全局一致性的地图之前,根据RTK信息和云台角信息进行优化处理。具体可以包括:
首先维护一个局部地图,该局部地图可以由当前帧、当前帧的共视关键帧及它们所能观测到的点云组成。本实施例在利用非线性优化调整局部地图时,加入每一个参与优化的关键帧对应的RTK信息与云台角信息,使得关键帧的位姿解算及三维点云的位置更加精确。
本实施例通过在非线性优化过程中,引入更将精确的传感器信息,即RTK信息与云台角信息,优化后的代价函数不仅考虑了重投影误差,而且考虑了当前估计的位姿与传感器提供的位姿之间的差距,采用优化后的代价函数可以得到最优的位姿估计。解决了现有技术中仅考虑视觉重投影误差,所带了的稳定性差的问题。
可选的,在实时测量结束后,本实施例还会对所有保留下的关键帧和三维点云进行全局的优化。可以理解的是,在该全局优化中加入RTK信息与云台角 信息,使得最终输出的结果更加精确。
在上一实施例的基础上,为了获得更加精准的关键帧的三维点云关键帧,以提高目标场景三维重建的准确度,本实施例提供的目标场景三维重建方法中,可以在所述图像序列中为所述关键帧选取参考帧,然后根据所选取的参考帧,确定所述关键帧的深度图,根据关键帧的深度图获取关键帧的三维点云。参考帧至少可以包括第一图像帧和第二图像帧。其中,第一图像帧在时序上位于所述关键帧之前,第二图像帧在时序上位于所述关键帧之后。
无人机航拍时,可以沿着规划的航线飞行。当无人机沿着一条航线飞行时,当前图像帧中存在相当大的一部分区域不存在于之前拍摄的图像帧中。也就是说,若参考帧中仅包括当前图像帧之前拍摄的图像帧,根据参考帧确定当前图像帧的深度图时,会存在相当大的一部分区域的视差无解,深度图中必然会存在大片的无效区域。
因此,为了避免关键帧中的区域在参考帧中无相应的匹配区域,而导致该区域对应的深度图无效,本实施例中的参考帧既包括在时序上位于参考帧之前的第一图像帧,也包括在时序上位于参考帧之后的第二图像帧,提高了关键帧与参考帧之间的重叠率,减小了视差无解的区域,进而提高了基于参考帧获得的关键帧的深度图的准确性。
可选的,若关键帧为第N帧,则第一图像帧为第N-1帧,第二图像帧为第N+1帧,即参考帧包括与关键帧相邻的前后两帧。举例来说,若无人机在航拍时,相邻两帧之间的重叠率为70%,若参考帧仅包括关键帧之前的图像帧,则关键帧中至少有30%区域的视差无解。而本实施例提供的参考帧的选取策略,使得关键帧中的全部区域均可以在参考帧中找到与之相匹配的区域,避免了视差无解现象的产生,提高了关键帧的深度图的准确性。
可选的,若关键帧为第N帧,则第一图像帧可以包括第N帧之前预设数量的图像帧,第二图像帧可以包括第N帧之后预设数量的图像帧。
可选的,若关键帧为第N帧,则第一图像帧可以为第N帧之前预设数量的图像帧中的一帧,第二图像帧可以为第N帧之后预设数量的图像帧中的一帧。
在上述任一实施例的基础上,为了提高关键帧的深度图的可靠性,以提高目标场景三维重建的可靠性,本实施例提供的目标场景三维重建方法中,参考帧至少可以包括第三图像帧。其中,第三图像帧与关键帧的极线方向不平行。
本实施例中的极线为对极几何中的极线,即极平面与图像之间的交线。第三图像帧与关键帧的极线方向不平行,也就是说,极平面与第三图像帧的第一交线,与该极平面与关键帧的第二交线,不平行。
当关键帧中存在重复纹理时,若关键帧与参考帧的极线方向平行,则会出现沿着平行极线分布的重复纹理,将会降低该区域对应的深度图的可靠性。因此,本实施例通过选取与关键帧的极线方向不平行的第三图像帧作为参考帧,避免了出现重复纹理沿着平行极线分布的现象,提高了深度图的可靠性。
可选的,第三图像帧可以包括关键帧相邻航带中与关键帧存在重叠像素的图像帧。
可选的,第三图像帧可以为关键帧相邻航带中与关键帧的重叠率最高的图像帧。
下面通过一个具体的示例来说明本发明实施例提供的参考帧的选取方法。图6是本发明实施例提供的目标场景三维重建方法一实施例中参考帧选取的示意图。如图6所示,其中的实线用于表示无人机的飞行航线,航线覆盖了目标场景,箭头表示无人机的飞行方向,飞行航线上的黑色圆圈和黑色正方形表示无人机的拍摄装置在该位置进行拍摄,即黑色圆圈和黑色正方形对应目标场景的一个图像帧。当无人机沿着飞行航线飞行时,通过无人机上搭载的拍摄装置,如单目相机,便可以获取到目标场景的图像序列,包含了在时序上连续的多个图像帧。图6中的M-1、M、M+1、N-1、N、N+1表示图像帧的帧号,N和M为自然数,本实施例对N和M的具体取值不做限制。
若黑色正方形表示的第N帧为关键帧,在一种可能的实现方式中,参考帧可以包括图中所示的第N-1帧和第N+1帧。
若黑色正方形表示的第N帧为关键帧,在又一种可能的实现方式中,参考帧可以包括图中所示的第M帧。
若黑色正方形表示的第N帧为关键帧,在另一种可能的实现方式中,参考帧可以包括图中所示的第M帧、第N-1帧和第N+1帧,即图3中虚线圆圈中包括的图像帧。
可以理解的是,参考帧还可以包括更多的图像帧,例如还可以包括第M-1帧、第M+1帧、第N-2帧等。在具体实现时,可以综合考虑关键帧与参考帧的重叠率以及计算速度,进行选取。
在一些实施例中,基于参考帧获得关键帧的深度图的一种实现方式可以是:根据所述关键帧和所述参考帧之间的像差,获得所述关键帧的深度图。
本实施例中可以根据同一对象在关键帧和参考帧中的像差,获得关键帧的深度图。
在一些实施例中,基于所述图像序列获得所述关键帧的三维点云的一种实现方式可以是:根据所述图像序列,获得所述关键帧的深度图;根据所述关键帧的深度图,获得所述关键帧的三维点云。
在一些实施例中,根据所述图像序列,获得所述关键帧的深度图的一种实现方式可以是:根据所述图像序列,确定所述关键帧对应的匹配代价;根据所述关键帧对应的匹配代价,确定所述关键帧的深度图。
本实施例中可以通过对图像序列与关键帧中的像素点进行匹配,以确定关键帧对应的匹配代价。在确定了关键帧对应的匹配代价之后,可以进行匹配代价聚合,然后确定视差,根据视差与深度之间的对应关系,确定关键帧的深度图。可选的,在确定视差之后,还可以进行视差优化,视差加强。根据优化以及加强之后的视差,确定关键帧的深度图。
无人机的飞行高度通常在100米左右,且无人机通常都是垂直朝下进行拍摄的,由于地面高低起伏,对阳光的反射具有差异性,无人机拍摄的图像具有不可忽视的光照变化,光照变化将降低目标场景三维重建的准确性。
在上述任一实施例的基础上,为了提高目标场景三维重建对于光照的鲁棒性,本实施例提供的目标场景三维重建方法中,根据图像序列,确定关键帧对应的匹配代价,可以包括:根据图像序列,确定关键帧对应的第一类型匹配代价和第二类型匹配代价;确定关键帧对应的匹配代价等于第一类型匹配代价和第二类型匹配代价的加权和。
本实施例中在计算匹配代价时,通过将第一类型匹配代价与第二类型匹配代价进行融合,相较于仅采用单一类型匹配代价,提高了匹配代价对于光照的鲁棒性,进而减少了光照变化对于三维重建的影响,提高了三维重建的准确性。本实施例中第一类型匹配代价和第二类型匹配代价的加权系数可以根据具体需要进行设置,本实施例对此不做限制。
可选的,第一类型匹配代价可以基于零均值归一化互相关(Zero-based Normalized Cross Correlation,ZNCC)确定。基于ZNCC可以精确的度量关键 帧与参考帧之间的相似性。
可选的,第二类型匹配代价可以基于光照不变特征确定。本实施例中,可以提取无人机所采集的图像帧中的光照不变特征,例如局部二值模式(Local Binary Patterns,LBP),census序列等,然后可以基于光照不变特征确定第二类型匹配代价。
本实施例中的census序列可以通过如下方式确定:在图像帧中选取任一点,以该点为中心划出一个例如3×3的矩形,矩形中除中心点之外的每一点都与中心点进行比较,灰度值小于中心点即记为1,灰度值大于中心点的则记为0,以所得长度为8的只有0和1的序列作为该中心点的census序列,即中心像素的灰度值被census序列替换。
经过census变换后,可以采用汉明距离确定关键帧的第二类型匹配代价。
例如,关键帧对应的匹配代价可以等于ZNCC和census两种匹配代价的加权和。
在一些实施例中,根据关键帧对应的匹配代价,确定关键帧的深度图的一种实现方式可以是:将关键帧划分成多个图像块;根据图像序列,确定每一个图像块对应的匹配代价;根据每一个所述图像块对应的匹配代价,确定关键帧对应的匹配代价。
本实施例中可以采用如下方式中的一种或者多种将关键帧划分为多个图像块:
(1)采用聚类的方式,将关键帧划分成多个图像块。本实施例中例如可以根据关键帧的色彩信息和/或纹理信息,采用聚类的方式,将关键帧划分成多个图像块。
(2)将关键帧均匀划分成多个图像块。本实施例中例如可以预先设置图像块的数量,然后根据预先设置的图像块的数量,对关键帧进行划分。
(3)将关键帧划分成预设大小的多个图像块。例如可以预先设置图像块的大小,然后根据预先设置的图像块的大小,对关键帧进行划分。
可选的,在将关键帧划分成多个图像块之后,可以根据图像序列,并行确定每一个图像块对应的匹配代价。本实施例中例如可以采用软件和/或硬件的方式并行确定每一个图像块对应的匹配代价。具体的,例如可以采用多线程并行确定每一个图像块对应的匹配代价,和/或,可以采用图形处理器(Graphics  Processing Unit,GPU)并行确定每一个图像块对应的匹配代价。
本实施例提供的目标场景三维重建方法,在上述实施例的基础上,通过将关键帧划分成多个图像块,根据图像序列,并行确定每一个图像块对应的匹配代价,然后根据每一个图像块对应的匹配代价,确定关键帧对应的匹配代价,提高了匹配代价的计算速度,进而提高了目标场景三维重建的实时性。
深度采样次数可以根据深度范围和精度确定,深度采样次数与深度范围正相关,与精度负相关。举例来说,若深度范围为50米,精度要求为0.1米,则深度采样次数可以为500。
在确定关键帧的匹配代价时,可以采用预设深度采样次数,也可以采用即时定位与地图构建(Simultaneous Localization and Mapping,SLAM)恢复出关键帧中一些稀疏的三维点,然后根据这些稀疏的三维点确定整个关键帧的深度范围,然后根据整个关键帧的深度范围以及精度要求,确定深度采样次数。若深度采样次数为N,则需要针对关键帧中每一个像素点计算N次匹配代价。对于640*480像素大小的关键帧,需要计算640*480*N次匹配代价。
在上述任一实施例的基础上,为了进一步提高处理速度,提高目标场景三维重建的实时性,本实施例提供的目标场景三维重建方法中,根据图像序列,确定每一个图像块对应的匹配代价,可以包括:根据每一个图像块中的稀疏点确定该图像块的深度采样次数;根据图像序列以及每一个图像块的深度采样次数,确定每一个图像块对应的匹配代价。
需要说明的是,当无人机垂直朝下进行拍摄时,关键帧中可以包含多种拍摄对象,例如行人、汽车、树木、高楼等,因此整个关键帧的深度范围比较大,在预设精度要求下,深度采样次数较大。然而关键帧中各个图像块对应的深度范围是比较小的,比如当一个图像块中仅包括行人时,该图像块对应的深度范围将远远小于整个关键帧的深度范围,在相同精度要求下,可以大幅减小深度采样次数。也就是说,在相同精度要求下,关键帧中图像块的深度采样次数必定小于等于关键帧整体的深度采样次数。
本发明实施例中,图像处理设备可以获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据,并对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图,从而根据所述置信度特征图,确定所述目标图像数据上具有相同语义类别的目标对象的 数量。通过这种实施方式,实现了基于置信度特征图,自动对目标图像数据上具有相同语义的目标对象进行计数,提高了计数效率。
请参见图7,图7是本发明实施例提供的一种图像处理设备的结构示意图。具体的,所述图像处理设备包括:存储器701、处理器702以及数据接口703。
所述存储器701可以包括易失性存储器(volatile memory);存储器701也可以包括非易失性存储器(non-volatile memory);存储器701还可以包括上述种类的存储器的组合。所述处理器702可以是中央处理器(central processing unit,CPU)。所述处理器702还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。具体例如可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA)或其任意组合。
进一步地,所述存储器701用于存储程序指令,当程序指令被执行时所述处理器702可以调用存储器701中存储的程序指令,用于执行如下步骤:
获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。
进一步地,所述处理器702对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图时,具体用于:
基于语义识别模型处理所述目标图像数据,以获得所述目标图像数据中每个像素点所具有的语义类别和语义的置信度;
根据所述目标图像数据对应的位置数据、高度数据以及所述目标图像数据中每个像素点所具有的语义类别和语义的置信度,生成包含语义类别和语义的置信度的点云数据;
根据所述包含语义类别和语义的置信度的点云数据,生成所述置信度特征图。
进一步地,所述点云数据和所述置信度特征图均包含复数个点数据,每个 点数据包括位置数据、高度数据和不同置信度的多个语义类别。
进一步地,所述处理器702对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图之后,还用于:
根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理;
根据后处理的结果更新所述置信度特征图。
进一步地,所述处理器702在根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理时,具体用于:
检测所述置信度特征图中每个点数据的语义的置信度;
对所述置信度特征图中语义的置信度小于或等于预设置信度阈值的点数据进行删除处理;
所述根据后处理的结果更新所述置信度特征图,包括:
基于所述删除处理后的点云数据,更新所述置信度特征图。
进一步地,所述处理器702在根据所述置信度特征图,确定所述目标图像数据中目标区域的位置时,具体用于:
获取所述置信度特征图中每个点数据的位置数据和语义类别;
根据所述置信度特征图中每个点数据的位置数据和语义类别,确定所述置信度特征图中具有相同语义类别的图像区域;
根据所述置信度特征图中具有相同语义类别的图像区域,确定所述目标图像数据中目标区域的位置数据。
进一步地,所述处理器702在根据所述置信度特征图,确定所述目标图像数据中目标区域的位置之后,还用于:
根据所述目标图像数据中目标区域的位置和语义类别,规划航线;
控制可移动平台按照所述航线移动。
进一步地,所述处理器702在根据所述目标图像数据中目标区域的位置和语义类别,规划航线时,具体用于:
根据所述置信度特征图上具有不同语义类别的图像区域,对不同语义类别的图像区域进行分类;
根据不同类别的图像区域,规划各类别的图像区域对应的航线。
进一步地,所述处理器702控制可移动平台按照所述航线移动时,具体用 于:
在控制所述可移动平台按照所述航线移动的过程中,判断所述可移动平台的当前位置在所述置信度特征图中所对应的语义类别是否与目标任务的语义类别相匹配;
如果判断结果为是,则控制所述可移动平台执行所述目标任务。
进一步地,所述可移动平台包括无人机或者按照航线自动行驶的无人车。
进一步地,所述处理器还用于:
在控制所述可移动平台按照所述航线移动的过程中,控制所述可移动平台在所述航线中的标记点停留,以执行与目标任务对应的预定操作。
进一步地,所述预定操作包括农药喷洒操作。
进一步地,所述农药喷洒操作包括围绕指定点进行环形喷洒的操作。
进一步地,所述目标图像数据包括彩色图像;或者,
所述目标图像数据包括彩色图像和所述彩色图像对应的景深数据;或者,
所述目标图像数据包括正射影像;或者,
所述目标图像数据包括正射影像和所述正射影像对应的景深数据。
进一步地,所述处理器702基于语义识别模型处理所述目标图像数据之前,还用于:
获取样本数据库,所述样本数据库包括样本图像数据;
根据预设的语义识别算法生成初始语义识别模型;
基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型;
其中,所述样本图像数据包括样本图像和语义标注信息;或者,所述样本图像数据包括样本图像、所述样本图像中各个像素点对应的景深数据和语义标注信息。
进一步地,所述处理器702基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型时,具体用于:
调用所述初始语义识别模型对所述样本图像数据包括的所述样本图像以及所述样本图像中各个像素点对应的景深数据进行识别,得到识别结果;
若所述识别结果与所述样本图像数据包括的语义标注信息相匹配,则对所述初始语义识别模型的模型参数进行优化,以得到所述语义识别模型。
本发明实施例中,图像处理设备可以获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据,并对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图,以及根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。通过这种实施方式,实现了快速、高效地识别目标图像数据中的目标区域的位置,从而提高了对图像区域的定位效率。
请参见图8,图8是本发明实施例提供的另一种图像处理设备的结构示意图。具体的,所述图像处理设备包括:存储器801、处理器802以及数据接口803。
所述存储器801可以包括易失性存储器(volatile memory);存储器801也可以包括非易失性存储器(non-volatile memory);存储器801还可以包括上述种类的存储器的组合。所述处理器802可以是中央处理器(central processing unit,CPU)。所述处理器802还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。具体例如可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA)或其任意组合。
进一步地,所述存储器801用于存储程序指令,当程序指令被执行时所述处理器802可以调用存储器801中存储的程序指令,用于执行如下步骤:
获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
根据所述置信度特征图,确定所述目标图像数据上具有相同语义类别的目标对象的数量。
进一步地,所述处理器802根据所述置信度特征图,确定所述目标图像数据上具有相同语义的目标对象的数量时,具体用于:
根据所述置信度特征图上各点数据的语义类别,对所述置信度特征图上不同语义类别的点数据进行分类;
计算所述置信度特征图上不同类别的点数据的数量;
确定所述置信度特征图上不同类别的点数据的数量为所述目标图像数据上具有相同语义的目标对象的数量。
进一步地,所述处理器802对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图时,具体用于:
基于语义识别模型处理所述目标图像数据,以获得所述目标图像数据中每个像素点所具有的语义类别和语义的置信度;
根据所述目标图像数据对应的位置数据、高度数据以及所述目标图像数据中每个像素点所具有的语义类别和语义的置信度,生成包含语义类别和语义的置信度的点云数据;
根据所述包含语义类别和语义的置信度的点云数据,生成所述置信度特征图。
进一步地,所述点云数据和所述置信度特征图均包含复数个点数据,每个点数据包括位置数据、高度数据和不同置信度的多个语义类别;所述点云数据包含的每个点数据与所述目标图像数据中的每个像素点对应。
进一步地,所述处理器802对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图之后,还用于:
根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理;
根据后处理的结果更新所述置信度特征图。
进一步地,所述处理器802根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理时,具体用于:
检测所述置信度特征图中每个点数据的语义的置信度;
对所述置信度特征图中语义的置信度小于或等于预设置信度阈值的点数据进行删除处理;
所述根据后处理的结果更新所述置信度特征图,包括:
基于所述删除处理后的点云数据,更新所述置信度特征图。
进一步地,所述目标图像数据包括彩色图像;或者,
所述目标图像数据包括彩色图像和所述彩色图像对应的景深数据;或者,
所述目标图像数据包括正射影像;或者,
所述目标图像数据包括正射影像和所述正射影像对应的景深数据。
进一步地,所述处理器802基于语义识别模型处理所述目标图像数据之前,还用于:
获取样本数据库,所述样本数据库包括样本图像数据;
根据预设的语义识别算法生成初始语义识别模型;
基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型;
其中,所述样本图像数据包括样本图像和语义标注信息;或者,所述样本图像数据包括样本图像、所述样本图像中各个像素点对应的景深数据和语义标注信息。
进一步地,所述处理器802基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型,具体用于:
调用所述初始语义识别模型对所述样本图像数据包括的所述样本图像以及所述样本图像中各个像素点对应的景深数据进行识别,得到识别结果;
若所述识别结果与所述样本图像数据包括的语义标注信息相匹配,则对所述初始语义识别模型的模型参数进行优化,以得到所述语义识别模型。
本发明实施例中,图像处理设备可以获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据,并对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图,从而根据所述置信度特征图,确定所述目标图像数据上具有相同语义类别的目标对象的数量。通过这种实施方式,实现了基于置信度特征图,自动对目标图像数据上具有相同语义的目标对象进行计数,提高了计数效率。
本发明实施例还提供了一种可移动平台,具体的,所述可移动平台包括:动力系统,用于为可移动平台提供移动的动力;存储器和处理器;处理器,用于执行如下步骤:
获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。
进一步地,所述处理器对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图时,具体用于:
基于语义识别模型处理所述目标图像数据,以获得所述目标图像数据中每个像素点所具有的语义类别和语义的置信度;
根据所述目标图像数据对应的位置数据、高度数据以及所述目标图像数据中每个像素点所具有的语义类别和语义的置信度,生成包含语义类别和语义的置信度的点云数据;
根据所述包含语义类别和语义的置信度的点云数据,生成所述置信度特征图。
进一步地,所述点云数据和所述置信度特征图均包含复数个点数据,每个点数据包括位置数据、高度数据和不同置信度的多个语义类别。
进一步地,所述处理器对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图之后,还用于:
根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理;
根据后处理的结果更新所述置信度特征图。
进一步地,所述处理器在根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理时,具体用于:
检测所述置信度特征图中每个点数据的语义的置信度;
对所述置信度特征图中语义的置信度小于或等于预设置信度阈值的点数据进行删除处理;
所述根据后处理的结果更新所述置信度特征图,包括:
基于所述删除处理后的点云数据,更新所述置信度特征图。
进一步地,所述处理器在根据所述置信度特征图,确定所述目标图像数据中目标区域的位置时,具体用于:
获取所述置信度特征图中每个点数据的位置数据和语义类别;
根据所述置信度特征图中每个点数据的位置数据和语义类别,确定所述置信度特征图中具有相同语义类别的图像区域;
根据所述置信度特征图中具有相同语义类别的图像区域,确定所述目标图 像数据中目标区域的位置数据。
进一步地,所述处理器在根据所述置信度特征图,确定所述目标图像数据中目标区域的位置之后,还用于:
根据所述目标图像数据中目标区域的位置和语义类别,规划航线;
控制可移动平台按照所述航线移动。
进一步地,所述处理器在根据所述目标图像数据中目标区域的位置和语义类别,规划航线时,具体用于:
根据所述置信度特征图上具有不同语义类别的图像区域,对不同语义类别的图像区域进行分类;
根据不同类别的图像区域,规划各类别的图像区域对应的航线。
进一步地,所述处理器控制可移动平台按照所述航线移动时,具体用于:
在控制所述可移动平台按照所述航线移动的过程中,判断所述可移动平台的当前位置在所述置信度特征图中所对应的语义类别是否与目标任务的语义类别相匹配;
如果判断结果为是,则控制所述可移动平台执行所述目标任务。
进一步地,所述可移动平台包括无人机或者按照航线自动行驶的无人车。
进一步地,所述处理器还用于:
在控制所述可移动平台按照所述航线移动的过程中,控制所述可移动平台在所述航线中的标记点停留,以执行与目标任务对应的预定操作。
进一步地,所述预定操作包括农药喷洒操作。
进一步地,所述农药喷洒操作包括围绕指定点进行环形喷洒的操作。
进一步地,所述目标图像数据包括彩色图像;或者,
所述目标图像数据包括彩色图像和所述彩色图像对应的景深数据;或者,
所述目标图像数据包括正射影像;或者,
所述目标图像数据包括正射影像和所述正射影像对应的景深数据。
进一步地,所述处理器基于语义识别模型处理所述目标图像数据之前,还用于:
获取样本数据库,所述样本数据库包括样本图像数据;
根据预设的语义识别算法生成初始语义识别模型;
基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进 行训练优化,得到所述语义识别模型;
其中,所述样本图像数据包括样本图像和语义标注信息;或者,所述样本图像数据包括样本图像、所述样本图像中各个像素点对应的景深数据和语义标注信息。
进一步地,所述处理器基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型时,具体用于:
调用所述初始语义识别模型对所述样本图像数据包括的所述样本图像以及所述样本图像中各个像素点对应的景深数据进行识别,得到识别结果;
若所述识别结果与所述样本图像数据包括的语义标注信息相匹配,则对所述初始语义识别模型的模型参数进行优化,以得到所述语义识别模型。
本发明实施例中,可移动平台可以获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据,并对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图,以及根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。通过这种实施方式,实现了快速、高效地识别目标图像数据中的目标区域的位置,从而提高了对图像区域的定位效率。
本发明实施例还提供了另一种可移动平台,具体的,所述可移动平台包括:动力系统,用于为可移动平台提供移动的动力;存储器和处理器;处理器,用于执行如下步骤:
获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
根据所述置信度特征图,确定所述目标图像数据上具有相同语义类别的目标对象的数量。
进一步地,所述处理器根据所述置信度特征图,确定所述目标图像数据上具有相同语义的目标对象的数量时,具体用于:
根据所述置信度特征图上各点数据的语义类别,对所述置信度特征图上不同语义类别的点数据进行分类;
计算所述置信度特征图上不同类别的点数据的数量;
确定所述置信度特征图上不同类别的点数据的数量为所述目标图像数据上具有相同语义的目标对象的数量。
进一步地,所述处理器对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图时,具体用于:
基于语义识别模型处理所述目标图像数据,以获得所述目标图像数据中每个像素点所具有的语义类别和语义的置信度;
根据所述目标图像数据对应的位置数据、高度数据以及所述目标图像数据中每个像素点所具有的语义类别和语义的置信度,生成包含语义类别和语义的置信度的点云数据;
根据所述包含语义类别和语义的置信度的点云数据,生成所述置信度特征图。
进一步地,所述点云数据和所述置信度特征图均包含复数个点数据,每个点数据包括位置数据、高度数据和不同置信度的多个语义类别;所述点云数据包含的每个点数据与所述目标图像数据中的每个像素点对应。
进一步地,所述处理器对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图之后,还用于:
根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理;
根据后处理的结果更新所述置信度特征图。
进一步地,所述处理器根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理时,具体用于:
检测所述置信度特征图中每个点数据的语义的置信度;
对所述置信度特征图中语义的置信度小于或等于预设置信度阈值的点数据进行删除处理;
所述根据后处理的结果更新所述置信度特征图,包括:
基于所述删除处理后的点云数据,更新所述置信度特征图。
进一步地,所述目标图像数据包括彩色图像;或者,
所述目标图像数据包括彩色图像和所述彩色图像对应的景深数据;或者,
所述目标图像数据包括正射影像;或者,
所述目标图像数据包括正射影像和所述正射影像对应的景深数据。
进一步地,所述处理器基于语义识别模型处理所述目标图像数据之前,还用于:
获取样本数据库,所述样本数据库包括样本图像数据;
根据预设的语义识别算法生成初始语义识别模型;
基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型;
其中,所述样本图像数据包括样本图像和语义标注信息;或者,所述样本图像数据包括样本图像、所述样本图像中各个像素点对应的景深数据和语义标注信息。
进一步地,所述处理器基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型,具体用于:
调用所述初始语义识别模型对所述样本图像数据包括的所述样本图像以及所述样本图像中各个像素点对应的景深数据进行识别,得到识别结果;
若所述识别结果与所述样本图像数据包括的语义标注信息相匹配,则对所述初始语义识别模型的模型参数进行优化,以得到所述语义识别模型。
本发明实施例中,可移动平台可以获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据,并对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图,从而根据所述置信度特征图,确定所述目标图像数据上具有相同语义类别的目标对象的数量。通过这种实施方式,实现了基于置信度特征图,自动对目标图像数据上具有相同语义的目标对象进行计数,提高了计数效率。
本发明实施例还提供了一种无人机,包括:机身;设置于所述机身上的动力系统,用于提供飞行动力;摄像装置,用于拍摄目标图像数据;所述动力系统包括:桨叶、电机,用于驱动桨叶转动;如图7或图8所述的图像处理设备。
在本发明的实施例中还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现本发明图图2或图5所对应实施例中描述的图像处理方法方式,也可实现图7或图8所述本发明所对应实施例的图像处理设备,在此不再赘述。
所述计算机可读存储介质可以是前述任一项实施例所述的设备的内部存 储单元,例如设备的硬盘或内存。所述计算机可读存储介质也可以是所述设备的外部存储设备,例如所述设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述计算机可读存储介质还可以既包括所述设备的内部存储单元也包括外部存储设备。所述计算机可读存储介质用于存储所述计算机程序以及所述设备所需的其他程序和数据。所述计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本发明部分实施例而已,当然不能以此来限定本发明之权利范围,因此依本发明权利要求所作的等同变化,仍属本发明所涵盖的范围。

Claims (54)

  1. 一种图像处理方法,其特征在于,包括:
    获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
    对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
    根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。
  2. 根据权利要求1所述的方法,其特征在于,对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图,包括:
    基于语义识别模型处理所述目标图像数据,以获得所述目标图像数据中每个像素点所具有的语义类别和语义的置信度;
    根据所述目标图像数据对应的位置数据、高度数据以及所述目标图像数据中每个像素点所具有的语义类别和语义的置信度,生成包含语义类别和语义的置信度的点云数据;
    根据所述包含语义类别和语义的置信度的点云数据,生成所述置信度特征图。
  3. 根据权利要求2所述的方法,其特征在于,
    所述点云数据和所述置信度特征图均包含复数个点数据,每个点数据包括位置数据、高度数据和不同置信度的多个语义类别。
  4. 根据权利要求3所述的方法,其特征在于,所述对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图之后,还包括:
    根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理;
    根据后处理的结果更新所述置信度特征图。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述置信度特征 图中每个点数据的语义的置信度,对所述置信度特征图进行后处理,包括:
    检测所述置信度特征图中每个点数据的语义的置信度;
    对所述置信度特征图中语义的置信度小于或等于预设置信度阈值的点数据进行删除处理;
    所述根据后处理的结果更新所述置信度特征图,包括:
    基于所述删除处理后的点云数据,更新所述置信度特征图。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述置信度特征图,确定所述目标图像数据中目标区域的位置,包括:
    获取所述置信度特征图中每个点数据的位置数据和语义类别;
    根据所述置信度特征图中每个点数据的位置数据和语义类别,确定所述置信度特征图中具有相同语义类别的图像区域;
    根据所述置信度特征图中具有相同语义类别的图像区域,确定所述目标图像数据中目标区域的位置数据。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述置信度特征图,确定所述目标图像数据中目标区域的位置之后,还包括:
    根据所述目标图像数据中目标区域的位置和语义类别,规划航线;
    控制可移动平台按照所述航线移动。
  8. 根据权利要求6所述的方法,其特征在于,所述根据所述目标图像数据中目标区域的位置和语义类别,规划航线,包括:
    根据所述置信度特征图上具有不同语义类别的图像区域,对不同语义类别的图像区域进行分类;
    根据不同类别的图像区域,规划各类别的图像区域对应的航线。
  9. 根据权利要求7所述的方法,其特征在于,所述控制可移动平台按照所述航线移动,包括:
    在控制所述可移动平台按照所述航线移动的过程中,判断所述可移动平台的当前位置在所述置信度特征图中所对应的语义类别是否与目标任务的语义 类别相匹配;
    如果判断结果为是,则控制所述可移动平台执行所述目标任务。
  10. 根据权利要求7-9任一项所述的方法,其特征在于,
    所述可移动平台包括无人机或者按照航线自动行驶的无人车。
  11. 根据权利要求9所述的方法,其特征在于,还包括:
    在控制所述可移动平台按照所述航线移动的过程中,控制所述可移动平台在所述航线中的标记点停留,以执行与目标任务对应的预定操作。
  12. 根据权利要求11所述的方法,其特征在于,所述预定操作包括农药喷洒操作。
  13. 根据权利要求12所述的方法,其特征在于,所述农药喷洒操作包括围绕指定点进行环形喷洒的操作。
  14. 根据权利要求1所述的方法,其特征在于,
    所述目标图像数据包括彩色图像;或者,
    所述目标图像数据包括彩色图像和所述彩色图像对应的景深数据;或者,
    所述目标图像数据包括正射影像;或者,
    所述目标图像数据包括正射影像和所述正射影像对应的景深数据。
  15. 根据权利要求2所述的方法,其特征在于,所述基于语义识别模型处理所述目标图像数据之前,还包括:
    获取样本数据库,所述样本数据库包括样本图像数据;
    根据预设的语义识别算法生成初始语义识别模型;
    基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型;
    其中,所述样本图像数据包括样本图像和语义标注信息;或者,所述样本图像数据包括样本图像、所述样本图像中各个像素点对应的景深数据和语义标 注信息。
  16. 根据权利要求15所述的方法,其特征在于,所述基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型,包括:
    调用所述初始语义识别模型对所述样本图像数据包括的所述样本图像以及所述样本图像中各个像素点对应的景深数据进行识别,得到识别结果;
    若所述识别结果与所述样本图像数据包括的语义标注信息相匹配,则对所述初始语义识别模型的模型参数进行优化,以得到所述语义识别模型。
  17. 一种图像处理方法,其特征在于,包括:
    获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
    对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
    根据所述置信度特征图,确定所述目标图像数据上具有相同语义类别的目标对象的数量。
  18. 根据权利要求17所述的方法,其特征在于,所述根据所述置信度特征图,确定所述目标图像数据上具有相同语义的目标对象的数量,包括:
    根据所述置信度特征图上各点数据的语义类别,对所述置信度特征图上不同语义类别的点数据进行分类;
    计算所述置信度特征图上不同类别的点数据的数量;
    确定所述置信度特征图上不同类别的点数据的数量为所述目标图像数据上具有相同语义的目标对象的数量。
  19. 根据权利要求17所述的方法,其特征在于,所述对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图,包括:
    基于语义识别模型处理所述目标图像数据,以获得所述目标图像数据中每个像素点所具有的语义类别和语义的置信度;
    根据所述目标图像数据对应的位置数据、高度数据以及所述目标图像数据中每个像素点所具有的语义类别和语义的置信度,生成包含语义类别和语义的置信度的点云数据;
    根据所述包含语义类别和语义的置信度的点云数据,生成所述置信度特征图。
  20. 根据权利要求19所述的方法,其特征在于,
    所述点云数据和所述置信度特征图均包含复数个点数据,每个点数据包括位置数据、高度数据和不同置信度的多个语义类别;
    所述点云数据包含的每个点数据与所述目标图像数据中的每个像素点对应。
  21. 根据权利要求20所述的方法,其特征在于,所述对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图之后,还包括:
    根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理;
    根据后处理的结果更新所述置信度特征图。
  22. 根据权利要求21所述的方法,其特征在于,所述根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理,包括:
    检测所述置信度特征图中每个点数据的语义的置信度;
    对所述置信度特征图中语义的置信度小于或等于预设置信度阈值的点数据进行删除处理;
    所述根据后处理的结果更新所述置信度特征图,包括:
    基于所述删除处理后的点云数据,更新所述置信度特征图。
  23. 根据权利要求17所述的方法,其特征在于,
    所述目标图像数据包括彩色图像;或者,
    所述目标图像数据包括彩色图像和所述彩色图像对应的景深数据;或者,
    所述目标图像数据包括正射影像;或者,
    所述目标图像数据包括正射影像和所述正射影像对应的景深数据。
  24. 根据权利要求19所述的方法,其特征在于,所述基于语义识别模型处理所述目标图像数据之前,还包括:
    获取样本数据库,所述样本数据库包括样本图像数据;
    根据预设的语义识别算法生成初始语义识别模型;
    基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型;
    其中,所述样本图像数据包括样本图像和语义标注信息;或者,所述样本图像数据包括样本图像、所述样本图像中各个像素点对应的景深数据和语义标注信息。
  25. 根据权利要求24所述的方法,其特征在于,所述基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型,包括:
    调用所述初始语义识别模型对所述样本图像数据包括的所述样本图像以及所述样本图像中各个像素点对应的景深数据进行识别,得到识别结果;
    若所述识别结果与所述样本图像数据包括的语义标注信息相匹配,则对所述初始语义识别模型的模型参数进行优化,以得到所述语义识别模型。
  26. 一种图像处理设备,其特征在于,所述设备包括:存储器和处理器;
    所述存储器,用于存储程序指令;
    所述处理器,调用存储器中存储的程序指令,用于执行如下步骤:
    获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
    对目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
    根据所述置信度特征图,确定所述目标图像数据中目标区域的位置。
  27. 根据权利要求26所述的设备,其特征在于,所述处理器对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图时,具体用于:
    基于语义识别模型处理所述目标图像数据,以获得所述目标图像数据中每个像素点所具有的语义类别和语义的置信度;
    根据所述目标图像数据对应的位置数据、高度数据以及所述目标图像数据中每个像素点所具有的语义类别和语义的置信度,生成包含语义类别和语义的置信度的点云数据;
    根据所述包含语义类别和语义的置信度的点云数据,生成所述置信度特征图。
  28. 根据权利要求27所述的设备,其特征在于,
    所述点云数据和所述置信度特征图均包含复数个点数据,每个点数据包括位置数据、高度数据和不同置信度的多个语义类别。
  29. 根据权利要求28所述的设备,其特征在于,所述处理器在对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图之后,还用于:
    根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理;
    根据后处理的结果更新所述置信度特征图。
  30. 根据权利要求29所述的设备,其特征在于,所述处理器根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理时,具体用于:
    检测所述置信度特征图中每个点数据的语义的置信度;
    对所述置信度特征图中语义的置信度小于或等于预设置信度阈值的点数据进行删除处理;
    所述根据后处理的结果更新所述置信度特征图,包括:
    基于所述删除处理后的点云数据,更新所述置信度特征图。
  31. 根据权利要求30所述的设备,其特征在于,所述处理器在根据所述置信度特征图,确定所述目标图像数据中目标区域的位置时,具体用于:
    获取所述置信度特征图中每个点数据的位置数据和语义类别;
    根据所述置信度特征图中每个点数据的位置数据和语义类别,确定所述置信度特征图中具有相同语义类别的图像区域;
    根据所述置信度特征图中具有相同语义类别的图像区域,确定所述目标图像数据中目标区域的位置数据。
  32. 根据权利要求31所述的设备,其特征在于,所述处理器根据所述置信度特征图,确定所述目标图像数据中目标区域的位置之后,还用于:
    根据所述目标图像数据中目标区域的位置和语义类别,规划航线;
    控制可移动平台按照所述航线移动。
  33. 根据权利要求31所述的设备,其特征在于,所述处理器根据所述目标图像数据中目标区域的位置和语义类别,规划航线时,具体用于:
    根据所述置信度特征图上具有不同语义类别的图像区域,对不同语义类别的图像区域进行分类;
    根据不同类别的图像区域,规划各类别的图像区域对应的航线。
  34. 根据权利要求32所述的设备,其特征在于,所述处理器控制可移动平台按照所述航线移动时,具体用于:
    在控制所述可移动平台按照所述航线移动的过程中,判断所述可移动平台的当前位置在所述置信度特征图中所对应的语义类别是否与目标任务的语义类别相匹配;
    如果判断结果为是,则控制所述可移动平台执行所述目标任务。
  35. 根据权利要求32-34任一项所述的设备,其特征在于,
    所述可移动平台包括无人机或者按照航线自动行驶的无人车。
  36. 根据权利要求34所述的设备,其特征在于,所述处理器还用于:
    在控制所述可移动平台按照所述航线移动的过程中,控制所述可移动平台在所述航线中的标记点停留,以执行与目标任务对应的预定操作。
  37. 根据权利要求36所述的设备,其特征在于,所述预定操作包括农药喷洒操作。
  38. 根据权利要求37所述的设备,其特征在于,所述农药喷洒操作包括围绕指定点进行环形喷洒的操作。
  39. 根据权利要求26所述的设备,其特征在于,
    所述目标图像数据包括彩色图像;或者,
    所述目标图像数据包括彩色图像和所述彩色图像对应的景深数据;或者,
    所述目标图像数据包括正射影像;或者,
    所述目标图像数据包括正射影像和所述正射影像对应的景深数据。
  40. 根据权利要求27所述的设备,其特征在于,所述处理器基于语义识别模型处理所述目标图像数据之前,还用于:
    获取样本数据库,所述样本数据库包括样本图像数据;
    根据预设的语义识别算法生成初始语义识别模型;
    基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型;
    其中,所述样本图像数据包括样本图像和语义标注信息;或者,所述样本图像数据包括样本图像、所述样本图像中各个像素点对应的景深数据和语义标注信息。
  41. 根据权利要求40所述的设备,其特征在于,所述处理器基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型时,具体用于:
    调用所述初始语义识别模型对所述样本图像数据包括的所述样本图像以 及所述样本图像中各个像素点对应的景深数据进行识别,得到识别结果;
    若所述识别结果与所述样本图像数据包括的语义标注信息相匹配,则对所述初始语义识别模型的模型参数进行优化,以得到所述语义识别模型。
  42. 一种图像处理设备,其特征在于,所述设备包括:存储器和处理器;
    所述存储器,用于存储程序指令;
    所述处理器,调用存储器中存储的程序指令,用于执行如下步骤:
    获取目标图像数据,所述目标图像数据包括目标图像以及所述目标图像中各像素点对应的景深数据;
    对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图;
    根据所述置信度特征图,确定所述目标图像数据上具有相同语义类别的目标对象的数量。
  43. 根据权利要求42所述的设备,其特征在于,所述处理器根据所述置信度特征图,确定所述目标图像数据上具有相同语义的目标对象的数量时,具体用于:
    根据所述置信度特征图上各点数据的语义类别,对所述置信度特征图上不同语义类别的点数据进行分类;
    计算所述置信度特征图上不同类别的点数据的数量;
    确定所述置信度特征图上不同类别的点数据的数量为所述目标图像数据上具有相同语义的目标对象的数量。
  44. 根据权利要求42所述的设备,其特征在于,所述处理器对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图时,具体用于:
    基于语义识别模型处理所述目标图像数据,以获得所述目标图像数据中每个像素点所具有的语义类别和语义的置信度;
    根据所述目标图像数据对应的位置数据、高度数据以及所述目标图像数据中每个像素点所具有的语义类别和语义的置信度,生成包含语义类别和语义的 置信度的点云数据;
    根据所述包含语义类别和语义的置信度的点云数据,生成所述置信度特征图。
  45. 根据权利要求44所述的设备,其特征在于,
    所述点云数据和所述置信度特征图均包含复数个点数据,每个点数据包括位置数据、高度数据和不同置信度的多个语义类别;
    所述点云数据包含的每个点数据与所述目标图像数据中的每个像素点对应。
  46. 根据权利要求45所述的设备,其特征在于,所述处理器对所述目标图像数据进行处理,得到所述目标图像数据的语义的置信度特征图之后,还用于:
    根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理;
    根据后处理的结果更新所述置信度特征图。
  47. 根据权利要求46所述的设备,其特征在于,所述处理器根据所述置信度特征图中每个点数据的语义的置信度,对所述置信度特征图进行后处理时,具体用于:
    检测所述置信度特征图中每个点数据的语义的置信度;
    对所述置信度特征图中语义的置信度小于或等于预设置信度阈值的点数据进行删除处理;
    所述根据后处理的结果更新所述置信度特征图,包括:
    基于所述删除处理后的点云数据,更新所述置信度特征图。
  48. 根据权利要求42所述的设备,其特征在于,
    所述目标图像数据包括彩色图像;或者,
    所述目标图像数据包括彩色图像和所述彩色图像对应的景深数据;或者,
    所述目标图像数据包括正射影像;或者,
    所述目标图像数据包括正射影像和所述正射影像对应的景深数据。
  49. 根据权利要求44所述的设备,其特征在于,所述处理器基于语义识别模型处理所述目标图像数据之前,还用于:
    获取样本数据库,所述样本数据库包括样本图像数据;
    根据预设的语义识别算法生成初始语义识别模型;
    基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型;
    其中,所述样本图像数据包括样本图像和语义标注信息;或者,所述样本图像数据包括样本图像、所述样本图像中各个像素点对应的景深数据和语义标注信息。
  50. 根据权利要求49所述的设备,其特征在于,所述处理器基于所述样本数据库中的各个样本图像数据对所述初始语义识别模型进行训练优化,得到所述语义识别模型时。具体用于:
    调用所述初始语义识别模型对所述样本图像数据包括的所述样本图像以及所述样本图像中各个像素点对应的景深数据进行识别,得到识别结果;
    若所述识别结果与所述样本图像数据包括的语义标注信息相匹配,则对所述初始语义识别模型的模型参数进行优化,以得到所述语义识别模型。
  51. 一种可移动平台,其特征在于,包括:
    动力系统,用于为所述可移动平台提供移动的动力;
    如权利要求26-41中任一项所述的图像处理设备。
  52. 一种可移动平台,其特征在于,包括:
    动力系统,用于为所述可移动平台提供移动的动力;
    如权利要求42-50中任一项所述的图像处理设备。
  53. 一种无人机,其特征在于,包括:
    机身;
    设置于所述机身上的动力系统,用于提供飞行动力;
    如权利要求26-50中任一项所述的图像处理设备。
  54. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至25任一项所述方法。
PCT/CN2019/075171 2019-02-15 2019-02-15 图像处理方法、设备、可移动平台、无人机及存储介质 WO2020164092A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP2021543242A JP2022520019A (ja) 2019-02-15 2019-02-15 画像処理方法、装置、移動可能なプラットフォーム、プログラム
EP19915297.6A EP3920095A4 (en) 2019-02-15 2019-02-15 IMAGE PROCESSING METHOD AND APPARATUS, MOVABLE PLATFORM, UNMANNED AIR VEHICLE AND STORAGE MEDIA
PCT/CN2019/075171 WO2020164092A1 (zh) 2019-02-15 2019-02-15 图像处理方法、设备、可移动平台、无人机及存储介质
CN201980004951.7A CN111213155A (zh) 2019-02-15 2019-02-15 图像处理方法、设备、可移动平台、无人机及存储介质
US17/402,533 US20210390329A1 (en) 2019-02-15 2021-08-14 Image processing method, device, movable platform, unmanned aerial vehicle, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/075171 WO2020164092A1 (zh) 2019-02-15 2019-02-15 图像处理方法、设备、可移动平台、无人机及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/402,533 Continuation US20210390329A1 (en) 2019-02-15 2021-08-14 Image processing method, device, movable platform, unmanned aerial vehicle, and storage medium

Publications (1)

Publication Number Publication Date
WO2020164092A1 true WO2020164092A1 (zh) 2020-08-20

Family

ID=70788972

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/075171 WO2020164092A1 (zh) 2019-02-15 2019-02-15 图像处理方法、设备、可移动平台、无人机及存储介质

Country Status (5)

Country Link
US (1) US20210390329A1 (zh)
EP (1) EP3920095A4 (zh)
JP (1) JP2022520019A (zh)
CN (1) CN111213155A (zh)
WO (1) WO2020164092A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115482478A (zh) * 2022-09-14 2022-12-16 北京远度互联科技有限公司 道路识别方法、装置、无人机、设备及存储介质

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI760782B (zh) * 2019-07-08 2022-04-11 國立臺灣大學 用於地理區域上果園辨識之系統及方法
EP3819674A1 (en) * 2019-11-08 2021-05-12 Outsight Rolling environment sensing and gps optimization
US11395149B2 (en) 2020-05-01 2022-07-19 Digital Global Systems, Inc. System, method, and apparatus for providing dynamic, prioritized spectrum management and utilization
US11653213B2 (en) 2020-05-01 2023-05-16 Digital Global Systems. Inc. System, method, and apparatus for providing dynamic, prioritized spectrum management and utilization
US11849332B2 (en) 2020-05-01 2023-12-19 Digital Global Systems, Inc. System, method, and apparatus for providing dynamic, prioritized spectrum management and utilization
US11665547B2 (en) 2020-05-01 2023-05-30 Digital Global Systems, Inc. System, method, and apparatus for providing dynamic, prioritized spectrum management and utilization
US11638160B2 (en) 2020-05-01 2023-04-25 Digital Global Systems, Inc. System, method, and apparatus for providing dynamic, prioritized spectrum management and utilization
US11700533B2 (en) 2020-05-01 2023-07-11 Digital Global Systems, Inc. System, method, and apparatus for providing dynamic, prioritized spectrum management and utilization
US11898871B2 (en) * 2021-09-15 2024-02-13 Here Global B.V. Apparatus and methods for providing a map layer of one or more temporary dynamic obstructions
WO2023050385A1 (zh) * 2021-09-30 2023-04-06 深圳市大疆创新科技有限公司 无人机的控制方法、装置、无人机及计算机可读存储介质
CN114842678B (zh) * 2022-03-28 2024-04-26 中国民用航空中南地区空中交通管理局广西分局 一种民航管制运行现场相似日度量系统
US11843953B1 (en) 2022-08-02 2023-12-12 Digital Global Systems, Inc. System, method, and apparatus for providing optimized network resources
US11711726B1 (en) 2022-08-02 2023-07-25 Digital Global Systems, Inc. System, method, and apparatus for providing optimized network resources
US11570627B1 (en) 2022-08-02 2023-01-31 Digital Global Systems, Inc. System, method, and apparatus for providing optimized network resources
US11751064B1 (en) 2022-08-02 2023-09-05 Digital Global Systems, Inc. System, method, and apparatus for providing optimized network resources
CN115880575B (zh) * 2022-10-26 2023-05-16 中国电子科技集团公司第五十四研究所 结合变化信息和建筑特征的遥感影像新增建筑提取方法
CN115661376B (zh) * 2022-12-28 2023-04-07 深圳市安泽拉科技有限公司 一种基于无人机图像的目标重建方法及系统
CN116665228B (zh) * 2023-07-31 2023-10-13 恒生电子股份有限公司 图像处理方法及装置
CN117372273B (zh) * 2023-10-26 2024-04-19 航天科工(北京)空间信息应用股份有限公司 无人机影像的正射影像生成方法、装置、设备和存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103026368A (zh) * 2010-07-30 2013-04-03 高通股份有限公司 使用增量特征提取的对象辨识
GB2532948A (en) * 2014-12-02 2016-06-08 Nokia Technologies Oy Objection recognition in a 3D scene
CN106228162A (zh) * 2016-07-22 2016-12-14 王威 一种基于深度学习的移动机器人快速物体识别方法
CN107871117A (zh) * 2016-09-23 2018-04-03 三星电子株式会社 用于检测对象的设备和方法
CN108230346A (zh) * 2017-03-30 2018-06-29 北京市商汤科技开发有限公司 用于分割图像语义特征的方法和装置、电子设备
CN109073404A (zh) * 2016-05-02 2018-12-21 谷歌有限责任公司 用于基于地标和实时图像生成导航方向的系统和方法
CN109284779A (zh) * 2018-09-04 2019-01-29 中国人民解放军陆军工程大学 基于深度全卷积网络的物体检测方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9870617B2 (en) * 2014-09-19 2018-01-16 Brain Corporation Apparatus and methods for saliency detection based on color occurrence analysis
US10789468B2 (en) * 2014-09-22 2020-09-29 Sikorsky Aircraft Corporation Context-based autonomous perception

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103026368A (zh) * 2010-07-30 2013-04-03 高通股份有限公司 使用增量特征提取的对象辨识
GB2532948A (en) * 2014-12-02 2016-06-08 Nokia Technologies Oy Objection recognition in a 3D scene
CN109073404A (zh) * 2016-05-02 2018-12-21 谷歌有限责任公司 用于基于地标和实时图像生成导航方向的系统和方法
CN106228162A (zh) * 2016-07-22 2016-12-14 王威 一种基于深度学习的移动机器人快速物体识别方法
CN107871117A (zh) * 2016-09-23 2018-04-03 三星电子株式会社 用于检测对象的设备和方法
CN108230346A (zh) * 2017-03-30 2018-06-29 北京市商汤科技开发有限公司 用于分割图像语义特征的方法和装置、电子设备
CN109284779A (zh) * 2018-09-04 2019-01-29 中国人民解放军陆军工程大学 基于深度全卷积网络的物体检测方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP3920095A4 *
XIE, HONGHUI: "UVA Aerial Image-based Tobacco Plants Recognizing and Counting", MASTER’S DISSERTATION, 20 March 2018 (2018-03-20), CN, pages 1 - 62, XP009522732 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115482478A (zh) * 2022-09-14 2022-12-16 北京远度互联科技有限公司 道路识别方法、装置、无人机、设备及存储介质
CN115482478B (zh) * 2022-09-14 2023-07-18 北京远度互联科技有限公司 道路识别方法、装置、无人机、设备及存储介质

Also Published As

Publication number Publication date
CN111213155A (zh) 2020-05-29
JP2022520019A (ja) 2022-03-28
US20210390329A1 (en) 2021-12-16
EP3920095A1 (en) 2021-12-08
EP3920095A4 (en) 2022-03-02

Similar Documents

Publication Publication Date Title
WO2020164092A1 (zh) 图像处理方法、设备、可移动平台、无人机及存储介质
US20220028163A1 (en) Computer Vision Systems and Methods for Detecting and Modeling Features of Structures in Images
Goforth et al. GPS-denied UAV localization using pre-existing satellite imagery
WO2020103110A1 (zh) 一种基于点云地图的图像边界获取方法、设备及飞行器
CN112270249A (zh) 一种融合rgb-d视觉特征的目标位姿估计方法
WO2020103108A1 (zh) 一种语义生成方法、设备、飞行器及存储介质
WO2020103109A1 (zh) 一种地图生成方法、设备、飞行器及存储介质
CN111998862B (zh) 一种基于bnn的稠密双目slam方法
Pang et al. SGM-based seamline determination for urban orthophoto mosaicking
CN110097498B (zh) 基于无人机航迹约束的多航带图像拼接与定位方法
US11769225B2 (en) Image processing apparatus, image processing method, and program
CN111915517A (zh) 一种适用于室内光照不利环境下rgb-d相机全局定位方法
Axelsson et al. Roof type classification using deep convolutional neural networks on low resolution photogrammetric point clouds from aerial imagery
Shi et al. An improved lightweight deep neural network with knowledge distillation for local feature extraction and visual localization using images and LiDAR point clouds
Florea et al. Wilduav: Monocular uav dataset for depth estimation tasks
Liu et al. Comparison of 2D image models in segmentation performance for 3D laser point clouds
Majdik et al. Micro air vehicle localization and position tracking from textured 3d cadastral models
CN117496401A (zh) 一种用于视频测量影像序列椭圆形目标点全自动识别与跟踪方法
Sujiwo et al. Robust and accurate monocular vision-based localization in outdoor environments of real-world robot challenge
CN115565072A (zh) 一种道路垃圾识别和定位方法、装置、电子设备及介质
Xu et al. Uav image geo-localization by point-line-patch feature matching and iclk optimization
Zhang et al. Feature regions segmentation based RGB-D visual odometry in dynamic environment
Sikdar et al. Unconstrained Vision Guided UAV Based Safe Helicopter Landing
CN110717981A (zh) 小型机器人室内可通行区域获取方法及装置
Sambolek et al. Person Detection and Geolocation Estimation in UAV Aerial Images: An Experimental Approach.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19915297

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021543242

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019915297

Country of ref document: EP

Effective date: 20210901