CN113661513A

CN113661513A - Image processing method, image processing device, image processing system and storage medium

Info

Publication number: CN113661513A
Application number: CN201980094989.8A
Authority: CN
Inventors: 徐斌; 陈晓智
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2021-11-16
Also published as: WO2021128314A1

Abstract

An image processing method, an apparatus, an image processing system, and a storage medium, wherein the method comprises: acquiring an initial image shot by a shooting device; processing the initial image, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area; determining an initial three-dimensional coordinate of the target object according to the projection model of the two-dimensional target image area and the semantic information of the target object; extracting region feature points of the two-dimensional target image region, and determining three-dimensional coordinate adjustment information of the target object based on the region feature points; and determining the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information. By the implementation mode, three-dimensional target detection can be carried out on different types of images in a self-adaptive mode, and the efficiency and effectiveness of image processing are improved.

Description

Image processing method, image processing device, image processing system and storage medium

Technical Field

The embodiments of the present invention relate to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, an image processing system, and a storage medium.

Background

At present, most of the schemes for detecting the three-dimensional target of the monocular image aim at the image obtained by the common pinhole imaging projection mode, and few schemes for detecting other projection models are available. If a common monocular three-dimensional object detection algorithm is directly applied to an image (such as a fisheye image) obtained by other projection modes, the three-dimensional object detection effect on the image is reduced. Therefore, how to more effectively perform three-dimensional target detection processing on different types of images becomes an urgent problem to be solved.

Disclosure of Invention

The embodiment of the invention discloses an image processing method, image processing equipment, an image processing system and a storage medium, which can adaptively detect three-dimensional targets of different types of images and improve the efficiency and effectiveness of image processing.

In a first aspect, an embodiment of the present invention provides an image processing method, including:

acquiring an initial image shot by a shooting device;

processing the initial image, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area;

determining an initial three-dimensional coordinate of the target object according to the projection model of the two-dimensional target image area and the semantic information of the target object;

extracting region feature points of the two-dimensional target image region, and determining three-dimensional coordinate adjustment information of the target object based on the region feature points;

and determining the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information.

In a second aspect, an embodiment of the present invention provides an image processing apparatus, including: a memory and a processor, wherein the processor is capable of,

the memory is used for storing programs;

the processor to execute the memory-stored program, the processor to, when executed:

acquiring an initial image shot by a shooting device;

In a third aspect, an embodiment of the present invention provides an image processing system, including:

a photographing device and the image processing apparatus of the first aspect described above.

In a fourth aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method according to the first aspect.

The embodiment of the invention can process the initial image obtained by the shooting device to determine the two-dimensional target image area of the initial image and the semantic information of the target object contained in the two-dimensional target image area, determine the initial three-dimensional coordinate of the target object according to the projection model of the two-dimensional target image area and the semantic information of the target object, extract the area feature points of the two-dimensional target image area, and determine the three-dimensional coordinate adjustment information of the target object based on the area feature points, thereby determining the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information. By the implementation mode, three-dimensional target detection can be carried out on different types of images in a self-adaptive mode, and the efficiency and effectiveness of image processing are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without inventive labor.

Fig. 1 is a schematic structural diagram of an image processing system according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an image processing method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of determining three-dimensional coordinates based on an aperture imaging model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of determining three-dimensional coordinates based on an isometric projection model according to an embodiment of the present invention;

FIG. 5 is a flow chart of another image processing method according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating a further image processing method according to an embodiment of the present invention;

FIG. 7 is a block diagram of an image processing method according to an embodiment of the present invention;

FIG. 8 is a flowchart illustrating a further image processing method according to an embodiment of the present invention;

FIG. 9 is a block diagram illustrating an image processing method according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

In the fields of Advanced Driving Assistance Systems (ADAS), automatic Driving, unmanned aerial vehicles, and robot navigation, how to obtain the three-dimensional coordinate position of a surrounding dynamic obstacle is a very important thing. Currently, among many sensors, a visual camera can capture surrounding scene information, wherein a monocular camera has a lower cost, and therefore a sensor with a larger selection is used. Therefore, how to perform three-dimensional detection of a dynamic obstacle through a monocular camera, including information such as a three-dimensional coordinate position, a three-dimensional size, and an orientation of the obstacle, is a problem which needs to be solved urgently. Meanwhile, various lenses and projection models exist at present, except common lenses adopting small-hole imaging, fish-eye lenses with ultra-large view fields are also adopted, and the adopted projection mode is different from the common small-hole imaging mode, so that how to carry out three-dimensional target detection on various monocular shooting devices is more difficult.

In one embodiment, two-dimensional or three-dimensional detection of target objects on images mostly employs convolutional neural networks in deep learning. For two-dimensional target detection, a convolutional neural network can be used for extracting the features of the image, and then the category and the circumscribed rectangle of the corresponding target object are estimated directly from the features.

For three-dimensional target detection of a specific target object, the detection result can be generally divided into three aspects, one is the three-dimensional size of the target object, and the three-dimensional size is usually expressed by the length, width and height of a circumscribed cuboid of the target object obtained by detection; one is the three-dimensional position of the target object, which is usually represented by the center of the circumscribed cuboid on the left side under the camera coordinate system, and can be converted into coordinates under other coordinate systems if necessary; one is the orientation information of the target object, and is usually represented by the pitch angle, yaw angle, and roll angle of the circumscribed rectangle. In the three-dimensional detection of a target object of a monocular camera, multi-frame images at different moments in time sequence are often required to be combined with motion information of a camera or a movable platform carrying the camera to realize the three-dimensional detection; if a single-frame image of a monocular camera is used for three-dimensional detection of a target object, semantic recognition of the target is required, and the target object can be detected by combining a preset three-dimensional information experience value corresponding to semantic information of the target object, so that the obtained result is inaccurate, and only one rough three-dimensional information can be provided.

For example, if an image of the periphery of the vehicle is captured by one monocular camera mounted on the vehicle and the image includes a target object (e.g., another vehicle), the image may be first semantically recognized through a neural network, and if the target object is a car, the preset three-dimensional information corresponding to the car, such as a height of 1.5 m, may be given to the car, and the three-dimensional information of the car may be finally obtained by combining the preset information and the camera parameters of the monocular camera.

In one embodiment, for any camera, such as a camera, the parameters are known, such as the focal length (f) and the coordinates of the optical center (cx, cy), so that when a three-dimensional object is detected, the three-dimensional coordinates can be roughly estimated according to the size of the three-dimensional object on the image.

The embodiment of the invention provides an image processing method, which comprises the steps of processing an initial image obtained by shooting through a shooting device to determine a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area, determining an initial three-dimensional coordinate of the target object according to a projection model of the two-dimensional target image area and the semantic information of the target object, extracting area feature points of the two-dimensional target image area, determining three-dimensional coordinate adjustment information of the target object based on the area feature points, and determining a target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information. By the implementation mode, three-dimensional target detection can be carried out on different types of images in a self-adaptive mode, and the efficiency and effectiveness of image processing are improved.

The image processing method provided in the embodiments of the present invention may be executed by an image processing system, wherein the image processing system may include a camera and an image processing apparatus, and in some embodiments, the camera may be integrated with the image processing apparatus; in some embodiments, the camera may be spatially independent of the image processing apparatus, and the camera and the image processing apparatus establish a communication connection in a wired or wireless manner. In some embodiments, the camera may be disposed on a movable platform configured with a load (e.g., camera, infrared detection device, mapper, etc.); in other embodiments, the camera may be spatially independent of the movable platform. In some embodiments, the camera may include, but is not limited to, a motion camera, a panoramic camera, a fisheye camera, and the like. In some embodiments, the number of the photographing devices may be one or more. In certain embodiments, the movable platform comprises a movable device such as a drone, a robot capable of autonomous movement, an unmanned vehicle, an unmanned ship, or the like. In some embodiments, the image processing device may include one or more of a smartphone, a tablet, a laptop, and a wearable device. In some embodiments, the method and the device can be applied to ADAS, automatic driving, robots, unmanned aerial vehicles and other movable platforms to sense obstacles, and achieve the functions of obstacle avoidance, subsequent path planning and the like. In one embodiment, the movable platform may be a vehicle, the camera may be a fisheye camera, and the image processing apparatus may be a computing platform mounted on the vehicle; specifically, the fisheye camera is installed on the rear-view mirror of vehicle for obtain the environment image of vehicle side and then send on-vehicle computing platform and handle, and can provide the field of vision that compares in ordinary camera lens wider range, thereby realize reducing or avoid camera field of vision blind area.

An image processing system according to an embodiment of the present invention is schematically illustrated with reference to fig. 1.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an image processing system according to an embodiment of the present invention, where the image processing system includes a camera 11 and an image processing apparatus 12, and a communication connection may be established between the camera 11 and the image processing apparatus 12 through a wireless communication connection. In some scenarios, the communication connection between the camera 11 and the image processing apparatus 12 may also be established by a wired communication connection. In some embodiments, the camera 11 may be provided on the image processing apparatus 12. In other embodiments, the camera 11 and the image processing device 12 are independent of each other, and the image processing device 12 may include one or more of a smartphone, a tablet, a laptop, and a wearable device.

In this embodiment of the present invention, the image processing device 12 may obtain an initial image captured by the capturing device 11, process the initial image, determine a two-dimensional target image region of the initial image and semantic information of a target object included in the two-dimensional target image region, determine an initial three-dimensional coordinate of the target object according to a projection model of the two-dimensional target image region and the semantic information of the target object, extract region feature points from the two-dimensional target image region, determine three-dimensional coordinate adjustment information of the target object based on the region feature points, and determine a target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information. By the implementation mode, three-dimensional target detection can be carried out on different types of images in a self-adaptive mode, and the efficiency and effectiveness of image processing are improved.

An image processing method provided by an embodiment of the present invention is schematically described below with reference to the drawings.

Referring to fig. 2 in detail, fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present invention, where the method may be executed by an image processing apparatus, and the image processing apparatus is specifically explained as described above. Specifically, the method of the embodiment of the present invention includes the following steps.

S201: and acquiring an initial image obtained by shooting by a shooting device.

In the embodiment of the invention, the image processing equipment can acquire the initial image obtained by shooting by the shooting device. In some embodiments, the initial image may be obtained by photographing the target object by the photographing device. In some embodiments, the camera is explained as described above, and is not described herein.

S202: and processing the initial image, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area.

In this embodiment of the present invention, the image processing device may process the initial image, and determine a two-dimensional target image area of the initial image and semantic information of a target object included in the two-dimensional target image area.

In one embodiment, the image processing device may perform image processing and semantic analysis on the initial image to determine semantic information of the target object. In certain embodiments, the semantic information includes, but is not limited to, a category, location information, etc. of the target object, which in one example includes semantics of different categories of objects, such as cars, drones, etc. In some embodiments, the position information includes two-dimensional coordinates of the target object.

In one embodiment, when the image processing device processes the initial image to determine the two-dimensional target image area of the initial image and the semantic information of the target object included in the two-dimensional target image area, the image processing device may process the initial image according to a first neural network to determine the two-dimensional target image area of the initial image and the semantic information of the target object included in the two-dimensional target image area. In certain embodiments, the first neural network is a convolutional neural network.

The semantic information of the target object contained in the two-dimensional target image area and the two-dimensional target image area of the initial image is determined, so that the subsequent determination of the initial three-dimensional coordinate of the target object is facilitated.

S203: and determining the initial three-dimensional coordinates of the target object according to the projection model of the two-dimensional target image area and the semantic information of the target object.

In this embodiment of the present invention, the image processing device may determine the initial three-dimensional coordinates of the target object according to the projection model of the two-dimensional target image region and the semantic information of the target object.

In some embodiments, the projection model includes, but is not limited to, any one of a pinhole imaging model, a snell window projection model, an isovolumetric projection model, an isometric projection model, a stereoscopic projection model, and the like.

In one embodiment, the image processing apparatus may acquire the projection model of the two-dimensional target image region and acquire the parameter of the photographing device when determining the initial three-dimensional coordinate of the target object based on the projection model of the two-dimensional target image region and the semantic information of the target object, and determine the initial three-dimensional coordinate of the two-dimensional target image region based on the projection model of the two-dimensional target image region, the semantic information of the target object, and the parameter of the photographing device. In some embodiments, the projection model may be a preset projection model.

In some embodiments, the parameters of the camera include an internal reference comprising a focal length of the camera and an external reference comprising an optical center of the camera.

In one embodiment, when acquiring the projection model of the two-dimensional target image area, the image processing device may determine category information of the two-dimensional target image area according to feature point information of the two-dimensional target image area, and determine the projection model of the two-dimensional target image area according to the category information of the two-dimensional target image area.

For example, assuming that the image processing apparatus determines that the category information of the two-dimensional target image region is a fisheye image according to the feature point information of the two-dimensional target image region, the projection model of the two-dimensional target image region may be determined to be any one of a snell window projection model, an equal-product projection model, an equidistant projection model, a stereoscopic projection model, and the like corresponding to the fisheye image.

In an embodiment, when obtaining the projection model of the two-dimensional target image region, the image processing device may process the initial image according to a first neural network to obtain the projection model of the initial image, and determine that the projection model of the initial image is the projection model of the two-dimensional target image region.

In an embodiment, when obtaining the projection model of the two-dimensional target image region, the image processing device may process the initial image according to a third neural network to obtain the projection model of the initial image, and determine that the projection model of the initial image is the projection model of the two-dimensional target image region. In some embodiments, the third neural network may be a convolutional neural network, the third neural network being different from the first neural network.

In one embodiment, the projection model comprises a first projection model; the image processing apparatus may acquire an image height of the target object in the two-dimensional target image region and an actual height of the target object when determining an initial three-dimensional coordinate of the two-dimensional target image region according to a projection model of the two-dimensional target image region, semantic information of the target object, and a parameter of the photographing device, and determine the initial three-dimensional coordinate of the two-dimensional target image region using the first projection model according to the image height, the actual height, the semantic information of the target object, and the parameter of the photographing device. In certain embodiments, the first projection model comprises an aperture imaging model.

In one embodiment, when determining the initial three-dimensional coordinates of the two-dimensional target image area using the first projection model based on the image height, the actual height, the semantic information of the target object, and the parameter of the photographing device, the image processing apparatus may determine the actual distance of the target object from the photographing device using the first projection model based on the image height, the actual height, and the parameter of the photographing device, and acquire the two-dimensional coordinates of the target object in an image coordinate system, and determine the initial three-dimensional coordinates of the two-dimensional target image area using the first projection model based on the parameter of the photographing device, the actual distance, the two-dimensional coordinates, and the semantic information of the target object.

By way of example, fig. 3 is a schematic diagram illustrating determination of three-dimensional coordinates based on an aperture imaging model according to an embodiment of the present invention. As shown in fig. 3, f is the focal length of the camera, H is the image height of the target object in the two-dimensional target image area, H is the actual height of the target object, and D is the distance from the target object to the camera, and f/D is H/H as shown in fig. 3. For object detection, H is the height of the target object in the two-dimensional target image region, and since the class to which the target object belongs in each two-dimensional target image region can be predicted, H can be obtained approximately (for example, if the target object is a car, H can be assumed to be 1 meter), and f is known from the camera intrinsic parameters, the actual distance D from the target object to the camera can be calculated. I.e. the Z-direction coordinates of the target object in the camera coordinate system. Similarly, using the pinhole imaging projection model, according to the optical centers cx and cy of the camera and the actual distance D between the target object and the camera, X, Y coordinates of the target object in the camera coordinate system can be obtained by calculation, as shown in the following formula (1):

u and v are coordinates of the target object in the image coordinate system, and can be approximated by using coordinates of a center point of the target object.

In one embodiment, after calculating the two-dimensional coordinates of the target object in the image coordinate system, the image processing device may calculate the initial three-dimensional coordinates of the two-dimensional target image area where the target object is located through the two-dimensional coordinates of the target object in the image coordinate system, the camera internal reference, and the semantic information of the target object.

In one embodiment, the projection model comprises a second projection model; the image processing apparatus may acquire an image distance of the target object in the two-dimensional target image region and acquire a two-dimensional coordinate of the target object in an image coordinate system when determining an initial three-dimensional coordinate of the two-dimensional target image region according to a projection model of the two-dimensional target image region, semantic information of the target object, and a parameter of the photographing device, to determine the initial three-dimensional coordinate of the two-dimensional target image region using the second projection model according to the image distance, the two-dimensional coordinate, the semantic information of the target object, and the parameter of the photographing device. In some embodiments, the second projection model includes any one of a snell window projection model, an equal-product projection model, an equidistant projection model, and a stereoscopic projection model; in other embodiments, the second projection model may further include other projection models, which are not specifically limited herein.

In one embodiment, when the image processing apparatus determines the initial three-dimensional coordinates of the two-dimensional target image region using the second projection model according to the image distance, the two-dimensional coordinates, the semantic information of the target object, and the parameter of the photographing device, the image processing apparatus may determine a photographing angle of view of the photographing device using the second projection model according to the image distance, the two-dimensional coordinates, and the parameter of the photographing device, and acquire an actual distance of the target object from the photographing device to determine the initial three-dimensional coordinates of the two-dimensional target image region using the second projection model according to the photographing angle of view, the actual distance, and the semantic information of the target object.

In one embodiment, when the image processing apparatus determines the initial three-dimensional coordinates of the two-dimensional target image region using the second projection model according to the photographing angle of view, the actual distance, and the semantic information of the target object, the image processing apparatus may determine the physical distance of the target object using the second projection model according to the photographing angle of view and the actual distance, and determine the initial three-dimensional coordinates of the two-dimensional target image region according to the parameters of the photographing device, the photographing angle of view, the two-dimensional coordinates, the physical distance, and the semantic information of the target object.

Specifically, fig. 4 is an example for illustration, and fig. 4 is a schematic diagram of determining three-dimensional coordinates based on an isometric projection model according to an embodiment of the present invention. As shown in fig. 4, where f is the focal length of the camera, t is the distance of the target object on the two-dimensional target image area, r is the actual distance of the target object, and z is the distance from the camera to the object, according to the principle of equidistant projection, there is the following formula (2):

wherein u and v are pixel coordinates of the target object in the image coordinate system, and f, cx and cy are camera intrinsic parameters.

For target object detection, H is the height of the target object on the two-dimensional target image area, and H is the actual height of the target object, then the following formula (3) exists:

in one embodiment, the category to which the target object in the two-dimensional target image region belongs may be predicted from semantic information of the target object, and then H may be approximately obtained (e.g., if the target object is a car, then H may be assumed to be 1 meter). As is known, u and v are coordinates of the target object in the image coordinate system, and may be approximated using the coordinates of the center point of the target object. Therefore, theta can be calculated firstly, then r value can be calculated, and finally X, Y, Z coordinates of the target object under the camera coordinate system, namely the initial three-dimensional coordinates, can be obtained.

In an embodiment, how to calculate the initial three-dimensional coordinate of the target object through the camera internal reference, the two-dimensional coordinate of the target object in the image coordinate system, the semantic information of the target object, the image distance of the target object in the two-dimensional target image area, and the projection model is described above, two models of pinhole imaging and equidistant projection are mainly described in detail, and for other projection models, the initial X, Y, Z coordinate may also be calculated, which is not described herein again.

In an embodiment, the image processing device may further detect the two-dimensional target image region according to a second neural network, to obtain region information of the two-dimensional target image region; the area information comprises any one or more of category information, three-dimensional size information, orientation information and two-dimensional external matrix information of the two-dimensional target image area. In some embodiments, the second neural network may be a convolutional neural network, the second neural network being different from the first and second neural networks.

S204: and extracting the region characteristic points of the two-dimensional target image region, and determining the three-dimensional coordinate adjustment information of the target object based on the region characteristic points.

In this embodiment of the present invention, the image processing device may perform region feature point extraction on the two-dimensional target image region, and determine three-dimensional coordinate adjustment information of the target object based on the region feature point.

In one embodiment, when performing region feature point extraction on the two-dimensional target image region, the image processing device may process the two-dimensional target image region according to a second neural network to extract feature point information of the two-dimensional target image region. In certain embodiments, the second neural network is a convolutional neural network. In certain embodiments, the second neural network is different from the first neural network and the third neural network.

S205: and determining the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information.

In this embodiment of the present invention, the image processing device may determine the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information.

In one embodiment, the image processing apparatus may process the two-dimensional target image region according to a second neural network to determine three-dimensional coordinate adjustment information of the two-dimensional target image region when determining the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information, and determine the target three-dimensional coordinate of the two-dimensional target image region according to the three-dimensional coordinate adjustment information and the initial three-dimensional coordinate.

In one embodiment, when determining the target three-dimensional coordinates of the two-dimensional target image area according to the three-dimensional coordinate adjustment information and the initial three-dimensional coordinates, the image processing apparatus may adjust the initial three-dimensional coordinates according to the three-dimensional coordinate adjustment information and determine the adjusted three-dimensional coordinates as the target three-dimensional coordinates of the two-dimensional target image area.

In this embodiment of the present invention, the image processing apparatus may process the initial image captured by the capturing device to determine a two-dimensional target image region of the initial image and semantic information of a target object included in the two-dimensional target image region, determine an initial three-dimensional coordinate of the target object according to a projection model of the two-dimensional target image region and the semantic information of the target object, and determine three-dimensional coordinate adjustment information of the target object based on the region feature points by performing region feature point extraction on the two-dimensional target image region, so as to determine a target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information. By the implementation mode, three-dimensional target detection can be carried out on different types of images in a self-adaptive mode, and the efficiency and effectiveness of image processing are improved.

Referring to fig. 5 in detail, fig. 5 is a schematic flowchart of another image processing method according to an embodiment of the present invention, where the method may be executed by an image processing apparatus, and the image processing apparatus is specifically explained as described above. Specifically, the method of the embodiment of the present invention includes the following steps.

S501: and acquiring an initial image obtained by shooting by a shooting device.

In the embodiment of the invention, the image processing equipment can acquire the initial image obtained by shooting by the shooting device. In some embodiments, the camera is explained as described above, and is not described herein.

S502: and processing the initial image, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area.

In this embodiment of the present invention, the image processing device may process the initial image, and determine a two-dimensional target image area of the initial image and semantic information of a target object included in the two-dimensional target image area. The specific embodiments are as described above and will not be described herein.

S503: and acquiring a projection model of the two-dimensional target image area, wherein the projection model is a preset projection model.

In the embodiment of the present invention, the image processing device may obtain a projection model of the two-dimensional target image area, where the projection model is a preset projection model. In some embodiments, the projection model is explained as described above, and is not described here.

S504: and determining the initial three-dimensional coordinates of the target object according to the projection model of the two-dimensional target image area and the semantic information of the target object.

In this embodiment of the present invention, the image processing device may determine the initial three-dimensional coordinates of the target object according to the projection model of the two-dimensional target image region and the semantic information of the target object. The specific embodiments are as described above and will not be described herein.

S505: and extracting the region characteristic points of the two-dimensional target image region, and determining the three-dimensional coordinate adjustment information of the target object based on the region characteristic points.

In this embodiment of the present invention, the image processing device may perform region feature point extraction on the two-dimensional target image region, and determine three-dimensional coordinate adjustment information of the target object based on the region feature point. The specific embodiments are as described above and will not be described herein.

S506: and determining the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information.

In this embodiment of the present invention, the image processing device may determine the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information. The specific embodiments are as described above and will not be described herein.

In this embodiment of the present invention, the image processing device may process the initial image captured by the capturing device to determine a two-dimensional target image region of the initial image and semantic information of a target object included in the two-dimensional target image region, determine an initial three-dimensional coordinate of the target object according to a preset projection model and the semantic information of the target object, and determine three-dimensional coordinate adjustment information of the target object based on the region feature points by performing region feature point extraction on the two-dimensional target image region, so as to determine a target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information. By the implementation mode, three-dimensional target detection can be carried out on different types of images in a self-adaptive mode, and the efficiency and effectiveness of image processing are improved.

Referring to fig. 6 in detail, fig. 6 is a schematic flowchart of another image processing method according to an embodiment of the present invention, which can be executed by an image processing apparatus, where the image processing apparatus is specifically explained as described above. Specifically, the method of the embodiment of the present invention includes the following steps.

S601: and acquiring an initial image obtained by shooting by a shooting device.

S602: and processing the initial image, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area.

S603: and processing the initial image according to a third neural network to obtain a projection model of the initial image, and determining the projection model of the initial image as a projection model of the two-dimensional target image area.

In this embodiment of the present invention, the image processing device may process the initial image according to a third neural network to obtain a projection model of the initial image, and determine that the projection model of the initial image is a projection model of the two-dimensional target image region.

Specifically, fig. 7 is an example, and fig. 7 is a schematic structural diagram of an image processing method according to an embodiment of the present invention, and as shown in fig. 7, the image processing device may process an initial image 72 through a first neural network 71 to obtain a two-dimensional target image area 73. And detecting the two-dimensional target image area according to the second neural network 74 to obtain area information 75 of the two-dimensional target image area, wherein the area information 75 includes any one or more of category information, three-dimensional size information, orientation information and two-dimensional external matrix information of the two-dimensional target image area. A projection model 77 of the two-dimensional target image region is determined by means of a third neural network 76, wherein the projection model 77 is at least one determined from a plurality of projection models. Determining initial three-dimensional coordinates 791 of the target object according to a projection model 77 of the two-dimensional target image area, semantic information of the target object, and parameters 78 of a photographing device; extracting region feature points of the two-dimensional target image region, and determining three-dimensional coordinate adjustment information 792 of the target object based on the region feature points; the target three-dimensional coordinates 710 of the target object are determined based on the initial three-dimensional coordinates 791 and the three-dimensional coordinate adjustment information 792.

By the implementation mode, the corresponding projection model can be automatically selected, the self-adaption of various projection models in the same algorithm frame is realized, so that the monocular images of all types are adapted, and the efficiency and the effectiveness of image processing are improved.

S604: and determining the initial three-dimensional coordinates of the target object according to the projection model of the two-dimensional target image area and the semantic information of the target object.

S605: and extracting the region characteristic points of the two-dimensional target image region, and determining the three-dimensional coordinate adjustment information of the target object based on the region characteristic points.

S606: and determining the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information.

In this embodiment of the present invention, the image processing apparatus may process an initial image captured by the capturing device to determine a two-dimensional target image region of the initial image and semantic information of a target object included in the two-dimensional target image region, determine an initial three-dimensional coordinate of the target object according to a projection model of the two-dimensional target image region obtained by processing the initial image through a third neural network and the semantic information of the target object, and determine three-dimensional coordinate adjustment information of the target object based on the region feature points by performing region feature point extraction on the two-dimensional target image region, thereby determining a target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information. By the implementation mode, the projection model can be determined through the neural network so as to carry out three-dimensional target detection on different types of images in a self-adaptive manner, and the efficiency and effectiveness of image processing are improved.

Referring to fig. 8 in detail, fig. 8 is a flowchart illustrating another image processing method according to an embodiment of the present invention, where the method may be executed by an image processing apparatus, and the image processing apparatus is specifically explained as described above. Specifically, the method of the embodiment of the present invention includes the following steps.

S801: and acquiring an initial image obtained by shooting by a shooting device.

S802: and processing the initial image, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area.

S803: and processing the initial image according to a first neural network to obtain a projection model of the initial image, and determining the projection model of the initial image as a projection model of the two-dimensional target image area.

In this embodiment of the present invention, the image processing device may process the initial image according to a first neural network to obtain a projection model of the initial image, and determine that the projection model of the initial image is a projection model of the two-dimensional target image region.

In one embodiment, the image processing apparatus may process the two-dimensional target image region according to a second neural network to extract feature point information of the two-dimensional target image region, determine category information of the two-dimensional target image region according to the feature point information of the two-dimensional target image region, and determine a projection model of the two-dimensional target image region according to the category information of the two-dimensional target image region.

Specifically, as shown in fig. 9 as an example, fig. 9 is a schematic structural diagram of an image processing method according to an embodiment of the present invention, and as shown in fig. 9, the image processing device may process a two-dimensional target image area 92 through a first neural network 91 to extract feature point information 93 of the two-dimensional target image area 92. The two-dimensional target image area is detected according to the second neural network 94, and area information 95 of the two-dimensional target image area is obtained. Determining a projection model 96 of the two-dimensional target image area 92 through a first neural network 91, and determining an initial three-dimensional coordinate 981 of the target object according to the projection model 96 of the two-dimensional target image area, parameters 97 of a shooting device and semantic information of the target object; extracting area characteristic points of the two-dimensional target image area, and determining three-dimensional coordinate adjustment information 982 of the target object based on the area characteristic points; and determining the target three-dimensional coordinate 99 of the target object according to the initial three-dimensional coordinate 981 and the three-dimensional coordinate adjustment information 982.

By the implementation mode, the neural network for judging the projection model and the neural network for extracting the feature point information of the two-dimensional target image area can be shared, the calculation amount and complexity are reduced, and the image processing efficiency is further improved.

S804: and determining the initial three-dimensional coordinates of the target object according to the projection model of the two-dimensional target image area and the semantic information of the target object.

In one embodiment, the image device may acquire parameters of the photographing apparatus when determining the initial three-dimensional coordinates of the target object according to the projection model of the two-dimensional target image area and the semantic information of the target object, and determine the initial three-dimensional coordinates of the two-dimensional target image area according to the projection model of the two-dimensional target image area, the semantic information of the target object, and the parameters of the photographing apparatus. In some embodiments, the parameters of the camera include an internal reference comprising a focal length of the camera and an external reference comprising an optical center of the camera.

S805: and extracting the region characteristic points of the two-dimensional target image region, and determining the three-dimensional coordinate adjustment information of the target object based on the region characteristic points.

S806: and determining the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information.

In an embodiment of the present invention, the image processing apparatus may process an initial image captured by the capturing device to determine a two-dimensional target image region of the initial image and semantic information of a target object included in the two-dimensional target image region, process the initial image according to a first neural network to obtain a projection model of the two-dimensional target image region, determine an initial three-dimensional coordinate of the target object according to the projection model and the semantic information of the target object, and determine three-dimensional coordinate adjustment information of the target object based on region feature points by performing region feature point extraction on the two-dimensional target image region, so as to determine a target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information. By the implementation mode, the projection model of the two-dimensional target image area can be determined according to the category information of the two-dimensional target image area determined by the characteristic point information, so that three-dimensional target detection can be carried out on different types of images in a self-adaptive mode, and the efficiency and effectiveness of image processing are improved.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention. Specifically, the image processing apparatus includes: memory 1001, processor 1002.

In one embodiment, the image processing apparatus further comprises a data interface 1003, and the data interface 1003 is used for transferring data information between the image processing apparatus and other apparatuses.

The memory 1001 may include a volatile memory (volatile memory); the memory 1001 may also include a non-volatile memory (non-volatile memory); the memory 1001 may also comprise a combination of memories of the kind described above. The processor 1002 may be a Central Processing Unit (CPU). The processor 1002 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), or any combination thereof.

The memory 1001 is used for storing programs, and the processor 1002 can call the programs stored in the memory 1001 for executing the following steps:

acquiring an initial image shot by a shooting device;

Further, when the processor 1002 processes the initial image and determines the two-dimensional target image area of the initial image and semantic information of a target object included in the two-dimensional target image area, specifically, the processor is configured to:

and processing the initial image according to a first neural network, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area.

Further, when the processor 1002 performs the region feature point extraction on the two-dimensional target image region, it is specifically configured to:

and processing the two-dimensional target image area according to a second neural network so as to extract the characteristic point information of the two-dimensional target image area.

Further, the processor 1002 is further configured to:

detecting the two-dimensional target image area according to a second neural network to obtain area information of the two-dimensional target image area;

the area information comprises any one or more of category information, three-dimensional size information, orientation information and two-dimensional external matrix information of the two-dimensional target image area.

Further, when the processor 1002 determines the initial three-dimensional coordinate of the target object according to the projection model of the two-dimensional target image area and the semantic information of the target object, it is specifically configured to:

acquiring a projection model of the two-dimensional target image area;

acquiring parameters of the shooting device;

and determining the initial three-dimensional coordinates of the two-dimensional target image area according to the projection model of the two-dimensional target image area, the semantic information of the target object and the parameters of the shooting device.

Further, the projection model is a preset projection model.

Further, when the processor 1002 acquires the projection model of the two-dimensional target image region, it is specifically configured to:

determining the category information of the two-dimensional target image area according to the characteristic point information of the two-dimensional target image area;

and determining a projection model of the two-dimensional target image area according to the category information of the two-dimensional target image area.

processing the initial image according to a first neural network to obtain a projection model of the initial image;

and determining the projection model of the initial image as the projection model of the two-dimensional target image area.

processing the initial image according to a third neural network to obtain a projection model of the initial image;

Further, the projection model comprises a first projection model; the processor 1002 is specifically configured to, when determining the initial three-dimensional coordinate of the two-dimensional target image region according to the projection model of the two-dimensional target image region, the semantic information of the target object, and the parameter of the shooting device:

acquiring the image height of the target object in the two-dimensional target image area and the actual height of the target object;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the first projection model according to the image height, the actual height, the semantic information of the target object and the parameters of the shooting device.

Further, when the processor 1002 determines the initial three-dimensional coordinate of the two-dimensional target image area by using the first projection model according to the image height, the actual height, the semantic information of the target object, and the parameter of the shooting device, specifically, the processor is configured to:

determining the actual distance between the target object and the shooting device by utilizing the first projection model according to the image height, the actual height and the parameters of the shooting device;

acquiring two-dimensional coordinates of the target object in an image coordinate system;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the first projection model according to the parameters of the shooting device, the actual distance, the two-dimensional coordinates and the semantic information of the target object.

Further, the projection model comprises a second projection model; the processor 702 is specifically configured to, when determining the initial three-dimensional coordinate of the two-dimensional target image region according to the projection model of the two-dimensional target image region, the semantic information of the target object, and the parameter of the shooting device:

acquiring the image distance of the target object in the two-dimensional target image area;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the second projection model according to the image distance, the two-dimensional coordinates, the semantic information of the target object and the parameters of the shooting device.

Further, when the processor 1002 determines the initial three-dimensional coordinate of the two-dimensional target image area by using the second projection model according to the image distance, the two-dimensional coordinate, the semantic information of the target object, and the parameter of the shooting device, specifically, the processor is configured to:

determining a shooting visual angle of the shooting device by using the second projection model according to the image distance, the two-dimensional coordinates and parameters of the shooting device;

acquiring the actual distance between the target object and the shooting device;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the second projection model according to the shooting visual angle, the actual distance and the semantic information of the target object.

Further, when the processor 1002 determines the initial three-dimensional coordinate of the two-dimensional target image area by using the second projection model according to the shooting angle, the actual distance, and the semantic information of the target object, specifically, the processor is configured to:

determining the physical distance of the target object by utilizing the second projection model according to the shooting visual angle and the actual distance;

and determining the initial three-dimensional coordinate of the two-dimensional target image area according to the parameters of the shooting device, the shooting visual angle, the two-dimensional coordinate, the physical distance and the semantic information of the target object.

Further, when determining the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information, the processor 1002 is specifically configured to:

processing the two-dimensional target image area according to a second neural network, and determining three-dimensional coordinate adjustment information of the two-dimensional target image area;

and determining the target three-dimensional coordinate of the two-dimensional target image area according to the three-dimensional coordinate adjustment information and the initial three-dimensional coordinate.

Further, the parameters of the camera include an internal reference and an external reference, the internal reference includes a focal length of the camera, and the external reference includes an optical center of the camera.

Further, the first projection model includes an aperture imaging model.

Further, the second projection model includes any one of a snell window projection model, an equal-product projection model, an equidistant projection model, and a stereoscopic projection model.

In this embodiment of the present invention, the image processing apparatus may process the initial image captured by the capturing device to determine a two-dimensional target image region of the initial image and semantic information of a target object included in the two-dimensional target image region, determine an initial three-dimensional coordinate of the target object according to a projection model of the two-dimensional target image region and the semantic information of the target object, and determine three-dimensional coordinate adjustment information of the target object based on the region feature points by performing region feature point extraction on the two-dimensional target image region, so as to determine a target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information. By the implementation mode, various projection models can be self-adapted in the same algorithm frame so as to adapt to monocular images of all types, and the efficiency and effectiveness of image processing are improved.

An embodiment of the present invention further provides an image processing system, including: shooting device and above-mentioned image processing apparatus. In this embodiment of the present invention, the image processing apparatus may process the initial image captured by the capturing device to determine a two-dimensional target image region of the initial image and semantic information of a target object included in the two-dimensional target image region, determine an initial three-dimensional coordinate of the target object according to a projection model of the two-dimensional target image region and the semantic information of the target object, and determine three-dimensional coordinate adjustment information of the target object based on the region feature points by performing region feature point extraction on the two-dimensional target image region, so as to determine a target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information. By the implementation mode, three-dimensional target detection can be carried out on different types of images in a self-adaptive mode, and the efficiency and effectiveness of image processing are improved.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the method described in the embodiment corresponding to fig. 2, fig. 5, fig. 6, or fig. 8 of the present invention, and may also implement the apparatus in the embodiment corresponding to the present invention described in fig. 10, which is not described herein again.

The computer readable storage medium may be an internal storage unit of the device according to any of the foregoing embodiments, for example, a hard disk or a memory of the device. The computer readable storage medium may also be an external storage device of the device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the apparatus. The computer-readable storage medium is used for storing the computer program and other programs and data required by the terminal. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

The above disclosure is intended to be illustrative of only some embodiments of the invention, and is not intended to limit the scope of the invention.

Claims

An image processing method, comprising:

acquiring an initial image shot by a shooting device;

processing the initial image, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area;

determining an initial three-dimensional coordinate of the target object according to the projection model of the two-dimensional target image area and the semantic information of the target object;

extracting region feature points of the two-dimensional target image region, and determining three-dimensional coordinate adjustment information of the target object based on the region feature points;

and determining the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information.
The method according to claim 1, wherein the processing the initial image to determine the two-dimensional target image area of the initial image and semantic information of the target object included in the two-dimensional target image area comprises:

and processing the initial image according to a first neural network, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area.
The method according to claim 1, wherein the performing of the region feature point extraction on the two-dimensional target image region comprises:

and processing the two-dimensional target image area according to a second neural network so as to extract the characteristic point information of the two-dimensional target image area.
The method of claim 1, further comprising:

detecting the two-dimensional target image area according to a second neural network to obtain area information of the two-dimensional target image area;

the area information comprises any one or more of category information, three-dimensional size information, orientation information and two-dimensional external matrix information of the two-dimensional target image area.
The method according to claim 1 or 2, wherein determining initial three-dimensional coordinates of the target object based on the projected model of the two-dimensional target image area and semantic information of the target object comprises:

acquiring a projection model of the two-dimensional target image area;

acquiring parameters of the shooting device;

and determining the initial three-dimensional coordinates of the two-dimensional target image area according to the projection model of the two-dimensional target image area, the semantic information of the target object and the parameters of the shooting device.
The method of claim 5, wherein the projection model is a preset projection model.
The method of claim 5, wherein said obtaining a projection model of the two-dimensional target image region comprises:

determining the category information of the two-dimensional target image area according to the characteristic point information of the two-dimensional target image area;

and determining a projection model of the two-dimensional target image area according to the category information of the two-dimensional target image area.
The method of claim 5, wherein said obtaining a projection model of the two-dimensional target image region comprises:

processing the initial image according to a first neural network to obtain a projection model of the initial image;

and determining the projection model of the initial image as the projection model of the two-dimensional target image area.
The method of claim 5, wherein said obtaining a projection model of the two-dimensional target image region comprises:

processing the initial image according to a third neural network to obtain a projection model of the initial image;

and determining the projection model of the initial image as the projection model of the two-dimensional target image area.
The method of any of claims 5-9, wherein the projection model comprises a first projection model; the determining an initial three-dimensional coordinate of the two-dimensional target image area according to the projection model of the two-dimensional target image area, the semantic information of the target object and the parameter of the shooting device includes:

acquiring the image height of the target object in the two-dimensional target image area and the actual height of the target object;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the first projection model according to the image height, the actual height, the semantic information of the target object and the parameters of the shooting device.
The method of claim 10, wherein determining initial three-dimensional coordinates of the two-dimensional target image area using the first projection model based on the image height, the actual height, semantic information of the target object, and parameters of the camera comprises:

determining the actual distance between the target object and the shooting device by utilizing the first projection model according to the image height, the actual height and the parameters of the shooting device;

acquiring two-dimensional coordinates of the target object in an image coordinate system;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the first projection model according to the parameters of the shooting device, the actual distance, the two-dimensional coordinates and the semantic information of the target object.
The method of any of claims 5-9, wherein the projection model comprises a second projection model; the determining an initial three-dimensional coordinate of the two-dimensional target image area according to the projection model of the two-dimensional target image area, the semantic information of the target object and the parameter of the shooting device includes:

acquiring the image distance of the target object in the two-dimensional target image area;

acquiring two-dimensional coordinates of the target object in an image coordinate system;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the second projection model according to the image distance, the two-dimensional coordinates, the semantic information of the target object and the parameters of the shooting device.
The method of claim 12, wherein determining initial three-dimensional coordinates of the two-dimensional target image area using the second projection model based on the image distance, the two-dimensional coordinates, semantic information of the target object, and parameters of the camera comprises:

determining a shooting visual angle of the shooting device by using the second projection model according to the image distance, the two-dimensional coordinates and parameters of the shooting device;

acquiring the actual distance between the target object and the shooting device;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the second projection model according to the shooting visual angle, the actual distance and the semantic information of the target object.
The method of claim 13, wherein determining initial three-dimensional coordinates of the two-dimensional target image area using the second projection model based on the capture perspective, the actual distance, and semantic information of the target object comprises:

determining the physical distance of the target object by utilizing the second projection model according to the shooting visual angle and the actual distance;

and determining the initial three-dimensional coordinate of the two-dimensional target image area according to the parameters of the shooting device, the shooting visual angle, the two-dimensional coordinate, the physical distance and the semantic information of the target object.
The method of claim 4, wherein determining the target three-dimensional coordinates of the target object based on the initial three-dimensional coordinates and the three-dimensional coordinate adjustment information comprises:

processing the two-dimensional target image area according to a second neural network, and determining three-dimensional coordinate adjustment information of the two-dimensional target image area;

and determining the target three-dimensional coordinate of the two-dimensional target image area according to the three-dimensional coordinate adjustment information and the initial three-dimensional coordinate.
The method of claim 1, wherein the parameters of the camera include an internal reference and an external reference, the internal reference including a focal length of the camera, the external reference including an optical center of the camera.
The method of claim 10, wherein the first projection model comprises a pinhole imaging model.
The method of claim 12, wherein the second projection model comprises any one of a snell window projection model, an equal product projection model, an equidistant projection model, and a stereoscopic projection model.
An image processing apparatus characterized by comprising: a memory and a processor, wherein the processor is capable of,

the memory is used for storing programs;

the processor to execute the memory-stored program, the processor to, when executed:

acquiring an initial image shot by a shooting device;

processing the initial image, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area;

determining an initial three-dimensional coordinate of the target object according to the projection model of the two-dimensional target image area and the semantic information of the target object;

extracting region feature points of the two-dimensional target image region, and determining three-dimensional coordinate adjustment information of the target object based on the region feature points;

and determining the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information.
The apparatus according to claim 19, wherein the processor is configured to, when processing the initial image and determining the two-dimensional target image area of the initial image and semantic information of the target object included in the two-dimensional target image area, specifically:

and processing the initial image according to a first neural network, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area.
The device according to claim 19, wherein the processor, when performing the region feature point extraction on the two-dimensional target image region, is specifically configured to:

and processing the two-dimensional target image area according to a second neural network so as to extract the characteristic point information of the two-dimensional target image area.
The device of claim 19, wherein the processor is further configured to:

detecting the two-dimensional target image area according to a second neural network to obtain area information of the two-dimensional target image area;

the area information comprises any one or more of category information, three-dimensional size information, orientation information and two-dimensional external matrix information of the two-dimensional target image area.
The apparatus according to claim 19 or 21, wherein the processor is configured to determine the initial three-dimensional coordinates of the target object based on the projection model of the two-dimensional target image area and the semantic information of the target object, and is further configured to:

acquiring a projection model of the two-dimensional target image area;

acquiring parameters of the shooting device;

and determining the initial three-dimensional coordinates of the two-dimensional target image area according to the projection model of the two-dimensional target image area, the semantic information of the target object and the parameters of the shooting device.
The apparatus of claim 23, wherein the projection model is a preset projection model.
The apparatus of claim 23, wherein the processor, when obtaining the projection model of the two-dimensional target image region, is specifically configured to:

determining the category information of the two-dimensional target image area according to the characteristic point information of the two-dimensional target image area;

and determining a projection model of the two-dimensional target image area according to the category information of the two-dimensional target image area.
The apparatus of claim 23, wherein the processor, when obtaining the projection model of the two-dimensional target image region, is specifically configured to:

processing the initial image according to a first neural network to obtain a projection model of the initial image;

and determining the projection model of the initial image as the projection model of the two-dimensional target image area.
The apparatus of claim 23, wherein the processor, when obtaining the projection model of the two-dimensional target image region, is specifically configured to:

processing the initial image according to a third neural network to obtain a projection model of the initial image;

and determining the projection model of the initial image as the projection model of the two-dimensional target image area.
The apparatus of any of claims 23-27, wherein the projection model comprises a first projection model; the processor is specifically configured to, when determining the initial three-dimensional coordinate of the two-dimensional target image region according to the projection model of the two-dimensional target image region, the semantic information of the target object, and the parameter of the photographing device:

acquiring the image height of the target object in the two-dimensional target image area and the actual height of the target object;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the first projection model according to the image height, the actual height, the semantic information of the target object and the parameters of the shooting device.
The apparatus of claim 28, wherein the processor is configured to determine the initial three-dimensional coordinates of the two-dimensional target image region using the first projection model based on the image height, the actual height, semantic information of the target object, and parameters of the camera, and is further configured to:

determining the actual distance between the target object and the shooting device by utilizing the first projection model according to the image height, the actual height and the parameters of the shooting device;

acquiring two-dimensional coordinates of the target object in an image coordinate system;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the first projection model according to the parameters of the shooting device, the actual distance, the two-dimensional coordinates and the semantic information of the target object.
The apparatus of any of claims 23-27, wherein the projection model comprises a second projection model; the processor is specifically configured to, when determining an initial three-dimensional coordinate of the two-dimensional target image region according to the projection model of the two-dimensional target image region, the semantic information of the target object, and the parameter of the photographing device:

acquiring the image distance of the target object in the two-dimensional target image area;

acquiring two-dimensional coordinates of the target object in an image coordinate system;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the second projection model according to the image distance, the two-dimensional coordinates, the semantic information of the target object and the parameters of the shooting device.
The apparatus of claim 30, wherein the processor is further configured to determine the initial three-dimensional coordinates of the two-dimensional target image region using the second projection model based on the image distance, the two-dimensional coordinates, semantic information of the target object, and parameters of the camera, and is further configured to:

determining a shooting visual angle of the shooting device by using the second projection model according to the image distance, the two-dimensional coordinates and parameters of the shooting device;

acquiring the actual distance between the target object and the shooting device;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the second projection model according to the shooting visual angle, the actual distance and the semantic information of the target object.
The apparatus of claim 31, wherein the processor, when determining the initial three-dimensional coordinates of the two-dimensional target image region using the second projection model based on the capture perspective, the actual distance, and the semantic information of the target object, is specifically configured to:

determining the physical distance of the target object by utilizing the second projection model according to the shooting visual angle and the actual distance;

and determining the initial three-dimensional coordinate of the two-dimensional target image area according to the parameters of the shooting device, the shooting visual angle, the two-dimensional coordinate, the physical distance and the semantic information of the target object.
The device of claim 22, wherein the processor, when determining the target three-dimensional coordinate of the target object based on the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information, is specifically configured to:

processing the two-dimensional target image area according to a second neural network, and determining three-dimensional coordinate adjustment information of the two-dimensional target image area;

and determining the target three-dimensional coordinate of the two-dimensional target image area according to the three-dimensional coordinate adjustment information and the initial three-dimensional coordinate.
The apparatus of claim 19, wherein the parameters of the camera include an internal reference and an external reference, the internal reference including a focal length of the camera, the external reference including an optical center of the camera.
The apparatus of claim 28, wherein the first projection model comprises an aperture imaging model.
The apparatus of claim 30, wherein the second projection model comprises any one of a snell window projection model, an equal product projection model, an equidistant projection model, and a stereoscopic projection model.
An image processing system, comprising:

the shooting device is used for shooting to obtain an initial image;

an image processing apparatus for:

acquiring an initial image shot by a shooting device;

processing the initial image, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area;

determining an initial three-dimensional coordinate of the target object according to the projection model of the two-dimensional target image area and the semantic information of the target object;

extracting region feature points of the two-dimensional target image region, and determining three-dimensional coordinate adjustment information of the target object based on the region feature points;

and determining the target three-dimensional coordinate of the target object according to the initial three-dimensional coordinate and the three-dimensional coordinate adjustment information.
The system according to claim 37, wherein the image processing device is configured to, when processing the initial image and determining the two-dimensional target image area of the initial image and semantic information of the target object included in the two-dimensional target image area, specifically:

and processing the initial image according to a first neural network, and determining a two-dimensional target image area of the initial image and semantic information of a target object contained in the two-dimensional target image area.
The system according to claim 37, wherein the image processing device, when performing the region feature point extraction on the two-dimensional target image region, is specifically configured to:

and processing the two-dimensional target image area according to a second neural network so as to extract the characteristic point information of the two-dimensional target image area.
The system of claim 37, wherein the image processing device is further configured to:

detecting the two-dimensional target image area according to a second neural network to obtain area information of the two-dimensional target image area;

the area information comprises any one or more of category information, three-dimensional size information, orientation information and two-dimensional external matrix information of the two-dimensional target image area.
The system according to claim 37 or 39, wherein the image processing device is configured to determine the initial three-dimensional coordinates of the target object based on the projection model of the two-dimensional target image area and the semantic information of the target object, and is specifically configured to:

acquiring a projection model of the two-dimensional target image area;

acquiring parameters of the shooting device;

and determining the initial three-dimensional coordinates of the two-dimensional target image area according to the projection model of the two-dimensional target image area, the semantic information of the target object and the parameters of the shooting device.
The system of claim 41, wherein the projection model is a default projection model.
The system according to claim 41, wherein the image processing device, when acquiring the projection model of the two-dimensional target image area, is specifically configured to:

determining the category information of the two-dimensional target image area according to the characteristic point information of the two-dimensional target image area;

and determining a projection model of the two-dimensional target image area according to the category information of the two-dimensional target image area.
The system according to claim 41, wherein the image processing device, when acquiring the projection model of the two-dimensional target image area, is specifically configured to:

processing the initial image according to a first neural network to obtain a projection model of the initial image;

and determining the projection model of the initial image as the projection model of the two-dimensional target image area.
The system according to claim 41, wherein the image processing device, when acquiring the projection model of the two-dimensional target image area, is specifically configured to:

processing the initial image according to a third neural network to obtain a projection model of the initial image;

and determining the projection model of the initial image as the projection model of the two-dimensional target image area.
The system of any of claims 41-45, wherein the projection model comprises a first projection model; the image processing device is specifically configured to, when determining the initial three-dimensional coordinate of the two-dimensional target image region according to the projection model of the two-dimensional target image region, the semantic information of the target object, and the parameter of the shooting device:

acquiring the image height of the target object in the two-dimensional target image area and the actual height of the target object;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the first projection model according to the image height, the actual height, the semantic information of the target object and the parameters of the shooting device.
The system according to claim 46, wherein the image processing device, when determining the initial three-dimensional coordinates of the two-dimensional target image area using the first projection model based on the image height, the actual height, semantic information of the target object, and parameters of the camera, is specifically configured to:

determining the actual distance between the target object and the shooting device by utilizing the first projection model according to the image height, the actual height and the parameters of the shooting device;

acquiring two-dimensional coordinates of the target object in an image coordinate system;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the first projection model according to the parameters of the shooting device, the actual distance, the two-dimensional coordinates and the semantic information of the target object.
The system of any of claims 41-45, wherein the projection model comprises a second projection model; the image processing device is specifically configured to, when determining the initial three-dimensional coordinate of the two-dimensional target image region according to the projection model of the two-dimensional target image region, the semantic information of the target object, and the parameter of the shooting device:

acquiring the image distance of the target object in the two-dimensional target image area;

acquiring two-dimensional coordinates of the target object in an image coordinate system;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the second projection model according to the image distance, the two-dimensional coordinates, the semantic information of the target object and the parameters of the shooting device.
The system according to claim 48, wherein the image processing device, when determining the initial three-dimensional coordinates of the two-dimensional target image area using the second projection model based on the image distance, the two-dimensional coordinates, the semantic information of the target object, and the parameters of the camera, is specifically configured to:

determining a shooting visual angle of the shooting device by using the second projection model according to the image distance, the two-dimensional coordinates and parameters of the shooting device;

acquiring the actual distance between the target object and the shooting device;

and determining the initial three-dimensional coordinates of the two-dimensional target image area by using the second projection model according to the shooting visual angle, the actual distance and the semantic information of the target object.
The system according to claim 49, wherein the image processing device, when determining the initial three-dimensional coordinates of the two-dimensional target image area using the second projection model based on the capturing perspective, the actual distance, and the semantic information of the target object, is specifically configured to:

determining the physical distance of the target object by utilizing the second projection model according to the shooting visual angle and the actual distance;

and determining the initial three-dimensional coordinate of the two-dimensional target image area according to the parameters of the shooting device, the shooting visual angle, the two-dimensional coordinate, the physical distance and the semantic information of the target object.
The system according to claim 40, wherein the image processing device, when determining the target three-dimensional coordinates of the target object based on the initial three-dimensional coordinates and the three-dimensional coordinate adjustment information, is specifically configured to:

processing the two-dimensional target image area according to a second neural network, and determining three-dimensional coordinate adjustment information of the two-dimensional target image area;

and determining the target three-dimensional coordinate of the two-dimensional target image area according to the three-dimensional coordinate adjustment information and the initial three-dimensional coordinate.
The system of claim 37, wherein the parameters of the camera include an internal reference and an external reference, the internal reference including a focal length of the camera, the external reference including an optical center of the camera.
The system of claim 46, wherein the first projection model comprises an aperture imaging model.
The system of claim 48, wherein the second projection model comprises any one of a Snell window projection model, an equal product projection model, an equidistant projection model, and a stereoscopic projection model.
A computer-readable storage medium having a computer program stored therein, characterized in that: the computer program when executed by a processor implementing the steps of the method according to any one of claims 1 to 18.