CN111784659A

CN111784659A - Image detection method and device, electronic equipment and storage medium

Info

Publication number: CN111784659A
Application number: CN202010605474.4A
Authority: CN
Inventors: 李莹莹; 叶晓青; 谭啸; 孙昊
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2020-10-16

Abstract

The application discloses an image detection method, an image detection device, electronic equipment and a storage medium, and relates to the fields of automatic driving, computer vision and deep learning. The specific implementation scheme is as follows: acquiring an image to be detected acquired by a camera in an environment to be detected; fusing the global features of the image to be detected, the local features of the image to be detected and the depth map features of the environment to be detected to obtain the fusion features of the image to be detected; and predicting the depth of the obstacle in the image to be detected from the camera according to the fusion characteristics of the image to be detected. Compared with the prior art, the depth of the obstacle in the image from the camera is predicted by combining the depth map features of the environment to be detected, and the accuracy and robustness of image detection are improved.

Description

Image detection method and device, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the field of computer vision, automatic driving and deep learning in computer technology, in particular to an image detection method, an image detection device, electronic equipment and a storage medium.

Background

The unmanned technology is a technology that senses the environment around a vehicle by a sensor and controls the steering and speed of the vehicle according to the sensed road, vehicle position, obstacle, and the like, so that the vehicle can safely and reliably travel on the road. Three-dimensional vehicle detection is used for detecting obstacles around a vehicle and is of great importance to unmanned technology.

Currently, three-dimensional vehicle detection in road scenes is mainly based on images or radar data of vehicle-mounted binocular cameras. For three-dimensional vehicle detection in a fixed monitoring scene, projection or length, width, height and orientation angle information of eight vertexes of a three-dimensional detection frame on an image can be directly predicted through a network, so that obstacle detection is realized; or, the image combined depth information is converted into a pseudo point cloud through monocular depth estimation, and obstacle detection is performed through a 3D point cloud detection method.

However, the method relying on the binocular camera has high accuracy requirement on the estimation of the depth of the obstacle, has high calculation complexity, and cannot meet the requirements of real-time performance and robustness. The method depending on the radar does not meet the application scene requirements under the monitoring camera, the point cloud generated by the radar is sparse, and the remote detection precision is low. The mode based on the two-dimensional image is often influenced by the size of the image due to perspective projection, so that the estimated 3D detection frame is not accurate enough, and the detection precision is not enough. Therefore, the existing three-dimensional vehicle detection method cannot meet the requirements of precision and robustness at the same time.

Disclosure of Invention

The application provides an image detection method and device, an electronic device and a storage medium.

According to a first aspect of the present application, there is provided a method of image detection, comprising:

acquiring an image to be detected acquired by a camera in an environment to be detected;

fusing the global features of the image to be detected, the local features of the image to be detected and the depth map features of the environment to be detected to obtain the fusion features of the image to be detected;

and predicting the depth of the obstacle in the image to be detected from the camera according to the fusion characteristics of the image to be detected.

According to a second aspect of the present application, there is provided an apparatus for image detection, comprising:

the acquisition module is used for acquiring an image to be detected acquired by a camera in an environment to be detected;

the fusion module is used for fusing the global features of the image to be detected, the local features of the image to be detected and the depth map features of the environment to be detected to obtain the fusion features of the image to be detected;

and the prediction module is used for predicting the depth of the barrier in the image to be detected from the camera according to the fusion characteristics of the image to be detected.

According to a third aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first aspect described above.

According to a fifth aspect of the present application, there is provided a method of image detection, comprising:

According to the technology of the application, the technical problem that the three-dimensional vehicle detection in the prior art cannot meet the requirements of precision and robustness at the same time is solved. Compared with the prior art, the depth of the obstacle in the image from the camera is predicted by combining the depth map features of the environment to be detected, and the accuracy and robustness of image detection are improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a scene schematic diagram of a method for image detection according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a method for image detection according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of an image detection principle provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of another image detection method according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of another image detection method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an apparatus for image detection according to an embodiment of the present disclosure;

fig. 7 is a block diagram of an electronic device for implementing the method of image detection according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The application provides an image detection method, an image detection device, electronic equipment and a storage medium, which are applied to the fields of computer vision, automatic driving and deep learning in computer technology, so that the technical problem that the three-dimensional vehicle detection cannot meet the requirements of precision and robustness at the same time is solved, and the effect of improving the precision and robustness of the image detection is achieved. The invention conception of the application is as follows: when image detection is carried out, the depth map features are fused on the basis of the existing global features and local features.

For the purpose of clearly understanding the technical solution of the present application, the terms referred to in the present application are explained below:

monocular camera: a camera with only one vision sensor.

A binocular camera: the binocular camera can acquire depth information of a scene by utilizing a triangulation principle and can reconstruct the three-dimensional shape and position of a surrounding scene.

Depth map: a depth map is also called a distance map, and refers to an image in which distances from an image pickup to points in a scene are taken as pixel values.

Internal reference of the camera: parameters related to the camera's own characteristics, such as the focal length of the camera, the pixel size, etc.

The following describes a usage scenario of the present application.

Fig. 1 is a scene schematic diagram of an image detection method according to an embodiment of the present application. As shown in fig. 1, a camera 102 on a vehicle 101 captures an image of the surroundings of the vehicle and transmits the image to a server 103, and the server 103 detects the image, determines the depth of an obstacle in the image from the camera 103, and generates instruction information for the vehicle 101, and controls the steering and speed of the vehicle 101 to perform automatic driving.

The camera 102 may be a monocular camera, among others. The server 102 may be a server or a server in a cloud service platform.

It should be noted that the application scenario of the present invention may be the automatic driving scenario in fig. 1, but is not limited to this, and may also be applied to other scenarios requiring image detection.

It is understood that the above-mentioned method for image detection may be implemented by the apparatus for image detection provided in the embodiments of the present application, and the apparatus for image detection may be part or all of a certain device, and may be, for example, a server or a processor in the server.

The following takes a server integrated or installed with relevant execution codes as an example, and details the technical solution of the embodiment of the present application with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 2 is a schematic flowchart of an image detection method according to an embodiment of the present application, and this embodiment relates to a process of how to perform image detection. As shown in fig. 2, the method includes:

s201, acquiring an image to be detected collected by a camera in an environment to be detected.

The camera may be a monocular camera, and correspondingly, the image to be measured may be a monocular image.

In the application, the camera can acquire the image to be detected in the environment to be detected in real time, and then the server can receive the image to be detected sent by the camera. In the automatic driving scene, the camera can be installed at any position of the vehicle. For example, a camera may be installed in front of the vehicle to capture an image to be measured in front of the vehicle; alternatively, the camera may be mounted behind the vehicle to capture the image to be measured behind the vehicle.

The environment to be detected is not limited, for example, automatic driving is taken as an example, and if the vehicle is in a road scene, the environment to be detected can be a road; if the vehicle is in a fixed monitoring scene, the environment to be detected can be a parking lot under fixed monitoring and the like.

S202, fusing the global features of the image to be detected, the local features of the image to be detected and the depth map features of the environment to be detected to obtain the fusion features of the image to be detected.

In this step, after the server acquires the image to be detected collected by the camera in the environment to be detected, the global feature of the image to be detected, the local feature of the image to be detected, and the depth map feature of the environment to be detected may be fused to acquire the fusion feature of the image to be detected.

The global feature (global feature) may be an overall attribute of the image, and may include, for example, a color feature, a texture feature, a shape feature, and the like. The global features have the characteristics of good invariance, simple calculation, visual representation and the like.

The local feature may be a local expression of an image feature, and may include, for example, a scale-invariant feature transform (SIFT), Speeded Up Robust Features (SURF), dense features (daisy), and the like.

The embodiment of the application is not limited to how to obtain the global features and the local features of the image to be detected, and in an optional implementation, the image to be detected may be input into a Backbone network (Backbone), and the local features and the global features output by the Backbone may be obtained. Among them, the backhaul is a neural network model for target detection, which may be, for example, ResNet, densnet, etc. The embodiment of the application does not limit the backhaul, and can be specifically set according to actual conditions.

In an optional implementation manner, before acquiring the fusion features of the image to be measured, the server extracts the depth map features of the environment to be measured from the ground depth map of the environment to be measured. Similarly, the server may input the ground depth map of the environment to be measured into the backbone network, and obtain the depth map feature of the environment to be measured output by the backbone network.

The embodiment of the present application also does not limit how the global feature, the local feature and the depth map feature are fused. Illustratively, if the backhaul adopted in feature extraction is ResNet, feature fusion is performed in an Element wise-sum manner correspondingly. The Element wise-sum mode is to combine a plurality of features into a complex vector.

Illustratively, if backhaul used in extracting the features is DenseNet, the features are fused in a concat manner accordingly. Wherein, the concat mode is to directly connect a plurality of characteristics.

In addition, before extracting the depth map features of the environment to be measured from the ground depth map of the environment to be measured, the server can also determine the depth of at least one point on the ground of the environment to be measured from the camera according to the coordinates of the at least one point on the ground of the environment to be measured in the image coordinate system. And then, establishing a ground depth map of the environment to be measured according to the depth of at least one point on the ground of the environment to be measured from the camera.

In the application, the features can be extracted quickly and accurately through the backbone network. In addition, as the regression three-dimensional positions of different cameras have ambiguity, the ambiguity can be eliminated by adding the depth map feature of the environment to be detected, and the generalization capability is improved, so that the requirements of the precision and the robustness of image detection are met.

S203, predicting the depth of the obstacle in the image to be detected from the camera according to the fusion characteristics of the image to be detected.

In this step, after the server fuses the global feature of the image to be detected, the local feature of the image to be detected, and the depth map feature of the environment to be detected, and obtains the fusion feature of the image to be detected, the depth of the obstacle in the image to be detected from the camera can be predicted according to the fusion feature of the image to be detected.

In some embodiments, the server may input the fusion features of the image to be detected into the neural network model, and obtain the depth from the obstacle to the camera in the image to be detected output by the neural network model.

The neural network model is a convolutional neural network model or a fully-connected neural network model.

It should be noted that, in the embodiment of the present application, the building process of the neural network model is not limited, and a common convolutional layer or a full connection layer may be used to build the neural network.

Fig. 3 is a schematic diagram illustrating a principle of image detection according to an embodiment of the present application. FIG. 3 shows a monocular 3d region predictive network for object detection (M3D-RPN). In the M3D-RPN, on one hand, after acquiring the image to be detected, the server inputs the image to be detected into the backhaul, and acquires the global feature and the local feature of the image to be detected output by the backhaul. On the other hand, the server also inputs the ground depth map acquired in advance into the backhaul, and acquires the depth map characteristics of the environment to be measured output by the backhaul. And then, the server performs feature fusion on the global features, the local features and the depth map features, and outputs the feature fusion to a prediction module for depth prediction to obtain a depth prediction result. In addition, the prediction module can also obtain the original 3D prediction result in the existing M3D-RPN according to the global feature and the local feature, and combine the original 3D prediction result and the depth prediction result into a new 3D prediction result.

The image detection method provided by the embodiment of the application comprises the steps of firstly obtaining an image to be detected collected by a camera in an environment to be detected, and then fusing the global feature of the image to be detected, the local feature of the image to be detected and the depth map feature of the environment to be detected to obtain the fusion feature of the image to be detected. And finally, predicting the depth of the obstacle in the image to be detected from the camera according to the fusion characteristics of the image to be detected. Compared with the prior art, the depth of the obstacle in the image from the camera is predicted by combining the depth map features of the environment to be detected, and the accuracy and robustness of image detection are improved.

On the basis of the above embodiments, how to obtain the depth map features of the environment to be measured is described below. Fig. 4 is a schematic flowchart of another image detection method according to an embodiment of the present application, and as shown in fig. 4, the image detection method includes:

s301, according to the coordinates of at least one point on the ground of the environment to be measured in the image coordinate system, the depth of the at least one point on the ground of the environment to be measured from the camera is determined.

In some embodiments, the server may obtain coordinates of at least one point on the ground of the environment to be measured in the image coordinate system. And then, according to the coordinates of at least one point on the ground of the environment to be measured in the image coordinate system, the internal reference of the camera and the ground equation, determining the depth of the at least one point on the ground of the environment to be measured from the camera.

Specifically, the server may perform reflection projection on coordinates of at least one point on the ground of the environment to be measured in the image coordinate system according to the internal reference of the camera and the ground equation, so as to calculate the depth from the at least one point on the ground of the environment to be measured to the camera.

The internal reference and the ground equation of the camera are calibrated in advance, each point under the image coordinate is a known coordinate, and each point can be assumed to be a point on the ground in the early stage of constructing the ground depth map.

Illustratively, the point on the ground of the environment to be measured is Corner (x, y), the camera internal reference K is shown in equation (1), and the ground-ground equation is shown in equation (2). Accordingly, the depth of a point on the ground of the environment to be measured from the camera can be calculated by equations (3) to (5). The formulas (1) to (5) are as follows:

ax+by+cz+d＝0......................................(2)

point_cam＝K^-1*Img_p...................................(4)

wherein a, b, c and d are adjustable parameters, and f _ x is the number of pixels in the x-axis direction f on an imaging plane with the focal length f; f _ y is the number of pixels in the y-axis direction f on the imaging plane with the focal length f; c _ x is the offset of the origin of the physical imaging plane in the x-axis direction, and c _ y is the offset of the origin of the physical imaging plane in the y-axis direction.

In the method, the depth of at least one point on the ground from the camera can be quickly and accurately determined through the coordinates of the at least one point on the ground of the environment to be measured in the image coordinate system, the internal reference of the camera and the ground equation, and then the ground depth map of the environment to be measured is established.

S302, establishing a ground depth map of the environment to be measured according to the depth of at least one point on the ground of the environment to be measured from the camera.

In the step, after the server determines the depth of the at least one point on the ground of the environment to be measured from the camera according to the coordinates of the at least one point on the ground of the environment to be measured in the image coordinate system, the ground depth map of the environment to be measured can be established according to the depth of the at least one point on the ground of the environment to be measured from the camera.

In some embodiments, the server may use the depth of at least one point on the ground of the environment to be measured from the camera as a pixel value, thereby creating a ground depth map of the environment to be measured.

S303, extracting the depth map characteristics of the environment to be measured from the ground depth map of the environment to be measured.

For example, the server may input the ground depth map of the environment to be measured into the backbone network, and obtain the depth map feature of the environment to be measured output by the backbone network.

S304, acquiring an image to be detected collected by the camera in the environment to be detected.

S305, fusing the global feature of the image to be detected, the local feature of the image to be detected and the depth map feature of the environment to be detected to obtain the fusion feature of the image to be detected.

S306, according to the fusion characteristics of the image to be detected, the depth of the obstacle in the image to be detected from the camera is predicted.

The technical terms, technical effects, technical features, and alternative embodiments of S304-S306 can be understood with reference to S201-S203 shown in fig. 2, and repeated descriptions thereof will not be repeated here.

On the basis of the above embodiment, how to predict the depth of an obstacle in an image from a camera is described below. Fig. 5 is a schematic flowchart of another image detection method according to an embodiment of the present application, and as shown in fig. 5, the image detection method includes:

s401, acquiring an image to be detected collected by a camera in an environment to be detected.

S402, fusing the global features of the image to be detected, the local features of the image to be detected and the depth map features of the environment to be detected to obtain the fusion features of the image to be detected.

And S403, inputting the fusion characteristics of the image to be detected into the neural network model, and acquiring the depth of the obstacle in the image to be detected from the camera, which is output by the neural network model.

The neural network model is a convolution neural network model or a full-connection neural network model.

In the method and the device, the depth of the barrier from the camera can be predicted by fusing features by utilizing the existing neural network model, so that the additional calculation force can be avoided, and the timeliness of image detection is improved.

The image detection method provided by the embodiment of the application comprises the steps of firstly obtaining an image to be detected collected by a camera in an environment to be detected, and then fusing the global feature of the image to be detected, the local feature of the image to be detected and the depth map feature of the environment to be detected to obtain the fusion feature of the image to be detected. And finally, inputting the fusion characteristics of the image to be detected into the neural network model, and acquiring the depth of the obstacle in the image to be detected, which is output by the neural network model, from the camera. Compared with the prior art, the depth of the obstacle in the image from the camera is predicted by combining the depth map features of the environment to be detected, and the accuracy and robustness of image detection are improved.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program information, the program may be stored in a computer readable storage medium, and the program executes the steps including the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Fig. 6 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present application. The image detection device may be implemented by software, hardware or a combination of the two, and may be, for example, the server or a chip in the server, which is used to execute the image detection method. As shown in fig. 6, the apparatus 500 for image detection includes: an acquisition module 501, a fusion module 502, a prediction module 503, an extraction module 504, a drawing module 505, and a calculation module 506.

An obtaining module 501, configured to obtain an image to be detected acquired by a camera in an environment to be detected;

the fusion module 502 is configured to fuse the global feature of the image to be detected, the local feature of the image to be detected, and the depth map feature of the environment to be detected, so as to obtain a fusion feature of the image to be detected;

and the predicting module 503 is configured to predict the depth of the obstacle in the image to be detected from the camera according to the fusion feature of the image to be detected.

In an alternative embodiment, the apparatus 500 for image detection further includes:

the extracting module 504 is configured to extract a depth map feature of the environment to be measured from the ground depth map of the environment to be measured.

In an optional implementation manner, the extracting module 504 is specifically configured to input the ground depth map of the environment to be measured into the backbone network, and obtain the depth map feature of the environment to be measured output by the backbone network.

and the drawing module 505 is configured to establish a ground depth map of the environment to be measured according to the depth of the at least one point on the ground of the environment to be measured from the camera.

and the calculating module 506 is configured to determine a depth of at least one point on the ground of the environment to be measured from the camera according to coordinates of the at least one point on the ground of the environment to be measured in the image coordinate system.

In an optional implementation manner, the calculating module 506 is specifically configured to obtain coordinates of at least one point on the ground of the environment to be measured in the image coordinate system; and determining the depth of at least one point on the ground of the environment to be measured from the camera according to the coordinates of the at least one point on the ground of the environment to be measured in the image coordinate system, the internal reference of the camera and the ground equation.

In an optional embodiment, the calculating module 506 is specifically configured to perform a reflection projection on coordinates of at least one point on the ground of the environment to be measured in the image coordinate system according to the internal reference of the camera and the ground equation, and calculate a depth of the at least one point on the ground of the environment to be measured from the camera.

In an optional embodiment, the prediction module 503 is specifically configured to input the fusion feature of the image to be detected into the neural network model, and obtain the depth from the obstacle to the camera in the image to be detected, which is output by the neural network model.

In an alternative embodiment, the neural network model is a convolutional neural network model or a fully-connected neural network model.

In an optional implementation manner, the image to be measured is a monocular image.

The image detection apparatus provided in the embodiment of the present application can perform the actions of the image detection method in the foregoing method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 7, it is a block diagram of an electronic device according to the method of image detection in the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 7, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 7 illustrates an example of a processor 601.

The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of image detection provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of image detection provided herein.

The memory 602, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method of image detection in the embodiments of the present application (for example, the obtaining module 501, the fusing module 502, the predicting module 503 extracting module 504, the drawing module 505, and the calculating module 506 shown in fig. 6). The processor 601 executes various functional applications of the server and data processing, i.e., implements the method of image detection in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 602.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device by image detection, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected to the image sensing electronics via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of image detection may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 7 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the image-sensing electronic apparatus, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The embodiment of the application also provides a chip which comprises a processor and an interface. Wherein the interface is used for inputting and outputting data or instructions processed by the processor. The processor is configured to perform the methods provided in the above method embodiments. The chip can be applied to a server.

The present invention also provides a computer-readable storage medium, which may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and in particular, the computer readable storage medium stores program information, and the program information is used for the above method.

Embodiments of the present application also provide a program, which when executed by a processor, is configured to perform the method for image detection provided by the above method embodiments.

Embodiments of the present application further provide a program product, such as a computer-readable storage medium, having stored therein instructions, which, when executed on a computer, cause the computer to perform the method for image detection provided by the above method embodiments.

According to the technical scheme of the embodiment of the application, the image to be detected collected by the camera in the environment to be detected is firstly obtained, and then the global feature of the image to be detected, the local feature of the image to be detected and the depth map feature of the environment to be detected are fused to obtain the fusion feature of the image to be detected. And finally, predicting the depth of the obstacle in the image to be detected from the camera according to the fusion characteristics of the image to be detected. Compared with the prior art, the depth of the obstacle in the image from the camera is predicted by combining the depth map features of the environment to be detected, and the accuracy and robustness of image detection are improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of image detection, comprising:

2. The method of claim 1, further comprising, prior to said obtaining the fused feature of the image under test:

and extracting the depth map characteristics of the environment to be detected from the ground depth map of the environment to be detected.

3. The method of claim 2, wherein the extracting depth map features of the environment under test from the ground depth map of the environment under test comprises:

inputting the ground depth map of the environment to be detected into a backbone network, and acquiring the depth map characteristics of the environment to be detected output by the backbone network.

4. The method of claim 2, further comprising, prior to said extracting depth map features of the environment under test from the ground depth map of the environment under test:

and establishing a ground depth map of the environment to be measured according to the depth of at least one point on the ground of the environment to be measured from the camera.

5. The method of claim 4, further comprising, prior to said establishing a ground depth map of said environment under test:

and determining the depth of at least one point on the ground of the environment to be detected from the camera according to the coordinates of the at least one point on the ground of the environment to be detected in an image coordinate system.

6. The method of claim 5, wherein the determining the depth of the at least one point on the ground of the environment to be measured from the camera according to the coordinates of the at least one point on the ground of the environment to be measured in the image coordinate system comprises:

acquiring coordinates of at least one point on the ground of the environment to be measured in an image coordinate system;

and determining the depth of at least one point on the ground of the environment to be detected from the camera according to the coordinates of the at least one point on the ground of the environment to be detected in the image coordinate system, the internal parameters of the camera and the ground equation.

7. The method of claim 6, wherein the determining the depth of the at least one point on the ground of the environment to be measured from the camera according to the coordinates of the at least one point on the ground of the environment to be measured in the image coordinate system, the internal reference of the camera, and a ground equation comprises:

and performing reflection projection on the coordinates of at least one point on the ground of the environment to be measured under an image coordinate system according to the internal reference of the camera and the ground equation, and calculating the depth of the at least one point on the ground of the environment to be measured from the camera.

8. The method according to any one of claims 1-7, wherein the predicting the depth of the obstacle in the image to be tested from the camera according to the fused feature of the image to be tested comprises:

and inputting the fusion characteristics of the image to be detected into a neural network model, and acquiring the depth of the obstacle in the image to be detected, which is output by the neural network model, from the camera.

9. The method of claim 8, wherein the neural network model is a convolutional neural network model or a fully-connected neural network model.

10. The method according to any one of claims 1-7, wherein the image to be tested is a monocular image.

11. An apparatus for image detection, comprising:

12. The apparatus of claim 11, further comprising:

and the extraction module is used for extracting the depth map characteristics of the environment to be detected from the ground depth map of the environment to be detected.

13. The apparatus according to claim 12, wherein the extracting module is specifically configured to input the ground depth map of the environment to be measured into a backbone network, and obtain the depth map feature of the environment to be measured output by the backbone network.

14. The apparatus of claim 12, further comprising:

and the drawing module is used for establishing a ground depth map of the environment to be detected according to the depth of at least one point on the ground of the environment to be detected from the camera.

15. The apparatus of claim 14, further comprising:

and the computing module is used for determining the depth of at least one point on the ground of the environment to be detected from the camera according to the coordinates of the at least one point on the ground of the environment to be detected in the image coordinate system.

16. The device according to claim 15, wherein the computing module is specifically configured to obtain coordinates of at least one point on the ground of the environment to be measured in an image coordinate system; and determining the depth of at least one point on the ground of the environment to be detected from the camera according to the coordinates of the at least one point on the ground of the environment to be detected in the image coordinate system, the internal parameters of the camera and the ground equation.

17. The apparatus according to claim 16, wherein the computing module is specifically configured to perform a reflection projection on coordinates of at least one point on the ground of the environment to be measured in an image coordinate system according to the internal reference of the camera and the ground equation, and compute a depth of the at least one point on the ground of the environment to be measured from the camera.

18. The apparatus according to any one of claims 11-17, wherein the prediction module is specifically configured to input the fusion features of the image to be measured into a neural network model, and obtain the depth of an obstacle in the image to be measured from the camera, which is output by the neural network model.

19. The apparatus of claim 18, wherein the neural network model is a convolutional neural network model or a fully-connected neural network model.

20. The apparatus according to any one of claims 11-17, wherein the image to be measured is a monocular image.

21. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.

23. A method of image detection, comprising: