CN111161138B

CN111161138B - Target detection method, device, equipment and medium for two-dimensional panoramic image

Info

Publication number: CN111161138B
Application number: CN201911414124.3A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Urban Network Neighbor Information Technology Co Ltd
Current assignee: Beijing Urban Network Neighbor Information Technology Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2021-05-07
Anticipated expiration: 2039-12-31
Also published as: CN111161138A

Abstract

Disclosed are a method, an apparatus, a device, and a medium for object detection of a two-dimensional panoramic image. The method comprises the following steps: determining at least two perspective images of the two-dimensional panoramic image, wherein the at least two perspective images respectively comprise partial image information in the two-dimensional panoramic image; performing target detection on the at least two perspective images to determine candidate target detection results for the at least two perspective images; and determining a panoramic target detection result of the two-dimensional panoramic image based on the candidate target detection results of the at least two perspective images.

Description

Target detection method, device, equipment and medium for two-dimensional panoramic image

Technical Field

The present application relates to the field of image processing, and more particularly, to a method, an apparatus, a device, and a medium for detecting a target in a two-dimensional panoramic image.

Background

The two-dimensional panoramic image may show 360 ° panoramic information in a two-dimensional manner. A panoramic camera may be utilized to acquire a two-dimensional panoramic image. Based on the acquired two-dimensional panoramic image, a planar panoramic image may be directly displayed through an electronic device or a corresponding three-dimensional model may be generated based on the panoramic image, and the generated model may be displayed in a three-dimensional manner.

The panoramic image may provide the user with all scene information within a 360 ° view angle. In order to be able to represent all scene information, there is a certain degree of distortion in the objects and scenes represented in the two-dimensional panoramic image.

Disclosure of Invention

According to an aspect of the present application, there is provided a target detection method for a two-dimensional panoramic image, including: determining at least two perspective images of the two-dimensional panoramic image, wherein the at least two perspective images respectively comprise partial image information in the two-dimensional panoramic image; performing target detection on the at least two perspective images to determine candidate target detection results for the at least two perspective images; and determining a panoramic target detection result of the two-dimensional panoramic image based on the candidate target detection results of the at least two perspective images.

In some embodiments, for each of the at least two fluoroscopic images, the candidate target detection result indicates a probability that a candidate target belonging to a predetermined category exists in the fluoroscopic image and a position of the candidate target in the fluoroscopic image.

In some embodiments, the two-dimensional panoramic image corresponds to a three-dimensional space formed at least partially enclosed by walls and floor of the three-dimensional space.

In some embodiments, the candidate target is an object present on a wall in the three-dimensional space.

In some embodiments, determining at least two perspective images of the two-dimensional panoramic image comprises: converting the two-dimensional panoramic image into an omnidirectional image in a three-dimensional space based on a coordinate system of the three-dimensional space; determining at least two perspective images of the two-dimensional panoramic image based on the omnidirectional image.

In some embodiments, determining at least two perspective images based on the omnidirectional image comprises: and carrying out perspective unfolding on the omnidirectional image based on at least two preset perspective view angles to obtain at least two perspective images corresponding to the at least two perspective view angles.

In some embodiments, the at least two fluoroscopic images are partially overlapping.

In some embodiments, performing object detection on the at least two fluoroscopic images to determine candidate object detection results for the at least two fluoroscopic images comprises: and respectively processing the at least two perspective images by using a deep neural network for target detection to obtain candidate target detection results, wherein the candidate target detection results comprise the probability of the candidate target belonging to a preset category existing in each perspective image and the position of the candidate target in the perspective image.

In some embodiments, the deep neural network is at least one of:

YOLO；RCNN；Fast-RCNN；Faster-RCNN；SSD。

in some embodiments, performing object detection on the at least two fluoroscopic images to determine candidate object detection results for the at least two fluoroscopic images comprises: for each perspective image, determining a one-dimensional feature representation for the perspective image; and performing target detection on the one-dimensional feature representation to obtain a one-dimensional target detection result of the one-dimensional feature representation, wherein the one-dimensional target detection result indicates the probability that pixel points in the perspective image corresponding to each element in the one-dimensional feature representation belong to a candidate target of a predetermined category, and the candidate target detection result of the perspective image is determined based on the one-dimensional target detection result.

In some embodiments, determining a one-dimensional feature representation for the fluoroscopic image comprises: processing the perspective image by using a convolutional neural network comprising at least one convolutional layer and at least one pooling layer to obtain image characteristics of the perspective image; determining image features of the fluoroscopic image as the one-dimensional feature representation; the size of the perspective image is H multiplied by W, the size of the image feature is 1 multiplied by W, wherein H is the number of pixels of the perspective image in the height direction, W is the number of pixels of the perspective image in the width direction, and each element in the image feature of the perspective image corresponds to a column of pixels in the perspective image.

In some embodiments, performing target detection on the one-dimensional feature representation comprises: processing the one-dimensional feature representation by using an LSTM to obtain a detection feature of the one-dimensional feature representation; and processing the detection features by using a full connection layer to obtain a one-dimensional target detection result represented by the one-dimensional features, wherein the one-dimensional target detection result indicates the scores of targets of which each element in the one-dimensional features belongs to a plurality of preset categories.

In some embodiments, determining candidate target detection results for the fluoroscopic image based on the one-dimensional target detection results comprises: for each element in the one-dimensional feature representation, determining a highest score for the element based on the one-dimensional target detection result and determining the element as a target belonging to a predetermined category having the highest score; determining the size and position of the candidate object in the fluoroscopic image based on the position of each element of the object belonging to the predetermined category in the one-dimensional feature representation.

In some embodiments, determining a panoramic object detection result for the two-dimensional panoramic image based on the candidate object detection results for the at least two fluoroscopic images comprises: determining the position of a candidate target in the at least two perspective images in the omnidirectional image based on the mapping relation between the at least two perspective images and the omnidirectional image; and determining the position of the candidate target in the two-dimensional panoramic image based on the mapping relation between the two-dimensional panoramic image and the omnidirectional image.

In some embodiments, the method further comprises: and vertically correcting the two-dimensional panoramic image so that the ground in the corrected two-dimensional panoramic image is parallel to a horizontal line.

According to another aspect of the present application, there is also provided an object detection apparatus for a two-dimensional panoramic image, including: an image to be detected determining unit configured to determine at least two perspective images of the two-dimensional panoramic image, wherein the at least two perspective images respectively include partial image information in the two-dimensional panoramic image; an object detection unit configured to perform object detection on the at least two fluoroscopic images to determine candidate object detection results for the at least two fluoroscopic images; and a result determination unit configured to determine a panoramic target detection result of the two-dimensional panoramic image based on candidate target detection results of the at least two perspective images.

In some embodiments, the image to be detected determining unit is configured to convert the two-dimensional panoramic image into an omnidirectional image in a three-dimensional space based on a coordinate system of the three-dimensional space; determining at least two perspective images of the two-dimensional panoramic image based on the omnidirectional image.

In some embodiments, the target detection unit is configured to process the at least two perspective images respectively by using a deep neural network for target detection to obtain the candidate target detection result, wherein the candidate target detection result includes a probability of a candidate target belonging to a predetermined category existing in each perspective image and a position of the candidate target in the perspective image.

In some embodiments, the deep neural network is at least one of:

YOLO；RCNN；Fast-RCNN；Faster-RCNN；SSD。

in some embodiments, the object detection unit is configured to determine, for each fluoroscopic image, a one-dimensional feature representation for the fluoroscopic image; and performing target detection on the one-dimensional feature representation to obtain a one-dimensional target detection result of the one-dimensional feature representation, wherein the one-dimensional target detection result indicates the probability that pixel points in the perspective image corresponding to each element in the one-dimensional feature representation belong to a candidate target of a predetermined category, and the candidate target detection result of the perspective image is determined based on the one-dimensional target detection result.

In some embodiments, the result determination unit is configured to determine a position of a candidate target in the at least two perspective images in the omnidirectional image based on a mapping relationship between the at least two perspective images and the omnidirectional image; and determining the position of the candidate target in the two-dimensional panoramic image based on the mapping relation between the two-dimensional panoramic image and the omnidirectional image.

In some embodiments, the apparatus further includes a vertical correction unit configured to vertically correct the two-dimensional panoramic image such that a ground in the corrected two-dimensional panoramic image is parallel to a horizontal line.

According to still another aspect of the present application, there is also provided an object detection apparatus including: a processor; and a memory in which computer-readable program instructions are stored, wherein the instructions for the object detection method for a two-dimensional panoramic image as described before are executed when the computer-readable program instructions are executed by the processor.

According to still another aspect of the present application, there is also provided a computer-readable storage medium for storing computer-readable instructions which, when executed by a computer, implement the object detection method for a two-dimensional panoramic image as described above.

By using the method, the device, the equipment and the medium for detecting the target of the two-dimensional panoramic image, the target detection task aiming at the two-dimensional panoramic image can be converted into the target detection task aiming at the perspective image corresponding to the two-dimensional panoramic image. The object displayed in the perspective image has less deformation relative to the two-dimensional panoramic image, and the accuracy of target detection on the two-dimensional panoramic image can be improved by determining the panoramic target in the two-dimensional panoramic image based on the candidate target obtained by target detection on the perspective image.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts. The following drawings are not intended to be drawn to scale in actual dimensions, with emphasis instead being placed upon illustrating the subject matter of the present application.

FIG. 1 illustrates an exemplary scene graph of an image processing system according to the present application;

FIG. 2 shows a schematic flow diagram of a target detection method for a two-dimensional panoramic image according to an embodiment of the present application;

fig. 3A shows a two-dimensional panoramic image before vertical correction;

fig. 3B shows a vertically corrected two-dimensional panoramic image;

FIG. 4 shows a schematic flow chart of a method of determining an image to be detected according to an embodiment of the present application;

FIG. 5 is an example of a coordinate system of a two-dimensional panoramic image and coordinates of pixel points in the two-dimensional panoramic image, according to an embodiment of the present application;

6A-6D illustrate examples of two-dimensional panoramic images to be detected with viewing angles of 0, 90, 180, and 270, respectively, generated according to the method illustrated in FIG. 4;

FIG. 7 shows a schematic flow chart of another method of determining an image to be detected according to the present application;

8A-8F illustrate examples of perspective pictures generated according to the method of FIG. 7;

FIG. 9 shows a schematic flow chart of a method of determining an image to be detected according to an embodiment of the present application;

FIG. 10 shows an example of a plurality of scaled two-dimensional panoramic images, according to an embodiment of the present application;

FIG. 11 shows a schematic flow diagram of a target detection method according to an embodiment of the present application;

FIG. 12 shows a schematic block diagram of an object detection apparatus for a two-dimensional panoramic image according to an embodiment of the present application; and

FIG. 13 illustrates an architecture of a computing device according to an embodiment of the application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings, and obviously, the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without any creative effort also belong to the protection scope of the present application.

As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Although various references are made herein to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative and different aspects of the systems and methods may use different modules.

Flow charts are used herein to illustrate operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

The panoramic target detection result may be determined by performing target detection on the two-dimensional panoramic image using a one-stage or two-stage target detection model based on deep learning (e.g., YOLO, FastRCNN), or the like. The panoramic target detection result indicates panoramic targets which belong to a preset category and exist in the two-dimensional panoramic image and the positions of the panoramic targets in the two-dimensional panoramic image. However, since the object in the panoramic image is displayed according to a polar coordinate system, the object may be bent to different degrees, thereby increasing the difficulty of target detection. Since the output result of the model based on the deep learning is related to the number and accuracy of the data sets used for training to some extent, and the display effect of the panoramic image is different from that of the object in the image generally used for target detection (i.e., the object displayed based on the rectangular coordinate system), the accuracy of the panoramic target detection result obtained by directly performing target detection on the two-dimensional panoramic image by using the existing target detection model is not high.

In order to improve the accuracy of a panoramic target detection result in a panoramic image, the application provides a method for performing target detection on the panoramic image.

Fig. 1 shows an exemplary scene diagram of an image processing system according to the present application. As shown in fig. 1, the image processing system 100 may include a user terminal 110, a network 120, a server 130, and a database 140.

The user terminal 110 may be, for example, a computer 110-1, a mobile phone 110-2 shown in fig. 1. It is to be appreciated that the user terminal may be virtually any other type of electronic device capable of performing data processing, which may include, but is not limited to, a desktop computer, a laptop computer, a tablet computer, a smartphone, a smart home device, a wearable device, and the like.

The user terminal provided by the application can be used for receiving the image to be processed and realizing image processing by using the method provided by the application. In some embodiments, the user terminal may capture an image to be processed through an image capture device (e.g., a camera, a video camera, etc.) provided on the user terminal. For example, the user terminal may also be implemented as a panoramic image capture device including an image capture unit and a processing unit. In other embodiments, the user terminal may also receive images to be processed from a separately located image capture device (e.g., a camera, a video camera, a panoramic camera, etc.). For another example, the user terminal may also receive an image to be processed from the server via the network. The image to be processed may be a single image or a frame in a video.

In some embodiments, the image to be processed may be a two-dimensional panoramic image. The two-dimensional panoramic image may be an image directly captured by a panoramic camera, a two-dimensional image developed based on a three-dimensional omnidirectional image, or a two-dimensional panoramic image generated based on a plurality of perspective images in a space.

In some implementations, the two-dimensional panoramic image corresponds to a three-dimensional space, i.e., the two-dimensional panoramic image is a panoramic image of a three-dimensional space. For example, the two-dimensional panoramic image here is a two-dimensional panoramic image of a single three-dimensional space (e.g., a single living room or a single bedroom). For example, the ratio of the length and width of the two-dimensional panoramic image may be 2: 1.

The three-dimensional space may be a living space, an office space (e.g., an office), a sales space (e.g., a store), an exhibition space (e.g., an exhibition hall), or other suitable space. For example, the living space may be a bedroom, a living room, a kitchen, a hotel, a residential home, or the like. For example, the three-dimensional space is formed to be at least partially closed by wall surfaces and a floor surface of the three-dimensional space.

In some embodiments, the image processing method provided by the present application may be performed by a processing unit of a user terminal. In some implementations, the user terminal may execute the image processing method of the present application by using an application program built in the user terminal. In other implementations, the user terminal may execute the image processing method provided by the present application by calling an application program stored outside the user terminal.

In other embodiments, the user terminal transmits the received image to be processed to the server 130 via the network 120, and the server 130 performs the image processing method. In some implementations, the server 130 can perform the image processing method of the present application using an application built in the server. In other implementations, the server 130 may perform the image processing method of the present application by calling an application program stored outside the server. After performing the image processing method of the present application using the server 130 and obtaining the corresponding image processing result, the server 130 may transmit the image processing result to an output device integrated on the user terminal 130 and/or any output device independent of the user terminal 130. The output device can output the image processing result in any mode of image, character, video, audio and the like.

The network 120 may be a single network, or a combination of at least two different networks. For example, network 120 may include, but is not limited to, one or a combination of local area networks, wide area networks, public networks, private networks, and the like.

The server 130 may be a single server or a group of servers, each server in the group being connected via a wired or wireless network. A group of servers may be centralized, such as a data center, or distributed. The server 130 may be local or remote.

Database 140 may generally refer to a device having a storage function. The database 130 is mainly used to store various data utilized, generated, and outputted from the user terminal 110 and the server 130 in operation. The database 140 may be local or remote. The database 140 may include various memories such as a Random Access Memory (RAM), a Read Only Memory (ROM), and the like. The above mentioned storage devices are only examples and the storage devices that the system can use are not limited to these.

The database 140 may be interconnected or in communication with the server 130 or a portion thereof via the network 120, or directly interconnected or in communication with the server 130, or a combination thereof.

In some embodiments, database 150 may be a stand-alone device. In other embodiments, database 150 may also be integrated in at least one of user terminal 110 and server 140. For example, the database 150 may be provided on the user terminal 110 or may be provided on the server 140. For another example, the database 150 may be distributed, and a part thereof may be provided on the user terminal 110 and another part thereof may be provided on the server 140.

The flow of the image processing method for object detection of a two-dimensional panoramic image provided by the present application will be described in detail below. Such an image processing method is also referred to as an object detection method hereinafter.

Fig. 2 shows a schematic flow diagram of a method of object detection for a two-dimensional panoramic image according to an embodiment of the present application. The object detection method illustrated in fig. 2 may be performed using a user terminal or a server illustrated in fig. 1.

In step S202, at least one image to be detected may be determined based on the two-dimensional panoramic image to be processed.

In some embodiments, at least one perspective-converted two-dimensional panoramic image may be determined by performing perspective conversion on the two-dimensional panoramic image to be processed. The original two-dimensional panoramic image to be processed and the two-dimensional panoramic image after at least one view angle conversion can be determined as an image to be detected.

In other embodiments, a perspective image corresponding to a two-dimensional panoramic image to be processed may be determined as an image to be detected. Since the object in the panoramic image is displayed based on the polar coordinate system, the object in the panoramic image has deformation compared with the actual object. However, the object in the fluoroscopic image is displayed based on the rectangular coordinate system, and thus is less deformed than the actual object.

In still other embodiments, the two-dimensional panoramic image to be processed may be scaled in at least one direction in a three-dimensional space, and the scaled two-dimensional panoramic image may be determined as the image to be detected. In some implementations, the at least one direction in the three-dimensional space can be at least one of a horizontal direction parallel to a floor of the three-dimensional space and a vertical direction perpendicular to the floor. The two-dimensional panoramic image may be scaled in a plurality of directions, and a plurality of scaled two-dimensional panoramic images corresponding to each direction may be determined as an image to be detected.

It is to be understood that the two-dimensional panoramic image to be processed may be processed based on at least one of the above-described methods to obtain an image to be detected including image information of the two-dimensional panoramic image to be processed. Based on the method for determining the images to be detected, the mapping relationship between each image to be detected and the two-dimensional panoramic image to be processed can be determined. Based on such a mapping relationship, after determining candidate targets in an image to be detected by using a target detection method to be described later, the position of the corresponding panoramic target in the two-dimensional panoramic image to be processed can be determined based on the positions of the candidate targets in the image to be detected. For example, in the case where the image to be detected is a perspective image, the position of the candidate target in the perspective image in the omnidirectional image may be determined based on the mapping relationship between the perspective image and the omnidirectional image. Further, the position of the panoramic target existing in the two-dimensional panoramic image to be processed may be determined based on the mapping relationship between the two-dimensional panoramic image to be processed and the omnidirectional image.

In step S204, object detection may be performed on at least one image to be detected to determine at least one candidate object detection result. The candidate target detection result corresponding to each image to be detected can be obtained by performing target detection on at least one image to be detected, respectively.

The candidate target detection result may indicate a position of a candidate target existing in the corresponding image to be detected and a category to which the candidate target belongs.

The candidate target may be an object present on a wall surface in the three-dimensional space. For example, a candidate target may be any kind of object located on a wall surface, such as a door, a window, furniture, a lamp, a decoration, etc. A candidate object may also be any object located in a three-dimensional space, such as a person, an animal, furniture placed on a horizontal floor, such as a table, a chair, etc.

The principle of the present application will be described below by taking the number of categories to which the candidate object belongs as an example of two. It is understood that, according to the practical application, a person skilled in the art may set the category to which the candidate object belongs according to the practical requirement, so as to include a greater or lesser number of categories.

Further, the principles of the present application will be described below by way of example with the categories to which the candidate objects belong being "doors" and "windows". Similarly, the skilled person can set the category to which the candidate target belongs as other objects according to actual needs.

In some embodiments, the at least one image to be detected may be subject to target detection using a depth learning based model. For example, the one-stage or two-stage deep neural network model can be used to perform target detection on at least one image to be detected. The deep neural network model may include at least one of YOLO, RCNN, Fast-RCNN, SSD.

It is understood that one skilled in the art can also use any variation of the deep neural network model described above or other deep neural network models having the same effect as the deep neural network model described above for target detection of at least one image to be detected without departing from the principles of the present application.

The model based on the deep learning is used for carrying out target detection on at least one image to be detected, and whether a candidate target of a preset category exists in each image to be detected and the position of the candidate target in the image to be detected can be determined.

For example, in the case where the candidate objects of the predetermined category used for detection based on the deep learning based model include a door and a window, the candidate object detection result may include probabilities that the candidate objects present in the image to be detected belong to the category "door" and the category "window", respectively, and a position of an object frame indicating the candidate objects present in the image to be detected. Wherein the position of the target frame may include the size of the target frame and the coordinates of a feature point on the target frame (e.g., the center point of the target frame) in the image to be detected. The target frame may be any regular or irregular geometric figure such as a circle, rectangle, parallelogram, etc.

In other embodiments, object detection may be performed on the one-dimensional feature representation of each image to be detected to determine candidate object detection results corresponding to each image to be detected. The one-dimensional feature representation may be represented in the form of a one-dimensional vector. In the case where the size of the image to be detected is H × W, the size of the shift feature representation may be 1 × W. Wherein H is the number of pixels of the image to be detected in the height direction, and W is the number of pixels of the image to be detected in the width direction. Each element in the one-dimensional feature representation may represent feature information of a column of pixels in the corresponding image to be detected.

In this case, the candidate object detection result may include a probability that each element in the one-dimensional feature representation is a candidate object belonging to a predetermined category, i.e., a probability of belonging to a category "gate" or a category "window". Furthermore, the position of an element determined as a candidate object belonging to a predetermined category in the shifted feature representation may be used to represent the position of a candidate object of a predetermined category present in the image to be detected.

As mentioned before, one element in the one-dimensional feature representation may correspond to a column of pixels in the image to be detected. Therefore, when one element in the one-dimensional feature representation is determined as a candidate object belonging to a predetermined category, a list of pixels in the image to be detected corresponding to the element may be determined as pixels of the candidate object belonging to the predetermined category. Taking the size of the image to be detected as 512 × 1024 as an example, the size of the one-dimensional feature representation of the image to be detected may be 1 × 1024, that is, a 1024-dimensional vector. When the 100 th and 500 th elements represented by the one-dimensional feature of the image to be detected are determined as the candidate targets belonging to the predetermined category, the 100 th and 500 th columns of pixels in the image to be detected can be determined as the targets of the predetermined category.

In step S206, a panoramic object detection result of the two-dimensional panoramic image to be processed may be determined based on the candidate object detection result determined in step S204.

In some embodiments, the candidate target detection results of each image to be detected may be combined to obtain a panoramic target detection result. In some implementations, the panoramic target detection result may include candidate targets included in each candidate target detection result. The position of the candidate target indicated in each candidate target detection result in the two-dimensional panoramic image to be processed may be determined based on the relationship between the image to be detected and the two-dimensional panoramic image to be processed. In other implementation manners, the candidate target detection result of each image to be detected may be screened, and the panoramic target detection result in the two-dimensional panoramic image to be processed may be determined based on the screened candidate target detection result. It can be understood that, in the case that the number of the images to be detected is greater than or equal to two, the detection results of the candidate targets of different images to be detected may include the detection result for the same target. In this case, only the candidate object appearing in the candidate object detection results of the at least two images to be detected may be determined as the panoramic object. That is, if a certain candidate object is detected only in one image to be detected, the candidate object may be determined to be a false detection and thus not be regarded as a panoramic object present in the two-dimensional panoramic image to be processed.

In some embodiments, when two candidate objects with different categories are detected in the two different images to be detected at the same position of the two-dimensional panoramic image, the candidate object with higher probability may be determined as the panoramic object existing at the position of the two-dimensional panoramic image based on the high or low probability that the different candidate objects exist at the position output by the object detection method.

In some embodiments, the method 200 may further include a vertical correction step (not shown) before step S202. As previously described, the two-dimensional panoramic image may correspond to a single three-dimensional space. When a two-dimensional panoramic image is captured using a panoramic camera, if the shooting position of the panoramic camera is not perpendicular to the horizontal plane, the object in the resulting two-dimensional panoramic image will be skewed. The vertical correction step may be used to eliminate skew of an object in the two-dimensional panoramic image, so that when the two-dimensional panoramic image is displayed, a ground plane which should be parallel to a horizontal plane in the real world is parallel to a horizontal direction of the two-dimensional panoramic image, and a line which should be perpendicular to the ground plane is perpendicular to the horizontal direction of the two-dimensional panoramic image.

The vertical correction step may include converting the two-dimensional panoramic image rendered based on the polar coordinate system into an omnidirectional image rendered based on the rectangular coordinate system. Then, at least two lines perpendicular to the ground of the two-dimensional panoramic image in the omnidirectional image based on the rectangular coordinate system may be extracted. The angular difference between the photographed vertical direction and the actual vertical direction of the current omnidirectional image may be determined based on the angular difference between the average direction of the at least two lines and the standard vertical direction. The pitch angle of the two-dimensional panoramic image based on the polar coordinates is adjusted through the angle difference, so that the angle difference between the average direction of at least two lines and the standard vertical direction is zero, and the two-dimensional panoramic image after vertical correction can be obtained.

Fig. 3A shows a two-dimensional panoramic image before vertical correction. Fig. 3B shows a vertically corrected two-dimensional panoramic image. It can be seen that the two-dimensional panoramic image in fig. 3A shows a case where the vertical wall surface is not perpendicular to the ground. Each line representing a vertical wall surface in the two-dimensional panoramic image in fig. 3B obtained after the processing in the vertical correction step is perpendicular to the ground.

Then, the panoramic object detection result in the two-dimensional panoramic image may be obtained by performing steps S202, S204, and S206 shown in fig. 2 on the vertically corrected two-dimensional panoramic image.

By using the target detection method for the two-dimensional panoramic image, the image to be detected comprising the image information of the two-dimensional panoramic image can be determined based on the original two-dimensional panoramic image. The image to be detected can show the image information in the original two-dimensional panoramic image in a changed manner. The target detection is carried out on the image information displayed in different modes through the image to be detected, and the detection precision of the target detection aiming at the two-dimensional panoramic image can be improved. In addition, the application provides a method for target detection based on one-dimensional feature representation of the image. By regarding the one-dimensional characteristic representation of the image as a sequence formed by each row of pixel points in the image, the target detection task in the image can be converted into a sequence labeling problem for the sequence formed by each row of pixel points, and the target detection process for the two-dimensional panoramic image can be simplified.

Fig. 4 shows a schematic flow chart of a method of determining an image to be detected according to an embodiment of the application.

As shown in fig. 4, in step S402, a two-dimensional panoramic image may be converted into an omnidirectional image in a three-dimensional space based on a coordinate system of the three-dimensional space.

Fig. 5 is an example of a coordinate system of a two-dimensional panoramic image and coordinates of pixel points in the two-dimensional panoramic image according to an embodiment of the present application.

The width and length of the two-dimensional panorama image 500 may be W and H, respectively. In some embodiments, the unit of the width and length of the two-dimensional panoramic image 500 may be the number of pixels of the two-dimensional panoramic image in the width direction and the length direction. For example, W and H may be 1000 pixels and 500 pixels, respectively.

As shown in fig. 5, the coordinate system of the two-dimensional panoramic image may be composed of two coordinate axes U and V perpendicular to each other and intersecting each other, the intersection of the coordinate axes U and V being a coordinate origin o1 of the coordinate system of the two-dimensional panoramic image, the coordinate origin o1 being disposed at the center of the two-dimensional panoramic image 500; in this case, the coordinates of the pixel point T of the two-dimensional panoramic image 500 may be represented by (U, V), where U and V are the coordinate values of the above-mentioned labeled pixel point T corresponding to the coordinate axes U and V, respectively.

When the vertical distance between the pixel point T and the coordinate axis V in the two-dimensional panoramic image 500 is T1 pixels and the vertical distance between the pixel point T and the coordinate axis U is T2 pixels, U is T1/W and V is T2/H, that is, the coordinate of the pixel point T may be represented by (T1/W and T2/H), so that a method of acquiring the coordinates of the end points of at least two lines in the coordinate system of the two-dimensional panoramic image based on the position information of the end points of the at least two lines in the two-dimensional panoramic image is exemplarily described.

For example, when the vertical distance between the pixel T and the coordinate axis V is 500 pixels and the vertical distance between the pixel T and the coordinate axis U is 250 pixels, the coordinate of the pixel T is (0.5 ) can be obtained. For another example, when the vertical distance between the pixel T and the coordinate axis V is 1000 pixels and the vertical distance between the pixel T and the coordinate axis U is 500 pixels, the coordinate of the pixel T is (1, 1).

It should be noted that the origin o1 of the two-dimensional rectangular coordinate system is not limited to be set at the center of the two-dimensional panoramic image 500, and according to the actual application requirement, the origin o1 of the two-dimensional rectangular coordinate system may also be set at the lower left corner, the lower right corner or the upper left corner of the two-dimensional panoramic image 500; correspondingly, the coordinates of the marked pixel points T in the coordinate system of the two-dimensional panoramic image and the acquired position information of the pixel points will change adaptively.

The inverse process of the equidistant columnar projection may be utilized to project the pixel points in the two-dimensional panoramic image 500 to the three-dimensional projection space to obtain the projection points of the pixel points in the two-dimensional panoramic image 500 in the three-dimensional projection space.

Equidistant cylindrical projection may be used to project various points on a sphere onto a cylinder (e.g., cylinder tangent to sphere at the equator of sphere) and then expand the cylinder into a plane along one generatrix of the cylinder. By equidistant cylindrical projection, the meridian of the sphere can be mapped to vertical lines (i.e., lines extending in the width direction of the plane) of constant pitch on the plane, and the latitude of the sphere can be mapped to horizontal lines (i.e., lines extending in the length direction of the plane) of constant pitch on the plane.

For example, in the equidistant cylindrical projection, the coordinates (u, v) of the longitude λ and the latitude α on the spherical surface and the point on the plane satisfy the following expression (1).

Here, α 0 is a reference weft, λ 0 is a center meridian, u is a coordinate value of a pixel point on the plane in the length direction of the plane, and v is a coordinate value of a pixel point on the plane in the width direction of the plane.

The longitude and the latitude of the projection point of the pixel point in the three-dimensional projection space on the spherical surface can be obtained by using the coordinates (u, v) of the pixel point in the two-dimensional panoramic image 500 in the coordinate system of the two-dimensional panoramic image and based on the expression (1); then, the spherical coordinates of the projected point in the three-dimensional projection space may be obtained based on the longitude and the latitude of the projected point on the spherical surface, and the three-dimensional rectangular coordinates of the projected point in the three-dimensional projection space may be obtained based on the conversion relationship between the three-dimensional spherical coordinates and the three-dimensional rectangular coordinates.

In the three-dimensional rectangular coordinate system, the transformation between the three-dimensional rectangular coordinate system and the uv coordinates of the two-dimensional panoramic image can be expressed by expression (2):

where v ═ pi × (v-0.5), u ═ 2 pi × (u-0.5), u and v are coordinates of pixel points in the two-dimensional panoramic image, and r denotes a radius of the omnidirectional image. The value of r can be set according to the actual situation.

Referring back to fig. 4, the two-dimensional panoramic image may be converted into an omnidirectional image in three-dimensional space using the process described in connection with fig. 5.

It is to be understood that although one possible process of converting a two-dimensional panoramic image into an omnidirectional image in a three-dimensional space is described in the present application by taking only the projection manner shown in fig. 5 as an example, the implementation of step S402 is not limited thereto. For example, the two-dimensional panoramic image may also be converted into an omnidirectional image in a three-dimensional space based on optical parameters of a camera used to acquire the two-dimensional panoramic image.

In step S404, the omni-directional image determined in step S402 may be expanded based on at least two preset viewing directions to obtain at least two images to be detected. The image to be detected determined in step S404 is a two-dimensional panoramic image corresponding to at least two preset view angle directions, and therefore, the image to be detected determined in step S404 is also referred to as a two-dimensional panoramic image to be detected in the present application.

In some embodiments, the preset at least two viewing angle directions may include at least two of 0 °, 90 °, 180 °, 270 °. Wherein the two-dimensional panoramic image corresponding to the viewing angle direction of 0 ° and the two-dimensional panoramic image used for generating the omnidirectional image in step S402 are the same. And aiming at other visual angle directions, rotating the horizontal rotation angle of the omnidirectional image in a three-dimensional space according to a preset visual angle direction, and then unfolding the rotated omnidirectional image into a two-dimensional panoramic image, so that the two-dimensional panoramic image to be detected corresponding to the preset visual angle direction can be obtained.

It will be appreciated that a greater or lesser number of viewing directions may be provided by those skilled in the art depending on the actual situation. Further, the person skilled in the art may also set the viewing angle direction to any value in the interval of [0,360) according to actual needs.

Fig. 6A to 6D show examples of two-dimensional panoramic images to be detected whose viewing angles are 0 °, 90 °, 180 °, and 270 °, respectively, generated according to the method shown in fig. 4. It can be seen that fig. 6A, 6B, 6C, and 6D respectively show information in the same three-dimensional space at different viewing angles. It can be seen that in fig. 6A, there is an image of a portion of one window at the left and right edges of fig. 6A due to the limitation of two-dimensional panorama image presentation. It is understood that if the object detection is performed using only the two-dimensional panoramic image in fig. 6D, an incomplete object divided into two parts may not be detected. However, according to the technical solution provided by the present application, since the to-be-detected images include the to-be-detected two-dimensional panoramic images corresponding to the preset at least two view angle directions, even if an incomplete target exists at an edge of one of the to-be-detected two-dimensional panoramic images, in the to-be-detected two-dimensional panoramic images corresponding to other view angles, the target is not displayed in a divided manner because the target is already deviated from the image edge. As can be seen with reference to fig. 6B-6D, the window in fig. 6A, which is divided into two parts by the image edges, is represented as a complete window in fig. 6B-6D. Therefore, if the window cannot be detected in fig. 6A, the candidate target detection results obtained for the two-dimensional panoramic image to be detected in fig. 6B, 6C, and 6D can be used for compensation, so that the detection result of the window cannot be omitted in the final panoramic target detection result.

By using the method for determining the image to be detected, at least two-dimensional panoramic images to be detected corresponding to different panoramic view angles can be determined. By using the target detection method provided in the following text of the application, candidate target detection results of at least two to-be-detected two-dimensional panoramic images corresponding to different panoramic viewing angles can be determined. And a complete panoramic target detection result can be obtained by combining the candidate target detection results of the two-dimensional panoramic image to be detected at different panoramic view angles.

Fig. 7 shows a schematic flow chart of a further method for determining an image to be detected according to the present application.

As shown in fig. 7, in step S702, a two-dimensional panoramic image may be converted into an omnidirectional image in a three-dimensional space based on a coordinate system of the three-dimensional space. Step S702 may be performed in the same manner as step S402 described in conjunction with fig. 4, and will not be described again.

In step S704, at least two perspective images of the two-dimensional panoramic image may be determined based on the omnidirectional image determined in step S702, and the at least two perspective images may be determined as images to be detected. Wherein the at least two fluoroscopic images respectively include partial image information in the two-dimensional panoramic image.

The omnidirectional image may be perspectively unfolded based on at least two predetermined perspective views to obtain perspective images respectively corresponding to the predetermined perspective views. Wherein the perspective viewing angle may include a line of sight direction and a field of view range. The two-dimensional panoramic image may be reconstructed using a spherical reprojection algorithm in different gaze and view range directions with the sphere center of the omnidirectional image as the viewpoint to generate perspective views corresponding to the different gaze directions.

Fig. 8A-8F illustrate examples of perspective pictures generated according to the method in fig. 7. In the perspective pictures shown in fig. 8A to 8F, the range of the angle of view in the pitch angle direction of the omnidirectional image may be determined to be 60 °, the range of the angle of view in the rotation angle direction of the omnidirectional image may be determined to be 60 °, and the six perspective pictures shown in fig. 8A to 8F may be obtained by rotating the omnidirectional image by 360 °. It can be seen that, since the pixel points of the object in the perspective image are projected on the perspective image along the line of sight taking the sphere center of the omnidirectional image as the viewpoint center, the object in the perspective image is displayed in a manner of being observed by human eyes, and there is no deformation of the object in the two-dimensional panoramic image.

It is understood that the person skilled in the art can set the parameters of the perspective view angle of the perspective image according to actual needs. For example, a person skilled in the art may set the angle of view range of the perspective image in the pitch angle direction of the omnidirectional image in the range of 0 to 180 degrees, and may set the angle of view range in the rotation angle direction of the omnidirectional image in the range of 0 to 360 degrees.

In some embodiments, the perspective images corresponding to different perspective views may partially overlap or may be independent of each other. That is, different fluoroscopic images may have partially the same image information.

By utilizing the method for determining the image to be detected, at least two perspective images to be detected corresponding to different perspective view angles can be determined. By using the target detection method provided in the following of the application, the candidate target detection results of at least two images to be detected corresponding to different perspective view angles can be determined. And a complete panoramic target detection result can be obtained by combining the candidate target detection results of the images to be detected at different visual angles.

FIG. 9 shows a schematic flow chart of a method of determining an image to be detected according to an embodiment of the present application.

In step 902, a two-dimensional panoramic image may be converted to an omnidirectional image in three-dimensional space based on a coordinate system of the three-dimensional space. Step S902 may be performed in the same manner as step S402 described in conjunction with fig. 4, and will not be described again.

In step S904, the omni-directional image may be scaled in at least one direction in the three-dimensional space to determine a scaled omni-directional image.

In some embodiments, the at least one direction in the three-dimensional space may include a horizontal direction parallel to the ground and a vertical direction perpendicular to the ground.

As described above, when converting a two-dimensional panoramic image into an omnidirectional image in a three-dimensional rectangular coordinate system, the expression (2) can be used to determine the mapping relationship between the pixel points in the two-dimensional panoramic image and the pixel points in the omnidirectional image.

Then, in step S904, the mapping relationship between the x direction and the z direction may be stretched. The scaled transformation of an omnidirectional image in three-dimensional space can be represented by expression (3):

where v ═ pi × (v-0.5), u ═ 2 pi × (u-0.5), u and v are coordinates of pixel points in the two-dimensional panoramic image, and r denotes a radius of the omnidirectional image. The value of r can be set according to the actual situation. k is a radical of_x、k_y、k_zAre the scaling factors for the x, y, z directions, respectively. By setting k_x、k_y、k_zThe zoom effect of the omnidirectional image in three-dimensional space can be determined.

In step S906, the scaled omni-directional image may be expanded to obtain at least one image to be detected. The scaled omnidirectional image can be expanded by means of, for example, equidistant cylindrical projection to obtain an image to be detected. The image to be detected generated in step S906 may be a scaled two-dimensional panoramic image.

As can be appreciated. By the scaling processing in step S904, the omnidirectional image is deformed in at least one direction of x, y, and z. When the deformed omnidirectional image is expanded into an image to be detected with the same size as the original two-dimensional panoramic image, an object in the scaled two-dimensional panoramic image is also deformed in one direction.

FIG. 10 illustrates an example of a plurality of scaled two-dimensional panoramic images according to an embodiment of the present application. Wherein with k_yWhen k is 1 as an example_x＝1、k_zWhen 1, the omnidirectional image is not zoomed, so the obtained image to be detected is the same as the original two-dimensional panoramic image. When k is_x＝2、k_zWhen 1, it can be seen that the object displayed in the scaled two-dimensional panoramic image is deformed with respect to the original two-dimensional panoramic image. Similarly, when k is_x、k_zTaking other values, e.g. k_x＝1、k_z2 or k_x＝2、k_zWhen 2, the object in the scaled two-dimensional panoramic image will be deformed in different ways.

Referring back to fig. 9, part of the information in the image to be detected generated in step S906 may be enlarged or reduced. And at least two images to be detected which are zoomed in different modes can be obtained by setting different zooming parameters.

In some embodiments, the principle described in fig. 4 and 7 may be combined to determine the image to be detected. For example, step S906 may include expanding the scaled omnidirectional image based on a preset at least two panoramic angles to determine scaled omnidirectional images respectively corresponding to the at least two panoramic angles as the to-be-detected image. For another example, step S906 may further include determining at least two perspective views of the scaled two-dimensional panoramic image as the image to be detected based on the scaled omnidirectional image.

By using the target detection method provided in the following of the application, the candidate target detection results of at least two images to be detected corresponding to different zooming effects can be determined. And a complete panoramic target detection result can be obtained by combining the candidate target detection results of the images to be detected with different zooming effects.

Fig. 11 shows a schematic flow diagram of a target detection method according to an embodiment of the present application. The target detection method provided by the present application is described in fig. 11 by taking the target detection performed on the image to be detected as an example. However, it is understood that the object detection method illustrated in fig. 11 may be applied to any image in which object detection is to be performed. For example, the method shown in fig. 11 may be used to process a two-dimensional panoramic image to be processed and directly obtain a panoramic object detection result for the two-dimensional panoramic image to be processed. Further, it is understood that the image to be detected referred to in fig. 11 may be an image to be detected determined using the methods described in fig. 4, 7, and 9.

As shown in fig. 11, in step S1102, a one-dimensional feature representation for the image to be detected may be determined. In some embodiments, the image to be detected may be processed using a convolutional neural network to determine a one-dimensional feature representation of the image features used to represent the image to be detected. For example, the image to be detected may be processed by using a convolutional neural network including at least one convolutional layer and at least one pooling layer to obtain an image feature of the image to be detected, and the image feature of the image to be detected may be determined as a one-dimensional feature representation. Wherein the one-dimensional feature representation can be represented as a one-dimensional vector, and the one-dimensional vector comprises a plurality of vector elements for representing vector information.

In one implementation, the image to be detected may be processed using a densenert 169 network to reduce the dimension of the image to be detected into a one-dimensional feature representation. It is to be understood that any other convolutional neural network model may be utilized to process the image to be detected to obtain the one-dimensional feature representation of the image to be detected. For example, when the image to be detected is a two-dimensional panoramic image to be detected with a size of 512 x 1024, the convolutional neural network may output a one-dimensional representation of the features with a size of 1 x 1024. Each element in the one-dimensional feature representation represents an overall feature of image information of a column of pixels in the two-dimensional panoramic image to be detected.

In some embodiments, in step 1102, the image to be detected may be scaled in at least one direction in the three-dimensional space to determine a scaled image to be detected. Then, the scaled image to be detected is processed by using a convolutional neural network comprising at least one convolutional layer and at least one pooling layer to obtain the image characteristics of the scaled image to be detected, and the image characteristics of the scaled image to be detected are determined to be the one-dimensional characteristic representation.

In step S1104, target detection may be performed on the one-dimensional feature representation to obtain a one-dimensional target detection result of the one-dimensional feature representation. Wherein the one-dimensional object detection result indicates a probability that an element in the one-dimensional feature representation belongs to an object of a predetermined class.

In some embodiments, target detection may be performed on the one-dimensional feature representation using an LSTM series model. For example, target detection can be performed on the one-dimensional feature representation using Bi-LSTM.

In some implementations, the one-dimensional feature representation can be processed using LSTM to obtain detection features of the one-dimensional feature representation, and then the detection features can be processed using the full connectivity layer to obtain one-dimensional target detection results of the one-dimensional feature representation.

With the above method, the target detection performed on the image to be detected is different from the conventional one-stage or two-stage target detection method, but is converted into a sequence labeling task. Using the LSTM model, it is possible to note whether each element in the one-dimensional feature representation belongs to a candidate object of a predetermined class. Taking the example of a predetermined category comprising category "door" and category "window", the LSTM model may output a probability that each element in the one-dimensional feature representation belongs to category "door" and category "window", respectively. Whether each element belongs to a predetermined category can be determined by comparing the probability that the element belongs to the category "door" and the category "window", respectively, with a preset probability threshold. That is, an element in the one-dimensional representation of features may be determined to belong to the category "gate" when the probability that the element belongs to the category "gate" is greater than a probability threshold. An element in a one-dimensional feature representation may be determined to belong to a category "window" when the probability that the element belongs to the category "window" is greater than a probability threshold. In one implementation, the preset probability threshold may be set to 0.5.

In still other embodiments, the one-dimensional object detection result indicates a score of an object in which each element in the one-dimensional feature representation belongs to a predetermined plurality of categories. Taking the example that the predetermined multiple categories of objects include "door", "window", "background", the one-dimensional object detection result may include scores of objects each of which belongs to the three predetermined categories, respectively. For example, for an element in a one-dimensional feature representation, a result [ 0.80.31.1 ] may be output, where 0.8 represents a score for the element belonging to "gate", 0.3 represents a score for the element belonging to "window", and 1.1 represents a score for the element belonging to "background". A highest score for the element may be determined and the element determined to be the target belonging to the predetermined category having the highest score. In the above example, the element belongs to the "background" with the highest score, so it can be determined that the element belongs to the "background". Since each element in the one-dimensional feature representation represents an image feature of a column of pixels in the image to be detected, the fact that the element belongs to the "background" means that the column of pixels in the image to be detected corresponding to the element belongs to the "background".

In step S1106, an object present in the image to be detected and a position of the object in the image to be detected may be determined based on the one-dimensional object detection result.

As mentioned above, each element in the one-dimensional feature representation of the image to be detected corresponds to the overall feature of a column of pixel points in the image to be detected. Therefore, when an element in the one-dimensional feature representation is determined to belong to the category "door" or the category "window" in step S1104, it may be determined that a list of pixel points in the image to be detected corresponding to the element belongs to the category "door" or the category "window". And the position of the pixel in the one-dimensional characteristic representation indicates the position of a corresponding column of pixel points in the image to be detected. When the 100 th and 500 th elements represented by the one-dimensional feature of the image to be detected are determined as the candidate targets belonging to the predetermined category, the 100 th and 500 th columns of pixels in the image to be detected can be determined as the targets of the predetermined category.

It is understood that, with the object detection method provided in fig. 11, it is possible to determine whether or not an object of a predetermined category exists in the three-dimensional space in the horizontal direction, regardless of the position of the object in the vertical direction in the three-dimensional space.

By using the target detection method provided by the application, the dimension of the two-dimensional image to be detected can be reduced to the one-dimensional characteristic representation, and the one-dimensional target detection result represented by the one-dimensional characteristic representation can be obtained by carrying out sequence marking on the one-dimensional characteristic representation. Compared with the traditional one-stage or two-stage target detection method, the target detection method provided by the application has smaller data size of the target to be processed, so that the target detection method has higher calculation speed and consumes less calculation resources.

Fig. 12 shows a schematic block diagram of an object detection apparatus for a two-dimensional panoramic image according to an embodiment of the present application. The object detection apparatus shown in fig. 12 may be implemented by a user terminal or a server shown in fig. 1.

As shown in fig. 12, the object detection apparatus 1200 may include an image to be detected determination unit 1210, an object detection unit 1220, and a result determination unit 1230.

The image to be detected determination unit 1210 may be configured to determine at least one image to be detected based on the two-dimensional panoramic image to be processed.

In some embodiments, at least one perspective-converted two-dimensional panoramic image may be determined by performing perspective conversion on the two-dimensional panoramic image to be processed. The original two-dimensional panoramic image to be processed and the two-dimensional panoramic image after at least one view angle conversion can be determined as an image to be detected. For example, the image to be detected determination unit 1210 may be configured to perform the method of determining an image to be detected described above in connection with fig. 4.

In other embodiments, a perspective image corresponding to a two-dimensional panoramic image to be processed may be determined as an image to be detected. Since the object in the panoramic image is displayed based on the polar coordinate system, the object in the panoramic image has deformation compared with the actual object. However, the object in the fluoroscopic image is displayed based on the rectangular coordinate system, and thus is less deformed than the actual object. For example, the image to be detected determination unit 1210 may be configured to perform the method of determining an image to be detected described above in connection with fig. 7.

In still other embodiments, the two-dimensional panoramic image to be processed may be scaled in at least one direction in a three-dimensional space, and the scaled two-dimensional panoramic image may be determined as the image to be detected. In some implementations, the at least one direction in the three-dimensional space can be at least one of a horizontal direction parallel to a floor of the three-dimensional space and a vertical direction perpendicular to the floor. The two-dimensional panoramic image may be scaled in a plurality of directions, and a plurality of scaled two-dimensional panoramic images corresponding to each direction may be determined as an image to be detected. For example, the to-be-detected image determining unit 1210 may be configured to perform the method of determining an image to be detected described above in connection with fig. 11.

It is to be understood that the two-dimensional panoramic image to be processed may be processed based on at least one of the above-described methods to obtain an image to be detected including image information of the two-dimensional panoramic image to be processed. Based on the method for determining the images to be detected, the mapping relationship between each image to be detected and the two-dimensional panoramic image to be processed can be determined. Based on such a mapping relationship, after determining candidate targets in an image to be detected by using a target detection method to be described later, the position of the corresponding panoramic target in the two-dimensional panoramic image to be processed can be determined based on the positions of the candidate targets in the image to be detected.

The object detection unit 1220 may be configured to perform object detection on at least one image to be detected to determine at least one candidate object detection result. The candidate target detection result corresponding to each image to be detected can be obtained by performing target detection on at least one image to be detected, respectively.

In other embodiments, the object detection unit 1220 may be configured with the object detection method previously described above with reference to fig. 11.

That is, the target detection unit 1220 may be configured to perform target detection on the one-dimensional feature representation of each image to be detected to determine candidate target detection results corresponding to each image to be detected. The one-dimensional feature representation may be represented in the form of a one-dimensional vector. In the case where the size of the image to be detected is H × W, the size of the shift feature representation may be 1 × W. Wherein H is the number of pixels of the image to be detected in the height direction, and W is the number of pixels of the image to be detected in the width direction. Each element in the one-dimensional feature representation may represent feature information of a column of pixels in the corresponding image to be detected.

The object detection unit 1220 may include a feature representation determination unit and a one-dimensional object detection unit 1222. The feature representation determination unit may be configured to determine a one-dimensional feature representation for the image to be detected. In some embodiments, the image to be detected may be processed using a convolutional neural network to determine a one-dimensional feature representation of the image features used to represent the image to be detected. For example, the image to be detected may be processed by using a convolutional neural network including at least one convolutional layer and at least one pooling layer to obtain an image feature of the image to be detected, and the image feature of the image to be detected may be determined as a one-dimensional feature representation. Wherein the one-dimensional feature representation can be represented as a one-dimensional vector, and the one-dimensional vector comprises a plurality of vector elements for representing vector information.

In one implementation, the feature representation determining unit may process the image to be detected using a densenert 169 network to reduce the dimension of the image to be detected into a one-dimensional feature representation. It is to be understood that any other convolutional neural network model may be utilized to process the image to be detected to obtain the one-dimensional feature representation of the image to be detected. For example, when the image to be detected is a two-dimensional panoramic image to be detected with a size of 512 x 1024, the convolutional neural network may output a one-dimensional representation of the features with a size of 1 x 1024. Each element in the one-dimensional feature representation represents an overall feature of image information of a column of pixels in the two-dimensional panoramic image to be detected.

In some embodiments, the feature representation determination unit may be configured to scale the image to be detected in at least one direction in the three-dimensional space to determine a scaled image to be detected. Then, the feature representation determining unit may be configured to process the scaled image to be detected using a convolutional neural network comprising at least one convolutional layer and at least one pooling layer to obtain image features of the scaled image to be detected, and determine the image features of the scaled image to be detected as the one-dimensional feature representation.

The one-dimensional target detection unit may be configured to perform target detection on the one-dimensional feature representation to obtain a one-dimensional target detection result of the one-dimensional feature representation. Wherein the one-dimensional object detection result indicates a probability that an element in the one-dimensional feature representation belongs to an object of a predetermined class.

In some embodiments, the one-dimensional object detection unit may perform object detection on the one-dimensional feature representation using an LSTM series model. For example, target detection can be performed on the one-dimensional feature representation using Bi-LSTM.

In some implementations, the one-dimensional target detection unit may process the one-dimensional feature representation using LSTM to obtain detection features of the one-dimensional feature representation, and then may process the detection features using the full connectivity layer to obtain one-dimensional target detection results of the one-dimensional feature representation.

The result determining unit 1230 may be configured to determine a panoramic object detection result of the two-dimensional panoramic image to be processed based on the candidate object detection result determined by the object detecting unit 1220.

In some embodiments, the result determination unit 1230 may be configured to determine the object present in the image to be detected and the position of the object in the image to be detected based on the one-dimensional object detection result. As mentioned above, each element in the one-dimensional feature representation of the image to be detected corresponds to the overall feature of a column of pixel points in the image to be detected. Therefore, when an element in the one-dimensional feature representation is determined to belong to the category "door" or the category "window" in step S1104, it may be determined that a list of pixel points in the image to be detected corresponding to the element belongs to the category "door" or the category "window". And the position of the pixel in the one-dimensional characteristic representation indicates the position of a corresponding column of pixel points in the image to be detected. When the 100 th and 500 th elements represented by the one-dimensional feature of the image to be detected are determined as the candidate targets belonging to the predetermined category, the 100 th and 500 th columns of pixels in the image to be detected can be determined as the targets of the predetermined category.

In some embodiments, the apparatus 1200 may further include a vertical correction unit (not shown in the figures). As previously described, the two-dimensional panoramic image may correspond to a single three-dimensional space. When a two-dimensional panoramic image is captured using a panoramic camera, if the shooting position of the panoramic camera is not perpendicular to the horizontal plane, the object in the resulting two-dimensional panoramic image will be skewed. The vertical correction step may be used to eliminate skew of an object in the two-dimensional panoramic image, so that when the two-dimensional panoramic image is displayed, a ground plane which should be parallel to a horizontal plane in the real world is parallel to a horizontal direction of the two-dimensional panoramic image, and a line which should be perpendicular to the ground plane is perpendicular to the horizontal direction of the two-dimensional panoramic image.

The vertical correction unit may be configured to convert the two-dimensional panoramic image exhibited based on the polar coordinate system into an omnidirectional image exhibited based on the rectangular coordinate system. Then, at least two lines perpendicular to the ground of the two-dimensional panoramic image in the omnidirectional image based on the rectangular coordinate system may be extracted. The angular difference between the photographed vertical direction and the actual vertical direction of the current omnidirectional image may be determined based on the angular difference between the average direction of the at least two lines and the standard vertical direction. The pitch angle of the two-dimensional panoramic image based on the polar coordinates is adjusted through the angle difference, so that the angle difference between the average direction of at least two lines and the standard vertical direction is zero, and the two-dimensional panoramic image after vertical correction can be obtained.

Furthermore, the method or apparatus according to the embodiments of the present application may also be implemented by means of the architecture of a computing device shown in fig. 13. Fig. 13 illustrates an architecture of the computing device. As shown in fig. 13, computing device 1300 may include a bus 1310, one or at least two CPUs 1320, a Read Only Memory (ROM)1330, a Random Access Memory (RAM)1340, a communication port 1350 to connect to a network, input/output components 1360, a hard disk 1370, and so forth. A storage device, such as ROM 1330 or hard disk 1370, in computing device 1300 may store various data or files used in the processing and/or communication of the object detection method provided herein and program instructions executed by the CPU. The computing device 800 may also include a user interface 1380. Of course, the architecture shown in FIG. 13 is merely exemplary, and one or at least two components of the computing device shown in FIG. 13 may be omitted when implementing different devices, as desired.

According to another aspect of the present application, there is also provided a non-transitory computer readable storage medium having stored thereon computer readable instructions which, when executed by a computer, can perform the method as described above.

Portions of the technology may be considered "articles" or "articles of manufacture" in the form of executable code and/or associated data, which may be embodied or carried out by a computer readable medium. Tangible, non-transitory storage media may include memory or storage for use by any computer, processor, or similar device or associated module. For example, various semiconductor memories, tape drives, disk drives, or any similar device capable of providing a storage function for software.

All or a portion of the software may sometimes communicate over a network, such as the internet or other communication network. Such communication may load software from one computer device or processor to another. For example: from a server or host computer of the video object detection device to a hardware platform of a computer environment, or other computer environment implementing a system, or similar functionality related to providing information needed for object detection. Thus, another medium capable of transferring software elements may also be used as a physical connection between local devices, such as optical, electrical, electromagnetic waves, etc., propagating through cables, optical cables, air, etc. The physical medium used for the carrier wave, such as an electric, wireless or optical cable or the like, may also be considered as the medium carrying the software. As used herein, unless limited to a tangible "storage" medium, other terms referring to a computer or machine "readable medium" refer to media that participate in the execution of any instructions by a processor.

This application uses specific words to describe embodiments of the application. Reference to "a first/second embodiment," "an embodiment," and/or "some embodiments" means a feature, structure, or characteristic described in connection with at least one embodiment of the application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereon. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The foregoing is illustrative of the present invention and is not to be construed as limiting thereof. Although a few exemplary embodiments of this invention have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this invention. Accordingly, all such modifications are intended to be included within the scope of this invention as defined in the claims. It is to be understood that the foregoing is illustrative of the present invention and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The invention is defined by the claims and their equivalents.

Claims

1. A target detection method for a two-dimensional panoramic image, comprising:

determining at least two perspective images of the two-dimensional panoramic image, wherein the at least two perspective images respectively comprise partial image information in the two-dimensional panoramic image;

performing target detection on the at least two perspective images to determine at least two candidate target detection results corresponding to the at least two perspective images; and

and determining a panoramic target detection result of the two-dimensional panoramic image based on at least two candidate target detection results corresponding to the at least two perspective images.

2. The object detection method according to claim 1, wherein, for each of the at least two fluoroscopic images, the candidate object detection result indicates a probability that a candidate object belonging to a predetermined category exists in the fluoroscopic image and a position of the candidate object in the fluoroscopic image.

3. The object detection method of claim 1, wherein the two-dimensional panoramic image corresponds to a three-dimensional space formed to be at least partially enclosed by wall and floor surfaces of the three-dimensional space.

4. The object detection method according to claim 3, wherein the candidate object is an object existing on a wall surface in the three-dimensional space.

5. The object detection method of claim 3, wherein determining at least two perspective images of the two-dimensional panoramic image comprises:

converting the two-dimensional panoramic image into an omnidirectional image in a three-dimensional space based on a coordinate system of the three-dimensional space;

determining at least two perspective images of the two-dimensional panoramic image based on the omnidirectional image.

6. The object detection method of claim 5, wherein determining at least two perspective images based on the omnidirectional image comprises:

and carrying out perspective unfolding on the omnidirectional image based on at least two preset perspective view angles to obtain at least two perspective images corresponding to the at least two perspective view angles.

7. The object detection method of claim 1, wherein the at least two fluoroscopic images are partially overlapping.

8. The object detection method of claim 1, wherein performing object detection on the at least two fluoroscopic images to determine at least two candidate object detection results corresponding to the at least two fluoroscopic images comprises:

and respectively processing the at least two perspective images by using a deep neural network for target detection to obtain at least two candidate target detection results, wherein the candidate target detection results comprise the probability of the candidate target belonging to a preset category existing in each perspective image and the position of the candidate target in the perspective image.

9. The target detection method of claim 8, wherein the deep neural network is at least one of:

YOLO；

RCNN；

Fast-RCNN；

Faster-RCNN；

SSD。

10. the object detection method of claim 1, wherein performing object detection on the at least two fluoroscopic images to determine at least two candidate object detection results corresponding to the at least two fluoroscopic images comprises:

for each of the perspective images it is possible to,

determining a one-dimensional feature representation for the fluoroscopic image;

performing target detection on the one-dimensional feature representation to obtain a one-dimensional target detection result of the one-dimensional feature representation, wherein the one-dimensional target detection result indicates the probability that pixel points in the perspective image corresponding to each element in the one-dimensional feature representation belong to a candidate target of a predetermined category,

and determining a candidate target detection result corresponding to the perspective image based on the one-dimensional target detection result.

11. The object detection method of claim 10, wherein determining a one-dimensional feature representation for the fluoroscopic image comprises:

processing the perspective image by using a convolutional neural network comprising at least one convolutional layer and at least one pooling layer to obtain image characteristics of the perspective image;

determining image features of the fluoroscopic image as the one-dimensional feature representation;

the size of the perspective image is H multiplied by W, the size of the image feature is 1 multiplied by W, wherein H is the number of pixels of the perspective image in the height direction, W is the number of pixels of the perspective image in the width direction, and each element in the image feature of the perspective image corresponds to a column of pixels in the perspective image.

12. The object detection method of claim 11, wherein object detecting the one-dimensional feature representation comprises:

processing the one-dimensional feature representation by using an LSTM to obtain a detection feature of the one-dimensional feature representation;

and processing the detection features by using a full connection layer to obtain a one-dimensional target detection result represented by the one-dimensional features, wherein the one-dimensional target detection result indicates the scores of targets of which each element in the one-dimensional features belongs to a plurality of preset categories.

13. The object detection method of claim 12, wherein determining the candidate object detection result corresponding to the fluoroscopic image based on the one-dimensional object detection result comprises:

for each element in the one-dimensional feature representation, determining a highest score for the element based on the one-dimensional target detection result and determining the element as a target belonging to a predetermined category having the highest score;

determining the size and position of the candidate object in the fluoroscopic image based on the position of each element of the object belonging to the predetermined category in the one-dimensional feature representation.

14. The object detection method of claim 5 or 6, wherein determining the panoramic object detection result of the two-dimensional panoramic image based on at least two candidate object detection results corresponding to the at least two fluoroscopic images comprises:

determining the position of a candidate target in the at least two perspective images in the omnidirectional image based on the mapping relation between the at least two perspective images and the omnidirectional image; and

determining the position of the candidate target in the two-dimensional panoramic image based on the mapping relation between the two-dimensional panoramic image and the omnidirectional image.

15. The object detection method of claim 1, further comprising:

and vertically correcting the two-dimensional panoramic image so that the ground in the corrected two-dimensional panoramic image is parallel to a horizontal line.

16. An object detection apparatus for a two-dimensional panoramic image, comprising:

an image to be detected determining unit configured to determine at least two perspective images of the two-dimensional panoramic image, wherein the at least two perspective images respectively include partial image information in the two-dimensional panoramic image;

an object detection unit configured to perform object detection on the at least two perspective images to determine at least two candidate object detection results corresponding to the at least two perspective images; and

a result determination unit configured to determine a panoramic target detection result of the two-dimensional panoramic image based on at least two candidate target detection results corresponding to the at least two perspective images.

17. An object detection device comprising:

a processor; and

a memory having computer-readable program instructions stored therein,

wherein the computer readable program instructions, when executed by the processor, perform the instructions for the object detection method for a two-dimensional panoramic image of any of claims 1-15.

18. A computer-readable storage medium storing computer-readable instructions which, when executed by a computer, implement the object detection method for a two-dimensional panoramic image according to any one of claims 1 to 15.