WO2022126921A1 - 全景图片的检测方法、装置、终端及存储介质 - Google Patents

全景图片的检测方法、装置、终端及存储介质 Download PDF

Info

Publication number
WO2022126921A1
WO2022126921A1 PCT/CN2021/083845 CN2021083845W WO2022126921A1 WO 2022126921 A1 WO2022126921 A1 WO 2022126921A1 CN 2021083845 W CN2021083845 W CN 2021083845W WO 2022126921 A1 WO2022126921 A1 WO 2022126921A1
Authority
WO
WIPO (PCT)
Prior art keywords
detection
detection frame
projection
spherical
panoramic picture
Prior art date
Application number
PCT/CN2021/083845
Other languages
English (en)
French (fr)
Inventor
刘杰
王健宗
瞿晓阳
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022126921A1 publication Critical patent/WO2022126921A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a method, device, terminal and storage medium for detecting panoramic pictures.
  • Panoramic photo (Panoramic photo, or Panorama) usually refers to the normal effective viewing angle of the human eyes (about 90 degrees horizontally, 70 degrees vertically) or including the peripheral vision (about 180 degrees horizontally, 90 degrees vertically) or more, or even 360 degrees. Photos taken with full scene range. Panoramic pictures are widely used, in addition to virtual reality display in exhibition halls and scenic exhibitions, they are more used in street view services. Google Maps officially launched the Street View service as early as 2007, and major domestic map service providers such as Tencent, Baidu and other related products have also launched the Street View service. Panoramic images are generally shot with a professional VR panoramic camera, and the pictures generated after shooting are saved in a special projection method.
  • Equidistant cylindrical projection is currently the most widely used 360° panoramic projection method. It maps the meridian to equidistant vertical lines and the latitude to equidistant horizontal lines. This format is more intuitive and the projection is rectangular. . But this projection is neither equal area nor conformal. Therefore, extreme deformation will occur at the positions of the two poles. Due to the need to code and protect private information such as faces and license plates, and mark famous scenic spots and introductions in scenic spots, target detection for both street view panorama pictures and VR panorama pictures is a very important task. However, the inventor found that, due to the projection format, the object to be detected in the two-level object will be greatly deformed because one pixel is elongated, and such deformation seriously affects the detection performance.
  • Existing solutions such as changing the shape of the convolution kernel, need to modify the target detection network, the operation is complex, the amount of engineering is large, and the existing target detection framework cannot be used, and the performance is poor.
  • the present application provides a panorama picture detection method, device, terminal and storage medium to solve the problem of inaccurate detection of objects in two levels due to excessive deformation in the existing panorama picture detection.
  • a technical solution adopted in the present application is to provide a method for detecting a panoramic picture, comprising: dividing the obtained equidistant cylindrical projection panoramic picture into multiple side-by-side spherical projection pictures;
  • the spherical polar plane projection images are input to the pre-trained detection network, and the detection frame information of each spherical polar plane projection image is obtained; based on the detection frame information, the coordinate projection transformation of the detection frame is performed to label the detection frame to the equidistant cylindrical projection.
  • an equidistant cylindrical projection panoramic image with a detection frame is obtained.
  • a detection device for a panoramic picture comprising: a dividing module for dividing the obtained equidistant cylindrical projection panoramic picture into a plurality of side-by-side spherical poles Plane projection picture; detection module, used to input multiple spherical polar plane projection pictures to the pre-trained detection network, to obtain the detection frame information of each spherical polar plane projection picture; projection module, used to detect detection based on the detection frame information The coordinate projection transformation of the frame is performed to mark the detection frame on the equidistant cylindrical projection panoramic image, and an equidistant cylindrical projection panoramic image with the detection frame is obtained.
  • a terminal including a memory, a processor, and a program file stored in the memory and running on the processor, wherein, when the processor executes the program file, Implement the following steps: divide the obtained equidistant cylindrical projection panoramic image into multiple spherical projection images side by side; input the multiple spherical projection images into the pre-trained detection network to obtain each spherical projection The detection frame information of the picture; the coordinate projection transformation is performed on the detection frame based on the detection frame information, so as to mark the detection frame on the equidistant cylindrical projection panoramic picture, and obtain the equidistant cylindrical projection panoramic picture with the detection frame.
  • another technical solution adopted in the present application is to provide a storage medium, wherein a program file capable of realizing the detection method of a panoramic picture is stored, and the program file implements the following steps when executed by a processor: Divide the obtained equidistant cylindrical projection panorama image into multiple spherical projection images side by side; input the multiple spherical projection images into the pre-trained detection network to obtain the detection frame of each spherical projection image The coordinate projection transformation is performed on the detection frame based on the detection frame information, so as to mark the detection frame on the equidistant cylindrical projection panoramic picture, and obtain the equidistant cylindrical projection panoramic picture with the detection frame.
  • the beneficial effects of the present application are: the method for detecting a panoramic picture of the present application divides the equidistant cylindrical projection panoramic picture into a plurality of side-by-side spherical polar plane projection pictures, and then inputs each spherical polar plane projection picture into the pre-trained
  • the detection is performed in the detection network, and the detection frame information of each spherical projection image is obtained, and then the detection frame of each spherical projection image is projected onto the equidistant cylindrical projection panorama image according to the detection frame information, and the detection frame with the detection frame is generated.
  • the equidistant cylindrical projection panorama image completes the image detection of the equidistant cylindrical projection panorama image. It divides the equidistant cylindrical projection panoramic image into multiple sub-projection pictures to reduce the deformation of the objects at the two levels on the picture, thereby improving the detection accuracy and performance.
  • FIG. 1 is a schematic flowchart of a method for detecting a panoramic picture according to a first embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for detecting a panoramic picture according to a second embodiment of the present application
  • FIG. 3 is a schematic diagram of functional modules of a device for detecting panoramic pictures according to an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of a terminal according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a storage medium according to an embodiment of the present application.
  • first”, “second” and “third” in this application are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as “first”, “second”, “third” may expressly or implicitly include at least one of that feature.
  • "a plurality of” means at least two, such as two, three, etc., unless otherwise expressly and specifically defined. All directional indications (such as up, down, left, right, front, rear%) in the embodiments of the present application are only used to explain the relative positional relationship between components under a certain posture (as shown in the accompanying drawings).
  • FIG. 1 is a schematic flowchart of a method for detecting a panoramic image according to a first embodiment of the present application. It should be noted that, if there is substantially the same result, the method of the present application is not limited to the sequence of the processes shown in FIG. 1 . As shown in Figure 1, the method includes the steps:
  • Step S101 Divide the obtained equidistant cylindrical projection panoramic picture into a plurality of side-by-side spherical projection pictures.
  • panoramic projection there are many ways to implement panoramic projection, which can usually be divided into the following four types:
  • Spherical projection also known as spherical rectangular projection or equidistant cylindrical projection. This is the most common method to open the surround sphere, and it is also the projection method generally supported by the current panorama software.
  • the panoramic image after projection processing is a 2:1 ratio picture, just like a world map.
  • the equator is the horizontal line passing through the middle of the image, and only the influence of this line is kept horizontal, and the others have different degrees. distortion. The closer to the two levels, the more severe the deformation, and the upper and lower endpoints of the two levels become a pixel line.
  • the horizontal and horizontal building lines, roads, etc. are all curved, and the vertical lines of buildings, telephone poles, straight trees, etc. will not be deformed.
  • Cube surface projection which can realize surround vision is not only spherical, but also cube. In a six-sided cube, if our viewpoint is in the center of it, we only need to perform appropriate image compensation for each viewing angle to achieve the same look-around effect as spherical projection.
  • the advantage of this projection method is that the projected picture is a picture of a cube face, and the picture of each cube face is a square image with a horizontal viewing angle of 90° and a vertical viewing angle of 90°.
  • the pixel density and quality of each face of the cube map is consistent, and the image of the cube face can be finely adjusted and modified during image post-processing.
  • Circular projection or mirror spherical projection
  • Such a projected image looks like a picture taken with a super-circular fisheye lens, the image is extremely distorted and distorted, and its viewing angle reaches 360°, including all images in three-dimensional space.
  • the beauty of this projection is that it is a continuous, seam-free image. But since all the lines of the image are extremely distorted, it is almost impossible to modify and adjust the graphics correctly in post-processing.
  • the asteroid projection is the same as the circular projection in the projection method, which is different from the difference in the two-dimensional performance of the image and the shape of the picture, and can be appropriately trimmed as needed.
  • the targeted projection method is spherical projection, wherein, for an equidistant cylindrical projection panoramic picture, its length represents a circle, that is, 360°, and its width is half a circle, that is, 180°.
  • the equidistant cylindrical projection panoramic picture to be projected is acquired, the equidistant cylindrical projection panoramic picture is divided into a plurality of side-by-side spherical plane projection pictures in the horizontal direction.
  • the generation of the spherical polar plane projection image is as follows:
  • the equidistant cylindrical projection panoramic image is a spherical surface.
  • the two-dimensional coordinates of the point (x, y, z) on the spherical surface on the plane are
  • Each point represents a pixel
  • the above calculation is performed on the pixel matrix in the area on the sphere to convert each pixel on the sphere into two-dimensional coordinates.
  • the converted pixel is divided into multiple side-by-side spherical projection images. .
  • Step S102 Inputting a plurality of spherical polar plane projection pictures into a pre-trained detection network to obtain detection frame information of each spherical polar plane projection picture.
  • the detection network needs to be trained first, and then the trained detection network is used to detect the spherical projection picture.
  • the steps of preselecting the training detection network include:
  • AutoML refers to automatic machine learning, which is the process of automating the end-to-end process of applying machine learning to real-world problems.
  • Traditional machine learning models can be roughly divided into the following four parts: data acquisition, data preprocessing, optimization, and application; while AutoML realizes automation from three aspects: feature engineering, model selection, and hyperparameter optimization.
  • feature engineering is the process of converting raw data into features, which can better describe potential problems to the predictive model, thereby improving the accuracy of the model for unseen data.
  • Feature engineering usually includes three tasks: feature generation, feature selection, feature coding, etc.; model selection refers to the automatic selection of models.
  • the traditional method is to select one or more models with the best combination of traditional models, such as KNN, SVM, and decision trees.
  • hyperparameters are parameters pre-set by machine learning before learning, not parameters obtained through training, such as the number and depth of trees, and the learning rate of neural networks. etc. Even in the hyperparameter learning, the structure of the neural network, including the number of layers, the types of different layers, the connection between layers, etc., all belong to the category of hyperparameters, and hyperparameter optimization is the process of optimizing hyperparameters.
  • the detection network is constructed based on AutoML.
  • a search attempt is made on the connection part of the feature extraction layer of the network in the detection network, that is, all possible combinations are tried in the search space, Then select the combination with the highest mAP to get the optimized detection network.
  • Step S103 performing coordinate projection transformation on the detection frame based on the detection frame information, so as to mark the detection frame on the equidistant cylindrical projection panoramic picture, and obtain the equidistant cylindrical projection panoramic picture with the detection frame.
  • step S103 by inputting a plurality of spherical polar plane projection pictures into the pre-trained detection network, a detection frame on each spherical polar plane projection picture is obtained, the coordinates of the center point of the detection frame are recorded, and then the detection frame is The coordinates of the center point of the person are obtained by intensive coordinate transformation, so as to obtain the three-dimensional coordinates of the detection frame in the three-dimensional coordinate system of the equidistant cylindrical projection panoramic picture.
  • the coordinate transformation formula is as follows:
  • step S102 when the detection network detects the spherical plane projection picture, the length and width information of the detection frame is also obtained. , y, z) and the length and width information of the detection frame, so as to obtain the equidistant cylindrical projection panoramic picture with the detection frame, and complete the detection of objects on the equidistant cylindrical projection panoramic picture.
  • the method for detecting a panoramic picture divides the equidistant cylindrical projection panoramic picture into a plurality of side-by-side spherical projection pictures, and then inputs the values of each spherical projection picture into a pre-trained detection network. Perform detection to obtain detection frame information, and then construct a 360° panoramic image with a detection frame based on the detection frame information, which divides the equidistant cylindrical projection panoramic image into multiple sub-projection images to reduce the generation of objects on the image at the poles Therefore, the detection accuracy and performance are improved, and the generated panoramic pictures are more watchable.
  • FIG. 2 is a schematic flowchart of a method for detecting a panoramic image according to a second embodiment of the present application. It should be noted that, if there is substantially the same result, the method of the present application is not limited to the sequence of the processes shown in FIG. 2 . As shown in Figure 2, the method includes the steps:
  • Step S201 Divide the equidistant cylindrical projection panoramic picture into four spherical projection pictures side by side, the horizontal and vertical spans of each spherical projection picture are 180°, and the adjacent spherical projection pictures overlap horizontally by 90° Area.
  • step S201 when dividing the equidistant cylindrical projection panoramic picture, the equidistant cylindrical projection panoramic picture is divided into four side-by-side spherical projection pictures in the horizontal direction, and the horizontal and vertical spans of each spherical projection picture are horizontal and vertical. Both are 180°, so that the adjacent spherical projection images overlap in the horizontal direction by 90°, so that the object at the edge of the spherical projection image is divided into two halves, reducing the difficulty of detection.
  • Step S202 Inputting a plurality of spherical projection images into a pre-trained detection network to obtain a detection frame of each spherical projection image, as well as a detection category and a confidence score of the detection frame.
  • step S202 it should be noted that when the detection network is used to detect objects on each spherical plane projection picture to obtain detection frames, on the same spherical plane projection picture, the same object may generate multiple detection frames, Therefore, in the detection process, it is necessary to filter the detection frame by using the non-maximum suppression method until the optimal detection frame corresponding to the object remains.
  • the detection class, and the confidence score of the detection box The calculation of the letter-to-letter score belongs to the prior art, and details are not described here.
  • Step S203 Determine one or more detection frames corresponding to the same detection category of the adjacent spherical polar plane projection pictures.
  • step S204 is executed; when the detection category corresponds to multiple detection frames, step S205 is executed.
  • step S203 it should be understood that, for the adjacent spherical plane projection pictures, the adjacent spherical plane projection pictures horizontally overlap an area of 90°, therefore, when the adjacent spherical plane projection pictures are respectively input into the After the pre-trained detection network, objects in the overlapping area may be detected on the two spherical polar plane projection images, resulting in the same object corresponding to multiple detection frames, that is, the same detection category corresponds to multiple detection frames.
  • Step S204 Use the detection frame as the target detection frame of the detection category.
  • step 204 when the detection category corresponds to only one detection frame, the detection frame is directly used as the target detection frame of the detection category.
  • Step S205 Calculate the detection frame score of each detection frame according to the confidence score, and select the detection frame with the highest detection frame score as the target detection frame of the detection category.
  • step S205 when the detection category corresponds to multiple detection frames, the detection frame score of each detection frame is calculated according to the confidence score corresponding to each detection frame, and then the detection frame with the highest detection frame score is used as the target of the detection category Check box.
  • step S205 the calculation of the detection frame score of each detection frame according to the confidence score includes:
  • a second pending detection frame of the same detection category as the first pending detection frame is selected.
  • intersection ratio refers to the ratio of the intersection and union of two rectangular boxes.
  • the calculation formula of the detection frame score is:
  • s′ i is the detection frame score
  • s i is the confidence score
  • d i is the Euclidean distance
  • ⁇ 1 and ⁇ 2 are preset balance parameters, preferably, in this embodiment, ⁇ 1 is 0.1 and ⁇ 2 is 0.6.
  • Step S206 Obtain the coordinates of the center point of the target detection frame and perform coordinate projection transformation, so as to project the target detection frame onto the equidistant cylindrical projection panoramic picture to obtain the equidistant cylindrical projection panoramic picture with the detection frame.
  • step S206 after confirming the target detection frame corresponding to each detection category, obtain the center point coordinates of the target detection frame, and perform coordinate projection transformation on the center point coordinates to convert them into three-dimensional coordinates, and then the target The detection frame is projected to the equidistant cylindrical projection panoramic picture, and the cycle is repeated until all target detection frames are projected onto the equidistant cylindrical projection panoramic picture, and the equidistant cylindrical projection panoramic picture with the detection frame is obtained, and the equidistant cylindrical projection panorama is completed. Detection of objects on pictures.
  • the panorama picture detection method of the second embodiment of the present application divides the equidistant cylindrical projection panorama picture into four side-by-side spherical projection pictures.
  • the target detection frame makes the selection of the detection frame more accurate, and further improves the detection accuracy.
  • FIG. 3 is a schematic diagram of functional modules of an apparatus for detecting a panoramic picture according to an embodiment of the present application.
  • the detection device 30 of the panoramic picture includes a division module 31 , a detection module 32 and a projection module 33 .
  • the dividing module 31 is configured to divide the obtained equidistant cylindrical projection panoramic picture into a plurality of side-by-side spherical projection pictures.
  • the detection module 32 is used for inputting a plurality of spherical polar plane projection pictures into the pre-trained detection network to obtain detection frame information of each spherical polar plane projection picture.
  • the projection module 33 is configured to perform coordinate projection transformation on the detection frame based on the detection frame information, so as to mark the detection frame on the equidistant cylindrical projection panoramic picture, and obtain the equidistant cylindrical projection panoramic picture with the detection frame.
  • the division module 31 divides the obtained equidistant cylindrical projection panoramic picture into a plurality of side-by-side spherical polar plane projection pictures: the equidistant cylindrical projection panoramic picture is divided into four side-by-side spherical polar plane projections. Pictures, the horizontal and vertical spans of each spherical projection picture are 180°, and the adjacent spherical projection pictures overlap the area of 90° horizontally.
  • the detection module 32 inputs a plurality of spherical polar plane projection pictures to the pre-trained detection network, and the operation of obtaining the detection frame information of each spherical polar plane projection picture can also be: Input to the pre-trained detection network to obtain the detection frame of each spherical polar plane projection picture, as well as the detection category and confidence score of the detection frame; determine one or more corresponding detection categories of the same detection category of the adjacent spherical polar plane projection pictures Check box.
  • the projection module 33 performs coordinate projection transformation on the detection frame based on the detection frame information, so as to mark the detection frame on the equidistant cylindrical projection panoramic picture, and the operation of obtaining the equidistant cylindrical projection panoramic picture with the detection frame can also be: : When the detection category corresponds to one detection frame, the detection frame is used as the target detection frame of the detection category; when the detection category corresponds to multiple detection frames, the detection frame score of each detection frame is calculated according to the confidence score, and the detection frame score is selected.
  • the highest detection frame is used as the target detection frame of the detection category; the coordinates of the center point of the target detection frame are obtained and coordinate projection transformation is performed to project the target detection frame onto the equidistant cylindrical projection panorama image, and an equidistant cylindrical shape with a detection frame is obtained. Project a panoramic image.
  • the operation of the projection module 33 to calculate the detection frame score of each detection frame according to the confidence score may also be: confirming that on different spherical projection pictures, a plurality of pending detection frames of the same detection category; The first center coordinates of the plane projection picture and the second center coordinates of the first pending detection frame on the target spherical polar plane projection picture, and calculate the Euclidean distance between the first center coordinates and the second center coordinates; randomly select other The second pending detection frame with the same detection category as the first pending detection frame on the spherical polar plane projection picture; calculate the intersection ratio of the first pending detection frame and the second pending detection frame; according to the credit score of the first pending detection frame, The Euclidean distance and the intersection ratio are calculated to obtain the detection frame score of the first pending detection frame.
  • the calculation formula of the detection frame score is:
  • s′ i is the detection frame score
  • s i is the confidence score
  • d i is the Euclidean distance
  • ⁇ 1 and ⁇ 2 are preset balance parameters.
  • the panoramic image detection device 20 further includes a training module for pre-training the detection network, and the operation of the training module to pre-train the detection network may be: establishing an initial detection network based on AutoML; acquiring training samples, and using the training samples The initial detection network is trained until the pre-training index is met, and the trained detection network is obtained.
  • FIG. 4 is a schematic structural diagram of a terminal according to an embodiment of the present application.
  • the terminal 40 includes a processor 41 and a memory 42 coupled to the processor 41 .
  • the memory 42 stores program instructions, and when the program instructions are executed by the processor 41, the processor 41 executes the steps of the panorama image detection method in the above embodiment.
  • the processor 41 may also be referred to as a CPU (Central Processing Unit, central processing unit).
  • the processor 41 may be an integrated circuit chip with signal processing capability.
  • Processor 41 may also be a general purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components .
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • FIG. 5 is a schematic structural diagram of a storage medium according to an embodiment of the present application.
  • the storage medium of this embodiment of the present application stores a program file 51 capable of implementing all the above methods, wherein the program file 51 may be stored in the above-mentioned storage medium in the form of a software product, and includes several instructions to make a computer device (which can be A personal computer, a server, or a network device, etc.) or a processor (processor) executes all or part of the steps of the methods described in the various embodiments of the present application.
  • a computer device which can be A personal computer, a server, or a network device, etc.
  • processor processor
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes , or terminal devices such as computers, servers, mobile phones, and tablets.
  • the storage medium may be non-volatile or volatile.
  • the disclosed terminal, apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Studio Devices (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种全景图片的检测方法、装置、终端及存储介质,其中方法包括:将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片;将多张球极平面投影图片输入至预先训练好的检测网络,得到每张球极平面投影图片的检测框信息;基于检测框信息对检测框进行坐标投影变换,以将检测框标注至等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。通过上述方式,本申请能够将等距柱状投影全景图片分割为多张子投影图片来降低图片上的物体在极点处的产生的形变,提高了检测的准确度和性能。

Description

全景图片的检测方法、装置、终端及存储介质
本申请要求于2020年12月18日提交中国专利局、申请号为202011509078.8,发明名称为“全景图片的检测方法、装置、终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别是涉及一种全景图片的检测方法、装置、终端及存储介质。
背景技术
全景照片,(Panoramic photo,或者Panorama)通常是指符合人的双眼正常有效视角(大约水平90度,垂直70度)或包括双眼余光视角(大约水平180度,垂直90度)以上,乃至360度完整场景范围拍摄的照片。全景图片的用途十分广泛,除了可以在展览馆做虚拟现实展示,景区展览以外,更多的是使用在街景服务中。谷歌地图早在2007年就正式推出街景服务,国内的主要地图服务提供商,如腾讯,百度等相关产品也上线了街景服务。全景图像一般使用专业的VR全景相机拍摄,拍摄后生成的图片采用特殊的投影方式保存。等距柱状投影是目前应用最为广泛的一种360°全景投影方式,其将子午线映射为等间距的垂直直线,将纬线映射为等间距的水平直线,这种格式比较直观,并且投影是矩形的。但这种投影既不是等面积也不是保形的。因此在两个极点的位置会产生极大的形变。因需要对人脸、车牌等隐私信息进行打码保护和标注著名景点及景区内介绍等,无论是街景全景图片还是VR全景图片的目标检测都是十分重要的任务。然而,发明人发现,因为投影格式的原因,处于两级的物体的待检测物体因其一个像素被拉长而会有很大的形变,这样的形变严重影响了检测的性能。现有方案,如改变卷积核形状等需要对目标检测网络进行修改,操作复杂,工程量大,且有无法使用现有的目标检测框架,性能不佳。
发明内容
本申请提供一种全景图片的检测方法、装置、终端及存储介质,以解决现有全景图片检测中,处于两级的物体因形变过大而导致检测不准确的问题。
为解决上述技术问题,本申请采用的一个技术方案是:提供一种全景图片的检测方法,包括:将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片;将多张球极平面投影图片输入至预先训练好的检测网络,得到每张球极平面投影图片的检测框信息;基于检测框信息对检测框进行坐标投影变换,以将检测框标注至等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
为解决上述技术问题,本申请采用的另一个技术方案是:提供一种全景图片的检测装置,包括:划分模块,用于将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片;检测模块,用于将多张球极平面投影图片输入至预先训练好的检测网络,得到每张球极平面投影图片的检测框信息;投影模块,用于基于检测框信息对检测框进行坐标投影变换,以将检测框标注至等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
为解决上述技术问题,本申请采用的再一个技术方案是:提供一种终端,包括存储器、处理器以及存储在存储器上并可在处理器上运行的程序文件,其中,处理器执行程序文件时实现以下步骤:将获取到的等距柱状投影全景图片划分为多张并 排的球极平面投影图片;将多张球极平面投影图片输入至预先训练好的检测网络,得到每张球极平面投影图片的检测框信息;基于检测框信息对检测框进行坐标投影变换,以将检测框标注至等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
为解决上述技术问题,本申请采用的再一个技术方案是:提供一种存储介质,其中,存储有能够实现全景图片的检测方法的程序文件,所述程序文件被处理器执行时实现以下步骤:将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片;将多张球极平面投影图片输入至预先训练好的检测网络,得到每张球极平面投影图片的检测框信息;基于检测框信息对检测框进行坐标投影变换,以将检测框标注至等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
本申请的有益效果是:本申请的全景图片的检测方法通过将等距柱状投影全景图片划分为多张并排的球极平面投影图片,然后将每张球极平面投影图片输入至预先训练好的检测网络中进行检测,得到每张球极平面投影图片检测框信息,再根据检测框信息将每张球极平面投影图片检测框投影至等距柱状投影全景图片上,生成带有检测框的等距柱状投影全景图片,完成对等距柱状投影全景图片的图片检测,其采用将等距柱状投影全景图片分割为多张子投影图片来降低图片上在两级处的物体产生的形变,从而提高了检测的准确度和性能。
附图说明
图1是本申请第一实施例的全景图片的检测方法的流程示意图;
图2是本申请第二实施例的全景图片的检测方法的流程示意图;
图3是本申请实施例的全景图片的检测装置的功能模块示意图;
图4是本申请实施例的终端的结构示意图;
图5是本申请实施例的存储介质的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请中的术语“第一”、“第二”、“第三”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”、“第三”的特征可以明示或者隐含地包括至少一个该特征。本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。本申请实施例中所有方向性指示(诸如上、下、左、右、前、后……)仅用于解释在某一特定姿态(如附图所示)下各部件之间的相对位置关系、运动情况等,如果该特定姿态发生改变时,则该方向性指示也相应地随之改变。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
图1是本申请第一实施例的全景图片的检测方法的流程示意图。需注意的是,若有实质上相同的结果,本申请的方法并不以图1所示的流程顺序为限。如图1所示,该方法包括步骤:
步骤S101:将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片。
需要说明的是,全景投影有多种方式可以实现,通常可以划分为以下四种:
1、球面投影,也称之为球面矩形投影或等距柱状投影。这是打开环绕球体最常用的方法,也是目前全景软件普通支持的投影方式。经过投影处理后的全景图像是一副2:1比例的图片,就像一幅世界地图,赤道就是穿越图像中间的水平线,且只有这一条线上的影响是保持水平的,其他都有不同程度的扭曲变形。越接近两级,变形就越严重,而两级的上下的两个端点,就成为一条像素线。就像我们看到的全景图片一样,横向水平的建筑线条、马路等都是弯曲的,建筑物的垂直线条、电线杆、笔直的树木等不会变形。
2、立方体面投影,能够实现环绕视觉的不仅是球形,也可以是立方体。在一个六面正方体里,如果我们的视点处在他的正中央,那么只需要对每个视角进行适当的图像补偿,就能达到与球面投影一样的环视效果。这种投影方式的优点是,投影的图片是立方体面的图片,每一个立方体面的图片都是水平视角90°、垂直视角90°的正方形图。立方体图的每个面的像素密度和质量是一致的,在图像后期处理时,可以对立方体面的图片进行精细的调整和修改。
3、圆形投影,或称镜面球投影,为角投影的一种。这样的投影图像看起来像一个用超级圆形鱼眼镜头所拍摄的图片,图像被极端扭曲和变形,其视角达360°,包括了三维空间的所有影像。这种投影的优点在于,它是一个连续的、没有接缝的图像。但由于图像所有线条都被极端扭曲,在后期处理时几乎不可能对图形进行正确的修改和调整。
4、小行星投影,在投影方法上与圆形投影相同,不同于在图像的二维表现效果和图片形制的区别,并可以根据需要进行适当的剪裁处理。
本实施例中,针对于的投影方式是球面投影,其中,对于等距柱状投影全景图片,其长代表圆周,即360°,宽为半个圆周,即180°。在获取到待投影的等距柱状投影全景图片之后,将该等距柱状投影全景图片在水平方向上划分为多张并排的球极平面投影图片。具体地,球极平面投影图片的生成如下:
等距柱状投影全景图片为一球面,假设在该球面上,投影平面为球面与z=0的切平面,则在球面上的点(x,y,z)在平面上的二维坐标为
Figure PCTCN2021083845-appb-000001
每个点代表一个像素,对球面上区域内的像素矩阵进行上述计算,以将球面上每个像素点转换为二维坐标,再见转换后的像素点划分为多张并排的球极平面投影图片。
步骤S102:将多张球极平面投影图片输入至预先训练好的检测网络,得到每张球极平面投影图片的检测框信息。
本实施例中,需要说明的是,在进行等距柱状投影全景图片的检测之前,需要先训练检测网络,然后使用该训练好的检测网络检测球极平面投影图片。具体地,预选训练检测网络的步骤包括:
1、基于AutoML建立初始的检测网络。
需要说明的是,AutoML是指自动机器学习,是将机器学习应用于现实问题的端到端流程自动化的过程。传统机器学习模型大致可分为以下四个部分:数据采集、数据预处理、优化、应用;而AutoML则是从特征工程、模型选择、超参优化三方面实现自动化。其中,特征工程是将原始数据转化为特征的过程,这些特征可以更 好地向预测模型描述潜在问题,从而提高模型对未见数据的准确性,特征工程通常包括三个工作:特征生成、特征选择、特征编码等;模型选择是指模型的自动化选择,传统的方法是从传统的模型,例如KNN,SVM,决策树中选出一个,或多个组合起来效果最好的模型,也就是不经过人工干预,模型自动生成一个对当前任务最有效的网络结构;超参数是机器学习在学习之前预先设置好的参数,而非通过训练得到的参数,例如树的数量深度,神经网络的学习率等,甚至在超参学习中神经网络的结构,包括层数,不同层的类型,层之间的连接方式等,都属于超参数的范畴,超参优化即优化超参数的过程。
手动修改调参既耗费大量的人力和时间,同时也难以寻找优化的方向,而对超参数选择进行优化既能节省大量人力和时间,又能让学习获得更好的性能和效果
2、获取训练样本,并利用训练样本训练初始的检测网络,直至满足预先训练指标时,得到训练好的检测网络。
具体地,本实施例中,该检测网络基于AutoML进行构建,在训练该检测网络时,通过对检测网络中网络的特征提取层连接部分进行搜索尝试,即在搜索空间中尝试所有可能的组合,再选择mAP最高的组合,得到优化后的检测网络。
步骤S103:基于检测框信息对检测框进行坐标投影变换,以将检测框标注至等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
在步骤S103中,通过将多张球极平面投影图片输入至预先训练好的检测网络,从而获得每张球极平面投影图片上的检测框,记录检测框的中心点坐标,再将该检测框得人中心点坐标紧张坐标变换,从而得到检测框在等距柱状投影全景图片的三维坐标系的三维坐标,其中,坐标变换公式如下:
Figure PCTCN2021083845-appb-000002
其中,(x,y,z)检测框中心点坐标进行坐标变换后的三维坐标,(X,Y)为检测框中心点在球极平面投影图片上的二维坐标。此外,在步骤S102中,检测网络检测球极平面投影图片时,还获取到检测框的长宽信息,在将检测框检测框标注至等距柱状投影全景图片上时,根据该三维坐标(x,y,z)和检测框的长宽信息进行标注,从而得到带有检测框的等距柱状投影全景图片,完成对等距柱状投影全景图片上物体的检测。
本申请第一实施例的全景图片的检测方法通过将等距柱状投影全景图片划分为多张并排的球极平面投影图片,然后将每张球极平面投影图片输入值预先训练好的检测网络中进行检测,得到检测框信息,再基于检测框信息构建带有检测框的360°全景图片,其将等距柱状投影全景图片分割为多张子投影图片来降低图片上的物体在极点处的产生的形变,从而提高了检测的准确度和性能,使得生成的全景图片的可观赏性更高。
图2是本申请第二实施例的全景图片的检测方法的流程示意图。需注意的是,若有实质上相同的结果,本申请的方法并不以图2所示的流程顺序为限。如图2所示,该方法包括步骤:
步骤S201:将等距柱状投影全景图片划分为四张并排的球极平面投影图片,每张球极平面投影图片的水平和垂直跨度均为180°,相邻球极平面投影图片水平重叠90°的区域。
在步骤S201中,划分等距柱状投影全景图片时,将该等距柱状投影全景图片在水平方向上划分为四张并排的球极平面投影图片,每张球极平面投影图片的水平和垂直跨度均为180°,从而使得相邻球极平面投影图片在水平方向上重叠90°的区域,从而使得处于球极平面投影图片边缘的物体被分割成两半,降低检测难度。
步骤S202:将多张球极平面投影图片输入至预先训练好的检测网络,得到每张球极平面投影图片的检测框,以及检测框的检测类别和致信分数。
在步骤S202中,需要说明的是,在使用检测网络检测每张球极平面投影图片上的物体得到检测框时,在同一张球极平面投影图片上,同一物体可能会产生多个检测框,因此,在检测过程中,还需通过采用非极大值抑制方法来筛选检测框,直至剩下该物体对应的最优检测框,在检测过程中,还需获取到检测框内物体的类别即检测类别,以及检测框的致信分数。其中,致信分数的计算属于现有的公知技术,此处不再赘述。
步骤S203:确定相邻球极平面投影图片的同一检测类别对应的一个或多个检测框。当检测类别对应一个检测框时,执行步骤S204;当检测类别对应多个检测框时,执行步骤S205。
在步骤S203中,需要理解的是,针对于相邻的球极平面投影图片,相邻球极平面投影图片水平重叠90°的区域,因此,在将相邻的球极平面投影图片分别输入至预先训练好的检测网络之后,重叠区域的物体在两张球极平面投影图片上可能均会被检测到,导致同一个物体对应多个检测框,即同一检测类别对应有多个检测框。
步骤S204:将检测框作为检测类别的目标检测框。
在步骤是204中,当检测类别对应仅一个检测框时,直接将该检测框作为检测类别的目标检测框。
步骤S205:根据致信分数计算每个检测框的检测框分数,并选取检测框分数最高的检测框作为检测类别的目标检测框。
在步骤S205中,当检测类别对应多个检测框时,根据每个检测框对应的致信分数计算每个检测框的检测框分数,再将检测框分数最高的检测框作为该检测类别的目标检测框。
具体的,步骤S205中,所述根据致信分数计算每个检测框的检测框分数,包括:
1、确认不同的球极平面投影图片上,检测类别相同的多个待定检测框。
2、获取目标球极平面投影图片的第一中心坐标和处于目标球极平面投影图片上的第一待定检测框的第二中心坐标,并计算第一中心坐标和第二中心坐标之间的欧氏距离。
具体地,以当前的球极平面投影图片作为目标球极平面投影图片,获取该目标球极平面投影图片的中心点坐标作为第一中心点坐标,并且,获取该目标球极平面投影图片上的第一待定检测框的第二中心坐标,利用第一中心坐标和第二中心坐标计算两者之间的欧式距离。
3、随机选取其他球极平面投影图片上与第一待定检测框检测类别相同的第二待定检测框。
具体地,在与目标球极平面投影图片相邻的一个球极平面投影图片上,选取与第一待定检测框检测类别相同的第二待定检测框。
4、计算第一待定检测框和第二待定检测框的交并比。
具体地,交并比是指两个矩形框交集与并集的比值。
5、根据第一待定检测框的致信分数、欧氏距离、交并比,计算得到第一待定检测框的检测框分数。
具体地,该检测框分数的计算公式为:
Figure PCTCN2021083845-appb-000003
其中,s′ i为检测框分数,s i为致信分数,
Figure PCTCN2021083845-appb-000004
为两个待定检测框的交并比,d i为欧氏距离,σ 1、σ 2为预先设定的平衡参数,优选地,本实施例中,σ 1为0.1,σ 2为0.6。
步骤S206:获取目标检测框的中心点坐标并进行坐标投影变换,以将目标检测框投影至等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
在步骤S206中,在确认每个检测类别对应的目标检测框之后,获取该目标检测框的中心点坐标,并对该中心点坐标进行坐标投影变换,将其转换为三维坐标,然后将该目标检测框投影至等距柱状投影全景图片,依次循环,直至将所有目标检测框投影至等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片,完成对等距柱状投影全景图片上物体的检测。
本申请第二实施例的全景图片的检测方法在第一实施例的基础上,通过将等距柱状投影全景图片划分为四张并排的球极平面投影图片,相邻的球极平面投影图片之间存在重叠区域,从而避免有物体处于投影图片的边缘而被分割成两半,导致难以检测,并且,针对于重叠区域多生成的检测框,通过计算检测框分数,选取分数最高的检测框作为目标检测框,使得检测框的选择更为准确,进一步提高了检测的准确度。
图3是本申请实施例的全景图片的检测装置的功能模块示意图。如图3所示,该全景图片的检测装置30包括划分模块31、检测模块32和投影模块33。
划分模块31,用于将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片。
检测模块32,用于将多张球极平面投影图片输入至预先训练好的检测网络,得到每张球极平面投影图片的检测框信息。
投影模块33,用于基于检测框信息对检测框进行坐标投影变换,以将检测框标注至等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
可选地,划分模块31将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片的操作可以为:将等距柱状投影全景图片划分为四张并排的球极平面投影图片,每张球极平面投影图片的水平和垂直跨度均为180°,相邻球极平面投影图片水平重叠90°的区域。
可选地,检测模块32将多张球极平面投影图片输入至预先训练好的检测网络,得到每张球极平面投影图片的检测框信息的操作还可以为:将多张球极平面投影图片输入至预先训练好的检测网络,得到每张球极平面投影图片的检测框,以及检测框的检测类别和致信分数;确定相邻球极平面投影图片的同一检测类别对应的一个或多个检测框。
可选地,投影模块33基于检测框信息对检测框进行坐标投影变换,以将检测框标注至等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片的操作还可以为:当检测类别对应一个检测框时,将检测框作为检测类别的目标检测框;当检测类别对应多个检测框时,根据致信分数计算每个检测框的检测框分数,并选取检测框分数最高的检测框作为检测类别的目标检测框;获取目标检测框的中心点坐标并进行坐标投影变换,以将目标检测框投影至等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
可选地,投影模块33根据致信分数计算每个检测框的检测框分数的操作还可以为:确认不同的球极平面投影图片上,检测类别相同的多个待定检测框;获取目标球极平面投影图片的第一中心坐标和处于目标球极平面投影图片上的第一待定检测框的第二中心坐标,并计算第一中心坐标和第二中心坐标之间的欧氏距离;随机 选取其他球极平面投影图片上与第一待定检测框检测类别相同的第二待定检测框;计算第一待定检测框和第二待定检测框的交并比;根据第一待定检测框的致信分数、欧氏距离、交并比,计算得到第一待定检测框的检测框分数。
可选地,检测框分数的计算公式为:
Figure PCTCN2021083845-appb-000005
其中,s′ i为检测框分数,s i为致信分数,
Figure PCTCN2021083845-appb-000006
为两个待定检测框的交并比,d i为欧氏距离,σ 1、σ 2为预先设定的平衡参数。
可选地,该全景图片的检测装置20还包括训练模块,用于预先训练检测网络,训练模块预先训练检测网络的操作可以为:基于AutoML建立初始的检测网络;获取训练样本,并利用训练样本训练初始的检测网络,直至满足预先训练指标时,得到训练好的检测网络。
关于上述实施例全景图片的检测装置中各模块实现技术方案的其他细节,可参见上述实施例中的全景图片的检测方法中的描述,此处不再赘述。
需要说明的是,本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。对于装置类实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可
请参阅图4,图4为本申请实施例的终端的结构示意图。如图4所示,该终端40包括处理器41及和处理器41耦接的存储器42。
存储器42存储有程序指令,程序指令被处理器41执行时,使得处理器41执行上述实施例中的全景图片的检测方法的步骤。
其中,处理器41还可以称为CPU(Central Processing Unit,中央处理单元)。处理器41可能是一种集成电路芯片,具有信号的处理能力。处理器41还可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
参阅图5,图5为本申请实施例的存储介质的结构示意图。本申请实施例的存储介质存储有能够实现上述所有方法的程序文件51,其中,该程序文件51可以以软件产品的形式存储在上述存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施方式所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质,或者是计算机、服务器、手机、平板等终端设备。所述存储介质可以是非易失性,也可以是易失性。
在本申请所提供的几个实施例中,应该理解到,所揭露的终端,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。以 上仅为本申请的实施方式,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (22)

  1. 一种全景图片的检测方法,其中,包括:
    将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片;
    将多张所述球极平面投影图片输入至预先训练好的检测网络,得到每张所述球极平面投影图片的检测框信息;
    基于所述检测框信息对检测框进行坐标投影变换,以将所述检测框标注至所述等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
  2. 根据权利要求1所述的全景图片的检测方法,其中,所述将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片,包括:
    将所述等距柱状投影全景图片划分为四张并排的球极平面投影图片,每张所述球极平面投影图片的水平和垂直跨度均为180°,相邻所述球极平面投影图片水平重叠90°的区域。
  3. 根据权利要求2所述的全景图片的检测方法,其中,所述将多张所述球极平面投影图片输入至预先训练好的检测网络,得到每张所述球极平面投影图片的检测框信息,包括:
    将多张所述球极平面投影图片输入至预先训练好的检测网络,得到每张所述球极平面投影图片的检测框,以及所述检测框的检测类别和致信分数;
    确定相邻所述球极平面投影图片的同一所述检测类别对应的一个或多个检测框。
  4. 根据权利要求3所述的全景图片的检测方法,其中,所述基于所述检测框信息对检测框进行坐标投影变换,以将所述检测框标注至所述等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片,包括:
    当所述检测类别对应一个检测框时,将所述检测框作为所述检测类别的目标检测框;
    当所述检测类别对应多个检测框时,根据所述致信分数计算每个所述检测框的检测框分数,并选取所述检测框分数最高的检测框作为所述检测类别的目标检测框;
    获取所述目标检测框的中心点坐标并进行坐标投影变换,以将所述目标检测框投影至所述等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
  5. 根据权利要求4所述的全景图片的检测方法,其中,所述根据所述致信分数计算每个所述检测框的检测框分数,包括:
    确认不同的所述球极平面投影图片上,所述检测类别相同的多个待定检测框;
    获取目标球极平面投影图片的第一中心坐标和处于所述目标球极平面投影图片上的第一待定检测框的第二中心坐标,并计算所述第一中心坐标和所述第二中心坐标之间的欧氏距离;
    随机选取其他所述球极平面投影图片上与所述第一待定检测框检测类别相同的第二待定检测框;
    计算所述第一待定检测框和所述第二待定检测框的交并比;
    根据所述第一待定检测框的致信分数、所述欧氏距离、所述交并比,计算得到所述第一待定检测框的检测框分数。
  6. 根据权利要求5所述的全景图片的检测方法,其中,所述检测框分数的计算公式为:
    Figure PCTCN2021083845-appb-100001
    其中,所述s′ i为所述检测框分数,所述s i为所述致信分数,所述
    Figure PCTCN2021083845-appb-100002
    为两个所述待定检测框的交并比,所述d i为所述欧氏距离,所述σ 1、σ 2为预先设定的平衡参数。
  7. 根据权利要求1所述的全景图片的检测方法,其中,还包括预先训练所述检测网络,所述预先训练所述检测网络的步骤,包括:
    基于AutoML建立初始的检测网络;
    获取训练样本,并利用所述训练样本训练所述初始的检测网络,直至满足预先训练指标时,得到训练好的所述检测网络。
  8. 一种全景图片的检测装置,其中,包括:
    划分模块,用于将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片;
    检测模块,用于将多张所述球极平面投影图片输入至预先训练好的检测网络,得到每张所述球极平面投影图片的检测框信息;
    投影模块,用于基于所述检测框信息对检测框进行坐标投影变换,以将所述检测框标注至所述等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
  9. 一种终端,包括存储器、处理器以及存储在存储器上并可在处理器上运行的程序文件,其中,所述处理器执行所述程序文件时实现以下步骤:
    将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片;
    将多张所述球极平面投影图片输入至预先训练好的检测网络,得到每张所述球极平面投影图片的检测框信息;
    基于所述检测框信息对检测框进行坐标投影变换,以将所述检测框标注至所述等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
  10. 根据权利要求9所述的终端,其中,所述将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片,包括:
    将所述等距柱状投影全景图片划分为四张并排的球极平面投影图片,每张所述球极平面投影图片的水平和垂直跨度均为180°,相邻所述球极平面投影图片水平重叠90°的区域。
  11. 根据权利要求10所述的终端,其中,所述将多张所述球极平面投影图片输入至预先训练好的检测网络,得到每张所述球极平面投影图片的检测框信息,包括:
    将多张所述球极平面投影图片输入至预先训练好的检测网络,得到每张所述球极平面投影图片的检测框,以及所述检测框的检测类别和致信分数;
    确定相邻所述球极平面投影图片的同一所述检测类别对应的一个或多个检测框。
  12. 根据权利要求11所述的终端,其中,所述基于所述检测框信息对检测框进行坐标投影变换,以将所述检测框标注至所述等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片,包括:
    当所述检测类别对应一个检测框时,将所述检测框作为所述检测类别的目标检测框;
    当所述检测类别对应多个检测框时,根据所述致信分数计算每个所述检测框的检测框分数,并选取所述检测框分数最高的检测框作为所述检测类别的目标检测框;
    获取所述目标检测框的中心点坐标并进行坐标投影变换,以将所述目标检测框投影至所述等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
  13. 根据权利要求12所述的终端,其中,所述根据所述致信分数计算每个所述检测框的检测框分数,包括:
    确认不同的所述球极平面投影图片上,所述检测类别相同的多个待定检测框;
    获取目标球极平面投影图片的第一中心坐标和处于所述目标球极平面投影图片上的第一待定检测框的第二中心坐标,并计算所述第一中心坐标和所述第二中心坐标之间的欧氏距离;
    随机选取其他所述球极平面投影图片上与所述第一待定检测框检测类别相同的第二待定检测框;
    计算所述第一待定检测框和所述第二待定检测框的交并比;
    根据所述第一待定检测框的致信分数、所述欧氏距离、所述交并比,计算得到所述第一待定检测框的检测框分数。
  14. 根据权利要求13所述的终端,其中,所述检测框分数的计算公式为:
    Figure PCTCN2021083845-appb-100003
    其中,所述s′ i为所述检测框分数,所述s i为所述致信分数,所述
    Figure PCTCN2021083845-appb-100004
    为两个所述待定检测框的交并比,所述d i为所述欧氏距离,所述σ 1、σ 2为预先设定的平衡参数。
  15. 根据权利要求9所述的终端,其中,还包括预先训练所述检测网络,所述预先训练所述检测网络的步骤,包括:
    基于AutoML建立初始的检测网络;
    获取训练样本,并利用所述训练样本训练所述初始的检测网络,直至满足预先训练指标时,得到训练好的所述检测网络。
  16. 一种存储介质,其中,存储有能够实现全景图片的检测方法的程序文件,所述程序文件被处理器执行时实现以下步骤:
    将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片;
    将多张所述球极平面投影图片输入至预先训练好的检测网络,得到每张所述球极平面投影图片的检测框信息;
    基于所述检测框信息对检测框进行坐标投影变换,以将所述检测框标注至所述等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
  17. 根据权利要求16所述的存储介质,其中,所述将获取到的等距柱状投影全景图片划分为多张并排的球极平面投影图片,包括:
    将所述等距柱状投影全景图片划分为四张并排的球极平面投影图片,每张所述球极平面投影图片的水平和垂直跨度均为180°,相邻所述球极平面投影图片水平重叠90°的区域。
  18. 根据权利要求17所述的存储介质,其中,所述将多张所述球极平面投影图片输入至预先训练好的检测网络,得到每张所述球极平面投影图片的检测框信息,包括:
    将多张所述球极平面投影图片输入至预先训练好的检测网络,得到每张所述球极平面投影图片的检测框,以及所述检测框的检测类别和致信分数;
    确定相邻所述球极平面投影图片的同一所述检测类别对应的一个或多个检测框。
  19. 根据权利要求18所述的存储介质,其中,所述基于所述检测框信息对检测框进行坐标投影变换,以将所述检测框标注至所述等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片,包括:
    当所述检测类别对应一个检测框时,将所述检测框作为所述检测类别的目标检测框;
    当所述检测类别对应多个检测框时,根据所述致信分数计算每个所述检测框的检测框分数,并选取所述检测框分数最高的检测框作为所述检测类别的目标检测框;
    获取所述目标检测框的中心点坐标并进行坐标投影变换,以将所述目标检测框投影至所述等距柱状投影全景图片上,得到带有检测框的等距柱状投影全景图片。
  20. 根据权利要求19所述的存储介质,其中,所述根据所述致信分数计算每个所述检测框的检测框分数,包括:
    确认不同的所述球极平面投影图片上,所述检测类别相同的多个待定检测框;
    获取目标球极平面投影图片的第一中心坐标和处于所述目标球极平面投影图片上的第一待定检测框的第二中心坐标,并计算所述第一中心坐标和所述第二中心坐标之间的欧氏距离;
    随机选取其他所述球极平面投影图片上与所述第一待定检测框检测类别相同的第二待定检测框;
    计算所述第一待定检测框和所述第二待定检测框的交并比;
    根据所述第一待定检测框的致信分数、所述欧氏距离、所述交并比,计算得到所述第一待定检测框的检测框分数。
  21. 根据权利要求20所述的存储介质,其中,所述检测框分数的计算公式为:
    Figure PCTCN2021083845-appb-100005
    其中,所述s′ i为所述检测框分数,所述s i为所述致信分数,所述
    Figure PCTCN2021083845-appb-100006
    为两个所述待定检测框的交并比,所述d i为所述欧氏距离,所述σ 1、σ 2为预先设定的平衡参数。
  22. 根据权利要求16所述的终端,其中,还包括预先训练所述检测网络,所述预先训练所述检测网络的步骤,包括:
    基于AutoML建立初始的检测网络;
    获取训练样本,并利用所述训练样本训练所述初始的检测网络,直至满足预先训练指标时,得到训练好的所述检测网络。
PCT/CN2021/083845 2020-12-18 2021-03-30 全景图片的检测方法、装置、终端及存储介质 WO2022126921A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011509078.8 2020-12-18
CN202011509078.8A CN112529006B (zh) 2020-12-18 2020-12-18 全景图片的检测方法、装置、终端及存储介质

Publications (1)

Publication Number Publication Date
WO2022126921A1 true WO2022126921A1 (zh) 2022-06-23

Family

ID=75001988

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083845 WO2022126921A1 (zh) 2020-12-18 2021-03-30 全景图片的检测方法、装置、终端及存储介质

Country Status (2)

Country Link
CN (1) CN112529006B (zh)
WO (1) WO2022126921A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529006B (zh) * 2020-12-18 2023-12-22 平安科技(深圳)有限公司 全景图片的检测方法、装置、终端及存储介质
CN115423812B (zh) * 2022-11-05 2023-04-18 松立控股集团股份有限公司 一种全景监控平面化展示方法

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150271567A1 (en) * 2012-10-29 2015-09-24 Telefonaktiebolaget L M Ericsson (Publ) 3d video warning module
CN110827193A (zh) * 2019-10-21 2020-02-21 国家广播电视总局广播电视规划院 基于多通道特征的全景视频显著性检测方法
CN111260539A (zh) * 2020-01-13 2020-06-09 魔视智能科技(上海)有限公司 鱼眼图目标识别方法及其系统
US10740957B1 (en) * 2018-06-14 2020-08-11 Kilburn Live, Llc Dynamic split screen
CN111666434A (zh) * 2020-05-26 2020-09-15 武汉大学 基于深度全局特征的街景图片检索方法
CN111913343A (zh) * 2020-07-27 2020-11-10 微幻科技(北京)有限公司 一种全景图像显示方法及装置
CN112529006A (zh) * 2020-12-18 2021-03-19 平安科技(深圳)有限公司 全景图片的检测方法、装置、终端及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171759A (zh) * 2018-01-26 2018-06-15 上海小蚁科技有限公司 双鱼眼镜头全景相机的标定方法及装置、存储介质、终端
CN111160326B (zh) * 2020-04-02 2020-07-28 南京安科医疗科技有限公司 Ct扫描全景实时监测方法和系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150271567A1 (en) * 2012-10-29 2015-09-24 Telefonaktiebolaget L M Ericsson (Publ) 3d video warning module
US10740957B1 (en) * 2018-06-14 2020-08-11 Kilburn Live, Llc Dynamic split screen
CN110827193A (zh) * 2019-10-21 2020-02-21 国家广播电视总局广播电视规划院 基于多通道特征的全景视频显著性检测方法
CN111260539A (zh) * 2020-01-13 2020-06-09 魔视智能科技(上海)有限公司 鱼眼图目标识别方法及其系统
CN111666434A (zh) * 2020-05-26 2020-09-15 武汉大学 基于深度全局特征的街景图片检索方法
CN111913343A (zh) * 2020-07-27 2020-11-10 微幻科技(北京)有限公司 一种全景图像显示方法及装置
CN112529006A (zh) * 2020-12-18 2021-03-19 平安科技(深圳)有限公司 全景图片的检测方法、装置、终端及存储介质

Also Published As

Publication number Publication date
CN112529006B (zh) 2023-12-22
CN112529006A (zh) 2021-03-19

Similar Documents

Publication Publication Date Title
US10540576B1 (en) Panoramic camera systems
US11538229B2 (en) Image processing method and apparatus, electronic device, and computer-readable storage medium
CN109242961B (zh) 一种脸部建模方法、装置、电子设备和计算机可读介质
US11748906B2 (en) Gaze point calculation method, apparatus and device
WO2020001168A1 (zh) 三维重建方法、装置、设备和存储介质
WO2021103137A1 (zh) 室内场景光照估计模型、方法、装置、存储介质以及渲染方法
CN109887003B (zh) 一种用于进行三维跟踪初始化的方法与设备
WO2019238114A1 (zh) 动态模型三维重建方法、装置、设备和存储介质
WO2018014601A1 (zh) 方位跟踪方法、实现增强现实的方法及相关装置、设备
WO2022126921A1 (zh) 全景图片的检测方法、装置、终端及存储介质
WO2023284713A1 (zh) 一种三维动态跟踪方法、装置、电子设备和存储介质
WO2017027322A1 (en) Automatic connection of images using visual features
WO2020151268A1 (zh) 一种3d小行星动态图的生成方法及便携式终端
WO2020034515A1 (zh) 集成成像三维显示方法、装置、设备及存储介质
US11922568B2 (en) Finite aperture omni-directional stereo light transport
WO2023169283A1 (zh) 双目立体全景图像的生成方法、装置、设备、存储介质和产品
WO2022247126A1 (zh) 视觉定位方法、装置、设备、介质及程序
CN109166178B (zh) 一种视觉特性与行为特性融合的全景图像显著图生成方法及系统
CN111210506A (zh) 一种三维还原方法、系统、终端设备和存储介质
CN117730530A (zh) 图像处理方法及装置、设备、存储介质
CN117726747A (zh) 补全弱纹理场景的三维重建方法、装置、存储介质和设备
US20230005213A1 (en) Imaging apparatus, imaging method, and program
CN114445560A (zh) 一种头戴设备及其三维重建方法、装置、系统及介质
CN117635875A (zh) 一种三维重建方法、装置及终端

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21904861

Country of ref document: EP

Kind code of ref document: A1