CN113569636B - Fisheye image feature processing method and system based on spherical features and electronic equipment - Google Patents

Fisheye image feature processing method and system based on spherical features and electronic equipment Download PDF

Info

Publication number
CN113569636B
CN113569636B CN202110693974.2A CN202110693974A CN113569636B CN 113569636 B CN113569636 B CN 113569636B CN 202110693974 A CN202110693974 A CN 202110693974A CN 113569636 B CN113569636 B CN 113569636B
Authority
CN
China
Prior art keywords
features
spherical
feature map
fisheye image
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110693974.2A
Other languages
Chinese (zh)
Other versions
CN113569636A (en
Inventor
苗敬博
刘延伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202110693974.2A priority Critical patent/CN113569636B/en
Publication of CN113569636A publication Critical patent/CN113569636A/en
Application granted granted Critical
Publication of CN113569636B publication Critical patent/CN113569636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application provides a fisheye image feature processing method and system based on spherical features and electronic equipment. The fish-eye image feature processing method based on the spherical features comprises the following steps: extracting multi-scale features in the fisheye image to obtain a multi-scale feature map; inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing the spherical domain information with the spherical features to obtain the fusion feature map.

Description

Fisheye image feature processing method and system based on spherical features and electronic equipment
Technical Field
The present application relates to the field of image processing, and in particular, to a fisheye image feature processing method, system, electronic device, and storage medium based on spherical features.
Background
The panoramic image plays an important role in the fields of automatic driving, video monitoring and the like due to the larger field angle. However, unlike conventional planar images, since omni-directional images have a larger field angle, scenes equal to or greater than 180 degrees cannot be projected on a limited range of images using conventional pinhole imaging. The omni-directional image also has a unique projection mode while acquiring more information. However, these projection modes generate unavoidable deformations on the plane image, because the spherical information is forcibly projected onto the plane through nonlinear mapping in the projection process, so that interpolation, pixel point discarding and other conditions exist in the projection process, and more troublesome, targets in different positions in the image are different in deformation direction and degree, and distortion rules between different projection formats are not associated. Conventional images do not have such properties, so doing the migration of the algorithm directly would not solve the distortion problem. With the rapid development of industries such as autopilot, the practical demands on panoramic image processing algorithms become more urgent.
In the prior art, the most intuitive method is to firstly carry out distortion correction on the panoramic image so that the panoramic image can present translation invariance characteristics similar to those of a plane image, namely, the input panoramic image is corrected by using a preprocessing step; yet another approach is to directly improve the image processing algorithm.
However, on one hand, for the method of distortion correction of panoramic images, the prior art not only ignores the face detection process in the actual recognition problem, but also divides the images according to the deformation degree, which cannot well reflect the continuity of the deformation degree of the images along with the position change, and in addition, the prior art also relies on the detection of circular arcs. The problem is that the positions of different faces in the fisheye image are different, a plurality of targets are not completely and robustly processed by the algorithm, and because the panoramic image contains more information than the planar image, distortion can be generated when the panoramic image is projected on the plane, the prior art can always lead the original information to be lost by forcibly eliminating deformation, and extra time loss can be caused when the image correction is carried out and then the detection algorithm is carried out. On the other hand, with the method of directly improving the image processing algorithm, the prior art adopts multi-directional projection, which is time-consuming and hardware-consuming in terms of repeated execution in multiple directions. Because the kernel weights among different latitudes in the network are irrelevant, the model size is increased remarkably, corresponding adjustment is needed according to the change of the network structure, portability is poor, and the model cannot be applied to omnidirectional lens images in other projection formats. Furthermore, while the prior art allows weight distribution, it is implicitly assumed that features on spheres can be characterized by interpolation on the 2D plane defined by the equiangular projection, which represents a significant problem. In other words, it is difficult to learn the spherical features by optimizing the 2D convolution to fit the deformation problem of the image. In addition, although the prior art uses fast fourier transform to accelerate the convolution speed, since the point multiplication itself has high complexity in the frequency domain, the calculation overhead is large, and thus, the method cannot be widely and efficiently applied at present.
Disclosure of Invention
The application provides a fisheye image feature processing method, a system, electronic equipment and a storage medium based on spherical features, which aim to overcome a plurality of problems in the prior art, solve the problem of image distortion through a unified network and perform face detection by using the spherical features of images with smaller complexity.
Specifically, the embodiment of the application provides the following technical scheme:
in a first aspect, an embodiment of the present application provides a fisheye image feature processing method based on spherical features, including:
acquiring a fisheye image and extracting multi-scale features in the fisheye image to obtain a multi-scale feature map;
inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing spherical domain information with the spherical features to obtain the fusion feature map.
Further, the fisheye image feature processing method based on the spherical feature further comprises the following steps:
the fisheye image feature processing model comprises a feature image extraction layer, a feature image optimization layer and a feature image fusion layer;
the feature map extraction layer is used for extracting plane features and spherical features in the multi-scale feature map;
the feature map optimization layer is used for extracting plane features containing sphere information based on a spatial self-attention mechanism; and
and the feature map fusion layer is used for fusing the plane features containing the sphere domain information with the spherical features to obtain a fusion feature map.
Further, the fisheye image feature processing method based on the spherical feature further comprises the following steps:
the extracting plane features containing sphere information based on the spatial self-attention mechanism comprises the following steps: and guiding the plane feature to pay attention to corresponding distortion position information in the fisheye image by using the spherical feature.
Further, the fisheye image feature processing method based on the spherical feature further comprises the following steps:
the extracting the plane features and the spherical features in the multi-scale feature map comprises the following steps:
and extracting an intermediate layer in the multi-scale features to perform spherical convolution, and simultaneously transmitting the spherical features to other layers by utilizing path enhancement and upsampling to share information.
Further, the fisheye image feature processing method based on the spherical feature further comprises the following steps:
the method further comprises the steps of: the context sensing module carries out convolution operation on the output fusion characteristic diagram to obtain a fusion characteristic diagram with enhanced characteristics,
the convolution operation of the output fusion feature map through the context sensing module comprises the following steps: the method adopts a small convolution kernel and a multi-branch structure to carry out convolution operation with different scales, enhances the extraction capability of context information in a plurality of convolution layers through different receptive fields, and adopts a convolution decomposition mode with separable space channels to respectively carry out convolution operation in a frequency domain and a space domain.
Further, the fisheye image feature processing method based on the spherical feature further comprises the following steps:
the method further comprises the steps of: based on the fusion feature map with enhanced features, face detection is performed through a regression head network and a classification head network,
the face detection through the regression head network and the classification head network comprises the following steps:
determining coordinates of the fusion feature map through the regression head network;
determining the category of the fusion feature map through the classification head network; and
and carrying out the face detection based on the coordinates and the category.
Further, the fisheye image feature processing method based on the spherical feature further comprises the following steps:
the projection mode of the fisheye image comprises an equal rectangular projection ERP, a cube projection CMP and a stripe projection SSP.
In a second aspect, an embodiment of the present application further provides a fisheye image feature processing system based on spherical features, including:
the multi-scale feature acquisition module is used for extracting multi-scale features in the fisheye image to obtain a multi-scale feature map;
the fisheye image feature processing module is used for inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing spherical domain information with the spherical features to obtain the fusion feature map.
In a third aspect, an embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the steps of the above fisheye image feature processing method based on spherical features are implemented when the processor executes the program.
In a fourth aspect, an embodiment of the present application further provides a storage medium including a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the above-described fisheye image feature processing method based on spherical features.
As can be seen from the above technical solutions, the method, system, electronic device and storage medium for processing fisheye image features based on spherical features provided by the embodiments of the present application aim to overcome many problems in the prior art, and not only can solve the problem of image distortion through a unified network, but also can use the spherical features of images to perform face detection with less complexity. According to the technical scheme provided by the application, the image features of the plane 2D and the spherical 3D are simultaneously utilized, the image features are not simply overlapped, but the plane 2D features are guided to focus on related distortion position information in the graph by using a spatial self-attention mechanism, then the spherical features are combined with the optimized plane features to form the final fisheye image features, the phenomenon that rotation invariance image information of the spherical surface cannot be captured by using only the 2D features is avoided, meanwhile, the utilization rate of the spherical features is improved by using feature interaction between different layers, the spherical information of the middle layer is transmitted to other layers, unnecessary spherical convolution is avoided, meanwhile, due to the introduction of the attention mechanism, the extraction work of the spherical information can be completed by using only two layers of spherical domain convolution, and the calculation amount is greatly reduced.
Drawings
In order to more clearly illustrate the application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a fisheye image feature processing method based on spherical features according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a fisheye image feature processing system based on spherical features according to an embodiment of the application; and
fig. 3 is a schematic diagram of an electronic device according to an embodiment of the application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The various terms or phrases used herein have the ordinary meaning known to those of ordinary skill in the art, but rather the application is intended to be more fully described and explained herein. If the terms and phrases referred to herein have a meaning inconsistent with the known meaning, the meaning expressed by the present application; and if not defined in the present application, have meanings commonly understood by those of ordinary skill in the art.
In the prior art, the most intuitive method is to firstly carry out distortion correction on the panoramic image so that the panoramic image can present translation invariance characteristics similar to those of a plane image, namely, the input panoramic image is corrected by using a preprocessing step; yet another approach is to directly improve the image processing algorithm.
However, on one hand, for the method of distortion correction of panoramic images, the prior art not only ignores the face detection process in the actual recognition problem, but also divides the images according to the deformation degree, which cannot well reflect the continuity of the deformation degree of the images along with the position change, and in addition, the prior art also relies on the detection of circular arcs. The problem is that the positions of different faces in the fisheye image are different, a plurality of targets are not completely and robustly processed by the algorithm, and because the panoramic image contains more information than the planar image, distortion can be generated when the panoramic image is projected on the plane, the prior art can always lead the original information to be lost by forcibly eliminating deformation, and extra time loss can be caused when the image correction is carried out and then the detection algorithm is carried out. On the other hand, with the method of directly improving the image processing algorithm, the prior art adopts multi-directional projection, which is time-consuming and hardware-consuming in terms of repeated execution in multiple directions. Because the kernel weights among different latitudes in the network are irrelevant, the model size is increased remarkably, corresponding adjustment is needed according to the change of the network structure, portability is poor, and the model cannot be applied to omnidirectional lens images in other projection formats. Furthermore, while the prior art allows weight distribution, it is implicitly assumed that features on spheres can be characterized by interpolation on the 2D plane defined by the equiangular projection, which represents a significant problem. In other words, it is difficult to learn the spherical features by optimizing the 2D convolution to fit the deformation problem of the image. In addition, although the prior art uses fast fourier transform to accelerate the convolution speed, since the point multiplication itself has high complexity in the frequency domain, the calculation overhead is large, and thus, the method cannot be widely and efficiently applied at present.
In view of this, in a first aspect, an embodiment of the present application proposes a fisheye image feature processing method based on spherical features, which aims to overcome many problems in the prior art, and can solve the problem of image distortion through a unified network, and can use spherical features of images to perform face detection with less complexity. According to the technical scheme provided by the application, the image features of the plane 2D and the spherical 3D are simultaneously utilized, the image features are not simply overlapped, but the plane 2D features are guided to focus on related distortion position information in the graph by using a spatial self-attention mechanism, then the spherical features are combined with the optimized plane features to form the final fisheye image features, the phenomenon that rotation invariance image information of the spherical surface cannot be captured by using only the 2D features is avoided, meanwhile, the utilization rate of the spherical features is improved by using feature interaction between different layers, the spherical information of the middle layer is transmitted to other layers, unnecessary spherical convolution is avoided, meanwhile, due to the introduction of the attention mechanism, the extraction work of the spherical information can be completed by using only two layers of spherical domain convolution, and the calculation amount is greatly reduced.
The fish-eye image feature processing method based on spherical features of the present application is described below with reference to fig. 1.
Fig. 1 is a flowchart of a fisheye image feature processing method based on spherical features according to an embodiment of the application.
In this embodiment, it should be noted that the fisheye image feature processing method based on spherical features may include the following steps:
s1: acquiring a fisheye image and extracting multi-scale features in the fisheye image to obtain a multi-scale feature map;
s2: inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing the spherical domain information with the spherical features to obtain the fusion feature map.
Specifically, the fisheye image feature processing method based on spherical features provided by an embodiment of the present application may be further described as including, but not limited to, the following steps: extracting multi-scale features of the image; extracting image feature layers with multiple scales, setting the zooming step length to be 2 (namely, the step length of the last layer of each convolution layer is 2), and obtaining feature images with different sizes through continuous convolution layers; converting the multi-scale features into rotation invariance features of the sphere extracted image; the planar feature map is guided to notice the corresponding distortion position in the graph through the spherical feature map, and the planar feature map with distortion information is obtained; fusing the plane feature map containing distortion information and the spherical feature map in the channel direction to obtain a total feature map of the fisheye image; sending the total feature map to a context sensing module subjected to calculation optimization; and classifying and regressing the characteristic information by using a regression head network and a classification head network to obtain the final output of the category and the coordinate.
In this embodiment, it should be noted that the fisheye image feature processing method based on spherical features may further include: the fish-eye image feature processing model comprises a feature map extraction layer, a feature map optimization layer and a feature map fusion layer; the feature map extraction layer is used for extracting plane features and spherical features in the multi-scale feature map; the feature map optimization layer is used for extracting plane features containing sphere information based on a spatial self-attention mechanism; and the feature map fusion layer is used for fusing the plane features containing the spherical domain information with the spherical features to obtain a fusion feature map.
Specifically, for the fisheye image feature processing model, as shown in table 1, the attention module and the context module were trained and the ablation experimental test was performed on FDDB-360 (planar dataset) and widsurface-360 dataset (fisheye image), respectively. Wherein training was performed using four NVIDIA Tesla P40 (24G) GPUs, with a batch size of 8X8. Random gradient descent with momentum set to 0.9 was used as the optimization method. In different training phases, the weights are updated using a learning rate that fades (decay rate set to 5e (-4)). The learning rate is initialized to 1e (-3). Network training with mobilet as base layer 250 batches, network training with Resnet as base layer 100 batches.
TABLE 1
As shown in Table 2, comparing the method provided by the application with the prior art, the method provided by the application has higher accuracy or effectiveness on distorted fisheye images and common plane images. Wherein, widerface is divided into three data subsets of difficulty, namely difficulty, medium and easy.
TABLE 2
In this embodiment, it should be noted that the fisheye image feature processing method based on spherical features may further include: the extracting of the plane features and the spherical features in the multi-scale feature map comprises the following steps: intermediate layers in the multi-scale features are extracted for spherical convolution while the spherical features are transferred to other layers using path enhancement and upsampling, respectively, to share information.
In particular, detection algorithms have requirements on the run time of the algorithm, while the complexity of spherical convolution is high. In the experimental process, the feature map with medium size is found to be more important for sensing the distorted face in the fisheye image, and is more beneficial to focusing on the distortion based on the position: the higher-level feature map has richer semantic information and stronger semantic information coding capability; conversely, feature maps of larger sizes learn more content information, such as contours and edges, and learn less distortion than smaller sizes. Based on the above description, in combination with sphere convolution, the method only extracts the middle layer in the multilayer features to carry out sphere convolution, and simultaneously, the path enhancement and the upsampling are utilized to respectively transfer the spherical features to other layers, so that information sharing is realized. The strategy of 'partial convolution and overall sharing' can effectively utilize the sensitivity of the middle characteristic layer to the spherical characteristic, and can avoid the repeated complex spherical convolution in other layers.
In this embodiment, it should be noted that the fisheye image feature processing method based on spherical features may further include: extracting planar features containing sphere information based on a spatial self-attention mechanism includes: the spherical surface features are used for guiding the plane features to pay attention to corresponding distortion position information in the fisheye image.
For example, since the sphere convolution converts the feature map into a sphere for operation, rotation invariance features of the image can be extracted, the feature of the sphere convolution can be used to guide the planar image to capture the pixel area associated with the current position deformation information.
Specifically, a common self-attention mechanism is to encode own image information (through operations such as convolution, etc.), the encoded information is used as a Query (Query) to guide the self to improve the attention (i.e. weight) to a specific target area, in the encoding process, based on the spatial self-attention mechanism of the image, the 2D plane information of the self is transferred to a sphere to perform feature extraction, and then a spherical signal is used as the Query to guide the plane information to pay attention to a distortion position in the image. In other words, the spatial self-attention mechanism of the image sends the planar image feature map into the spherical convolution network, and the obtained feature map and the input are subjected to point-to-point operation to obtain the final fisheye image feature map.
More specifically, the spatial attention module is used to extract sphere information, the operators of which are as follows:
wherein x is i ,x j A planar image feature map representing point i and a spherical feature map representing point j, y i Representing a plane feature map based on spherical attention, its dimensions and x i Similarly, we use single layer convolution:rather than linear embedding, wherein +.>Is to encode a planar image into a weight vector represented by an input signal and f capture a specific point x in a planar feature map i Long range dependencies with all other points. The other points belong to a feature map extracted from the spherical convolution, which contains distortion information. f (x) i ,x j ) A feature fusion layer representing spherical domain features and planar domain features, as follows:
since spherical convolution manages rotation invariant signals over the sphere domain, sphere CNN is represented asTo attract the attention of the spherical signal. Specifically, two successive spherical convolution layers are employed, and the feature map is converted to S by focusing on the trade-off between computation and performance, using only two layers of spherical convolution 2 And the SO (3) domain. The first convolution layer is performed on the sphere domain, converting the H x W feature map in the planar coordinate system to a feature map of size α x β in the spherical coordinate system. The second convolution layer performs a conversion on the sphere domain to the SO (3) domain with an alpha x beta x gamma output. Further, finally combining the planar features based on the sphere attention with the features extracted by sphere convolution to obtain a final fisheye image feature map.
Furthermore, object detection is a computer vision image processing technique that can identify and locate objects in images or videos. In particular, object detection may be used to count objects in an environment and determine and track their precise locations while also enabling precise marking. Therefore, the types of found things can be immediately classified in actual application scenes such as automobile driving, monitoring and the like, and meanwhile, the things can be positioned in the images.
In this embodiment, it should be noted that the fisheye image feature processing method based on spherical features may further include: the method comprises the steps of carrying out convolution operation on the output fusion feature map through a context sensing module to obtain a feature-enhanced fusion feature map, wherein the step of carrying out convolution operation on the output fusion feature map through the context sensing module comprises the following steps: the method adopts a small convolution kernel and a multi-branch structure to carry out convolution operation with different scales, enhances the extraction capability of context information in a plurality of convolution layers through different receptive fields, and adopts a convolution decomposition mode with separable space channels to respectively carry out convolution operation in a frequency domain and a space domain.
Specifically, in order to meet the requirements of calculation amount and running speed, the context sensing module is improved, a small convolution kernel is adopted, meanwhile, convolution operation of different scales is carried out by adopting a multi-branch structure, the small-scale face can be detected while smaller calculation amount is kept by adopting a convolution decomposition mode with separable space channels and performing convolution operation in a frequency domain and a space domain through the extraction capability of context information in the enhancement layer of different receptive fields (described in inception).
In this embodiment, it should be noted that the fisheye image feature processing method based on spherical features may further include: based on the feature-enhanced fusion feature map, face detection is performed through a regression head network and a classification head network, wherein the face detection performed through the regression head network and the classification head network comprises: determining the coordinates of the fusion feature map through a regression head network; determining the category of the fusion feature map through a classification head network; and performing face detection based on the coordinates and the category.
Specifically, as a projection format of the panoramic image, the fisheye image is a foothold of the method, however, in terms of the method, the extraction process of the spherical features is suitable for projection formats of various panoramic images such as rectangular projection ERP, cube projection CMP, strip projection SSP and the like, and projections of various formats can map image information onto a spherical surface to carry out spherical convolution, and finally, the distortion features based on the positions are extracted as input elements of an attention mechanism.
In summary, the application provides a fisheye image feature processing algorithm for extracting an image spherical feature map for the first time, and the spherical signal and the plane signal are fused by utilizing a self-attention mechanism, so that the feature map based on the fisheye image is finally obtained and used as a classification and detection task of a human face.
Based on the same inventive concept, in another aspect, an embodiment of the present application provides a fisheye image feature processing system based on spherical features.
The spherical feature-based fisheye image feature processing system provided by the application is described below with reference to fig. 2, and the spherical feature-based fisheye image feature processing system described below and the spherical feature-based fisheye image feature processing method described above can be referred to correspondingly.
Fig. 2 is a schematic structural diagram of a fisheye image feature processing system based on spherical features according to an embodiment of the application.
In this embodiment, the fisheye image feature processing system 1 based on spherical features includes: the multi-scale feature map acquisition module 10 is used for acquiring a fisheye image and extracting multi-scale features in the fisheye image to obtain a multi-scale feature map; and a fisheye image feature processing module 20, configured to input the multi-scale feature map into a trained fisheye image feature processing model, to obtain a fused feature map of the fisheye image output by the fisheye image feature processing model, where the fisheye image feature processing model is configured to extract planar features and spherical features from the multi-scale feature map, extract planar features containing spherical domain information based on a spatial self-attention mechanism, and fuse the planar features containing spherical domain information with the spherical features to obtain a fused feature map.
The fisheye image feature processing system based on spherical features provided by the embodiment of the application can be used for executing the fisheye image feature processing method based on spherical features described in the above embodiment, and the working principle and the beneficial effects are similar, so that details are not described herein, and the detailed description can be found in the description of the above embodiment.
In this embodiment, it should be noted that, each module in the apparatus of the embodiment of the present application may be integrated into one body, or may be separately deployed. The modules may be combined into one module or may be further split into a plurality of subunits.
In yet another aspect, a further embodiment of the present application provides an electronic device based on the same inventive concept.
Fig. 3 is a schematic diagram of an electronic device according to an embodiment of the application.
In this embodiment, it should be noted that the electronic device may include: processor 310, communication interface (Communications Interface) 320, memory 330 and communication bus 340, wherein processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform a spherical feature based fisheye image feature processing method comprising: extracting multi-scale features in the fisheye image to obtain a multi-scale feature map; and inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing the spherical domain information with the spherical features to obtain the fusion feature map.
Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In yet another aspect, the present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which when executed by a processor is implemented to perform a fisheye image feature processing method based on spherical features, the method comprising: extracting multi-scale features in the fisheye image to obtain a multi-scale feature map; and inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing the spherical domain information with the spherical features to obtain the fusion feature map.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Moreover, in the present application, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Furthermore, in the present application, the description of the terms "embodiment," "this embodiment," "yet another embodiment," and the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (9)

1. The fish-eye image feature processing method based on the spherical features is characterized by comprising the following steps of:
extracting multi-scale features in the fisheye image to obtain a multi-scale feature map;
inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing spherical domain information with the spherical features to obtain the fusion feature map;
the context sensing module carries out convolution operation on the output fusion characteristic diagram to obtain a fusion characteristic diagram with enhanced characteristics,
the convolution operation of the output fusion feature map through the context sensing module comprises the following steps: the method adopts a small convolution kernel and a multi-branch structure to carry out convolution operation with different scales, enhances the extraction capability of context information in a plurality of convolution layers through different receptive fields, and adopts a convolution decomposition mode with separable space channels to respectively execute the convolution operation in a frequency domain and a space domain, and the method further comprises the following steps:
extracting multi-scale features of the image;
extracting image feature layers with multiple scales, setting the zooming step length to be 2, and obtaining feature images with different sizes through continuous convolution layers;
converting the multi-scale features into rotation invariance features of the sphere extracted image;
the planar feature map is guided to notice the corresponding distortion position in the graph through the spherical feature map, and the planar feature map with distortion information is obtained;
fusing the plane feature map containing the distortion information and the spherical feature map in the channel direction to obtain a total feature map of the fisheye image;
sending the total feature map to a context sensing module subjected to calculation optimization;
and classifying and regressing the characteristic information by using a regression head network and a classification head network to obtain the final output of the category and the coordinate.
2. The fisheye image feature processing method based on spherical features according to claim 1, wherein the fisheye image feature processing model comprises a feature map extraction layer, a feature map optimization layer and a feature map fusion layer;
the feature map extraction layer is used for extracting plane features and spherical features in the multi-scale feature map;
the feature map optimization layer is used for extracting plane features containing sphere information based on a spatial self-attention mechanism; and
and the feature map fusion layer is used for fusing the plane features containing the sphere domain information with the spherical features to obtain a fusion feature map.
3. The method for processing fish-eye image features based on spherical features according to claim 1, wherein the extracting planar features containing spherical domain information based on a spatial self-attention mechanism comprises: and guiding the plane feature to pay attention to corresponding distortion position information in the fisheye image by using the spherical feature.
4. The method for processing fish-eye image features based on spherical features according to claim 1, wherein the extracting the planar features and the spherical features in the multi-scale feature map comprises:
and extracting an intermediate layer in the multi-scale features to perform spherical convolution, and simultaneously transmitting the spherical features to other layers by utilizing path enhancement and upsampling to share information.
5. The spherical feature-based fisheye image feature processing method of claim 1, further comprising: based on the fusion feature map with enhanced features, face detection is performed through a regression head network and a classification head network,
the face detection through the regression head network and the classification head network comprises the following steps:
determining coordinates of the fusion feature map through the regression head network;
determining the category of the fusion feature map through the classification head network; and
and carrying out the face detection based on the coordinates and the category.
6. The method for processing the fisheye image features based on the spherical features according to claim 2, wherein the projection mode of the fisheye image comprises an isorectangular projection ERP, a cubic projection CMP and a strip projection SSP.
7. A spherical feature-based fisheye image feature processing system, wherein the spherical feature-based fisheye image feature processing system performs the spherical feature-based fisheye image feature processing method of claims 1-6.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the spherical feature based fisheye image feature processing method of any of claims 1-6 when the program is executed.
9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the fisheye image feature processing method based on spherical features as claimed in any one of claims 1 to 6.
CN202110693974.2A 2021-06-22 2021-06-22 Fisheye image feature processing method and system based on spherical features and electronic equipment Active CN113569636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110693974.2A CN113569636B (en) 2021-06-22 2021-06-22 Fisheye image feature processing method and system based on spherical features and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110693974.2A CN113569636B (en) 2021-06-22 2021-06-22 Fisheye image feature processing method and system based on spherical features and electronic equipment

Publications (2)

Publication Number Publication Date
CN113569636A CN113569636A (en) 2021-10-29
CN113569636B true CN113569636B (en) 2023-12-05

Family

ID=78162553

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110693974.2A Active CN113569636B (en) 2021-06-22 2021-06-22 Fisheye image feature processing method and system based on spherical features and electronic equipment

Country Status (1)

Country Link
CN (1) CN113569636B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651767A (en) * 2016-12-30 2017-05-10 北京星辰美豆文化传播有限公司 Panoramic image obtaining method and apparatus
CN110189247A (en) * 2019-05-16 2019-08-30 北京航空航天大学 The method, apparatus and system that image generates
CN110827193A (en) * 2019-10-21 2020-02-21 国家广播电视总局广播电视规划院 Panoramic video saliency detection method based on multi-channel features
CN111666434A (en) * 2020-05-26 2020-09-15 武汉大学 Streetscape picture retrieval method based on depth global features
CN112200045A (en) * 2020-09-30 2021-01-08 华中科技大学 Remote sensing image target detection model establishing method based on context enhancement and application

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106651767A (en) * 2016-12-30 2017-05-10 北京星辰美豆文化传播有限公司 Panoramic image obtaining method and apparatus
CN110189247A (en) * 2019-05-16 2019-08-30 北京航空航天大学 The method, apparatus and system that image generates
CN110827193A (en) * 2019-10-21 2020-02-21 国家广播电视总局广播电视规划院 Panoramic video saliency detection method based on multi-channel features
CN111666434A (en) * 2020-05-26 2020-09-15 武汉大学 Streetscape picture retrieval method based on depth global features
CN112200045A (en) * 2020-09-30 2021-01-08 华中科技大学 Remote sensing image target detection model establishing method based on context enhancement and application

Also Published As

Publication number Publication date
CN113569636A (en) 2021-10-29

Similar Documents

Publication Publication Date Title
Shivakumar et al. Dfusenet: Deep fusion of rgb and sparse depth information for image guided dense depth completion
Liao et al. DR-GAN: Automatic radial distortion rectification using conditional GAN in real-time
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN109712071B (en) Unmanned aerial vehicle image splicing and positioning method based on track constraint
Liang et al. A survey of 3D object detection
WO2023142602A1 (en) Image processing method and apparatus, and computer-readable storage medium
Hambarde et al. Single image depth estimation using deep adversarial training
Liu et al. Study of human action recognition based on improved spatio-temporal features
JP2023059794A (en) Semantic graph embedding lifted for all azimuth direction location recognition
Zhou et al. YOLO-CIR: The network based on YOLO and ConvNeXt for infrared object detection
CN111368733B (en) Three-dimensional hand posture estimation method based on label distribution learning, storage medium and terminal
Sundaram et al. FSSCaps-DetCountNet: fuzzy soft sets and CapsNet-based detection and counting network for monitoring animals from aerial images
Giang et al. TopicFM: Robust and interpretable topic-assisted feature matching
Zheng et al. Feature pyramid of bi-directional stepped concatenation for small object detection
Li et al. Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module
Mo et al. PVDet: Towards pedestrian and vehicle detection on gigapixel-level images
CN113743300A (en) Semantic segmentation based high-resolution remote sensing image cloud detection method and device
CN114972492A (en) Position and pose determination method and device based on aerial view and computer storage medium
CN112668662A (en) Outdoor mountain forest environment target detection method based on improved YOLOv3 network
CN113569636B (en) Fisheye image feature processing method and system based on spherical features and electronic equipment
CN115272450A (en) Target positioning method based on panoramic segmentation
CN117036658A (en) Image processing method and related equipment
CN114693951A (en) RGB-D significance target detection method based on global context information exploration
CN114897842A (en) Infrared small target segmentation detection method based on texture enhancement network
Zhou et al. Improved YOLOv7 models based on modulated deformable convolution and swin transformer for object detection in fisheye images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant