CN113569636B - Fisheye image feature processing method and system based on spherical features and electronic equipment - Google Patents
Fisheye image feature processing method and system based on spherical features and electronic equipment Download PDFInfo
- Publication number
- CN113569636B CN113569636B CN202110693974.2A CN202110693974A CN113569636B CN 113569636 B CN113569636 B CN 113569636B CN 202110693974 A CN202110693974 A CN 202110693974A CN 113569636 B CN113569636 B CN 113569636B
- Authority
- CN
- China
- Prior art keywords
- features
- spherical
- feature map
- fisheye image
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 36
- 238000012545 processing Methods 0.000 claims abstract description 52
- 230000004927 fusion Effects 0.000 claims abstract description 42
- 230000007246 mechanism Effects 0.000 claims abstract description 23
- 238000000034 method Methods 0.000 claims description 41
- 238000001514 detection method Methods 0.000 claims description 22
- 238000000605 extraction Methods 0.000 claims description 14
- 238000005457 optimization Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 238000000354 decomposition reaction Methods 0.000 claims description 4
- 239000010410 layer Substances 0.000 description 42
- 238000004422 calculation algorithm Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 4
- 238000012937 correction Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000003702 image correction Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000002679 ablation Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000002356 single layer Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The application provides a fisheye image feature processing method and system based on spherical features and electronic equipment. The fish-eye image feature processing method based on the spherical features comprises the following steps: extracting multi-scale features in the fisheye image to obtain a multi-scale feature map; inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing the spherical domain information with the spherical features to obtain the fusion feature map.
Description
Technical Field
The present application relates to the field of image processing, and in particular, to a fisheye image feature processing method, system, electronic device, and storage medium based on spherical features.
Background
The panoramic image plays an important role in the fields of automatic driving, video monitoring and the like due to the larger field angle. However, unlike conventional planar images, since omni-directional images have a larger field angle, scenes equal to or greater than 180 degrees cannot be projected on a limited range of images using conventional pinhole imaging. The omni-directional image also has a unique projection mode while acquiring more information. However, these projection modes generate unavoidable deformations on the plane image, because the spherical information is forcibly projected onto the plane through nonlinear mapping in the projection process, so that interpolation, pixel point discarding and other conditions exist in the projection process, and more troublesome, targets in different positions in the image are different in deformation direction and degree, and distortion rules between different projection formats are not associated. Conventional images do not have such properties, so doing the migration of the algorithm directly would not solve the distortion problem. With the rapid development of industries such as autopilot, the practical demands on panoramic image processing algorithms become more urgent.
In the prior art, the most intuitive method is to firstly carry out distortion correction on the panoramic image so that the panoramic image can present translation invariance characteristics similar to those of a plane image, namely, the input panoramic image is corrected by using a preprocessing step; yet another approach is to directly improve the image processing algorithm.
However, on one hand, for the method of distortion correction of panoramic images, the prior art not only ignores the face detection process in the actual recognition problem, but also divides the images according to the deformation degree, which cannot well reflect the continuity of the deformation degree of the images along with the position change, and in addition, the prior art also relies on the detection of circular arcs. The problem is that the positions of different faces in the fisheye image are different, a plurality of targets are not completely and robustly processed by the algorithm, and because the panoramic image contains more information than the planar image, distortion can be generated when the panoramic image is projected on the plane, the prior art can always lead the original information to be lost by forcibly eliminating deformation, and extra time loss can be caused when the image correction is carried out and then the detection algorithm is carried out. On the other hand, with the method of directly improving the image processing algorithm, the prior art adopts multi-directional projection, which is time-consuming and hardware-consuming in terms of repeated execution in multiple directions. Because the kernel weights among different latitudes in the network are irrelevant, the model size is increased remarkably, corresponding adjustment is needed according to the change of the network structure, portability is poor, and the model cannot be applied to omnidirectional lens images in other projection formats. Furthermore, while the prior art allows weight distribution, it is implicitly assumed that features on spheres can be characterized by interpolation on the 2D plane defined by the equiangular projection, which represents a significant problem. In other words, it is difficult to learn the spherical features by optimizing the 2D convolution to fit the deformation problem of the image. In addition, although the prior art uses fast fourier transform to accelerate the convolution speed, since the point multiplication itself has high complexity in the frequency domain, the calculation overhead is large, and thus, the method cannot be widely and efficiently applied at present.
Disclosure of Invention
The application provides a fisheye image feature processing method, a system, electronic equipment and a storage medium based on spherical features, which aim to overcome a plurality of problems in the prior art, solve the problem of image distortion through a unified network and perform face detection by using the spherical features of images with smaller complexity.
Specifically, the embodiment of the application provides the following technical scheme:
in a first aspect, an embodiment of the present application provides a fisheye image feature processing method based on spherical features, including:
acquiring a fisheye image and extracting multi-scale features in the fisheye image to obtain a multi-scale feature map;
inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing spherical domain information with the spherical features to obtain the fusion feature map.
Further, the fisheye image feature processing method based on the spherical feature further comprises the following steps:
the fisheye image feature processing model comprises a feature image extraction layer, a feature image optimization layer and a feature image fusion layer;
the feature map extraction layer is used for extracting plane features and spherical features in the multi-scale feature map;
the feature map optimization layer is used for extracting plane features containing sphere information based on a spatial self-attention mechanism; and
and the feature map fusion layer is used for fusing the plane features containing the sphere domain information with the spherical features to obtain a fusion feature map.
Further, the fisheye image feature processing method based on the spherical feature further comprises the following steps:
the extracting plane features containing sphere information based on the spatial self-attention mechanism comprises the following steps: and guiding the plane feature to pay attention to corresponding distortion position information in the fisheye image by using the spherical feature.
Further, the fisheye image feature processing method based on the spherical feature further comprises the following steps:
the extracting the plane features and the spherical features in the multi-scale feature map comprises the following steps:
and extracting an intermediate layer in the multi-scale features to perform spherical convolution, and simultaneously transmitting the spherical features to other layers by utilizing path enhancement and upsampling to share information.
Further, the fisheye image feature processing method based on the spherical feature further comprises the following steps:
the method further comprises the steps of: the context sensing module carries out convolution operation on the output fusion characteristic diagram to obtain a fusion characteristic diagram with enhanced characteristics,
the convolution operation of the output fusion feature map through the context sensing module comprises the following steps: the method adopts a small convolution kernel and a multi-branch structure to carry out convolution operation with different scales, enhances the extraction capability of context information in a plurality of convolution layers through different receptive fields, and adopts a convolution decomposition mode with separable space channels to respectively carry out convolution operation in a frequency domain and a space domain.
Further, the fisheye image feature processing method based on the spherical feature further comprises the following steps:
the method further comprises the steps of: based on the fusion feature map with enhanced features, face detection is performed through a regression head network and a classification head network,
the face detection through the regression head network and the classification head network comprises the following steps:
determining coordinates of the fusion feature map through the regression head network;
determining the category of the fusion feature map through the classification head network; and
and carrying out the face detection based on the coordinates and the category.
Further, the fisheye image feature processing method based on the spherical feature further comprises the following steps:
the projection mode of the fisheye image comprises an equal rectangular projection ERP, a cube projection CMP and a stripe projection SSP.
In a second aspect, an embodiment of the present application further provides a fisheye image feature processing system based on spherical features, including:
the multi-scale feature acquisition module is used for extracting multi-scale features in the fisheye image to obtain a multi-scale feature map;
the fisheye image feature processing module is used for inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing spherical domain information with the spherical features to obtain the fusion feature map.
In a third aspect, an embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the steps of the above fisheye image feature processing method based on spherical features are implemented when the processor executes the program.
In a fourth aspect, an embodiment of the present application further provides a storage medium including a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the above-described fisheye image feature processing method based on spherical features.
As can be seen from the above technical solutions, the method, system, electronic device and storage medium for processing fisheye image features based on spherical features provided by the embodiments of the present application aim to overcome many problems in the prior art, and not only can solve the problem of image distortion through a unified network, but also can use the spherical features of images to perform face detection with less complexity. According to the technical scheme provided by the application, the image features of the plane 2D and the spherical 3D are simultaneously utilized, the image features are not simply overlapped, but the plane 2D features are guided to focus on related distortion position information in the graph by using a spatial self-attention mechanism, then the spherical features are combined with the optimized plane features to form the final fisheye image features, the phenomenon that rotation invariance image information of the spherical surface cannot be captured by using only the 2D features is avoided, meanwhile, the utilization rate of the spherical features is improved by using feature interaction between different layers, the spherical information of the middle layer is transmitted to other layers, unnecessary spherical convolution is avoided, meanwhile, due to the introduction of the attention mechanism, the extraction work of the spherical information can be completed by using only two layers of spherical domain convolution, and the calculation amount is greatly reduced.
Drawings
In order to more clearly illustrate the application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a fisheye image feature processing method based on spherical features according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a fisheye image feature processing system based on spherical features according to an embodiment of the application; and
fig. 3 is a schematic diagram of an electronic device according to an embodiment of the application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The various terms or phrases used herein have the ordinary meaning known to those of ordinary skill in the art, but rather the application is intended to be more fully described and explained herein. If the terms and phrases referred to herein have a meaning inconsistent with the known meaning, the meaning expressed by the present application; and if not defined in the present application, have meanings commonly understood by those of ordinary skill in the art.
In the prior art, the most intuitive method is to firstly carry out distortion correction on the panoramic image so that the panoramic image can present translation invariance characteristics similar to those of a plane image, namely, the input panoramic image is corrected by using a preprocessing step; yet another approach is to directly improve the image processing algorithm.
However, on one hand, for the method of distortion correction of panoramic images, the prior art not only ignores the face detection process in the actual recognition problem, but also divides the images according to the deformation degree, which cannot well reflect the continuity of the deformation degree of the images along with the position change, and in addition, the prior art also relies on the detection of circular arcs. The problem is that the positions of different faces in the fisheye image are different, a plurality of targets are not completely and robustly processed by the algorithm, and because the panoramic image contains more information than the planar image, distortion can be generated when the panoramic image is projected on the plane, the prior art can always lead the original information to be lost by forcibly eliminating deformation, and extra time loss can be caused when the image correction is carried out and then the detection algorithm is carried out. On the other hand, with the method of directly improving the image processing algorithm, the prior art adopts multi-directional projection, which is time-consuming and hardware-consuming in terms of repeated execution in multiple directions. Because the kernel weights among different latitudes in the network are irrelevant, the model size is increased remarkably, corresponding adjustment is needed according to the change of the network structure, portability is poor, and the model cannot be applied to omnidirectional lens images in other projection formats. Furthermore, while the prior art allows weight distribution, it is implicitly assumed that features on spheres can be characterized by interpolation on the 2D plane defined by the equiangular projection, which represents a significant problem. In other words, it is difficult to learn the spherical features by optimizing the 2D convolution to fit the deformation problem of the image. In addition, although the prior art uses fast fourier transform to accelerate the convolution speed, since the point multiplication itself has high complexity in the frequency domain, the calculation overhead is large, and thus, the method cannot be widely and efficiently applied at present.
In view of this, in a first aspect, an embodiment of the present application proposes a fisheye image feature processing method based on spherical features, which aims to overcome many problems in the prior art, and can solve the problem of image distortion through a unified network, and can use spherical features of images to perform face detection with less complexity. According to the technical scheme provided by the application, the image features of the plane 2D and the spherical 3D are simultaneously utilized, the image features are not simply overlapped, but the plane 2D features are guided to focus on related distortion position information in the graph by using a spatial self-attention mechanism, then the spherical features are combined with the optimized plane features to form the final fisheye image features, the phenomenon that rotation invariance image information of the spherical surface cannot be captured by using only the 2D features is avoided, meanwhile, the utilization rate of the spherical features is improved by using feature interaction between different layers, the spherical information of the middle layer is transmitted to other layers, unnecessary spherical convolution is avoided, meanwhile, due to the introduction of the attention mechanism, the extraction work of the spherical information can be completed by using only two layers of spherical domain convolution, and the calculation amount is greatly reduced.
The fish-eye image feature processing method based on spherical features of the present application is described below with reference to fig. 1.
Fig. 1 is a flowchart of a fisheye image feature processing method based on spherical features according to an embodiment of the application.
In this embodiment, it should be noted that the fisheye image feature processing method based on spherical features may include the following steps:
s1: acquiring a fisheye image and extracting multi-scale features in the fisheye image to obtain a multi-scale feature map;
s2: inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing the spherical domain information with the spherical features to obtain the fusion feature map.
Specifically, the fisheye image feature processing method based on spherical features provided by an embodiment of the present application may be further described as including, but not limited to, the following steps: extracting multi-scale features of the image; extracting image feature layers with multiple scales, setting the zooming step length to be 2 (namely, the step length of the last layer of each convolution layer is 2), and obtaining feature images with different sizes through continuous convolution layers; converting the multi-scale features into rotation invariance features of the sphere extracted image; the planar feature map is guided to notice the corresponding distortion position in the graph through the spherical feature map, and the planar feature map with distortion information is obtained; fusing the plane feature map containing distortion information and the spherical feature map in the channel direction to obtain a total feature map of the fisheye image; sending the total feature map to a context sensing module subjected to calculation optimization; and classifying and regressing the characteristic information by using a regression head network and a classification head network to obtain the final output of the category and the coordinate.
In this embodiment, it should be noted that the fisheye image feature processing method based on spherical features may further include: the fish-eye image feature processing model comprises a feature map extraction layer, a feature map optimization layer and a feature map fusion layer; the feature map extraction layer is used for extracting plane features and spherical features in the multi-scale feature map; the feature map optimization layer is used for extracting plane features containing sphere information based on a spatial self-attention mechanism; and the feature map fusion layer is used for fusing the plane features containing the spherical domain information with the spherical features to obtain a fusion feature map.
Specifically, for the fisheye image feature processing model, as shown in table 1, the attention module and the context module were trained and the ablation experimental test was performed on FDDB-360 (planar dataset) and widsurface-360 dataset (fisheye image), respectively. Wherein training was performed using four NVIDIA Tesla P40 (24G) GPUs, with a batch size of 8X8. Random gradient descent with momentum set to 0.9 was used as the optimization method. In different training phases, the weights are updated using a learning rate that fades (decay rate set to 5e (-4)). The learning rate is initialized to 1e (-3). Network training with mobilet as base layer 250 batches, network training with Resnet as base layer 100 batches.
TABLE 1
As shown in Table 2, comparing the method provided by the application with the prior art, the method provided by the application has higher accuracy or effectiveness on distorted fisheye images and common plane images. Wherein, widerface is divided into three data subsets of difficulty, namely difficulty, medium and easy.
TABLE 2
In this embodiment, it should be noted that the fisheye image feature processing method based on spherical features may further include: the extracting of the plane features and the spherical features in the multi-scale feature map comprises the following steps: intermediate layers in the multi-scale features are extracted for spherical convolution while the spherical features are transferred to other layers using path enhancement and upsampling, respectively, to share information.
In particular, detection algorithms have requirements on the run time of the algorithm, while the complexity of spherical convolution is high. In the experimental process, the feature map with medium size is found to be more important for sensing the distorted face in the fisheye image, and is more beneficial to focusing on the distortion based on the position: the higher-level feature map has richer semantic information and stronger semantic information coding capability; conversely, feature maps of larger sizes learn more content information, such as contours and edges, and learn less distortion than smaller sizes. Based on the above description, in combination with sphere convolution, the method only extracts the middle layer in the multilayer features to carry out sphere convolution, and simultaneously, the path enhancement and the upsampling are utilized to respectively transfer the spherical features to other layers, so that information sharing is realized. The strategy of 'partial convolution and overall sharing' can effectively utilize the sensitivity of the middle characteristic layer to the spherical characteristic, and can avoid the repeated complex spherical convolution in other layers.
In this embodiment, it should be noted that the fisheye image feature processing method based on spherical features may further include: extracting planar features containing sphere information based on a spatial self-attention mechanism includes: the spherical surface features are used for guiding the plane features to pay attention to corresponding distortion position information in the fisheye image.
For example, since the sphere convolution converts the feature map into a sphere for operation, rotation invariance features of the image can be extracted, the feature of the sphere convolution can be used to guide the planar image to capture the pixel area associated with the current position deformation information.
Specifically, a common self-attention mechanism is to encode own image information (through operations such as convolution, etc.), the encoded information is used as a Query (Query) to guide the self to improve the attention (i.e. weight) to a specific target area, in the encoding process, based on the spatial self-attention mechanism of the image, the 2D plane information of the self is transferred to a sphere to perform feature extraction, and then a spherical signal is used as the Query to guide the plane information to pay attention to a distortion position in the image. In other words, the spatial self-attention mechanism of the image sends the planar image feature map into the spherical convolution network, and the obtained feature map and the input are subjected to point-to-point operation to obtain the final fisheye image feature map.
More specifically, the spatial attention module is used to extract sphere information, the operators of which are as follows:
wherein x is i ,x j A planar image feature map representing point i and a spherical feature map representing point j, y i Representing a plane feature map based on spherical attention, its dimensions and x i Similarly, we use single layer convolution:rather than linear embedding, wherein +.>Is to encode a planar image into a weight vector represented by an input signal and f capture a specific point x in a planar feature map i Long range dependencies with all other points. The other points belong to a feature map extracted from the spherical convolution, which contains distortion information. f (x) i ,x j ) A feature fusion layer representing spherical domain features and planar domain features, as follows:
since spherical convolution manages rotation invariant signals over the sphere domain, sphere CNN is represented asTo attract the attention of the spherical signal. Specifically, two successive spherical convolution layers are employed, and the feature map is converted to S by focusing on the trade-off between computation and performance, using only two layers of spherical convolution 2 And the SO (3) domain. The first convolution layer is performed on the sphere domain, converting the H x W feature map in the planar coordinate system to a feature map of size α x β in the spherical coordinate system. The second convolution layer performs a conversion on the sphere domain to the SO (3) domain with an alpha x beta x gamma output. Further, finally combining the planar features based on the sphere attention with the features extracted by sphere convolution to obtain a final fisheye image feature map.
Furthermore, object detection is a computer vision image processing technique that can identify and locate objects in images or videos. In particular, object detection may be used to count objects in an environment and determine and track their precise locations while also enabling precise marking. Therefore, the types of found things can be immediately classified in actual application scenes such as automobile driving, monitoring and the like, and meanwhile, the things can be positioned in the images.
In this embodiment, it should be noted that the fisheye image feature processing method based on spherical features may further include: the method comprises the steps of carrying out convolution operation on the output fusion feature map through a context sensing module to obtain a feature-enhanced fusion feature map, wherein the step of carrying out convolution operation on the output fusion feature map through the context sensing module comprises the following steps: the method adopts a small convolution kernel and a multi-branch structure to carry out convolution operation with different scales, enhances the extraction capability of context information in a plurality of convolution layers through different receptive fields, and adopts a convolution decomposition mode with separable space channels to respectively carry out convolution operation in a frequency domain and a space domain.
Specifically, in order to meet the requirements of calculation amount and running speed, the context sensing module is improved, a small convolution kernel is adopted, meanwhile, convolution operation of different scales is carried out by adopting a multi-branch structure, the small-scale face can be detected while smaller calculation amount is kept by adopting a convolution decomposition mode with separable space channels and performing convolution operation in a frequency domain and a space domain through the extraction capability of context information in the enhancement layer of different receptive fields (described in inception).
In this embodiment, it should be noted that the fisheye image feature processing method based on spherical features may further include: based on the feature-enhanced fusion feature map, face detection is performed through a regression head network and a classification head network, wherein the face detection performed through the regression head network and the classification head network comprises: determining the coordinates of the fusion feature map through a regression head network; determining the category of the fusion feature map through a classification head network; and performing face detection based on the coordinates and the category.
Specifically, as a projection format of the panoramic image, the fisheye image is a foothold of the method, however, in terms of the method, the extraction process of the spherical features is suitable for projection formats of various panoramic images such as rectangular projection ERP, cube projection CMP, strip projection SSP and the like, and projections of various formats can map image information onto a spherical surface to carry out spherical convolution, and finally, the distortion features based on the positions are extracted as input elements of an attention mechanism.
In summary, the application provides a fisheye image feature processing algorithm for extracting an image spherical feature map for the first time, and the spherical signal and the plane signal are fused by utilizing a self-attention mechanism, so that the feature map based on the fisheye image is finally obtained and used as a classification and detection task of a human face.
Based on the same inventive concept, in another aspect, an embodiment of the present application provides a fisheye image feature processing system based on spherical features.
The spherical feature-based fisheye image feature processing system provided by the application is described below with reference to fig. 2, and the spherical feature-based fisheye image feature processing system described below and the spherical feature-based fisheye image feature processing method described above can be referred to correspondingly.
Fig. 2 is a schematic structural diagram of a fisheye image feature processing system based on spherical features according to an embodiment of the application.
In this embodiment, the fisheye image feature processing system 1 based on spherical features includes: the multi-scale feature map acquisition module 10 is used for acquiring a fisheye image and extracting multi-scale features in the fisheye image to obtain a multi-scale feature map; and a fisheye image feature processing module 20, configured to input the multi-scale feature map into a trained fisheye image feature processing model, to obtain a fused feature map of the fisheye image output by the fisheye image feature processing model, where the fisheye image feature processing model is configured to extract planar features and spherical features from the multi-scale feature map, extract planar features containing spherical domain information based on a spatial self-attention mechanism, and fuse the planar features containing spherical domain information with the spherical features to obtain a fused feature map.
The fisheye image feature processing system based on spherical features provided by the embodiment of the application can be used for executing the fisheye image feature processing method based on spherical features described in the above embodiment, and the working principle and the beneficial effects are similar, so that details are not described herein, and the detailed description can be found in the description of the above embodiment.
In this embodiment, it should be noted that, each module in the apparatus of the embodiment of the present application may be integrated into one body, or may be separately deployed. The modules may be combined into one module or may be further split into a plurality of subunits.
In yet another aspect, a further embodiment of the present application provides an electronic device based on the same inventive concept.
Fig. 3 is a schematic diagram of an electronic device according to an embodiment of the application.
In this embodiment, it should be noted that the electronic device may include: processor 310, communication interface (Communications Interface) 320, memory 330 and communication bus 340, wherein processor 310, communication interface 320, memory 330 accomplish communication with each other through communication bus 340. The processor 310 may invoke logic instructions in the memory 330 to perform a spherical feature based fisheye image feature processing method comprising: extracting multi-scale features in the fisheye image to obtain a multi-scale feature map; and inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing the spherical domain information with the spherical features to obtain the fusion feature map.
Further, the logic instructions in the memory 330 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In yet another aspect, the present application also provides a non-transitory computer readable storage medium having stored thereon a computer program which when executed by a processor is implemented to perform a fisheye image feature processing method based on spherical features, the method comprising: extracting multi-scale features in the fisheye image to obtain a multi-scale feature map; and inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing the spherical domain information with the spherical features to obtain the fusion feature map.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Moreover, in the present application, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Furthermore, in the present application, the description of the terms "embodiment," "this embodiment," "yet another embodiment," and the like, means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.
Claims (9)
1. The fish-eye image feature processing method based on the spherical features is characterized by comprising the following steps of:
extracting multi-scale features in the fisheye image to obtain a multi-scale feature map;
inputting the multi-scale feature map into a trained fisheye image feature processing model to obtain a fusion feature map of the fisheye image output by the fisheye image feature processing model, wherein the fisheye image feature processing model is used for extracting plane features and spherical features from the multi-scale feature map, extracting plane features containing spherical domain information based on a spatial self-attention mechanism, and fusing the plane features containing spherical domain information with the spherical features to obtain the fusion feature map;
the context sensing module carries out convolution operation on the output fusion characteristic diagram to obtain a fusion characteristic diagram with enhanced characteristics,
the convolution operation of the output fusion feature map through the context sensing module comprises the following steps: the method adopts a small convolution kernel and a multi-branch structure to carry out convolution operation with different scales, enhances the extraction capability of context information in a plurality of convolution layers through different receptive fields, and adopts a convolution decomposition mode with separable space channels to respectively execute the convolution operation in a frequency domain and a space domain, and the method further comprises the following steps:
extracting multi-scale features of the image;
extracting image feature layers with multiple scales, setting the zooming step length to be 2, and obtaining feature images with different sizes through continuous convolution layers;
converting the multi-scale features into rotation invariance features of the sphere extracted image;
the planar feature map is guided to notice the corresponding distortion position in the graph through the spherical feature map, and the planar feature map with distortion information is obtained;
fusing the plane feature map containing the distortion information and the spherical feature map in the channel direction to obtain a total feature map of the fisheye image;
sending the total feature map to a context sensing module subjected to calculation optimization;
and classifying and regressing the characteristic information by using a regression head network and a classification head network to obtain the final output of the category and the coordinate.
2. The fisheye image feature processing method based on spherical features according to claim 1, wherein the fisheye image feature processing model comprises a feature map extraction layer, a feature map optimization layer and a feature map fusion layer;
the feature map extraction layer is used for extracting plane features and spherical features in the multi-scale feature map;
the feature map optimization layer is used for extracting plane features containing sphere information based on a spatial self-attention mechanism; and
and the feature map fusion layer is used for fusing the plane features containing the sphere domain information with the spherical features to obtain a fusion feature map.
3. The method for processing fish-eye image features based on spherical features according to claim 1, wherein the extracting planar features containing spherical domain information based on a spatial self-attention mechanism comprises: and guiding the plane feature to pay attention to corresponding distortion position information in the fisheye image by using the spherical feature.
4. The method for processing fish-eye image features based on spherical features according to claim 1, wherein the extracting the planar features and the spherical features in the multi-scale feature map comprises:
and extracting an intermediate layer in the multi-scale features to perform spherical convolution, and simultaneously transmitting the spherical features to other layers by utilizing path enhancement and upsampling to share information.
5. The spherical feature-based fisheye image feature processing method of claim 1, further comprising: based on the fusion feature map with enhanced features, face detection is performed through a regression head network and a classification head network,
the face detection through the regression head network and the classification head network comprises the following steps:
determining coordinates of the fusion feature map through the regression head network;
determining the category of the fusion feature map through the classification head network; and
and carrying out the face detection based on the coordinates and the category.
6. The method for processing the fisheye image features based on the spherical features according to claim 2, wherein the projection mode of the fisheye image comprises an isorectangular projection ERP, a cubic projection CMP and a strip projection SSP.
7. A spherical feature-based fisheye image feature processing system, wherein the spherical feature-based fisheye image feature processing system performs the spherical feature-based fisheye image feature processing method of claims 1-6.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the spherical feature based fisheye image feature processing method of any of claims 1-6 when the program is executed.
9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the fisheye image feature processing method based on spherical features as claimed in any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110693974.2A CN113569636B (en) | 2021-06-22 | 2021-06-22 | Fisheye image feature processing method and system based on spherical features and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110693974.2A CN113569636B (en) | 2021-06-22 | 2021-06-22 | Fisheye image feature processing method and system based on spherical features and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113569636A CN113569636A (en) | 2021-10-29 |
CN113569636B true CN113569636B (en) | 2023-12-05 |
Family
ID=78162553
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110693974.2A Active CN113569636B (en) | 2021-06-22 | 2021-06-22 | Fisheye image feature processing method and system based on spherical features and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113569636B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106651767A (en) * | 2016-12-30 | 2017-05-10 | 北京星辰美豆文化传播有限公司 | Panoramic image obtaining method and apparatus |
CN110189247A (en) * | 2019-05-16 | 2019-08-30 | 北京航空航天大学 | The method, apparatus and system that image generates |
CN110827193A (en) * | 2019-10-21 | 2020-02-21 | 国家广播电视总局广播电视规划院 | Panoramic video saliency detection method based on multi-channel features |
CN111666434A (en) * | 2020-05-26 | 2020-09-15 | 武汉大学 | Streetscape picture retrieval method based on depth global features |
CN112200045A (en) * | 2020-09-30 | 2021-01-08 | 华中科技大学 | Remote sensing image target detection model establishing method based on context enhancement and application |
-
2021
- 2021-06-22 CN CN202110693974.2A patent/CN113569636B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106651767A (en) * | 2016-12-30 | 2017-05-10 | 北京星辰美豆文化传播有限公司 | Panoramic image obtaining method and apparatus |
CN110189247A (en) * | 2019-05-16 | 2019-08-30 | 北京航空航天大学 | The method, apparatus and system that image generates |
CN110827193A (en) * | 2019-10-21 | 2020-02-21 | 国家广播电视总局广播电视规划院 | Panoramic video saliency detection method based on multi-channel features |
CN111666434A (en) * | 2020-05-26 | 2020-09-15 | 武汉大学 | Streetscape picture retrieval method based on depth global features |
CN112200045A (en) * | 2020-09-30 | 2021-01-08 | 华中科技大学 | Remote sensing image target detection model establishing method based on context enhancement and application |
Also Published As
Publication number | Publication date |
---|---|
CN113569636A (en) | 2021-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Shivakumar et al. | Dfusenet: Deep fusion of rgb and sparse depth information for image guided dense depth completion | |
Liao et al. | DR-GAN: Automatic radial distortion rectification using conditional GAN in real-time | |
CN110782420A (en) | Small target feature representation enhancement method based on deep learning | |
CN109712071B (en) | Unmanned aerial vehicle image splicing and positioning method based on track constraint | |
Liang et al. | A survey of 3D object detection | |
WO2023142602A1 (en) | Image processing method and apparatus, and computer-readable storage medium | |
Hambarde et al. | Single image depth estimation using deep adversarial training | |
Liu et al. | Study of human action recognition based on improved spatio-temporal features | |
JP2023059794A (en) | Semantic graph embedding lifted for all azimuth direction location recognition | |
Zhou et al. | YOLO-CIR: The network based on YOLO and ConvNeXt for infrared object detection | |
CN111368733B (en) | Three-dimensional hand posture estimation method based on label distribution learning, storage medium and terminal | |
Sundaram et al. | FSSCaps-DetCountNet: fuzzy soft sets and CapsNet-based detection and counting network for monitoring animals from aerial images | |
Giang et al. | TopicFM: Robust and interpretable topic-assisted feature matching | |
Zheng et al. | Feature pyramid of bi-directional stepped concatenation for small object detection | |
Li et al. | Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module | |
Mo et al. | PVDet: Towards pedestrian and vehicle detection on gigapixel-level images | |
CN113743300A (en) | Semantic segmentation based high-resolution remote sensing image cloud detection method and device | |
CN114972492A (en) | Position and pose determination method and device based on aerial view and computer storage medium | |
CN112668662A (en) | Outdoor mountain forest environment target detection method based on improved YOLOv3 network | |
CN113569636B (en) | Fisheye image feature processing method and system based on spherical features and electronic equipment | |
CN115272450A (en) | Target positioning method based on panoramic segmentation | |
CN117036658A (en) | Image processing method and related equipment | |
CN114693951A (en) | RGB-D significance target detection method based on global context information exploration | |
CN114897842A (en) | Infrared small target segmentation detection method based on texture enhancement network | |
Zhou et al. | Improved YOLOv7 models based on modulated deformable convolution and swin transformer for object detection in fisheye images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |