CN117830776A - Feature fusion method and system for vehicle-mounted sensor data and toilet sheet image data - Google Patents

Feature fusion method and system for vehicle-mounted sensor data and toilet sheet image data Download PDF

Info

Publication number
CN117830776A
CN117830776A CN202311765226.6A CN202311765226A CN117830776A CN 117830776 A CN117830776 A CN 117830776A CN 202311765226 A CN202311765226 A CN 202311765226A CN 117830776 A CN117830776 A CN 117830776A
Authority
CN
China
Prior art keywords
feature
attention
vehicle
mask
mounted sensor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311765226.6A
Other languages
Chinese (zh)
Inventor
尹玉成
蔡晨
石涤文
王一鹏
张志军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Heading Data Intelligence Co Ltd
Original Assignee
Heading Data Intelligence Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Heading Data Intelligence Co Ltd filed Critical Heading Data Intelligence Co Ltd
Priority to CN202311765226.6A priority Critical patent/CN117830776A/en
Publication of CN117830776A publication Critical patent/CN117830776A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention provides a method and a system for fusing characteristics of vehicle-mounted sensor data and toilet image data, wherein the method comprises the following steps: acquiring a toilet sheet image of an acquisition area of a vehicle-mounted sensor; extracting image features of the vehicle-mounted sensor through a multi-layer perceptron and inverse perspective transformation to obtain first features, and extracting image features of the toilet sheets through a UNet network and an FPN network to obtain second features; fusion weights of different positions in the image are adjusted based on position embedding, and the Q value, the K value and the V value of the attention network are obtained through linear projection learning; the segmentation mask and the distance mask in the second feature are used as attention masks to filter interference information in the toilet image data, and the first feature is extracted based on a masked cross attention mechanism to obtain a third feature; and aligning the second feature with the third feature, and fusing the aligned second feature and third feature. According to the scheme, feature fusion of the vehicle-mounted sensor image and the toilet film image can be achieved, and high-precision map manufacturing efficiency and quality are improved.

Description

Feature fusion method and system for vehicle-mounted sensor data and toilet sheet image data
Technical Field
The invention belongs to the field of high-precision maps, and particularly relates to a method and a system for fusing vehicle-mounted sensor data and satellite image data characteristics.
Background
The vehicle-mounted sensor data and the toilet-film image data play an important role in the process of manufacturing the crowdsourcing high-precision map, and can provide a lot of key data for manufacturing the high-precision map. At present, the high-precision map is mostly manufactured by singly using one data or separately using two data, for example, the map is constructed based on an on-vehicle sensor, and then the map is adjusted through a toilet image. However, due to certain defects of both data, the vehicle-mounted sensor is difficult to process complex intersections and scenes blocked by front obstacles, and the satellite images are blocked by high vegetation or building shadows, so that the actual high-precision map manufacturing efficiency is low and errors are easy to occur.
Disclosure of Invention
In view of the above, the embodiment of the invention provides a method and a system for fusing the characteristics of vehicle-mounted sensor data and toilet-film image data, which are used for solving the problems that the existing high-precision map is low in manufacturing efficiency and has errors.
In a first aspect of the embodiment of the present invention, a method for feature fusion between vehicle-mounted sensor data and toilet-film image data is provided, including:
acquiring the image data of the toilet sheets in the acquisition area of the vehicle-mounted sensor;
extracting image features of the vehicle-mounted sensor through a multi-layer perceptron and inverse perspective transformation to obtain first features, and extracting image data features of the toilet sheets through a UNet network and an FPN network to obtain second features;
based on the fusion weights of different positions in the position embedded and adjusted image, learning a first characteristic through linear projection to obtain an attention network Q value, and learning a second characteristic to obtain an attention network K value and a V value;
taking the segmentation mask and the distance mask in the second feature as attention masks, filtering interference information in the toilet sheet image data based on the attention masks, and extracting the first feature based on a cross attention mechanism with masks to obtain a third feature;
and aligning the second feature with the third feature, and fusing the aligned second feature and third feature.
In a second aspect of the embodiment of the present invention, there is provided a system for feature fusion of vehicle-mounted sensor data and film image data, including:
the data acquisition module is used for acquiring the toilet sheet image data of the acquisition area of the vehicle-mounted sensor;
the feature extraction module is used for extracting image features of the vehicle-mounted sensor through the multi-layer perceptron and inverse perspective transformation to obtain first features, and extracting image data features of the toilet sheets through the UNet network and the FPN network to obtain second features;
the feature learning module is used for adjusting the fusion weights of different positions in the image based on position embedding, learning a first feature through linear projection to obtain an attention network Q value, and learning a second feature to obtain an attention network K value and a V value;
the filtering and extracting module is used for taking the segmentation mask and the distance mask in the second feature as attention masks, filtering interference information in the toilet piece image data based on the attention masks, and extracting the first feature based on a cross attention mechanism with masks to obtain a third feature;
and the alignment fusion module is used for aligning the second feature with the third feature position and fusing the aligned second feature with the third feature.
In a third aspect of the embodiments of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to the first aspect of the embodiments of the present invention when the computer program is executed by the processor.
In a fourth aspect of the embodiments of the present invention, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method provided by the first aspect of the embodiments of the present invention.
In the embodiment of the invention, the fusion weights of different positions in the image are adjusted based on position embedding by respectively extracting the characteristics of the vehicle-mounted sensor image and the toilet image, and the Q value, the K value and the V value corresponding to the attention network are obtained through linear projection learning characteristics; the method comprises the steps of filtering interference information in the guard sheet image data based on the attention mask, extracting sensor image features based on a masked cross attention mechanism, aligning the sensor image features with the guard sheet image features, and fusing the aligned features, so that BEV (back-to-back) level fusion of the vehicle-mounted sensor image and the guard sheet image is realized, high-precision map manufacturing efficiency is improved, and accuracy and reliability of high-precision map manufacturing can be guaranteed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is a flow chart of a method for feature fusion of vehicle-mounted sensor data and toilet-film image data according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a feature fusion system for vehicle-mounted sensor data and film image data according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more comprehensible, the technical solutions in the embodiments of the present invention are described in detail below with reference to the accompanying drawings, and it is apparent that the embodiments described below are only some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the term "comprising" and other similar meaning in the description of the invention or the claims and the above-mentioned figures is intended to cover a non-exclusive inclusion, such as a process, method or system, apparatus comprising a series of steps or elements, without limitation to the listed steps or elements. Furthermore, "first" and "second" are used to distinguish between different objects and are not used to describe a particular order.
Referring to fig. 1, a flow chart of a method for feature fusion of vehicle-mounted sensor data and toilet-film image data according to an embodiment of the present invention includes:
s101, acquiring the image data of a toilet sheet in an acquisition area of a vehicle-mounted sensor;
the vehicle-mounted sensor can comprise a vehicle-mounted camera and a laser radar, and a defensive film image corresponding to a data acquisition area of the vehicle-mounted sensor needs to be acquired. The satellite image data is a satellite photo image, namely a satellite image, and is an image for scanning ground features.
The conversion relation between the vehicle-mounted sensor data and the toilet images needs to be determined so as to convert ground object targets acquired by the vehicle-mounted sensor.
Optionally, constructing a transformation matrix between the satellite image coordinate system and the global coordinate system of the vehicle-mounted sensor by a key point alignment method; and carrying out coordinate transformation on the vehicle-mounted sensor data and the vehicle position based on the transformation matrix.
The key points are generally positions with obvious features on the map, such as intersections, traffic signs, landmarks and the like, the conversion relationship between the vehicle-mounted sensor and the toilet images can be established based on the key point positions, the sensor acquisition positions, the target marks and the like, and the ground object targets acquired by the vehicle-mounted sensor can be converted according to the conversion relationship (namely a change matrix).
And acquiring a corresponding satellite map area according to the sample position and the reverse information acquired by the vehicle-mounted sensor. The reverse information comprises a transformation matrix of a toilet sheet image coordinate system and a vehicle-mounted sensor coordinate system.
S102, extracting image features of a vehicle-mounted sensor through a multi-layer perceptron and inverse perspective transformation to obtain first features, and extracting image data features of a toilet sheet through a UNet network and an FPN network to obtain second features;
a multi-layer perceptron (MLP) is a feed-forward neural network that can map multiple data sets of an input onto a single data set of an output. After multi-layer sensing is carried out on the vehicle-mounted sensor image, the characteristics in the vehicle-mounted sensor image, namely the first characteristics, are obtained through inverse perspective transformation.
The UNet network is a pixel point classification network, and the structure of the UNet network comprises two parts, namely an encoding part and a decoding part, wherein the first half part is used for feature extraction, the second half part is used for up-sampling, and the UNet network can be used for distinguishing whether a pixel point belongs to the foreground or the background. The FPN (Feature Pyramid Networks, i.e., feature pyramid) network is an object detection network that is capable of predicting objects of different scales. In the embodiment, the Unet and the FPN structure are combined to identify and extract the characteristics in the toilet sheet image, so that the accuracy of detection and identification of the multi-scale targets can be effectively improved.
S103, adjusting fusion weights of different positions in an image based on position embedding, learning a first characteristic through linear projection to obtain an attention network Q value, and learning a second characteristic to obtain an attention network K value and a V value;
position embedding is a technique of encoding information of each position in a sequence into a fixed length vector, which can provide information about the position order for a model, which can automatically adjust fusion weights during learning.
And linearly projecting the input feature sequence, and multiplying the input features by three trainable parameter matrixes respectively to obtain corresponding (Q, K and V), namely Query, key and Value in the attention network. The linear projection can be a linear layer without bias, and can realize the visual classification of multidimensional data features.
S104, taking the segmentation mask and the distance mask in the second feature as attention masks, filtering interference information in the toilet piece image data based on the attention masks, and extracting the first feature based on a cross attention mechanism with masks to obtain a third feature;
the segmentation mask is derived from a segmentation model, and the distance mask may be determined according to the following equation:
that is, when the distance is greater than or equal to the fixed value D, the target is filtered, and if the target sequence Eul (x, y) is smaller than the fixed value D, the corresponding target value-i nf is reserved.
The attention mask is a combination of a segmentation mask AND a distance mask, which may be obtained by logically ANding the segmentation mask AND the distance mask.
The masked cross attention mechanism is to adjust the attention of the output of the encoder through the masked cross attention layer, thereby obtaining the encoder information of the decoding position. The first feature may be further refined by a masked cross-attention mechanism resulting in a third feature.
Specifically, a segmentation mask and a distance mask of the second feature are created; acquiring a first characteristic query vector and a key and value vector of a second characteristic obtained by learning after linear projection; performing dot product operation on the query vector of the first feature and the key vector of the second feature, dividing the dot product operation by the square root of the dimension of the query vector, and calculating to obtain the attention fraction; based on the segmentation mask and the distance mask, multiplying the attention score by the segmentation mask of the second feature element by element, masking filling positions in the first feature, multiplying the attention score by the distance mask of the second feature element by element, masking positions in the second feature that exceed a predetermined distance; carrying out softmax normalization on the attention score after the masking to obtain attention weight; and carrying out weighted summation on the attention weight and the value vector of the second feature to obtain a third feature of the cross attention output.
Creating a segmentation mask and a distance mask of the source sequence and the target sequence, wherein the segmentation mask is related to filling positions and is used for shielding filling positions in the source sequence and the target sequence, and the distance mask is used for shielding remote positions in the target sequence; respectively carrying out linear projection on the source sequence and the target sequence to obtain a query (Q) vector of the source sequence and a key (K) and value (V) vector of the target sequence; calculating an attention score, performing dot product operation on the query vector of the target sequence and the key vector of the source sequence, and dividing the dot product operation by the square root of the dimension of the query vector to obtain the attention score; the segmentation mask and the distance mask are applied to multiply the attention score element by element with the segmentation mask of the source sequence to mask the padding positions in the source sequence. Multiplying the attention score by the distance mask of the target sequence element by element to mask remote locations in the target sequence; performing softmax normalization on the attention score after application masking to obtain attention weight; and carrying out weighted summation on the attention weight and the value vector of the source sequence to obtain the output of the cross attention.
And S105, aligning the second feature with the third feature, and fusing the aligned second feature and third feature.
And predicting the coordinate offset of each position of the second feature through the neural network convolution layer, and adjusting the position of the second feature based on the coordinate offset. And predicting the coordinate offset of each position by constructing a plurality of convolution layers, adjusting the position of the second feature based on the coordinate offset, and performing splicing fusion of the second feature and the third feature after feature alignment.
In the embodiment, image features of the vehicle-mounted sensor are extracted through a multi-layer perceptron and inverse perspective transformation respectively, and image data features of the toilet sheets are extracted through a UNet network and an FPN network; based on fusion weights of different positions in the position embedded and adjusted image, obtaining the Q value, the K value and the V value of the attention network through linear projection learning characteristics; filtering interference information in the toilet patch image data based on the attention mask, and extracting image features of the vehicle-mounted sensor based on a masked cross attention mechanism; and carrying out feature fusion after feature alignment. Therefore, the fusion of the vehicle-mounted sensor image and the guard image feature BEV level is realized, the high-precision map manufacturing efficiency can be improved, and the accuracy and the reliability of feature extraction and fusion can be ensured.
It should be understood that the sequence number of each step in the above embodiment does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not be construed as limiting the implementation process of the embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a feature fusion system for vehicle-mounted sensor data and toilet-film image data, which is provided in an embodiment of the present invention, and includes:
the data acquisition module 210 is configured to acquire the photo image data of the acquisition area of the vehicle-mounted sensor;
wherein, the data acquisition module 210 further comprises:
the coordinate conversion module is used for constructing a transformation matrix between the satellite image coordinate system and the global coordinate system of the vehicle-mounted sensor through a key point alignment method; and carrying out coordinate transformation on the vehicle-mounted sensor data based on the transformation matrix.
The feature extraction module 220 is configured to extract image features of the vehicle-mounted sensor through the multi-layer perceptron and inverse perspective transformation to obtain a first feature, and extract image data features of the toilet sheet through the UNet network and the FPN network to obtain a second feature;
the feature learning module 230 is configured to adjust fusion weights of different positions in the image based on position embedding, learn a first feature through linear projection to obtain an attention network Q value, and learn a second feature to obtain an attention network K value and a V value;
the filtering and extracting module 240 is configured to take the segmentation mask and the distance mask in the second feature as attention masks, filter interference information in the toilet sheet image data based on the attention masks, and extract the first feature based on a cross attention mechanism with masks to obtain a third feature;
specifically, a segmentation mask of the first feature and a distance mask of the second feature are created;
creating a segmentation mask and a distance mask for the second feature;
acquiring a first characteristic query vector and a key and value vector of a second characteristic obtained by learning after linear projection;
performing dot product operation on the query vector of the first feature and the key vector of the second feature, dividing the dot product operation by the square root of the dimension of the query vector, and calculating to obtain the attention fraction;
based on the segmentation mask and the distance mask, multiplying the attention score by the segmentation mask of the second feature element by element, masking filling positions in the first feature, multiplying the attention score by the distance mask of the second feature element by element, masking positions in the second feature that exceed a predetermined distance;
carrying out softmax normalization on the attention score after the masking to obtain attention weight;
and carrying out weighted summation on the attention weight and the value vector of the second feature to obtain a third feature of the cross attention output.
And an alignment fusion module 250, configured to align the second feature with the third feature, and fuse the aligned second feature and third feature.
Wherein aligning the second feature with the third feature location comprises:
and predicting the coordinate offset of each position of the second feature through the neural network convolution layer, and carrying out position adjustment on the second feature based on the coordinate offset.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the above-described system and module may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic equipment is used for carrying out feature fusion on the vehicle-mounted sensor data and the toilet image data. As shown in fig. 3, the electronic apparatus 3 of this embodiment includes: memory 310, processor 320, and system bus 330, the memory 310 including an executable program 3101 stored thereon, it will be understood by those skilled in the art that the electronic device structure shown in fig. 3 is not limiting of the electronic device and may include more or fewer components than illustrated, or may combine certain components, or a different arrangement of components.
The following describes the respective constituent elements of the electronic device in detail with reference to fig. 3:
the memory 310 may be used to store software programs and modules, and the processor 320 may execute various functional applications and data processing of the electronic device by executing the software programs and modules stored in the memory 310. The memory 310 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device (such as cache data), and the like. In addition, memory 310 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
An executable program 3101 of the network request method is contained on the memory 310, the executable program 3101 may be divided into one or more modules/units, which are stored in the memory 310 and executed by the processor 320 to perform feature fusion of the in-vehicle sensor data and the slice image data, etc., and the one or more modules/units may be a series of computer program instruction segments capable of performing a specific function, which are used to describe the execution process of the executable program 3101 in the electronic device 3. For example, the executable program 3101 may be divided into functional modules such as a data acquisition module, a feature extraction module, a feature learning module, a filter extraction module, and an alignment fusion module.
Processor 320 is a control center of the electronic device that utilizes various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the electronic device and process data by running or executing software programs and/or modules stored in memory 310, and invoking data stored in memory 310, thereby performing overall condition monitoring of the electronic device. Optionally, processor 320 may include one or more processing units; preferably, the processor 320 may integrate an application processor that primarily handles operating systems, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 320.
The system bus 330 is used to connect functional components in the computer, and CAN transmit data information, address information, and control information, and the types of the system bus may be, for example, a PCI bus, an isa bus, and a CAN bus. Instructions from the processor 320 are transferred to the memory 310 through the bus, the memory 310 feeds back data to the processor 320, and the system bus 330 is responsible for data and instruction interaction between the processor 320 and the memory 310. Of course, the system bus 330 may also access other devices, such as a network interface, a display device, etc.
In an embodiment of the present invention, the executable program executed by the processor 320 included in the electronic device includes:
acquiring the image data of the toilet sheets in the acquisition area of the vehicle-mounted sensor;
extracting image features of the vehicle-mounted sensor through a multi-layer perceptron and inverse perspective transformation to obtain first features, and extracting image data features of the toilet sheets through a UNet network and an FPN network to obtain second features;
based on the fusion weights of different positions in the position embedded and adjusted image, learning a first characteristic through linear projection to obtain an attention network Q value, and learning a second characteristic to obtain an attention network K value and a V value;
taking the segmentation mask and the distance mask in the second feature as attention masks, filtering interference information in the toilet sheet image data based on the attention masks, and extracting the first feature based on a cross attention mechanism with masks to obtain a third feature;
and aligning the second feature with the third feature, and fusing the aligned second feature and third feature.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system, apparatus and module may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. The characteristic fusion method of the vehicle-mounted sensor data and the toilet sheet image data is characterized by comprising the following steps of:
acquiring the image data of the toilet sheets in the acquisition area of the vehicle-mounted sensor;
extracting image features of the vehicle-mounted sensor through a multi-layer perceptron and inverse perspective transformation to obtain first features, and extracting image data features of the toilet sheets through a UNet network and an FPN network to obtain second features;
based on the fusion weights of different positions in the position embedded and adjusted image, learning a first characteristic through linear projection to obtain an attention network Q value, and learning a second characteristic to obtain an attention network K value and a V value;
taking the segmentation mask and the distance mask in the second feature as attention masks, filtering interference information in the toilet sheet image data based on the attention masks, and extracting the first feature based on a cross attention mechanism with masks to obtain a third feature;
and aligning the second feature with the third feature, and fusing the aligned second feature and third feature.
2. The method of claim 1, wherein the acquiring the toilet image data of the in-vehicle sensor acquisition area further comprises:
constructing a transformation matrix between a satellite image coordinate system and a global coordinate system of the vehicle-mounted sensor by a key point alignment method;
and carrying out coordinate transformation on the vehicle-mounted sensor data based on the transformation matrix.
3. The method of claim 1, wherein the filtering the disturbance information in the slice image data based on the attention mask using the segmentation mask and the distance mask in the second feature as the attention mask and extracting the first feature based on the masked cross-attention mechanism to obtain the third feature comprises:
creating a segmentation mask and a distance mask for the second feature;
acquiring a first characteristic query vector and a key and value vector of a second characteristic obtained by learning after linear projection;
performing dot product operation on the query vector of the first feature and the key vector of the second feature, dividing the dot product operation by the square root of the dimension of the query vector, and calculating to obtain the attention fraction;
based on the segmentation mask and the distance mask, multiplying the attention score by the segmentation mask of the second feature element by element, masking filling positions in the first feature, multiplying the attention score by the distance mask of the second feature element by element, masking positions in the second feature that exceed a predetermined distance;
carrying out softmax normalization on the attention score after the masking to obtain attention weight;
and carrying out weighted summation on the attention weight and the value vector of the second feature to obtain a third feature of the cross attention output.
4. The method of claim 1, wherein aligning the second feature with the third feature location comprises:
and predicting the coordinate offset of each position of the second feature through the neural network convolution layer, and carrying out position adjustment on the second feature based on the coordinate offset.
5. The utility model provides a vehicle-mounted sensor data and guard film image data characteristic fusion system which characterized in that includes:
the data acquisition module is used for acquiring the toilet sheet image data of the acquisition area of the vehicle-mounted sensor;
the feature extraction module is used for extracting image features of the vehicle-mounted sensor through the multi-layer perceptron and inverse perspective transformation to obtain first features, and extracting image data features of the toilet sheets through the UNet network and the FPN network to obtain second features;
the feature learning module is used for adjusting the fusion weights of different positions in the image based on position embedding, learning a first feature through linear projection to obtain an attention network Q value, and learning a second feature to obtain an attention network K value and a V value;
the filtering and extracting module is used for taking the segmentation mask and the distance mask in the second feature as attention masks, filtering interference information in the toilet piece image data based on the attention masks, and extracting the first feature based on a cross attention mechanism with masks to obtain a third feature;
and the alignment fusion module is used for aligning the second feature with the third feature position and fusing the aligned second feature with the third feature.
6. The system of claim 5, wherein the data acquisition module further comprises:
the coordinate conversion module is used for constructing a transformation matrix between the satellite image coordinate system and the global coordinate system of the vehicle-mounted sensor through a key point alignment method; and carrying out coordinate transformation on the vehicle-mounted sensor data based on the transformation matrix.
7. The system of claim 5, wherein the filtering the disturbance information in the slice image data based on the attention mask using the segmentation mask and the distance mask in the second feature as the attention mask and extracting the first feature based on the masked cross-attention mechanism to obtain the third feature comprises:
creating a segmentation mask and a distance mask for the second feature;
acquiring a first characteristic query vector and a key and value vector of a second characteristic obtained by learning after linear projection;
performing dot product operation on the query vector of the first feature and the key vector of the second feature, dividing the dot product operation by the square root of the dimension of the query vector, and calculating to obtain the attention fraction;
based on the segmentation mask and the distance mask, multiplying the attention score by the segmentation mask of the second feature element by element, masking filling positions in the first feature, multiplying the attention score by the distance mask of the second feature element by element, masking positions in the second feature that exceed a predetermined distance;
carrying out softmax normalization on the attention score after the masking to obtain attention weight;
and carrying out weighted summation on the attention weight and the value vector of the second feature to obtain a third feature of the cross attention output.
8. The system of claim 5, wherein the aligning the second feature with the third feature location comprises:
and predicting the coordinate offset of each position of the second feature through the neural network convolution layer, and carrying out position adjustment on the second feature based on the coordinate offset.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, performs the steps of a method for feature fusion of on-board sensor data and toilet-film image data as claimed in any one of claims 1 to 4.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed implements the steps of a method for feature fusion of vehicle-mounted sensor data and toilet-film image data according to any one of claims 1 to 4.
CN202311765226.6A 2023-12-20 2023-12-20 Feature fusion method and system for vehicle-mounted sensor data and toilet sheet image data Pending CN117830776A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311765226.6A CN117830776A (en) 2023-12-20 2023-12-20 Feature fusion method and system for vehicle-mounted sensor data and toilet sheet image data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311765226.6A CN117830776A (en) 2023-12-20 2023-12-20 Feature fusion method and system for vehicle-mounted sensor data and toilet sheet image data

Publications (1)

Publication Number Publication Date
CN117830776A true CN117830776A (en) 2024-04-05

Family

ID=90520109

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311765226.6A Pending CN117830776A (en) 2023-12-20 2023-12-20 Feature fusion method and system for vehicle-mounted sensor data and toilet sheet image data

Country Status (1)

Country Link
CN (1) CN117830776A (en)

Similar Documents

Publication Publication Date Title
CN110648283B (en) Image splicing method and device, electronic equipment and computer readable storage medium
CN104766058B (en) A kind of method and apparatus for obtaining lane line
CN112330601B (en) Fish-eye camera-based parking detection method, device, equipment and medium
CN103337052B (en) Automatic geometric correcting method towards wide cut remote sensing image
CN108318043A (en) Method, apparatus for updating electronic map and computer readable storage medium
CN112667837A (en) Automatic image data labeling method and device
CN111476343B (en) Method and apparatus for utilizing masking parameters
CN103383773A (en) Automatic ortho-rectification frame and method for dynamically extracting remote sensing satellite image of image control points
CN112801158A (en) Deep learning small target detection method and device based on cascade fusion and attention mechanism
KR102308456B1 (en) Tree species detection system based on LiDAR and RGB camera and Detection method of the same
CN112686274B (en) Target object detection method and device
CN110260857A (en) Calibration method, device and the storage medium of vision map
CN109190662A (en) A kind of three-dimensional vehicle detection method, system, terminal and storage medium returned based on key point
CN113408454B (en) Traffic target detection method, device, electronic equipment and detection system
CN111652915A (en) Remote sensing image overlapping area calculation method and device and electronic equipment
CN111758118B (en) Visual positioning method, device, equipment and readable storage medium
CN113743163A (en) Traffic target recognition model training method, traffic target positioning method and device
CN117830776A (en) Feature fusion method and system for vehicle-mounted sensor data and toilet sheet image data
CN115345944A (en) Method and device for determining external parameter calibration parameters, computer equipment and storage medium
CN114550016A (en) Unmanned aerial vehicle positioning method and system based on context information perception
CN109919998B (en) Satellite attitude determination method and device and terminal equipment
CN114531580B (en) Image processing method and device
Wang et al. A method of airborne infrared and visible image matching based on hog feature
CN117893634A (en) Simultaneous positioning and map construction method and related equipment
CN116242332A (en) High-precision map ground element acquisition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination