CN116740355A - Automatic driving image segmentation method, device, equipment and storage medium - Google Patents

Automatic driving image segmentation method, device, equipment and storage medium Download PDF

Info

Publication number
CN116740355A
CN116740355A CN202310709950.0A CN202310709950A CN116740355A CN 116740355 A CN116740355 A CN 116740355A CN 202310709950 A CN202310709950 A CN 202310709950A CN 116740355 A CN116740355 A CN 116740355A
Authority
CN
China
Prior art keywords
feature map
image
network model
vector
autopilot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310709950.0A
Other languages
Chinese (zh)
Inventor
邢春上
张松林
陈博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Faw Nanjing Technology Development Co ltd
FAW Group Corp
Original Assignee
Faw Nanjing Technology Development Co ltd
FAW Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Faw Nanjing Technology Development Co ltd, FAW Group Corp filed Critical Faw Nanjing Technology Development Co ltd
Priority to CN202310709950.0A priority Critical patent/CN116740355A/en
Publication of CN116740355A publication Critical patent/CN116740355A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of computer vision, and discloses a segmentation method, a segmentation device, segmentation equipment and a storage medium of an automatic driving image. The method comprises the following steps: acquiring an automatic driving image, and inputting the automatic driving image into a pre-trained residual neural network model to acquire an initial feature map; inputting the initial feature map to a pre-trained channel attention network model to obtain feature vectors; the initial feature map is multiplied by the feature vector to obtain a weighted feature map, and the autopilot image is segmented into a plurality of sub-images according to the weighted feature map. According to the technical scheme, the characteristic map is extracted by adopting the residual neural network model, and the characteristic map is weighted by adopting the channel attention network model, so that the edge segmentation accuracy of different semantic information in the image can be improved, and the calculated amount of image segmentation can be reduced.

Description

Automatic driving image segmentation method, device, equipment and storage medium
Technical Field
The present invention relates to the field of computer vision, and in particular, to a method, apparatus, device, and storage medium for segmenting an autopilot image.
Background
With the rapid development of automatic driving technology, higher requirements are put on the accuracy, stability and intelligence of the automatic driving technology. Currently, various sensors, such as a visual camera and a laser radar, are mounted on an automatic driving vehicle, and the visual camera has become one of the sensors widely used in automatic driving mass production vehicles because of low cost. The visual camera is often influenced by objective conditions such as backlight, darkness and the like, so that the post-processing effect of the original image can be influenced. In addition to the use of high-precision sensors, accurate segmentation of the original image is one of the most important tasks to achieve high precision positioning and high stability of the autopilot system.
Currently, the existing image segmentation methods mainly comprise three types of segmentation methods based on a threshold value, segmentation methods based on an edge and segmentation methods based on a region. However, for the threshold-based segmentation method, it cannot be applied to an image in which gray values are relatively complex; for the segmentation method based on the edge, the image segmentation effect is generally affected due to the fact that the edge detection is not accurate enough; with respect to the region-based segmentation method, there are problems in that the calculation amount is large and the time consumption is long.
Disclosure of Invention
The invention provides a segmentation method, a device, equipment and a storage medium for an automatic driving image, which can improve the edge segmentation accuracy of different semantic information in the image and reduce the calculated amount of image segmentation.
According to an aspect of the present invention, there is provided a segmentation method of an autopilot image, including:
acquiring an autopilot image, inputting the autopilot image into a pre-trained residual neural network model, and acquiring an initial feature map corresponding to the autopilot image output by the residual neural network model;
inputting the initial feature map to a pre-trained channel attention network model, and acquiring a feature vector corresponding to the initial feature map output by the channel attention network model; the channel attention network model is established based on a global average pooling algorithm and a global maximum pooling algorithm;
multiplying the initial feature map with the feature vector to obtain a weighted feature map, and dividing the autopilot image into a plurality of sub-images according to the weighted feature map.
According to another aspect of the present invention, there is provided an apparatus for segmenting an autopilot image, comprising:
the initial feature map acquisition module is used for acquiring an automatic driving image, inputting the automatic driving image into a pre-trained residual neural network model and acquiring an initial feature map corresponding to the automatic driving image output by the residual neural network model;
the feature vector acquisition module is used for inputting the initial feature map to a pre-trained channel attention network model and acquiring a feature vector corresponding to the initial feature map output by the channel attention network model; the channel attention network model is established based on a global average pooling algorithm and a global maximum pooling algorithm;
and the weighted feature map acquisition module is used for multiplying the initial feature map and the feature vector to acquire a weighted feature map and dividing the automatic driving image into a plurality of sub-images according to the weighted feature map.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of segmentation of an autopilot image according to any one of the embodiments of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the method for segmenting an autopilot image according to any one of the embodiments of the present invention when executed.
According to the technical scheme, the automatic driving image is acquired and is input into the pre-trained residual neural network model, and an initial feature map corresponding to the automatic driving image output by the residual neural network model is acquired; then, inputting the initial feature map into a pre-trained channel attention network model, and acquiring a feature vector corresponding to the initial feature map output by the channel attention network model; finally, multiplying the initial feature map with the feature vector to obtain a weighted feature map, and dividing the automatic driving image into a plurality of sub-images according to the weighted feature map; the feature map is extracted by adopting the residual neural network model, and the feature map is weighted by adopting the channel attention network model, so that the edge segmentation accuracy of different semantic information in the image can be improved, and the calculated amount of image segmentation can be reduced.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1A is a flowchart of a method for segmenting an autopilot image according to a first embodiment of the present invention;
fig. 1B is a schematic diagram of a residual block structure according to a first embodiment of the present invention;
FIG. 1C is a schematic diagram of a channel attention network model according to an embodiment of the present invention;
fig. 1D is a flowchart of another method for segmenting an autopilot image according to one embodiment of the present invention;
fig. 2 is a schematic structural diagram of an automatic driving image segmentation apparatus according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device implementing a method of segmenting an autopilot image according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," "target," and the like in the description and claims of the present invention and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1A is a flowchart of a method for segmenting an autopilot image according to an embodiment of the present invention, where the method may be applied to the situation of segmenting different semantic information in an autopilot image, and the method may be performed by an autopilot image segmentation apparatus, where the autopilot image segmentation apparatus may be implemented in hardware and/or software, and typically the autopilot image segmentation apparatus may be configured in an electronic device, for example, a computer device, a server, or a vehicle-mounted device. As shown in fig. 1A, the method includes:
s110, acquiring an automatic driving image, inputting the automatic driving image into a pre-trained residual neural network model, and acquiring an initial feature map corresponding to the automatic driving image output by the residual neural network model.
In this embodiment, the automatic driving image may be acquired in real time during the running of the vehicle by a visual camera disposed on the vehicle, or may be acquired from the internet. The autopilot image may include, among other things, different types of objects such as people, vehicles, and driving environments. Typically, the autopilot image may be an RGB three-channel image.
In this embodiment, the feature map extraction may be performed on the autopilot image using a convolutional neural network, which may typically be a residual neural network (Residual Neural Network, resNet). In a specific example, the initial ResNet model can be model trained using the labeled image samples until a trained ResNet model is obtained. The initial ResNet model can be built based on preset model parameters.
The residual neural network model may include a preset number of residual blocks, and the residual blocks may include a first convolution layer and a second convolution layer, and convolution kernel sizes of the first convolution layer and the second convolution layer are 1×1. The preset number may be a preset number of residual blocks, for example, may be 10. In the present embodiment, by setting the convolution kernel size of the convolution layer to 1×1, it can be ensured that the output feature map of the residual block remains in size with the original image. For example, the autopilot image size is [ H, W, C ], where C represents the number of channels, W and H represent the width and height (number of pixels) of each channel image, and the corresponding initial feature map size is also [ H, W, C ].
It should be noted that, the main core of the Residual neural network is a Residual Block (Residual Block), and the Residual Block may be composed of a plurality of convolution layers. And for the residual block, after one part of the input features is processed by each convolution layer, adding the processed residual block with the other part of the features which are not processed by the convolution layers to obtain a final output. It will be appreciated that the number of convolutional layers of the residual block may be adaptively adjusted according to the task scenario.
In this embodiment, the feature extraction of the autopilot image is performed by adopting the residual neural network model, so that the occurrence of the overfitting phenomenon can be effectively prevented, and the feature extraction efficiency can be improved.
S120, inputting the initial feature map to a pre-trained channel attention network model, and acquiring a feature vector corresponding to the initial feature map output by the channel attention network model.
Wherein the channel attention network model may be built based on a global average pooling algorithm and a global maximum pooling algorithm.
In this embodiment, a channel attention mechanism may be used to perform a feature weighting operation on each channel of the feature map; specifically, two algorithms of global average pooling (Global Average Pooling, avgPool) and global maximum pooling (Global Max Pooling, maxPool) are adopted to pool the initial feature map, and the results of the two pooling operations are overlapped to obtain a final feature vector. The feature vector is used for guiding the corresponding weighting operation on the initial feature map.
The channel attention network model can comprise a pooling layer, a multi-layer perceptron and an adder; the parameters of each module may be determined by pre-training. In a specific example, an initial channel attention network model may be established according to preset model parameters, and then the initial model may be trained using pre-labeled feature samples to obtain a trained channel attention network model.
In this embodiment, each channel of the initial feature map is scored by using a channel attention mechanism, so that a higher weight is given to a channel with a high score, so that the channel can obtain more sufficient visual expression in the final segmentation image, the expression effect of different semantic information in the image can be effectively improved, and the edge segmentation effect between different semantic information can be improved.
S130, multiplying the initial feature map and the feature vector to obtain a weighted feature map, and dividing the automatic driving image into a plurality of sub-images according to the weighted feature map.
In this embodiment, the feature vector may be multiplied by the initial feature map to perform feature weighting on the initial feature map, and the product may be used as a weighted feature map. Then, the automatic driving image is segmented according to each characteristic value in the weighted characteristic diagram so as to obtain a plurality of sub-images, and therefore extraction of the target object is achieved. Alternatively, after the weighted feature map is acquired, region segmentation may be performed in the original autopilot image according to each feature value, for example, using lines or colors. In the present embodiment, the method of image segmentation based on the weighted feature map is not particularly limited.
Optionally, after dividing the autopilot image into a plurality of sub-images, object recognition may be performed on each sub-image according to the weighted feature map, so as to determine an object classification corresponding to each sub-image, for example, a person, a vehicle, an animal, etc.
According to the technical scheme, the automatic driving image is acquired and is input into the pre-trained residual neural network model, and an initial feature map corresponding to the automatic driving image output by the residual neural network model is acquired; then, inputting the initial feature map into a pre-trained channel attention network model, and acquiring a feature vector corresponding to the initial feature map output by the channel attention network model; finally, multiplying the initial feature map with the feature vector to obtain a weighted feature map, and dividing the automatic driving image into a plurality of sub-images according to the weighted feature map; the feature map is extracted by adopting the residual neural network model, and the feature map is weighted by adopting the channel attention network model, so that the edge segmentation accuracy of different semantic information in the image can be improved, and the calculated amount of image segmentation can be reduced.
In an optional implementation manner of this embodiment, inputting the autopilot image into a pre-trained residual neural network model, and obtaining an initial feature map corresponding to the autopilot image output by the residual neural network model may include:
inputting the current image characteristic into a first convolution layer of a current residual block to obtain a first convolution characteristic, and carrying out nonlinear characteristic mapping on the first convolution characteristic through a preset activation function to obtain an intermediate characteristic;
inputting the intermediate feature to a second convolution layer of the current residual block to obtain a second convolution feature, and adding the second convolution feature to the current image feature to obtain a target feature;
and carrying out nonlinear feature mapping on the target features through a preset activation function to obtain a current feature map output by the current residual block.
The current image feature may be a feature map output by the previous residual block, or may be an original image feature of the autopilot image. The preset activation function may be a ReLu function.
In a specific example, the residual block structure may be shown in fig. 1B, where X represents the current image feature, weight layer is a convolution layer, reLu is a preset activation function, and may be F (X) =max (X, 0), F (X) represents the output feature after the convolution layer processing, and F (X) +x is the final output of the residual block. Specifically, the current image features sequentially pass through the first convolution layer, the ReLu activation function and the second convolution layer to obtain second convolution features, and then the second convolution features are added with the current image features and then processed through the ReLu activation function to obtain a final output current feature map.
The method for inputting the autopilot image into a pre-trained residual neural network model and obtaining an initial feature map corresponding to the autopilot image output by the residual neural network model may include:
obtaining a current feature map y according to a formula y=sigma (F (x, a) +x) through the current residual block, wherein sigma (-) represents a preset activation function, F (-) represents a residual function, x represents a current image feature, and a represents a convolution layer weight parameter.
In another optional implementation manner of this embodiment, inputting the initial feature map to a pre-trained channel attention network model, and obtaining a feature vector corresponding to the initial feature map output by the channel attention network model may include:
processing the initial feature map through a global average pooling algorithm to obtain a first pooling vector, and processing the initial feature map through a global maximum pooling algorithm to obtain a second pooling vector;
inputting the first pooling vector and the second pooling vector into a multi-layer perceptron, and obtaining a first mapping vector and a second mapping vector which are output by the multi-layer perceptron;
and adding the first mapping vector and the second mapping vector to obtain a superposition vector, and processing the superposition vector through a preset activation function to obtain a feature vector corresponding to the initial feature map.
In a specific example, the structure of the channel attention network model may be as shown in fig. 1C. In this embodiment, for the initial feature map (h×w×c), global maximum pooling and global average pooling may be performed from two dimensions of height and width to obtain two sets of one-dimensional pooling vectors, respectively; then, inputting the two groups of one-dimensional vectors into a preset multi-layer perceptron (Mutillayer Perceptrons, MLP) to perform corresponding feature mapping to respectively obtain two groups of one-dimensional vectors 1 multiplied by C1 and 1 multiplied by C2 after feature mapping, namely a first mapping vector and a second mapping vector; finally, the two sets of one-dimensional vectors are added and processed by a ReLu activation function to obtain the final feature vector 1 x C. Wherein the pooling algorithm may be implemented by a pooling layer.
Optionally, processing the initial feature map by a global average pooling algorithm to obtain a first pooled vector may include:
according to the formulaCalculating a first pooling vector AvgPool, wherein H and W respectively represent the height and width of an image channel, N represents an image channel index, N and M represent the size of an initial feature map, and c i,j Representing the feature values in the initial feature map. i and j represent feature map coordinates.
Optionally, processing the initial feature map by a global maximum pooling algorithm to obtain a second pooled vector may include:
according to the formulaA second pooling vector MaxPool is calculated, where Max (·) represents a Max-taking operation.
In a specific implementation of this embodiment, the flow of the method for segmenting the autopilot image may be as shown in fig. 1D. The method comprises the steps that an original automatic driving image is firstly input into a pre-trained residual neural network model to obtain an initial feature map; then, inputting the initial feature map into a pre-trained channel attention network model to obtain a feature vector of 1×1×c; and finally, multiplying the initial feature map with the feature vector to obtain a weighted feature map, and realizing image segmentation based on the weighted feature map.
Example two
Fig. 2 is a schematic structural diagram of an automatic driving image segmentation apparatus according to a second embodiment of the present invention. As shown in fig. 2, the apparatus includes: an initial feature map acquisition module 210, a feature vector acquisition module 220, and a weighted feature map acquisition module 230; wherein,,
the initial feature map obtaining module 210 is configured to obtain an autopilot image, input the autopilot image to a pre-trained residual neural network model, and obtain an initial feature map corresponding to the autopilot image output by the residual neural network model;
the feature vector obtaining module 220 is configured to input the initial feature map to a pre-trained channel attention network model, and obtain a feature vector corresponding to the initial feature map output by the channel attention network model; the channel attention network model is established based on a global average pooling algorithm and a global maximum pooling algorithm;
the weighted feature map obtaining module 230 is configured to multiply the initial feature map with the feature vector to obtain a weighted feature map, and divide the autopilot image into a plurality of sub-images according to the weighted feature map.
According to the technical scheme, the automatic driving image is acquired and is input into the pre-trained residual neural network model, and an initial feature map corresponding to the automatic driving image output by the residual neural network model is acquired; then, inputting the initial feature map into a pre-trained channel attention network model, and acquiring a feature vector corresponding to the initial feature map output by the channel attention network model; finally, multiplying the initial feature map with the feature vector to obtain a weighted feature map, and dividing the automatic driving image into a plurality of sub-images according to the weighted feature map; the feature map is extracted by adopting the residual neural network model, and the feature map is weighted by adopting the channel attention network model, so that the edge segmentation accuracy of different semantic information in the image can be improved, and the calculated amount of image segmentation can be reduced.
Optionally, the residual neural network model includes a preset number of residual blocks, where the residual blocks include a first convolution layer and a second convolution layer, and convolution kernel sizes of the first convolution layer and the second convolution layer are 1×1.
Optionally, the initial feature map obtaining module 210 is specifically configured to input a current image feature to a first convolution layer of a current residual block to obtain a first convolution feature, and perform nonlinear feature mapping on the first convolution feature through a preset activation function to obtain an intermediate feature;
inputting the intermediate feature to a second convolution layer of the current residual block to obtain a second convolution feature, and adding the second convolution feature to the current image feature to obtain a target feature;
and carrying out nonlinear feature mapping on the target features through a preset activation function to obtain a current feature map output by the current residual block.
Optionally, the initial feature map obtaining module 210 is specifically configured to obtain, according to the formula y=σ (F (x, a) +x), by using the current residual block, where σ (·) represents a preset activation function, F (·) represents a residual function, x represents a current image feature, and a represents a convolution layer weight parameter.
Optionally, the feature vector obtaining module 220 includes:
the pooling vector acquisition unit is used for processing the initial feature map through a global average pooling algorithm to acquire a first pooling vector, and processing the initial feature map through a global maximum pooling algorithm to acquire a second pooling vector;
the mapping vector acquisition unit is used for inputting the first pooling vector and the second pooling vector into the multi-layer perceptron and acquiring a first mapping vector and a second mapping vector output by the multi-layer perceptron;
and the feature vector acquisition unit is used for adding the first mapping vector and the second mapping vector to acquire a superposition vector, and processing the superposition vector through a preset activation function to acquire a feature vector corresponding to the initial feature map.
Optionally, the pooling vector obtaining unit is specifically configured to obtain the vector according to a formulaCalculating a first pooling vector AvgPool, wherein H and W respectively represent the height and width of an image channel, N represents an image channel index, N and M represent the size of an initial feature map, and c i,j Representing the feature values in the initial feature map.
Optionally, the pooling vector obtaining unit is specifically configured to obtain the vector according to a formulaA second pooling vector MaxPool is calculated, where Max (·) represents a Max-taking operation.
The automatic driving image segmentation device provided by the embodiment of the invention can execute the automatic driving image segmentation method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Embodiment III:
fig. 3 shows a schematic diagram of an electronic device 30 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 3, the electronic device 30 includes at least one processor 31, and a memory, such as a Read Only Memory (ROM) 32, a Random Access Memory (RAM) 33, etc., communicatively connected to the at least one processor 31, wherein the memory stores a computer program executable by the at least one processor, and the processor 31 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 32 or the computer program loaded from the storage unit 38 into the Random Access Memory (RAM) 33. In the RAM 33, various programs and data required for the operation of the electronic device 30 may also be stored. The processor 31, the ROM 32 and the RAM 33 are connected to each other via a bus 34. An input/output (I/O) interface 35 is also connected to bus 34.
Various components in electronic device 30 are connected to I/O interface 35, including: an input unit 36 such as a keyboard, a mouse, etc.; an output unit 37 such as various types of displays, speakers, and the like; a storage unit 38 such as a magnetic disk, an optical disk, or the like; and a communication unit 39 such as a network card, modem, wireless communication transceiver, etc. The communication unit 39 allows the electronic device 30 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 31 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 31 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 31 performs the respective methods and processes described above, such as the division method of the automatic driving image.
In some embodiments, the method of segmentation of the autopilot image may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 38. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 30 via the ROM 32 and/or the communication unit 39. When the computer program is loaded into the RAM 33 and executed by the processor 31, one or more steps of the above-described division method of the autopilot image may be performed. Alternatively, in other embodiments, the processor 31 may be configured to perform the method of segmentation of the autopilot image in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of segmenting an autopilot image, comprising:
acquiring an autopilot image, inputting the autopilot image into a pre-trained residual neural network model, and acquiring an initial feature map corresponding to the autopilot image output by the residual neural network model;
inputting the initial feature map to a pre-trained channel attention network model, and acquiring a feature vector corresponding to the initial feature map output by the channel attention network model; the channel attention network model is established based on a global average pooling algorithm and a global maximum pooling algorithm;
multiplying the initial feature map with the feature vector to obtain a weighted feature map, and dividing the autopilot image into a plurality of sub-images according to the weighted feature map.
2. The method of claim 1, wherein the residual neural network model comprises a predetermined number of residual blocks, the residual blocks comprising a first convolution layer and a second convolution layer, the first and second convolution layers having a convolution kernel size of 1 x 1.
3. The method of claim 2, wherein inputting the autopilot image into a pre-trained residual neural network model and obtaining an initial feature map corresponding to the autopilot image output by the residual neural network model comprises:
inputting the current image characteristic into a first convolution layer of a current residual block to obtain a first convolution characteristic, and carrying out nonlinear characteristic mapping on the first convolution characteristic through a preset activation function to obtain an intermediate characteristic;
inputting the intermediate feature to a second convolution layer of the current residual block to obtain a second convolution feature, and adding the second convolution feature to the current image feature to obtain a target feature;
and carrying out nonlinear feature mapping on the target features through a preset activation function to obtain a current feature map output by the current residual block.
4. A method according to claim 3, wherein inputting the autopilot image into a pre-trained residual neural network model and obtaining an initial feature map corresponding to the autopilot image output by the residual neural network model comprises:
obtaining a current feature map y according to a formula y=sigma (F (x, a) +x) through the current residual block, wherein sigma (-) represents a preset activation function, F (-) represents a residual function, x represents a current image feature, and a represents a convolution layer weight parameter.
5. The method of claim 1, wherein inputting the initial feature map to a pre-trained channel attention network model and obtaining feature vectors corresponding to the initial feature map output by the channel attention network model comprises:
processing the initial feature map through a global average pooling algorithm to obtain a first pooling vector, and processing the initial feature map through a global maximum pooling algorithm to obtain a second pooling vector;
inputting the first pooling vector and the second pooling vector into a multi-layer perceptron, and obtaining a first mapping vector and a second mapping vector which are output by the multi-layer perceptron;
and adding the first mapping vector and the second mapping vector to obtain a superposition vector, and processing the superposition vector through a preset activation function to obtain a feature vector corresponding to the initial feature map.
6. The method of claim 5, wherein processing the initial feature map by a global averaging pooling algorithm to obtain a first pooled vector comprises:
according to the formulaCalculating a first pooling vector AvgPool, wherein H and W respectively represent the height and width of an image channel, N represents an image channel index, N and M represent the size of an initial feature map, and c i,j Representing the feature values in the initial feature map.
7. The method of claim 6, wherein processing the initial feature map by a global maximization algorithm to obtain a second pooled vector comprises:
according to the formulaA second pooling vector MaxPool is calculated, where Max (·) represents a Max-taking operation.
8. An apparatus for segmenting an autopilot image, comprising:
the initial feature map acquisition module is used for acquiring an automatic driving image, inputting the automatic driving image into a pre-trained residual neural network model and acquiring an initial feature map corresponding to the automatic driving image output by the residual neural network model;
the feature vector acquisition module is used for inputting the initial feature map to a pre-trained channel attention network model and acquiring a feature vector corresponding to the initial feature map output by the channel attention network model; the channel attention network model is established based on a global average pooling algorithm and a global maximum pooling algorithm;
and the weighted feature map acquisition module is used for multiplying the initial feature map and the feature vector to acquire a weighted feature map and dividing the automatic driving image into a plurality of sub-images according to the weighted feature map.
9. An electronic device, the electronic device comprising:
at least one processor, and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of segmentation of an autopilot image of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to perform the method of segmentation of an autopilot image according to any one of claims 1-7.
CN202310709950.0A 2023-06-15 2023-06-15 Automatic driving image segmentation method, device, equipment and storage medium Pending CN116740355A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310709950.0A CN116740355A (en) 2023-06-15 2023-06-15 Automatic driving image segmentation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310709950.0A CN116740355A (en) 2023-06-15 2023-06-15 Automatic driving image segmentation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116740355A true CN116740355A (en) 2023-09-12

Family

ID=87916433

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310709950.0A Pending CN116740355A (en) 2023-06-15 2023-06-15 Automatic driving image segmentation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116740355A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994002A (en) * 2023-09-25 2023-11-03 杭州安脉盛智能技术有限公司 Image feature extraction method, device, equipment and storage medium
CN117788836A (en) * 2024-02-23 2024-03-29 中国第一汽车股份有限公司 Image processing method, device, computer equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994002A (en) * 2023-09-25 2023-11-03 杭州安脉盛智能技术有限公司 Image feature extraction method, device, equipment and storage medium
CN116994002B (en) * 2023-09-25 2023-12-19 杭州安脉盛智能技术有限公司 Image feature extraction method, device, equipment and storage medium
CN117788836A (en) * 2024-02-23 2024-03-29 中国第一汽车股份有限公司 Image processing method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN116740355A (en) Automatic driving image segmentation method, device, equipment and storage medium
CN115456167B (en) Lightweight model training method, image processing device and electronic equipment
CN113947188A (en) Training method of target detection network and vehicle detection method
CN115358392A (en) Deep learning network training method, text detection method and text detection device
CN114202648B (en) Text image correction method, training device, electronic equipment and medium
CN111932530B (en) Three-dimensional object detection method, device, equipment and readable storage medium
CN117036457A (en) Roof area measuring method, device, equipment and storage medium
CN114882313B (en) Method, device, electronic equipment and storage medium for generating image annotation information
CN116363444A (en) Fuzzy classification model training method, fuzzy image recognition method and device
CN115937537A (en) Intelligent identification method, device and equipment for target image and storage medium
CN115761698A (en) Target detection method, device, equipment and storage medium
CN111815658B (en) Image recognition method and device
CN113610856A (en) Method and device for training image segmentation model and image segmentation
CN117333487B (en) Acne classification method, device, equipment and storage medium
CN116012873B (en) Pedestrian re-identification method and device, electronic equipment and storage medium
CN114581746B (en) Object detection method, device, equipment and medium
CN112633276B (en) Training method, recognition method, device, equipment and medium
CN116229209B (en) Training method of target model, target detection method and device
CN114926447B (en) Method for training a model, method and device for detecting a target
CN117333873A (en) Instance segmentation method and device, electronic equipment and storage medium
CN117746013A (en) Label detection method, device, equipment and storage medium
CN114445690A (en) License plate detection method, model training method, device, medium, and program product
CN117808829A (en) Tooth segmentation method and device, storage medium and electronic equipment
CN117593713A (en) BEV time sequence model distillation method, device, equipment and medium
CN116385775A (en) Image tag adding method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination