CN113128386B - Obstacle recognition method, obstacle recognition device and electronic equipment - Google Patents

Obstacle recognition method, obstacle recognition device and electronic equipment Download PDF

Info

Publication number
CN113128386B
CN113128386B CN202110394670.6A CN202110394670A CN113128386B CN 113128386 B CN113128386 B CN 113128386B CN 202110394670 A CN202110394670 A CN 202110394670A CN 113128386 B CN113128386 B CN 113128386B
Authority
CN
China
Prior art keywords
feature map
image
target
training
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110394670.6A
Other languages
Chinese (zh)
Other versions
CN113128386A (en
Inventor
高翔
何洪刚
王磊
方昌銮
黄凯明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Streamax Technology Co Ltd
Original Assignee
Streamax Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Streamax Technology Co Ltd filed Critical Streamax Technology Co Ltd
Priority to CN202110394670.6A priority Critical patent/CN113128386B/en
Publication of CN113128386A publication Critical patent/CN113128386A/en
Application granted granted Critical
Publication of CN113128386B publication Critical patent/CN113128386B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application discloses an obstacle recognition method, an obstacle recognition device, an electronic device and a computer-readable storage medium. The obstacle identification method comprises the following steps: acquiring a road image acquired by a preset camera, wherein the preset camera is installed on a target vehicle; inputting the road image into a trained semantic segmentation model, wherein the semantic segmentation model is obtained by training based on a training sample set and a attention mechanism, and the training sample set comprises at least one training image and mask images associated with each training image; and if the segmentation result output by the semantic segmentation model is obtained, mapping the segmentation result back to the road image so as to extract barrier information contained in the road image. Through the scheme of the application, the obstacle appearing in the road can be timely and accurately identified.

Description

Obstacle recognition method, obstacle recognition device and electronic equipment
Technical Field
The application belongs to the technical field of image processing, and particularly relates to an obstacle recognition method, an obstacle recognition device, electronic equipment and a computer readable storage medium.
Background
As the amount of automobile maintenance increases, there is an increasing concern about road safety concerns. Considering that the quality of drivers is uneven, the situation of throwing garbage can occur randomly; and some vehicles may throw objects transported by the vehicles due to road jolts during transportation. Currently, most cities are still actively patrol by sanitation workers to find obstacles on the road. Due to the influence of traffic and patrol speed of sanitation workers, obstacles can not be recognized timely and accurately, and hidden danger is left for road safety.
Disclosure of Invention
The application provides an obstacle recognition method, an obstacle recognition device, electronic equipment and a computer readable storage medium, which can timely and accurately recognize obstacles appearing in a road.
In a first aspect, the present application provides a method for identifying an obstacle, including:
acquiring a road image acquired by a preset camera, wherein the preset camera is installed on a target vehicle;
inputting the road image into a trained semantic segmentation model, wherein the semantic segmentation model is obtained by training based on a training sample set and an attention mechanism, and the training sample set comprises at least one training image and mask images associated with each training image;
and if the segmentation result output by the semantic segmentation model is obtained, mapping the segmentation result back to the road image so as to extract obstacle information contained in the road image.
In a second aspect, the present application provides an obstacle recognition device, comprising:
the acquisition unit is used for acquiring road images acquired by a preset camera, wherein the preset camera is arranged on a target vehicle;
the segmentation unit is used for inputting the road image into a trained semantic segmentation model, wherein the semantic segmentation model is obtained by training based on a training sample set and a attention mechanism, and the training sample set comprises at least one training image and mask images associated with the training images;
and a mapping unit configured to map the segmentation result back to the road image to extract obstacle information included in the road image, if the segmentation result output by the semantic segmentation model is obtained.
In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the method of the first aspect described above.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by one or more processors, implements the steps of the method of the first aspect described above.
Compared with the prior art, the beneficial effects that this application exists are: in the running process of the vehicle, the camera arranged on the vehicle is used for shooting the road, and the electronic equipment can then divide the road image obtained by shooting through the semantic division model, so that the obstacle information is extracted from the road image, and the possible obstacles in the road are found in time. The semantic segmentation model is trained based on a training sample set and a attention mechanism, so that the semantic segmentation model can more consider the influence of the existing obstacle information in the training image, and can recognize the obstacle information more quickly when facing the road image containing the obstacle information, and the recognition accuracy and the recognition timeliness are both considered. It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic implementation flow chart of an obstacle recognition method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a training flow of a semantic segmentation model according to an embodiment of the present application;
FIG. 3 is an exemplary diagram of a training process for a semantic segmentation model provided by an embodiment of the present application;
FIG. 4 is an exemplary diagram of a training image provided by an embodiment of the present application;
FIG. 5 is an exemplary diagram of a thermodynamic diagram provided by an embodiment of the present application;
fig. 6 is a block diagram of a structure of an obstacle identifying apparatus provided in an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
In order to illustrate the technical solutions proposed in the present application, the following description is made by specific embodiments.
Referring to fig. 1, fig. 1 shows an obstacle identifying method provided in an embodiment of the present application, and details are as follows:
step 101, acquiring a road image acquired by a preset camera.
In this embodiment of the present application, a camera, that is, a preset camera, may be set on the vehicle in advance, so that the road image may be acquired by the preset camera. For ease of illustration, the vehicle may be referred to as a target vehicle. By way of example only, the camera may be mounted to a front portion of the vehicle body (e.g., at a front windshield of the target vehicle) or may be mounted to a rear portion of the vehicle body (e.g., at a rear windshield of the target vehicle); if the camera is arranged at the front part of the vehicle body, the camera faces to the front of the target vehicle, and road images in front of the target vehicle can be acquired and obtained; if the camera is installed at the rear part of the vehicle body, the camera faces to the rear of the target vehicle, and road images behind the target vehicle can be acquired and obtained. It should be noted that no matter where the camera is mounted, the camera should not be obstructed by other objects.
The preset camera can report the road image acquired by the camera to the electronic equipment with low delay, and the electronic equipment executes the steps provided by the embodiment of the application to realize the detection of the obstacle in the road. The electronic equipment can be arranged on the target vehicle and uses the power supply of the target vehicle; alternatively, the electronic device may be disposed at the cloud end, which is not limited herein.
In some embodiments, the preset camera may be started to operate immediately after the ignition of the target vehicle is started; alternatively, the preset camera may start to operate when the target vehicle is in a driving state, that is, the electronic device may acquire the road image acquired by the preset camera when the target vehicle is in the driving state.
Step 102, inputting the road image into a trained semantic segmentation model.
In the embodiment of the application, the electronic device may analyze the road image acquired by the preset camera based on the preset semantic segmentation model to determine whether the road image has a shot obstacle. In order to improve the processing timeliness of the semantic segmentation model, the semantic segmentation model can be trained based on a training sample set and a attention mechanism.
The training sample set comprises at least one training image and mask images associated with the training images. That is, the training sample set includes at least one image pair, and one image pair is composed of one training image and a mask image corresponding to the training image. Considering that the obstacle recognition method in the embodiment of the present application is mainly used for recognizing an obstacle on a road, therefore, the possible obstacle situation in the road can be simulated first, and then the simulated scene is photographed to obtain a plurality of training images. Then, aiming at each training image, obtaining a mask image based on the obstacle region mark in the training image; that is, the pixel value of the pixel point of the obstacle region is marked with 1, and the pixel value of the pixel point of the background region (i.e., the other region outside the obstacle region) is marked with 0, so that a mask image uniquely corresponding to each training image can be obtained. Of course, an image acquired by the preset camera in the driving process of the target vehicle can be used as a training image, and a corresponding mask image is obtained by manual marking, so that a new image pair is formed, and a training sample set is enriched.
Step 103, if the segmentation result output by the semantic segmentation model is obtained, mapping the segmentation result back to the road image to extract the obstacle information contained in the road image.
In the embodiment of the application, the semantic segmentation model is only used for segmenting out obstacle regions in the image. Thus, for images (e.g., road images obtained by photographing clean roads) where no obstacle information exists, the semantic segmentation model may consider each pixel in the image to be a background, and thus the segmentation result may not be output. Based on this, only when the semantic segmentation model outputs a segmentation result, it is considered that there is an obstacle region in the road image. In practice, the segmentation result output by the semantic segmentation model is also represented in the form of a mask, i.e. the pixel value of the pixel of the obstacle region is identified as 1 and the pixel value of the pixel of the background region is labeled as 0. Thus, the segmentation result is mapped back into the original road image, that is, the pixel value of each pixel point in the segmentation result is multiplied by the pixel value of the pixel point at the corresponding position in the road image, so that the pixel point of the obstacle region can be extracted from the road image, and the obstacle information contained in the road image can be obtained. For ease of understanding, the above process may be understood as a matting operation.
In some embodiments, the number of preset cameras may not be limited, and for example, it is considered to install cameras at both the front and rear of the body of the target vehicle. For convenience of explanation, the camera mounted at the front part of the body of the target vehicle may be denoted as a first camera, the camera mounted at the rear part of the body of the target vehicle may be denoted as a second camera, the road image collected by the first camera may be denoted as a first road image, and the image collected by the second camera may be denoted as a second road image. If no obstacle information exists in the first road image and obstacle information exists in the second road image matched with the first road image, the obstacle throwing behavior of the target vehicle can be judged, and at the moment, a reminding message can be output to the target vehicle.
Since the first road image is an image in front of the target vehicle, the obstacle detected in the first road image may be considered not to be an obstacle thrown by the target vehicle in view of the fact that the vehicle is generally traveling forward; conversely, if a certain area is clean, when the target vehicle does not travel to the area, the area is in front of the target vehicle, so that a first road image containing the area is firstly shot by the first camera, and at the moment, no obstacle image exists in the first road image; when the target vehicle travels to the area, that is, when the target vehicle coincides with the area, if the target vehicle is thrown, for example, a driver throws garbage outwards or objects transported by the target vehicle are scattered, an obstacle is caused to appear in the area; the target vehicle then continues to travel forward, the area is behind the target vehicle, and a second road image including the area is captured by the second camera, and at this time, an obstacle is present in the second road image. Based on the above-mentioned process, when there is no obstacle image in the first road image and there is an obstacle image in the second road image matched with the first road image, it is considered that the target vehicle has performed obstacle throwing action, and at this time, a reminding message may be output to the target vehicle.
It should be noted that it is obvious that the first camera and the second camera cannot shoot the same area at the same time; often, a first camera shoots a first road image including a certain area, then waits for a preset time, and after a target vehicle travels a certain distance, a second camera shoots a second road image including the same area. It is considered that there is a time delay between the first road image and the second road image which are matched, that is, the first road image and the second road image which include the same area (the photographed object is the same area), and the time delay is determined by the vehicle speed of the target vehicle: the faster the vehicle speed, the smaller the time delay; the slower the vehicle speed, the greater the delay.
The following describes the training of the semantic segmentation model: the semantic segmentation model comprises at least one convolution-pooling structure, wherein the convolution-pooling structure comprises at least one convolution layer and one pooling layer; in the training process of the semantic segmentation model, the obstacle recognition method further comprises, for each convolution-pooling structure:
step 201, obtaining a first feature map output by the convolution-pooling structure.
In the embodiment of the present application, if the convolution-pooling structure is the first convolution-pooling structure in the semantic segmentation model, the input of the convolution-pooling structure is the training image input in the present training; otherwise, if the convolution-pooling structure is not the first convolution-pooling structure in the semantic segmentation model, the input of the convolution-pooling structure is the feature map transferred based on the output of the last convolution-pooling structure. For convenience of explanation, the training image input in this training is referred to as a target training image in this embodiment of the present application.
For any convolution-pooling structure, the input of the convolution-pooling structure firstly carries out convolution operation through at least one convolution layer of the convolution-pooling structure, and then carries out pooling operation on the result of the convolution operation through a pooling layer, so that a characteristic diagram output by the convolution-pooling structure can be obtained. It should be noted that the pooling operation performed by the pooling layer is usually maximum pooling. For ease of illustration, the embodiment of the present application refers to the feature map output by the convolution-pooling structure as the first feature map. In fact, in the semantic segmentation model, the first feature map output by the convolution-pooling structure is obtained directly or indirectly based on the target training image, no matter where the semantic segmentation model is located.
And 202, splicing the first feature map and the target mask image to obtain a spliced feature map.
In the embodiment of the application, the attention mechanism is adopted, mainly for enabling the semantic segmentation model to consider more obstacle information in the image; that is, the obstacle information is an object to be detected with respect to the background information in the image. Based on this, the electronic device may obtain a target mask image from the training sample set, where the target mask image refers to a mask image associated with the target training image in the training sample set. By means of direct stitching operation, the target mask image and the first feature image are stitched together, and the formed new image can be recorded as a stitched feature image.
In some embodiments, when the electronic device performs stitching between the first feature map and the target mask image, the target mask image is adjusted to a target size, where the target size is the size of the first feature map, that is, the size of the first feature map and the size of the target mask image are unified; and then copying the target mask images so that the number of the target mask images is the same as the number of channels of the first feature images, and finally splicing the first feature images with all the target mask images in the channel dimension to obtain spliced feature images. The size of the spliced feature map obtained through the above-described process is still kept at the target size, and the number of channels becomes twice the number of channels of the first feature map. For example only, the electronic device may scale the target mask image using a linear interpolation operation to achieve a uniform size of the first feature map and the target mask image, the linear interpolation operation not being limited herein.
And 203, fusing the spliced feature images to obtain a second feature image.
In this embodiment of the present application, the simple stitching operation cannot make the target mask image directly affect the first feature map, and the electronic device wants to make the information in the target mask image directly affect the first feature map, so that the semantic segmentation model may relatively pay attention to the data related to the obstacle information in the first feature map, and relatively ignore the data related to the background information in the first feature map. Based on the above, the electronic device may fuse the spliced feature images, that is, fuse the data of the pixel points at the same position on the multiple channels together, so as to obtain the second feature image.
In some embodiments, the electronic device may use a 1×1 convolution kernel to fuse the spliced feature images in the channel dimension, so as to obtain the second feature image.
Step 204, transferring the second feature map to a next network layer in the semantic segmentation model of the convolution-pooling structure.
In the embodiment of the present application, after the second feature map is obtained, the second feature map may be transferred to the next network layer through the activation function, and used as an input of the next network layer; or, the second feature map may be subjected to mean pooling to perform secondary dimension reduction, and the second feature map after the secondary dimension reduction may be transferred to a next network layer through an activation function, which is used as an input of the next network layer, which is not limited herein. Assuming that the current convolution-pooling structure is not the last convolution-pooling structure, the next network layer is typically the convolution layer (which is the first convolution layer of the next convolution-pooling structure); assuming that the current convolution-pooling structure is the last convolution-pooling structure, the next network layer is typically the fully connected layer.
It will be appreciated that the embodiment of the present application actually improves on the training process of the existing semantic segmentation model, and incorporates the attention mechanism to add the above steps 201-204 after each convolution-pooling structure. In the training process, the semantic segmentation model finally outputs a training result (the training result is in a mask form) corresponding to the target training image, the training result is compared with the target mask image to calculate and obtain loss, parameters of the semantic segmentation model can be adjusted based on the loss until the loss is converged, and the training is finished to obtain the trained semantic segmentation model for application.
The above procedure is described below by way of a simple specific example:
referring to fig. 3, it is assumed that the semantic segmentation model has a plurality of convolution-pooling structures, namely convolution-pooling structures 1, 2, … …, n, respectively.
For convolution-pooling structure 1, the input is a target training image of h×w (e.g., 300×300), where H is high and W is wide. After passing through the convolutional layer 11, the convolutional layer 12 and the pooling layer 13 in the convolutional-pooling structure 1, a first feature map F11 of H 'W' C '(for example 32 x 64) is obtained, where C' is the number of channels.
The target mask images of H ' W are duplicated after being adjusted to the size of H ' W ', and C target mask images of H ' W ' are obtained; the C target mask images of H 'W' are stitched with the first feature map F11 of H 'W' C 'in the channel dimension to obtain a stitched feature map F12 of H' W '(C' +c ') (i.e., H' W '×2c').
The fusion of the spliced feature map F12 of H ' ×w ' +c ' (C ' +c ') is performed by a convolution check of 1×1, so that on one hand, the calculation force required for the subsequent semantic segmentation model can be reduced, and on the other hand, the channel dimension reduction is realized, so that the spliced feature map F12 is fused into the second feature map F13 of H ' ×w ' ×c″ in the channel dimension.
The second signature F13 of H '×w' ×c″ is transferred to the convolution-pooling structure 2 through an activation function, and is used as an input to the convolution-pooling structure 2, and the operation process is substantially the same as that of the convolution-pooling structure 1, which is not described herein.
In some embodiments, the value of each position on the second feature map actually represents the attention degree of the position, so that after the second feature map is obtained, the feature map under a random channel in the second feature map can be extracted as a feature map to be displayed, and a thermodynamic diagram is generated based on the feature map to be displayed, and reflection means mathematically: for a certain three-dimensional feature map (i.e., the second feature map of H '×w' ×c "), the feature map is effectively converted into a matrix corresponding to one two dimensions (i.e., the feature map to be displayed of H '×w'). The step of generating the thermodynamic diagram based on the feature map to be displayed may specifically be: and adjusting the feature map to be displayed to an original size, wherein the original size is the size of the target training image, and then generating a thermodynamic diagram based on the adjusted feature map to be displayed. For the pixel point with the coordinates of (x, y) in the adjusted feature map to be displayed, the larger the pixel value of the pixel point is, the higher the pixel point at the same position (i.e., with the coordinates of (x, y)) in the generated thermodynamic diagram is.
In some embodiments, the semantic segmentation model typically employs a UNet network structure. For the later convolution-pooling structure, the second feature map obtained based on the convolution-pooling structure can truly describe the attention degree of the target training image. That is, the thermodynamic diagram generated based on the second feature map obtained by convolving-pooling structure n is relatively better for the plurality of convolutionally-pooled structures 1, 2, 3, … …, n of the semantic segmentation model.
For ease of understanding, reference is made to fig. 4 and 5, fig. 4 presenting an example of a training image, and fig. 5 presenting an example of a thermodynamic diagram based on the training image of fig. 4.
It will be appreciated that other deep learning models may also be trained in the manner described above in connection with the attention mechanism. For example, a terminal may be provided on the vehicle, which enables preliminary detection of whether the road image has obstacle information through the target detection model; when the terminal preliminarily confirms that the road image has obstacle information through the target detection model, reporting the road image to the cloud, and further detecting whether the road image has obstacle information or not through the semantic segmentation model by the cloud; the target detection model adopted by the terminal or the semantic segmentation model adopted by the cloud can be trained by combining an attention mechanism.
From the above, according to the embodiment of the application, in the running process of the vehicle, the camera erected on the vehicle is used for shooting the road, and the electronic equipment can then perform segmentation operation on the shot road image through the semantic segmentation model, so that the barrier information is extracted from the road image, and the barriers possibly appearing in the road are found in time. The semantic segmentation model is trained based on a training sample set and a attention mechanism, so that the semantic segmentation model can more consider the influence of the existing obstacle information in the training sample, and can recognize the obstacle information more quickly when facing a road image containing the obstacle information, and the recognition accuracy and the recognition timeliness are both considered.
Corresponding to the obstacle recognition method proposed in the foregoing, the embodiment of the application provides an obstacle recognition device, where the obstacle recognition device is integrated in an electronic device. Referring to fig. 6, an obstacle identifying apparatus 600 in the embodiment of the present application includes:
an obtaining unit 601, configured to obtain a road image collected by a preset camera, where the preset camera is installed on a target vehicle;
a segmentation unit 602, configured to input the road image into a trained semantic segmentation model, where the semantic segmentation model is obtained by training based on a training sample set in combination with an attention mechanism, and the training sample set includes at least one training image and mask images associated with each training image;
and a mapping unit 603 configured to, when obtaining the segmentation result output by the semantic segmentation model, map the segmentation result back to the road image to extract obstacle information included in the road image.
Optionally, the semantic segmentation model includes at least one convolution-pooling structure, and the convolution-pooling structure includes at least one convolution layer and one pooling layer; in the training process of the semantic segmentation model, the obstacle recognition device 600 further includes, for each convolution-pooling structure:
a feature map obtaining unit, configured to obtain a first feature map output by the convolution-pooling structure;
the feature map stitching unit is used for stitching the first feature map with a target mask image to obtain a stitched feature map, wherein the target mask image is a mask image associated with a target training image, and the first feature map is obtained based on the target training image;
the feature map fusion unit is used for fusing the spliced feature maps to obtain a second feature map;
and the feature map transmitting unit is used for transmitting the second feature map to the next network layer in the semantic segmentation model of the convolution-pooling structure.
Optionally, the feature map stitching unit includes:
an adjustment subunit, configured to adjust the target mask image to a target size, where the target size is a size of the first feature map;
a copying subunit configured to copy the target mask images so that the number of the target mask images is the same as the number of channels of the first feature map;
and the splicing subunit is used for splicing the first characteristic image and all the target mask images in the channel dimension to obtain a spliced characteristic image, wherein the size of the spliced characteristic image is the target size, and the number of channels of the spliced characteristic image is twice the number of channels of the first characteristic image.
Optionally, the feature map fusion unit is specifically configured to use a 1×1 convolution kernel to fuse the spliced feature map in a channel dimension, so as to obtain a second feature map.
Optionally, the obstacle identifying apparatus 600 further includes:
the to-be-displayed feature map obtaining unit is used for obtaining a to-be-displayed feature map, wherein the to-be-displayed feature map is a feature map of a random channel in the second feature map;
and the thermodynamic diagram generating unit is used for generating thermodynamic diagrams based on the characteristic diagrams to be displayed.
Optionally, the thermodynamic diagram generating unit includes:
a size adjustment subunit, configured to adjust the feature map to be displayed to an original size, where the original size is a size of the target training image;
and the thermodynamic diagram generating subunit is used for generating thermodynamic diagrams based on the adjusted characteristic diagram to be displayed.
From the above, according to the embodiment of the application, in the running process of the vehicle, the camera erected on the vehicle is used for shooting the road, and the electronic equipment can then perform segmentation operation on the shot road image through the semantic segmentation model, so that the barrier information is extracted from the road image, and the barriers possibly appearing in the road are found in time. The semantic segmentation model is trained based on a training sample set and a attention mechanism, so that the semantic segmentation model can more consider the influence of the existing obstacle information in the training sample, and can recognize the obstacle information more quickly when facing a road image containing the obstacle information, and the recognition accuracy and the recognition timeliness are both considered.
The embodiment of the application further provides an electronic device, referring to fig. 7, the electronic device 7 in the embodiment of the application includes: memory 701, one or more processors 702 (only one shown in fig. 7) and computer programs stored on memory 701 and executable on the processors. Wherein: the memory 701 is used for storing software programs and units, and the processor 702 executes various functional applications and data processing by running the software programs and units stored in the memory 701 to obtain resources corresponding to the preset events. Specifically, the processor 702 implements the following steps by running the above-described computer program stored in the memory 701:
acquiring a road image acquired by a preset camera, wherein the preset camera is installed on a target vehicle;
inputting the road image into a trained semantic segmentation model, wherein the semantic segmentation model is obtained by training based on a training sample set and an attention mechanism, and the training sample set comprises at least one training image and mask images associated with each training image;
and if the segmentation result output by the semantic segmentation model is obtained, mapping the segmentation result back to the road image so as to extract obstacle information contained in the road image.
Assuming that the above is a first possible implementation, in a second possible implementation provided by way of the first possible implementation, the semantic segmentation model includes at least one convolution-pooling structure including at least one convolution layer and one pooling layer; during the training of the semantic segmentation model described above, the processor 702, by running the computer program described above stored in the memory 701, also implements the following steps for each convolution-pooling structure:
acquiring a first characteristic diagram output by the convolution-pooling structure;
splicing the first feature map and a target mask image to obtain a spliced feature map, wherein the target mask image is a mask image associated with a target training image, and the first feature map is obtained based on the target training image;
fusing the spliced feature images to obtain a second feature image;
delivering the second feature map to a next network layer in the semantic segmentation model for the convolution-pooling structure.
In a third possible implementation manner provided by the second possible implementation manner, the stitching the first feature map with the target mask image includes:
adjusting the target mask image to a target size, wherein the target size is the size of the first feature map;
duplicating the target mask images so that the number of the target mask images is the same as the number of channels of the first feature map;
and in the channel dimension, splicing the first feature map and all the target mask images to obtain a spliced feature map, wherein the size of the spliced feature map is the target size, and the number of channels of the spliced feature map is twice the number of channels of the first feature map.
In a fourth possible implementation manner provided by the second possible implementation manner, the fusing the spliced feature map to obtain a second feature map includes:
and adopting a 1 multiplied by 1 convolution to check the spliced feature images to fuse in the channel dimension, so as to obtain a second feature image.
In a fifth possible implementation manner provided by the second possible implementation manner, after the merging the spliced feature maps to obtain the second feature map, the processor 702 further implements the following steps when executing the computer program stored in the memory 701:
acquiring a feature map to be displayed, wherein the feature map to be displayed is a feature map of a random channel in the second feature map;
generating a thermodynamic diagram based on the characteristic diagram to be displayed.
In a sixth possible implementation manner provided by the fifth possible implementation manner, the generating a thermodynamic diagram based on the feature map to be displayed includes:
the feature image to be displayed is adjusted to an original size, wherein the original size is the size of the target training image;
and generating a thermodynamic diagram based on the adjusted characteristic diagram to be displayed.
It should be appreciated that in embodiments of the present application, the processor 702 may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Memory 701 may include read only memory and random access memory, and provides instructions and data to processor 702. Some or all of memory 701 may also include non-volatile random access memory. For example, the memory 701 may also store information of a device class.
From the above, according to the embodiment of the application, in the running process of the vehicle, the camera erected on the vehicle is used for shooting the road, and the electronic equipment can then perform segmentation operation on the shot road image through the semantic segmentation model, so that the barrier information is extracted from the road image, and the barriers possibly appearing in the road are found in time. The semantic segmentation model is trained based on a training sample set and a attention mechanism, so that the semantic segmentation model can more consider the influence of the existing obstacle information in the training sample, and can recognize the obstacle information more quickly when facing a road image containing the obstacle information, and the recognition accuracy and the recognition timeliness are both considered.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of external device software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the system embodiments described above are merely illustrative, e.g., the division of modules or units described above is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct associated hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. The computer program comprises computer program code, and the computer program code can be in a source code form, an object code form, an executable file or some intermediate form and the like. The above computer readable storage medium may include: any entity or device capable of carrying the computer program code described above, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer readable Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier wave signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable storage medium described above may be appropriately increased or decreased according to the requirements of the jurisdiction's legislation and the patent practice, for example, in some jurisdictions, the computer readable storage medium does not include electrical carrier signals and telecommunication signals according to the legislation and the patent practice.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (7)

1. A method of identifying an obstacle, comprising:
acquiring a road image acquired by a preset camera, wherein the preset camera is installed on a target vehicle;
inputting the road image into a trained semantic segmentation model, wherein the semantic segmentation model is obtained by training based on a training sample set and a attention mechanism, and the training sample set comprises at least one training image and mask images associated with each training image;
if the segmentation result output by the semantic segmentation model is obtained, mapping the segmentation result back to the road image so as to extract barrier information contained in the road image;
wherein the semantic segmentation model comprises at least one convolution-pooling structure comprising at least one convolution layer and one pooling layer; in the training process of the semantic segmentation model, for each convolution-pooling structure, the obstacle recognition method further comprises:
acquiring a first feature map output by the convolution-pooling structure;
splicing the first feature map and a target mask image to obtain a spliced feature map, wherein the target mask image is a mask image associated with a target training image, and the first feature map is obtained based on the target training image;
adopting a 1 multiplied by 1 convolution to check the fusion of the spliced feature images in the channel dimension to obtain a second feature image;
the second feature map is passed to a next network layer of the convolution-pooling structure in the semantic segmentation model.
2. The obstacle identifying method as claimed in claim 1, wherein the stitching the first feature map with a target mask image comprises:
adjusting the target mask image to a target size, wherein the target size is the size of the first feature map;
copying the target mask images so that the number of the target mask images is the same as the number of channels of the first feature map;
and in the channel dimension, splicing the first feature map and all the target mask images to obtain a spliced feature map, wherein the size of the spliced feature map is the target size, and the number of channels of the spliced feature map is twice the number of channels of the first feature map.
3. The obstacle identifying method as claimed in claim 1, wherein after the merging of the spliced feature maps in the channel dimension using a 1 x 1 convolution check to obtain a second feature map, the obstacle identifying method further comprises:
acquiring a feature map to be displayed, wherein the feature map to be displayed is a feature map of a random channel in the second feature map;
and generating a thermodynamic diagram based on the feature map to be displayed.
4. The obstacle recognition method as claimed in claim 3, wherein the generating a thermodynamic diagram based on the feature map to be displayed comprises:
adjusting the feature image to be displayed to an original size, wherein the original size is the size of the target training image;
and generating a thermodynamic diagram based on the adjusted feature map to be displayed.
5. An obstacle recognition device, characterized by comprising:
the acquisition unit is used for acquiring road images acquired by a preset camera, wherein the preset camera is installed on a target vehicle;
the segmentation unit is used for inputting the road image into a trained semantic segmentation model, wherein the semantic segmentation model is obtained by training based on a training sample set combined with an attention mechanism, and the training sample set comprises at least one training image and mask images associated with the training images;
the mapping unit is used for mapping the segmentation result back to the road image if the segmentation result output by the semantic segmentation model is obtained so as to extract barrier information contained in the road image;
wherein the semantic segmentation model comprises at least one convolution-pooling structure comprising at least one convolution layer and one pooling layer; in the training process of the semantic segmentation model, for each convolution-pooling structure, the obstacle recognition device further includes:
the feature map acquisition unit is used for acquiring a first feature map output by the convolution-pooling structure;
the feature map stitching unit is used for stitching the first feature map with a target mask image to obtain a stitched feature map, wherein the target mask image is a mask image associated with a target training image, and the first feature map is obtained based on the target training image;
the feature map fusion unit is used for fusing the spliced feature maps in the channel dimension by adopting a 1 multiplied by 1 convolution check to obtain a second feature map;
and the feature map transmission unit is used for transmitting the second feature map to the next network layer of the convolution-pooling structure in the semantic segmentation model.
6. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 4 when executing the computer program.
7. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 4.
CN202110394670.6A 2021-04-13 2021-04-13 Obstacle recognition method, obstacle recognition device and electronic equipment Active CN113128386B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110394670.6A CN113128386B (en) 2021-04-13 2021-04-13 Obstacle recognition method, obstacle recognition device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110394670.6A CN113128386B (en) 2021-04-13 2021-04-13 Obstacle recognition method, obstacle recognition device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113128386A CN113128386A (en) 2021-07-16
CN113128386B true CN113128386B (en) 2024-02-09

Family

ID=76776664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110394670.6A Active CN113128386B (en) 2021-04-13 2021-04-13 Obstacle recognition method, obstacle recognition device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113128386B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936268B (en) * 2021-12-16 2022-04-15 比亚迪股份有限公司 Obstacle detection method for rail vehicle, computer device, and storage medium
CN116434151B (en) * 2023-06-14 2023-08-29 云途信息科技(杭州)有限公司 Pavement foreign matter identification method, device, computer equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805889A (en) * 2018-05-07 2018-11-13 中国科学院自动化研究所 The fining conspicuousness method for segmenting objects of margin guide and system, equipment
CN110443818A (en) * 2019-07-02 2019-11-12 中国科学院计算技术研究所 A kind of Weakly supervised semantic segmentation method and system based on scribble
CN111428726A (en) * 2020-06-10 2020-07-17 中山大学 Panorama segmentation method, system, equipment and storage medium based on graph neural network
CN111951249A (en) * 2020-08-13 2020-11-17 浙江理工大学 Mobile phone light guide plate defect visual detection method based on multitask learning network
CN112016476A (en) * 2020-08-31 2020-12-01 山东大学 Method and system for predicting visual saliency of complex traffic guided by target detection
CN112200226A (en) * 2020-09-27 2021-01-08 北京达佳互联信息技术有限公司 Image processing method based on reinforcement learning, image processing method and related device
CN112424793A (en) * 2020-10-14 2021-02-26 深圳市锐明技术股份有限公司 Object identification method, object identification device and electronic equipment
CN112418176A (en) * 2020-12-09 2021-02-26 江西师范大学 Remote sensing image semantic segmentation method based on pyramid pooling multilevel feature fusion network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10679351B2 (en) * 2017-08-18 2020-06-09 Samsung Electronics Co., Ltd. System and method for semantic segmentation of images
US11003945B2 (en) * 2019-05-22 2021-05-11 Zoox, Inc. Localization using semantically segmented images

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805889A (en) * 2018-05-07 2018-11-13 中国科学院自动化研究所 The fining conspicuousness method for segmenting objects of margin guide and system, equipment
CN110443818A (en) * 2019-07-02 2019-11-12 中国科学院计算技术研究所 A kind of Weakly supervised semantic segmentation method and system based on scribble
CN111428726A (en) * 2020-06-10 2020-07-17 中山大学 Panorama segmentation method, system, equipment and storage medium based on graph neural network
CN111951249A (en) * 2020-08-13 2020-11-17 浙江理工大学 Mobile phone light guide plate defect visual detection method based on multitask learning network
CN112016476A (en) * 2020-08-31 2020-12-01 山东大学 Method and system for predicting visual saliency of complex traffic guided by target detection
CN112200226A (en) * 2020-09-27 2021-01-08 北京达佳互联信息技术有限公司 Image processing method based on reinforcement learning, image processing method and related device
CN112424793A (en) * 2020-10-14 2021-02-26 深圳市锐明技术股份有限公司 Object identification method, object identification device and electronic equipment
CN112418176A (en) * 2020-12-09 2021-02-26 江西师范大学 Remote sensing image semantic segmentation method based on pyramid pooling multilevel feature fusion network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation;Fatemehsadat Saleh et al.;《arXiv.org》;第1-17页 *
Concept Mask: Large-Scale Segmentation from Semantic Concepts;Yufei Wang et al.;《arXiv.org》;第1-32页 *
一种基于深度学习的大尺寸病理图像分割 诊断方法;王艳红等;《中国数字医学》;第第16卷卷(第第3期期);第80-83页 *

Also Published As

Publication number Publication date
CN113128386A (en) 2021-07-16

Similar Documents

Publication Publication Date Title
US20210342637A1 (en) Generating ground truth for machine learning from time series elements
CN109949594B (en) Real-time traffic light identification method
CN113646772A (en) Predicting three-dimensional features for autonomous driving
CN110471058A (en) The system and method detected automatically for trailer attribute
WO2021057134A1 (en) Scenario identification method and computing device
CN113128386B (en) Obstacle recognition method, obstacle recognition device and electronic equipment
CN112507862B (en) Vehicle orientation detection method and system based on multitasking convolutional neural network
CN108860166A (en) Processing system and processing method occur for pilotless automobile accident
CN112424793A (en) Object identification method, object identification device and electronic equipment
US11126875B2 (en) Method and device of multi-focal sensing of an obstacle and non-volatile computer-readable storage medium
CN110738150B (en) Camera linkage snapshot method and device and computer storage medium
CN110188482B (en) Test scene creating method and device based on intelligent driving
CN113255439B (en) Obstacle identification method, device, system, terminal and cloud
EP3839823A1 (en) Data integration from multiple sensors
US11620522B2 (en) Vehicular system for testing performance of headlamp detection systems
CN110834667B (en) Vehicle steering control method and device, vehicle, terminal device and storage medium
CN105894818A (en) Vehicle intersection traffic violation evidence obtaining system and method
CN111539268A (en) Road condition early warning method and device during vehicle running and electronic equipment
CN115119045A (en) Vehicle-mounted multi-camera-based video generation method and device and vehicle-mounted equipment
CN113160272B (en) Target tracking method and device, electronic equipment and storage medium
WO2021037350A1 (en) Method and system for processing a plurality of images so as to detect lanes on a road
CN106991415A (en) Image processing method and device for vehicle-mounted fisheye camera
US20220092313A1 (en) Method for deep neural network functional module deduplication
CN116343165A (en) 3D target detection system, method, terminal equipment and storage medium
Zhang et al. Smart-rain: A degradation evaluation dataset for autonomous driving in rain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant