CN112528944A - Image identification method and device, electronic equipment and storage medium - Google Patents
Image identification method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112528944A CN112528944A CN202011544194.3A CN202011544194A CN112528944A CN 112528944 A CN112528944 A CN 112528944A CN 202011544194 A CN202011544194 A CN 202011544194A CN 112528944 A CN112528944 A CN 112528944A
- Authority
- CN
- China
- Prior art keywords
- image
- target
- pixel
- preset
- exposure
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 87
- 230000004927 fusion Effects 0.000 claims abstract description 67
- 230000008569 process Effects 0.000 claims abstract description 29
- 230000011218 segmentation Effects 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 11
- 238000001514 detection method Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000007499 fusion processing Methods 0.000 abstract description 5
- 238000013527 convolutional neural network Methods 0.000 description 17
- 238000000605 extraction Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 238000012549 training Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000005286 illumination Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000007687 exposure technique Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/582—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/584—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/196—Recognition using electronic means using sequential comparisons of the image signals with a plurality of references
- G06V30/1983—Syntactic or structural pattern recognition, e.g. symbolic string recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
- G06T2207/30252—Vehicle exterior; Vicinity of vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/16—Image acquisition using multiple overlapping images; Image stitching
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The image identification method, the image identification device, the electronic equipment and the storage medium provided by the embodiment of the invention can acquire at least two groups of single-exposure image information within one frame time, wherein each group of single-exposure image information is the image information acquired in the process of one-time exposure; superposing at least two groups of single-exposure image information; and identifying the target object of the superposed image based on the object type which can appear in the preset scene. The method has the advantages that identification is not needed after the wide dynamic fusion, so that the problem that the identification accuracy is not high enough due to the loss of part of details in the wide dynamic fusion process is solved, and the image identification accuracy can be improved.
Description
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to an image recognition method and apparatus, an electronic device, and a storage medium.
Background
At present, image recognition has been widely applied, for example, in the field of automatic driving technology, traffic signs at intersections can be recognized by recognizing images acquired in real time, so as to take measures such as corresponding steering and braking. During image recognition, in order to obtain the detailed characteristics of a high illumination area and a low illumination area, a long-frame image with long exposure and a short-frame image with short exposure are fused through wide dynamic fusion, and then image recognition is performed according to the fused images.
However, under the condition of uneven illumination in the image, when the images of long exposure and short exposure are subjected to wide dynamic fusion, due to the difference of each target in the image, the wide dynamic fusion image obtained after the wide dynamic fusion still has over-exposure or under-exposure of a part of the targets, which results in loss of a part of details in the wide dynamic fusion image, so that the identification accuracy is not high enough when the image identification is performed based on the wide dynamic fusion image.
Disclosure of Invention
An embodiment of the invention provides an image recognition method, an image recognition device, an electronic device and a storage medium, which are used for improving the accuracy of image recognition. The specific technical scheme is as follows:
in a first aspect of the present invention, there is provided an image recognition method, including:
acquiring at least two groups of single-exposure image information within one frame time, wherein each group of single-exposure image information is image information acquired in a single-exposure process;
superposing at least two groups of single-exposure image information;
and identifying the target object of the superposed image based on the object type which can appear in the preset scene.
Optionally, the superimposing at least two sets of single-exposure image information includes:
respectively extracting the characteristics of each group of single-exposure image information images in the at least two groups of single-exposure image information to obtain the data of each image channel of each group of single-exposure image information in the at least two groups of single-exposure image information;
and overlapping the data of the same image channel in the data of each image channel.
Optionally, the identifying of the target object is performed on the superimposed image based on an object type that may appear in a preset scene, and the identifying includes:
detecting whether the superimposed images contain traffic light frame images or not;
and after the traffic light frame image is detected, identifying the color and the shape of the traffic light for the traffic light frame image.
Optionally, the identifying of the target object is performed on the superimposed image based on an object type that may appear in a preset scene, and the identifying includes:
and performing semantic segmentation on the superposed image based on object classes which can appear in a preset scene to generate a mask image for identifying a plurality of target objects and target classes of the target objects in the superposed image.
Optionally, after performing semantic segmentation on the superimposed image based on object classes that may appear in a preset scene, and generating a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image, the method further includes:
judging whether a plurality of target objects in the superposed image contain traffic marks or not based on the mask image;
and if the superposed images contain the traffic signs, carrying out image recognition on the images at the corresponding positions in the superposed images based on the positions of the traffic signs in the mask images.
Optionally, the method further includes:
calculating the weight of each pixel point combined with pixel type information according to the target types of a plurality of target objects in the mask image and preset pixel values of the mask image of each preset target type;
and performing wide dynamic fusion on at least two groups of single-exposure image information based on the weight of each pixel point combined with the pixel type information.
Optionally, calculating the weight of each pixel point combined with the pixel type information according to the target types of the plurality of target objects in the mask image and preset pixel values of the mask image of each preset target type, including:
according to the target classes of a plurality of target objects in the mask image and preset pixel values of the mask image of each preset target class, through a preset formula:
calculating the weight of each pixel point combined with the pixel type information, wherein Wci(x, y) is the weight of the pixel point at the target coordinate (x, y) in combination with the pixel type information, Ici(x, y) is the pixel value at the c-th type target coordinate of (x, y) in the i-th image, μc(x, y) is the preset pixel value, σ, for the class c target class in the mask mapcIs the variance of the preset pixel values for each target class in the mask map.
Optionally, based on the weight of each pixel point combined with the pixel type information, performing wide dynamic fusion on at least two groups of single-exposure image information, including:
based on the weight of each pixel point combined with the pixel type information, through a formula:
IWDR=∑iIi(x,y)*Wci(x,y)/∑iWci(x,y),
calculating to obtain the pixel value of each pixel point in the wide dynamic fusion image, wherein Ii(x, y) represents the pixel value at the coordinate (x, y) in the ith image in at least two sets of single-exposure image information, IWDRIs a pixel point with coordinates (x, y) in the wide dynamic fusion imageThe pixel value of (2).
In a second aspect of the present invention, there is provided an image recognition apparatus comprising:
the image acquisition module is used for acquiring at least two groups of single-exposure image information within one frame time, wherein each group of single-exposure image information is the image information acquired in the process of one exposure;
the image superposition module is used for superposing at least two groups of single-exposure image information;
and the target identification module is used for identifying the target object of the superposed image based on the object type which can appear in the preset scene.
Optionally, the image overlaying module includes:
the data acquisition sub-module is used for respectively extracting the characteristics of each group of single-exposure image information images in the at least two groups of single-exposure image information to obtain the data of each image channel of each group of single-exposure image information in the at least two groups of single-exposure image information;
and the data superposition submodule is used for superposing the data of the same image channel in the data of each image channel.
Optionally, the object recognition module includes:
the lamp frame detection submodule is used for detecting whether the superposed images contain the traffic light lamp frame images or not;
and the color identification submodule is used for identifying the color and the shape of the traffic light frame image after the traffic light frame image is detected.
Optionally, the object recognition module includes:
and the semantic segmentation submodule is used for performing semantic segmentation on the superposed image based on object classes which can appear in a preset scene to generate a mask image for identifying a plurality of target objects and target classes of the plurality of target objects in the superposed image.
Optionally, the apparatus further comprises:
the traffic sign judging module is used for judging whether a plurality of target objects in the superposed image contain traffic signs or not based on the mask image;
and the traffic sign recognition module is used for carrying out image recognition on the image at the corresponding position in the superposed image based on the position of the traffic sign in the mask image if the superposed image contains the traffic sign.
Optionally, the apparatus further comprises:
the weight calculation module is used for calculating the weight of each pixel point combined with the pixel type information according to the target types of the target objects in the mask image and the preset pixel values of the mask image of each preset target type;
and the wide dynamic fusion module is used for performing wide dynamic fusion on at least two groups of single-exposure image information based on the weight of each pixel point combined with the pixel type information.
Optionally, the weight calculating module is specifically configured to, according to the target classes of the plurality of target objects in the mask image and preset pixel values of the mask image of each preset target class, through a preset formula:
calculating the weight of each pixel point combined with the pixel type information, wherein Wci(x, y) is the weight of the pixel point at the target coordinate (x, y) in combination with the pixel type information, Ici(x, y) is the pixel value at the c-th type target coordinate of (x, y) in the i-th image, μc(x, y) is the preset pixel value, σ, for the class c target class in the mask mapcIs the variance of the preset pixel values for each target class in the mask map.
Optionally, the wide dynamic fusion module is specifically configured to combine the weight of the pixel type information based on each pixel point, and according to a formula:
IWDR=∑iIi(x,y)*Wci(x,y)/∑iWci(x,y),
calculating to obtain the pixel value of each pixel point in the wide dynamic fusion image, wherein Ii(x, y) represents at least two sets of single-exposure image informationThe pixel value at coordinate (x, y), I, in the ith imageWDRThe pixel value of the pixel point with the coordinate (x, y) in the wide dynamic fusion image is shown.
In a third aspect of an implementation of the present invention, there is provided an electronic device comprising a processor, a memory;
a memory for storing a computer program;
and a processor for implementing any of the image recognition methods described above when executing the computer program stored in the memory.
In a fourth aspect of the embodiments of the present invention, a computer-readable storage medium is provided, in which a computer program is stored, and the computer program is executed by a processor to implement any one of the image recognition methods described above.
The embodiment of the invention has the following beneficial effects:
the image identification method, the image identification device, the electronic equipment and the storage medium provided by the embodiment of the invention can acquire at least two groups of single-exposure image information within one frame time, wherein each group of single-exposure image information is the image information acquired in the process of one-time exposure; superposing at least two groups of single-exposure image information; and identifying the target object of the superposed image based on the object type which can appear in the preset scene.
In the image recognition process, after at least two groups of single-exposure image information within one frame time are acquired, the at least two groups of single-exposure image information are superposed; the target object is identified for the superposed image based on the object type which can appear in the preset scene, identification after wide dynamic fusion is not needed, the problem that the accuracy of identification is not high enough due to the loss of part of details in the wide dynamic fusion process is avoided, and therefore the accuracy of image identification can be improved.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a first flowchart of an image recognition method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a second image recognition method according to an embodiment of the present application;
fig. 3 is a diagram illustrating an example of traffic light detection provided in an embodiment of the present application;
FIG. 4 is a diagram illustrating an example of image recognition provided by an embodiment of the present application;
fig. 5 is a third flowchart illustrating an image recognition method according to an embodiment of the present application;
fig. 6 is a fourth flowchart illustrating an image recognition method according to an embodiment of the present application;
FIG. 7 is a diagram illustrating an example of image recognition performed by a computer system in a vehicle according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to improve the accuracy of image recognition, the embodiment of the application provides an image recognition method.
In one embodiment of the present application, the method includes:
acquiring at least two groups of single-exposure image information within one frame time, wherein each group of single-exposure image information is image information acquired in a single-exposure process;
superposing at least two groups of single-exposure image information;
and identifying the target object of the superposed image based on the object type which can appear in the preset scene.
By applying the image identification method provided by the embodiment of the application, after at least two groups of single-exposure image information within one frame time are obtained, the at least two groups of single-exposure image information are superposed; the target object is identified for the superposed image based on the object type which can appear in the preset scene, identification after wide dynamic fusion is not needed, the problem that the accuracy of identification is not high enough due to the loss of part of details in the wide dynamic fusion process is avoided, and therefore the accuracy of image identification can be improved.
The following describes in detail the image recognition method provided in the embodiments of the present application with specific embodiments.
In the embodiment of the present application, the image recognition may refer to: after at least two groups of single-exposure image information shot aiming at the same scene by the same camera are obtained, wherein each group of single-exposure image information is the image information obtained in the process of one-time exposure, and the targets in the shot at least two groups of single-exposure image information are identified. For example:
in the running process of the vehicle, a computer system in the vehicle can acquire images around the vehicle through a camera connected with the computer system, and whether obstacles exist or not is judged by analyzing the images, so that the driving state of a driver is interfered or an unmanned vehicle is controlled according to the analysis result, for example, emergency braking measures or obstacle avoidance measures are taken. When a computer system in a vehicle acquires images around the vehicle through a camera connected with the computer system, in order to improve the effect of the acquired images, at least two groups of single-exposure image information within one frame time are acquired, the at least two groups of single-exposure image information are superposed, and a target object is identified for the superposed images based on the object type appearing in a preset scene so as to judge whether an obstacle or not exists.
Specifically, referring to fig. 1, fig. 1 is a first schematic flow chart of an image recognition method provided in the embodiment of the present application, including:
in step S11, at least two sets of single-exposure image information within one frame time are acquired, where each set of single-exposure image information is acquired during a single exposure.
The at least two sets of single-exposure image information may be captured for the same scene by the same camera. In practical use, the at least two sets of single-exposure image information may be obtained by a single-frame double-exposure technique or by a multi-frame imaging technique, for example, when obtaining two sets of single-exposure image information, the two sets of single-exposure image information may be a long-frame image and a short-frame image obtained by the single-frame double-exposure technique, respectively.
In this embodiment, the image recognition method may be executed by an intelligent terminal, and the intelligent terminal may be a computer system or a computer in a vehicle of a vehicle, a server, or the like.
Step S12, at least two sets of single-exposure image information are superimposed.
The superposition of the at least two sets of single-exposure image information can be carried out in a channel superposition mode. Optionally, the superimposing at least two sets of single-exposure image information includes: respectively extracting the characteristics of each group of single-exposure image information images in the at least two groups of single-exposure image information to obtain the data of each image channel of each group of single-exposure image information in the at least two groups of single-exposure image information; and overlapping the data of the same image channel in the data of each image channel.
The feature extraction is respectively carried out on each group of single-exposure image information images in the at least two groups of single-exposure image information, and the feature extraction can be realized by adopting a convolutional neural network, and the training process of the convolutional neural network can be completed by adopting a supervision mode. For example, a large number of pictures including artificial markers are input into a convolutional neural network to be trained, feature extraction is performed on the large number of pictures through the convolutional neural network to be trained to obtain data of each image channel, and the data are overlapped to obtain an overlapped image. And then comparing the superposed images with the artificial marks, calculating the error of the convolutional neural network to be trained, correcting the weight of the convolutional neural network to be trained according to the obtained error, and retraining until the calculated error is smaller than a preset error threshold value, thereby obtaining the trained convolutional neural network.
And step S13, identifying the target object of the superposed image based on the object type which can appear in the preset scene.
The object types that may appear in the preset scene may be types set according to the application scene, for example, when the computer system applied in the vehicle identifies images that are captured more than once during the driving of the vehicle, the types may include 5 types of roads, road markings, moving obstacles, traffic signs, and others.
The identification of the target object may be performed on the superimposed image, and the target objects and the target categories of the target objects in the superimposed image may be obtained by performing semantic segmentation on the superimposed image. For example, when detecting whether an object of a preset object class exists in the image, the object of the preset object class exists in the detected image is determined by identifying the class of each object and comparing the identified class with the preset object class.
The target object is identified for the superposed image based on the object type appearing in the preset scene, whether the image has vehicles or pedestrians and the like can be judged, and whether the driving state of the vehicle is interfered or controlled is judged according to the identification result.
By applying the image identification method provided by the embodiment of the application, after at least two groups of single-exposure image information within one frame time are obtained, the at least two groups of single-exposure image information are superposed; the target object is identified for the superposed image based on the object type which can appear in the preset scene, identification after wide dynamic fusion is not needed, the problem that the accuracy of identification is not high enough due to the loss of part of details in the wide dynamic fusion process is avoided, and therefore the accuracy of image identification can be improved.
In some embodiments, in order to further improve the accuracy of the traffic light recognition, when the superimposed image is recognized for a target object based on an object category that may appear in a preset scene, the method further includes recognizing the color and shape of the traffic light for the superimposed image, see fig. 2, where fig. 2 is a second flowchart of the image recognition method provided in the embodiment of the present application, and includes:
in step S11, at least two sets of single-exposure image information within one frame time are acquired, where each set of single-exposure image information is acquired during a single exposure.
Specifically, at least two sets of single-exposure image information within one frame time are obtained, and each set of single-exposure image information is image information obtained in a single-exposure process, which can be referred to as step S11 in fig. 1
The specific implementation manner of this step may be the same as step S11 in fig. 1, and may specifically refer to fig. 1, which is not described herein again.
Step S12, at least two sets of single-exposure image information are superimposed.
The specific implementation manner of this step may be the same as step S12 in fig. 1, and may specifically refer to fig. 1, which is not described herein again.
Step S131, whether the superimposed image contains a traffic light frame image is detected.
As shown in fig. 3, after a long frame image and a short frame image are obtained, feature extraction may be selected, and then fusion of the feature images is performed in a channel superposition mode or the like to obtain a fused image, or the long frame image and the short frame image may be directly subjected to image fusion, then an image of a traffic light frame in the fused image is obtained by checking the red road light frame of the fused image, and then the color and the shape of the traffic light are identified to obtain a traffic light current state identification result. When the detection is performed through a preset neural network model, the neural network model may be a YOLO model (a kind of object detection algorithm). For example, the current fusion image is input into a previously trained YOLO model, and whether the current fusion image includes a traffic light frame image and a position of the traffic light frame image in the current fusion image is determined. The training method of the YOLO model is similar to the training method of the feature extraction model, and is not described here again.
Step S132, after the traffic light frame image is detected, the color and the shape of the traffic light are identified for the traffic light frame image.
The method comprises the steps of detecting a traffic light frame image, identifying the color and the shape of a traffic light of the traffic light frame image, detecting whether the superimposed image contains the traffic light frame image, and intercepting the traffic light frame image when the superimposed image contains the traffic light frame image. And identifying the color and the shape of the traffic light according to the intercepted traffic light frame image to obtain the identification result of the current state of the traffic light. The identification result may include a red light, a green light, and a yellow light, among others. When the camera is a camera in the unmanned vehicle, the unmanned vehicle can take a brake or the like operation by the recognition result.
The color and the shape of the traffic light can be identified through a pre-trained traffic light identification model, the training method of the traffic light identification model is similar to that of the feature extraction model, and the detailed description is omitted here.
Therefore, whether the traffic light frame image is included in the superposed image or not is detected, and after the traffic light frame image is detected, the color and the shape of the traffic light are identified for the traffic light frame image. The current state identification result of the traffic light can be obtained, so that operations such as braking and the like are performed according to the identification result, and the safety in the driving process is ensured.
In the actual use process, the position of the traffic light frame relative to the camera can be determined based on the superposed images; and storing the traffic light state identification result and the position of the traffic light frame relative to the camera.
Determining the position of the traffic light frame relative to the camera based on the superposed images; the traffic light state identification result and the position of the traffic light frame relative to the camera are stored, so that the state of the traffic light can be conveniently detected according to the stored position of the traffic light frame relative to the camera, and the traffic light detection efficiency is improved. For example, when the color and the shape of the traffic light are identified, and the identification result of the current state of the traffic light is obtained as the red light, the fused image at the next moment can be continuously obtained, and the color and the shape of the traffic light are directly identified according to the stored position of the traffic light frame relative to the camera, so that the identification result of the current state of the traffic light is obtained, and the detection of the traffic light frame is not needed. Moreover, the position of the traffic light frame relative to the camera is determined based on the superposed images; the traffic light state identification result and the position of the traffic light frame relative to the camera are stored, so that the computer system in the vehicle can be conveniently intervened or controlled according to the inspection result of the traffic light, for example, when the position of the traffic light frame relative to the camera is detected to be greater than a certain threshold value, the vehicle can be judged to be far away from the traffic light intersection and can continue to run, and when the position of the traffic light frame relative to the camera is smaller than the certain threshold value, the vehicle can be judged to be close to the traffic light intersection, and braking measures and the like are required to be taken in advance.
Optionally, in step S13, based on the object class that may appear in the preset scene, the identifying of the target object is performed on the superimposed image, where the identifying includes: step S133, based on the object class that may appear in the preset scene, semantically segmenting the superimposed image, and generating a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image.
The mask map is an image generated in the image recognition process and used for recognizing each target in the map, the boundary of the target can be labeled by a labeling frame for different targets in the map, and different types of targets can be labeled by different colors. For example, vehicles are labeled green and pedestrians are labeled red.
The above-mentioned semantic segmentation is performed on the superimposed image based on the object classes that may appear in the preset scene, and a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image is generated, as shown in fig. 4, fig. 4 is an example diagram of image recognition provided in the embodiment of the present application, and is implemented by presetting an Encode-Decode convolutional neural network model. As shown in fig. 4, after the long frame image and the short frame image are obtained, feature extraction may be selectively performed, and then feature image fusion is performed in a channel superposition manner or the like to obtain a fusion image, or the long frame image and the short frame image may be directly subjected to image fusion, and then semantic segmentation is performed by encoding and decoding the fusion image to generate a mask image. The current fusion image can be subjected to feature extraction through an encoding process, and a segmentation image consistent with the resolution of the input image can be restored layer by layer according to the feature image through a decoding process. Based on the preset object category, semantic segmentation of the current fusion image can be realized through an encoding process, and a mask image corresponding to the current fusion image is generated according to a plurality of target objects and a plurality of target categories and can be realized through a decoding process. The convolutional neural network model of the Encode-Decode can be realized by adopting a convolutional neural network, and the training process of the convolutional neural network model of the Encode-Decode can be finished by adopting a supervision mode. For example, a large number of pictures including artificial marks are input into a convolutional neural network model to be trained, the large number of pictures are subjected to semantic segmentation and mask image output through the convolutional neural network model to be trained, then the output result is compared with the artificial marks, the error of the convolutional neural network model to be trained is calculated, the weight of the convolutional neural network model to be trained is revised according to the obtained error, training is carried out again until the calculated error is smaller than a preset error threshold, and the trained convolutional neural network model is obtained.
In some embodiments, in order to further improve the accuracy of traffic sign recognition, after semantic segmentation is performed on the superimposed image based on object classes that may appear in a preset scene, and a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image is generated, the method further includes performing traffic sign recognition based on the mask map, referring to fig. 5, where fig. 5 is a third flowchart of an image recognition method provided in an embodiment of the present application, and includes:
in step S11, at least two sets of single-exposure image information within one frame time are acquired, where each set of single-exposure image information is acquired during a single exposure.
The specific implementation manner of this step may be the same as step S11 in fig. 1, and may specifically refer to fig. 1, which is not described herein again.
Step S12, at least two sets of single-exposure image information are superimposed.
The specific implementation manner of this step may be the same as step S12 in fig. 1, and may specifically refer to fig. 1, which is not described herein again.
Step S133, based on the object class that may appear in the preset scene, semantically segmenting the superimposed image, and generating a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image.
The specific implementation manner of this step may be the same as that in the above embodiment, and specifically, the implementation manner may be the same as that in the above embodiment, and details are not described here.
Step S14 is to determine whether or not the plurality of target objects in the superimposed image include traffic signs based on the mask map.
The traffic sign image in the present application may be an image of a road sign, such as a left turn prohibition, a whistling prohibition, a speed limit of 80km/h, and the like, or may be information played on a display screen for indicating a traffic condition, such as a detour request for a preceding accident. This is not limited in this application.
Whether the multiple target objects in the superimposed image contain the traffic signs or not is judged based on the mask map, and when the mask map for marking the multiple target objects in the superimposed image and the target categories of the multiple target objects is generated, the target categories include the categories corresponding to the traffic signs, and whether the multiple target objects in the current fusion characteristic image contain the traffic signs or not is judged according to the categories corresponding to the traffic signs identified in the process of generating the mask map. For example, when a mask map corresponding to the current fusion feature image is generated according to the multiple target objects and the multiple target categories, each category of target is marked with one color, for example, the traffic sign is red, and when it is determined whether the multiple target objects in the current fusion feature image include the traffic sign, it is only necessary to check the traffic sign or determine whether the mask map includes the target marked with red.
In step S15, if the superimposed image includes a traffic sign, the image at the corresponding position in the superimposed image is recognized based on the position of the traffic sign in the mask map.
The image recognition method includes the steps of generating a mask map corresponding to a current fusion image according to a plurality of target objects and a plurality of target categories when the mask map is generated, and marking each target object as the same color according to the target category and the position of each target object to obtain the mask map. Then, based on the position of the traffic sign in the mask image, the mask image is compared with the superposed image to determine the position of the traffic sign in the superposed image, and the image of the corresponding position in the superposed image is identified. For example, the images at the corresponding positions in the superimposed images are subjected to image recognition through a pre-trained traffic sign recognition model. The training process of the traffic sign recognition model trained in advance is similar to that of the feature extraction model, and the training process of the feature extraction model can be referred to. The traffic sign recognition result can comprise the conditions of forbidding left turn, forbidding whistling, limiting the speed by 80km/h and the like.
By applying the method provided by the embodiment of the application, whether the multiple target objects in the superposed image contain the traffic signs or not is judged based on the mask image, and if the superposed image contains the traffic signs, the image recognition is carried out on the image at the corresponding position in the superposed image based on the position of the traffic signs in the mask image. The traffic signs in the acquired images can be identified to obtain an identification result, so that the computer system in the vehicle can control or intervene the vehicle according to the identification result.
In some embodiments, in order to further improve the quality of the acquired image, after performing semantic segmentation on the superimposed image based on object classes that may appear in a preset scene and generating a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image, the method further includes performing wide dynamic fusion on at least two sets of single-exposure image information, see fig. 6, where fig. 6 is a fourth flowchart of the image recognition method provided in the embodiment of the present application, and the fourth flowchart includes:
in step S11, at least two sets of single-exposure image information within one frame time are acquired, where each set of single-exposure image information is acquired during a single exposure.
The specific implementation manner of this step may be the same as step S11 in fig. 1, and may specifically refer to fig. 1, which is not described herein again.
Step S12, at least two sets of single-exposure image information are superimposed.
The specific implementation manner of this step may be the same as step S12 in fig. 1, and may specifically refer to fig. 1, which is not described herein again.
Step S133, based on the object class that may appear in the preset scene, semantically segmenting the superimposed image, and generating a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image.
The specific implementation manner of this step may be the same as that in the above embodiment, and specifically, the implementation manner may be the same as that in the above embodiment, and details are not described here.
Step S16, calculating the weight of each pixel point in combination with the pixel type information according to the target type of the plurality of target objects in the mask map and the preset pixel value of the mask map of each preset target type.
The weight of each pixel point combined with the pixel type information is calculated according to the target types of a plurality of target objects in the mask image and the preset pixel values of the mask image of each preset target type, different preset pixel values can be set for each target type, and the weight of each pixel point combined with the pixel type information is calculated according to the preset pixel values of each target type.
Optionally, calculating the weight of each pixel point combined with the pixel type information according to the target types of the plurality of target objects in the mask image and preset pixel values of the mask image of each preset target type, including:
according to the target classes of a plurality of target objects in the mask image and preset pixel values of the mask image of each preset target class, through a preset formula:
calculating the weight of each pixel point combined with the pixel type information, wherein Wci(x, y) is the weight of the pixel point at the target coordinate (x, y) in combination with the pixel type information, Ici(x, y) is the pixel value at the c-th type target coordinate of (x, y) in the i-th image, μc(x, y) is the preset pixel value, σ, for the class c target class in the mask mapcIs the variance of the preset pixel values for each target class in the mask map.
And step S17, based on the weight of each pixel point combined with the pixel type information, performing wide dynamic fusion on at least two groups of single-exposure image information.
Optionally, based on the weight of each pixel point combined with the pixel type information, performing wide dynamic fusion on at least two groups of single-exposure image information, including:
based on the weight of each pixel point combined with the pixel type information, through a formula:
IWDR=∑iIi(x,y)*Wci(x,y)/∑iWci(x,y),
calculating to obtain each image in the wide dynamic fusion imagePixel value of a pixel point, wherein Ii(x, y) represents the pixel value at the coordinate (x, y) in the ith image in at least two sets of single-exposure image information, IWDRThe pixel value of the pixel point with the coordinate (x, y) in the wide dynamic fusion image is shown.
By applying the image identification method provided by the embodiment of the application, the weight of each pixel point combined with the pixel type information is calculated according to the target types of a plurality of target objects in the mask image and the preset pixel value of each preset target type of the mask image, and the wide dynamic fusion of at least two groups of single-exposure image information is carried out based on the weight of each pixel point combined with the pixel type information. Therefore, each pixel point can be combined with the weight of the pixel type information to perform wide dynamic fusion on the long frame image and the short frame image to obtain a wide dynamic fusion image, and the quality of the obtained image is improved.
Referring to fig. 7, fig. 7 is a diagram illustrating an example of image recognition performed by a computer system in a vehicle according to an embodiment of the present application, including:
when the image recognition method is applied to the computer system in the unmanned vehicle, the camera on the vehicle connected with the computer system in the unmanned vehicle can be used for collecting the image of the environment around the vehicle.
The camera on the unmanned vehicle can collect a plurality of long frame images and short frame images aiming at the same scene, feature extraction can be selectively carried out on the collected long frame images and short frame images, then fusion of the feature images is carried out in the modes of channel superposition and the like to obtain fused images, the fused images are obtained, and the long frame images and the short frame images can also be directly subjected to image fusion to obtain the fused images.
Through the fused images, a computer system in the unmanned vehicle can identify traffic lights. The traffic light frame can be checked through a pre-trained network model, for example, the traffic light frame can be checked through a pre-trained YOLO detection network model. And judging whether a traffic light frame exists or not, and directly acquiring the traffic light frame image of the fused image when the traffic light frame exists. And then, identifying the color and the shape of the traffic light in the traffic light frame image through a pre-trained traffic light identification model to obtain an identification result. The recognition result may include: red, green, yellow. The unmanned vehicle can determine whether to brake or continue traveling according to the recognition result.
Based on the fused image, the computer system in the unmanned vehicle can also perform identification of traffic signs. The fused image is semantically segmented by an encoding and decoding model to generate a mask map for identifying a plurality of target objects and target classes of the plurality of target objects in the current fused image, for example, the mask map for identifying the plurality of target objects and the target classes of the plurality of target objects in the current fused image is semantically segmented by a pre-trained convolutional neural network model of Encode-Decode. When performing semantic segmentation, performing semantic segmentation on the current fusion image based on a preset object class, where the preset object class may include a traffic sign. And then acquiring an image corresponding to the traffic sign in the fused image according to the position of the traffic sign, and recognizing the image of the traffic sign through a pre-trained traffic sign recognition model to obtain a recognition result. Wherein, the identification result may include: forbidding left turn, forbidding whistling, limiting speed by 80km/h and the like. The computer system in the unmanned vehicle can control the running state of the vehicle according to the recognition result of the traffic sign, for example, when the recognition result is the speed limit of 80km/h, the speed limit can be compared with the current vehicle speed, and when the vehicle speed exceeds 80km/h, a deceleration measure is taken.
Based on the fused image and the mask image, the computer system in the unmanned vehicle can also perform wide dynamic fusion to obtain a wide dynamic fusion image. The method comprises the steps of calculating the weight of each pixel point combined with pixel type information according to target types of a plurality of target objects in a mask image and preset pixel values of the preset mask image of each target type, carrying out wide dynamic fusion on a long frame image and a short frame image based on the weight of each pixel point combined with the pixel type information to obtain a wide dynamic fusion image, and obtaining the details of the dark part of the image through the wide dynamic fusion image, wherein the bright part of the image is not too saturated. And the computer system in the unmanned vehicle can record the driving state of the vehicle according to the wide dynamic fusion image or display the driving state through a streaming media rearview mirror and the like for the personnel in the vehicle to watch.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application, including:
an image obtaining module 801, configured to obtain at least two sets of single-exposure image information within one frame time, where each set of single-exposure image information is obtained in a single-exposure process;
an image overlaying module 802, configured to overlay at least two sets of single-exposure image information;
and the target identification module 803 is configured to identify a target object for the superimposed image based on an object type that may appear in a preset scene.
Optionally, the image overlaying module 802 includes:
the data acquisition sub-module is used for respectively extracting the characteristics of each group of single-exposure image information images in the at least two groups of single-exposure image information to obtain the data of each image channel of each group of single-exposure image information in the at least two groups of single-exposure image information;
and the data superposition submodule is used for superposing the data of the same image channel in the data of each image channel.
Optionally, the target identifying module 803 includes:
the lamp frame detection submodule is used for detecting whether the superposed images contain the traffic light lamp frame images or not;
and the color identification submodule is used for identifying the color and the shape of the traffic light frame image after the traffic light frame image is detected.
Optionally, the target identifying module 803 includes:
and the semantic segmentation submodule is used for performing semantic segmentation on the superposed image based on object classes which can appear in a preset scene to generate a mask image for identifying a plurality of target objects and target classes of the plurality of target objects in the superposed image.
Optionally, the apparatus further comprises:
the traffic sign judging module is used for judging whether a plurality of target objects in the superposed image contain traffic signs or not based on the mask image;
and the traffic sign recognition module is used for carrying out image recognition on the image at the corresponding position in the superposed image based on the position of the traffic sign in the mask image if the superposed image contains the traffic sign.
Optionally, the apparatus further comprises:
the weight calculation module is used for calculating the weight of each pixel point combined with the pixel type information according to the target types of the target objects in the mask image and the preset pixel values of the mask image of each preset target type;
and the wide dynamic fusion module is used for performing wide dynamic fusion on at least two groups of single-exposure image information based on the weight of each pixel point combined with the pixel type information.
Optionally, the weight calculating module is specifically configured to, according to the target classes of the plurality of target objects in the mask image and preset pixel values of the mask image of each preset target class, through a preset formula:
calculating the weight of each pixel point combined with the pixel type information, wherein Wci(x, y) is the weight of the pixel point at the target coordinate (x, y) in combination with the pixel type information, Ici(x, y) is the pixel value at the c-th type target coordinate of (x, y) in the i-th image, μc(x, y) is the preset pixel value, σ, for the class c target class in the mask mapcIs the variance of the preset pixel values for each target class in the mask map.
Optionally, the wide dynamic fusion module is specifically configured to combine the weight of the pixel type information based on each pixel point, and according to a formula:
IWDR=∑iIi(x,y)*Wci(x,y)/∑iWci(x,y),
calculating to obtain the pixel value of each pixel point in the wide dynamic fusion image; wherein, Ii(x, y) represents the pixel value at the coordinate (x, y) in the ith image in at least two sets of single-exposure image information, IWDRThe pixel value of the pixel point with the coordinate (x, y) in the wide dynamic fusion image is shown.
By applying the image identification method provided by the embodiment of the application, after at least two groups of single-exposure image information within one frame time are obtained, the at least two groups of single-exposure image information are superposed; the target object is identified for the superposed image based on the object type which can appear in the preset scene, identification after wide dynamic fusion is not needed, the problem that the accuracy of identification is not high enough due to the loss of part of details in the wide dynamic fusion process is avoided, and therefore the accuracy of image identification can be improved.
Referring to fig. 9, an embodiment of the present invention further provides an electronic device, including a processor, 901 and a memory 902;
a memory 902 for storing a computer program;
the processor 901 is configured to implement the following steps when executing the program stored in the memory:
acquiring at least two groups of single-exposure image information within one frame time, wherein each group of single-exposure image information is image information acquired in a single-exposure process;
superposing at least two groups of single-exposure image information;
and identifying the target object of the superposed image based on the object type which can appear in the preset scene.
Alternatively, when the processor 901 executes a program stored in the memory 902, any of the image recognition methods described above may be implemented.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program realizes the steps of any one of the above image recognition methods when executed by a processor.
In a further embodiment, the present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the image recognition methods of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Claims (17)
1. An image recognition method, comprising:
acquiring at least two groups of single-exposure image information within one frame time, wherein each group of single-exposure image information is image information acquired in a single-exposure process;
superposing the at least two groups of single-exposure image information;
and identifying the target object of the superposed image based on the object type which can appear in the preset scene.
2. The method of claim 1, wherein said superimposing said at least two sets of single-exposure image information comprises:
respectively extracting the characteristics of each group of single-exposure image information images in the at least two groups of single-exposure image information to obtain the data of each image channel of each group of single-exposure image information in the at least two groups of single-exposure image information;
and superposing the data of the same image channel in the data of each image channel.
3. The method according to claim 1, wherein the identifying the target object for the superimposed image based on the object class that can appear in the preset scene comprises:
detecting whether the superimposed image contains a traffic light frame image or not;
and after the traffic light frame image is detected, identifying the color and the shape of the traffic light for the traffic light frame image.
4. The method according to claim 1, wherein the identifying the target object for the superimposed image based on the object class that can appear in the preset scene comprises:
and performing semantic segmentation on the superposed image based on object classes which can appear in a preset scene to generate a mask image for identifying a plurality of target objects in the superposed image and the target classes of the target objects.
5. The method of claim 4, wherein after semantically segmenting the superimposed image based on object classes that may occur in a preset scene, generating a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image, the method further comprises:
judging whether a plurality of target objects in the superposed image contain traffic marks or not based on the mask image;
and if the superposed image contains the traffic sign, carrying out image recognition on the image at the corresponding position in the superposed image based on the position of the traffic sign in the mask image.
6. The method of claim 4, further comprising:
calculating the weight of each pixel point combined with pixel type information according to the target types of a plurality of target objects in the mask image and preset pixel values of the mask image of each preset target type;
and performing wide dynamic fusion on the at least two groups of single-exposure image information based on the weight of each pixel point combined with the pixel type information.
7. The method according to claim 6, wherein the calculating the weight of each pixel point in combination with the pixel type information according to the target class of the plurality of target objects in the mask image and the preset pixel value of the preset mask image of each target class comprises:
according to the target classes of a plurality of target objects in the mask image and preset pixel values of the mask image of each preset target class, through a preset formula:
calculating the weight of each pixel point combined with the pixel type information, wherein the Wci(x, y) is the weight of the pixel point at the target coordinate (x, y) in combination with the pixel type information, Ici(x, y) is the pixel value at the c-th type target coordinate (x, y) in the i-th image, and the μc(x, y) is the preset pixel value of the c-th class of targets in the mask map, σcIs the variance of the preset pixel values for each target class in the mask map.
8. The method according to claim 7, wherein said performing wide dynamic fusion on said at least two sets of single-exposure image information based on weights of said respective pixel points in combination with pixel type information comprises:
based on the weight of each pixel point combined with the pixel type information, through a formula:
IWDR=∑iIi(x,y)*Wci(x,y)/∑iWci(x,y),
calculating to obtain the pixel value of each pixel point in the wide dynamic fusion image, wherein the Ii(x, y) represents the pixel value at the coordinate (x, y) in the ith image in at least two sets of single-exposure image information, wherein I isWDRThe pixel value of the pixel point with the coordinate (x, y) in the wide dynamic fusion image is shown.
9. An image recognition apparatus, comprising:
the image acquisition module is used for acquiring at least two groups of single-exposure image information within one frame time, wherein each group of single-exposure image information is the image information acquired in the process of one exposure;
the image superposition module is used for superposing the at least two groups of single-exposure image information;
and the target identification module is used for identifying the target object of the superposed image based on the object type which can appear in the preset scene.
10. The apparatus of claim 9, wherein the image overlay module comprises:
the data acquisition sub-module is used for respectively extracting the characteristics of each group of single-exposure image information images in the at least two groups of single-exposure image information to obtain the data of each image channel of each group of single-exposure image information in the at least two groups of single-exposure image information;
and the data superposition submodule is used for superposing the data of the same image channel in the data of each image channel.
11. The apparatus of claim 9, wherein the object recognition module comprises:
the lamp frame detection submodule is used for detecting whether the superposed images contain the traffic light lamp frame images or not;
and the color identification submodule is used for identifying the color and the shape of the traffic light frame image after the traffic light frame image is detected.
12. The apparatus of claim 9, wherein the object recognition module comprises:
and the semantic segmentation submodule is used for performing semantic segmentation on the superposed image based on object categories which can appear in a preset scene to generate a mask image for identifying a plurality of target objects in the superposed image and the target categories of the plurality of target objects.
13. The apparatus of claim 12, further comprising:
the traffic sign judging module is used for judging whether a plurality of target objects in the superposed image contain traffic signs or not based on the mask image;
and the traffic sign recognition module is used for carrying out image recognition on the image at the corresponding position in the superposed image based on the position of the traffic sign in the mask image if the superposed image contains the traffic sign.
14. The apparatus of claim 13, further comprising:
the weight calculation module is used for calculating the weight of each pixel point combined with the pixel type information according to the target types of the target objects in the mask image and the preset pixel values of the mask image of each preset target type;
and the wide dynamic fusion module is used for performing wide dynamic fusion on the at least two groups of single-exposure image information based on the weight of each pixel point combined with the pixel type information.
15. The apparatus of claim 14,
the weight calculation module is specifically configured to, according to the target categories of the plurality of target objects in the mask image and preset pixel values of the mask image of each preset target category, through a preset formula:
calculating the weight of each pixel point combined with the pixel type information, wherein the Wci(x, y) is the weight of the pixel point at the target coordinate (x, y) in combination with the pixel type information, Ici(x, y) is the pixel value at the c-th type target coordinate (x, y) in the i-th image, and the μc(x, y) is the preset pixel value of the c-th class of targets in the mask map, σcIs the variance of the preset pixel values for each target class in the mask map.
16. The apparatus of claim 15,
the wide dynamic fusion module is specifically configured to combine the weight of the pixel type information with the weight of each pixel point, and according to a formula:
IWDR=∑iIi(x,y)*Wci(x,y)/∑iWci(x,y),
calculating to obtain the pixel value of each pixel point in the wide dynamic fusion image, wherein the Ii(x, y) represents the pixel value at the coordinate (x, y) in the ith image in at least two sets of single-exposure image information, wherein I isWDRThe pixel value of the pixel point with the coordinate (x, y) in the wide dynamic fusion image is shown.
17. An electronic device comprising a processor, a memory;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 8 when executing a program stored in the memory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011544194.3A CN112528944B (en) | 2020-12-23 | 2020-12-23 | Image recognition method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011544194.3A CN112528944B (en) | 2020-12-23 | 2020-12-23 | Image recognition method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112528944A true CN112528944A (en) | 2021-03-19 |
CN112528944B CN112528944B (en) | 2024-08-06 |
Family
ID=74976066
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011544194.3A Active CN112528944B (en) | 2020-12-23 | 2020-12-23 | Image recognition method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112528944B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255609A (en) * | 2021-07-02 | 2021-08-13 | 智道网联科技(北京)有限公司 | Traffic identification recognition method and device based on neural network model |
WO2023126736A1 (en) * | 2021-12-30 | 2023-07-06 | Mobileye Vision Technologies Ltd. | Image position dependent blur control within hdr blending scheme |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101888487A (en) * | 2010-06-02 | 2010-11-17 | 中国科学院深圳先进技术研究院 | High dynamic range video imaging system and image generating method |
CN103973990A (en) * | 2014-05-05 | 2014-08-06 | 浙江宇视科技有限公司 | Wide dynamic fusion method and device |
CN104182968A (en) * | 2014-08-05 | 2014-12-03 | 西北工业大学 | Method for segmenting fuzzy moving targets by wide-baseline multi-array optical detection system |
CN107153816A (en) * | 2017-04-16 | 2017-09-12 | 五邑大学 | A kind of data enhancement methods recognized for robust human face |
CN107730481A (en) * | 2017-09-19 | 2018-02-23 | 浙江大华技术股份有限公司 | A kind of traffic lights image processing method and traffic lights image processing apparatus |
US20180302543A1 (en) * | 2017-04-18 | 2018-10-18 | Qualcomm Incorporated | Hdr/wdr image time stamps for sensor fusion |
CN109035181A (en) * | 2017-06-08 | 2018-12-18 | 泰邦泰平科技(北京)有限公司 | A kind of wide dynamic range image processing method based on mean picture brightness |
CN109429001A (en) * | 2017-08-25 | 2019-03-05 | 杭州海康威视数字技术股份有限公司 | Image-pickup method, device, electronic equipment and computer readable storage medium |
CN110351489A (en) * | 2018-04-04 | 2019-10-18 | 展讯通信(天津)有限公司 | Generate the method, apparatus and mobile terminal of HDR image |
CN110516610A (en) * | 2019-08-28 | 2019-11-29 | 上海眼控科技股份有限公司 | A kind of method and apparatus for road feature extraction |
CN110619593A (en) * | 2019-07-30 | 2019-12-27 | 西安电子科技大学 | Double-exposure video imaging system based on dynamic scene |
CN110728620A (en) * | 2019-09-30 | 2020-01-24 | 北京市商汤科技开发有限公司 | Image processing method and device and electronic equipment |
WO2020078269A1 (en) * | 2018-10-16 | 2020-04-23 | 腾讯科技(深圳)有限公司 | Method and device for three-dimensional image semantic segmentation, terminal and storage medium |
CN111127358A (en) * | 2019-12-19 | 2020-05-08 | 苏州科达科技股份有限公司 | Image processing method, device and storage medium |
CN111246052A (en) * | 2020-01-21 | 2020-06-05 | 浙江大华技术股份有限公司 | Wide dynamic adjustment method and device, storage medium and electronic device |
CN111368845A (en) * | 2020-03-16 | 2020-07-03 | 河南工业大学 | Feature dictionary construction and image segmentation method based on deep learning |
CN111489320A (en) * | 2019-01-29 | 2020-08-04 | 华为技术有限公司 | Image processing method and device |
CN111886625A (en) * | 2019-05-13 | 2020-11-03 | 深圳市大疆创新科技有限公司 | Image fusion method, image acquisition equipment and movable platform |
CN112085673A (en) * | 2020-08-27 | 2020-12-15 | 宁波大学 | Multi-exposure image fusion method for removing strong ghost |
-
2020
- 2020-12-23 CN CN202011544194.3A patent/CN112528944B/en active Active
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101888487A (en) * | 2010-06-02 | 2010-11-17 | 中国科学院深圳先进技术研究院 | High dynamic range video imaging system and image generating method |
CN103973990A (en) * | 2014-05-05 | 2014-08-06 | 浙江宇视科技有限公司 | Wide dynamic fusion method and device |
CN104182968A (en) * | 2014-08-05 | 2014-12-03 | 西北工业大学 | Method for segmenting fuzzy moving targets by wide-baseline multi-array optical detection system |
CN107153816A (en) * | 2017-04-16 | 2017-09-12 | 五邑大学 | A kind of data enhancement methods recognized for robust human face |
US20180302543A1 (en) * | 2017-04-18 | 2018-10-18 | Qualcomm Incorporated | Hdr/wdr image time stamps for sensor fusion |
CN109035181A (en) * | 2017-06-08 | 2018-12-18 | 泰邦泰平科技(北京)有限公司 | A kind of wide dynamic range image processing method based on mean picture brightness |
CN109429001A (en) * | 2017-08-25 | 2019-03-05 | 杭州海康威视数字技术股份有限公司 | Image-pickup method, device, electronic equipment and computer readable storage medium |
CN107730481A (en) * | 2017-09-19 | 2018-02-23 | 浙江大华技术股份有限公司 | A kind of traffic lights image processing method and traffic lights image processing apparatus |
CN110351489A (en) * | 2018-04-04 | 2019-10-18 | 展讯通信(天津)有限公司 | Generate the method, apparatus and mobile terminal of HDR image |
WO2020078269A1 (en) * | 2018-10-16 | 2020-04-23 | 腾讯科技(深圳)有限公司 | Method and device for three-dimensional image semantic segmentation, terminal and storage medium |
CN111489320A (en) * | 2019-01-29 | 2020-08-04 | 华为技术有限公司 | Image processing method and device |
CN111886625A (en) * | 2019-05-13 | 2020-11-03 | 深圳市大疆创新科技有限公司 | Image fusion method, image acquisition equipment and movable platform |
CN110619593A (en) * | 2019-07-30 | 2019-12-27 | 西安电子科技大学 | Double-exposure video imaging system based on dynamic scene |
CN110516610A (en) * | 2019-08-28 | 2019-11-29 | 上海眼控科技股份有限公司 | A kind of method and apparatus for road feature extraction |
CN110728620A (en) * | 2019-09-30 | 2020-01-24 | 北京市商汤科技开发有限公司 | Image processing method and device and electronic equipment |
CN111127358A (en) * | 2019-12-19 | 2020-05-08 | 苏州科达科技股份有限公司 | Image processing method, device and storage medium |
CN111246052A (en) * | 2020-01-21 | 2020-06-05 | 浙江大华技术股份有限公司 | Wide dynamic adjustment method and device, storage medium and electronic device |
CN111368845A (en) * | 2020-03-16 | 2020-07-03 | 河南工业大学 | Feature dictionary construction and image segmentation method based on deep learning |
CN112085673A (en) * | 2020-08-27 | 2020-12-15 | 宁波大学 | Multi-exposure image fusion method for removing strong ghost |
Non-Patent Citations (4)
Title |
---|
JONATHAN LONG等: "Fully Convolutional Networks for Semantic Segmentation", 《PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》, 31 December 2015 (2015-12-31), pages 3431 - 3440 * |
K. RAM PRABHAKAR等: "DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs", 《ARXIV》, 23 December 2017 (2017-12-23), pages 1 - 9 * |
九点澡堂子: "Kernels(similarity)核函数", pages 1, Retrieved from the Internet <URL:《https://blog.csdn.net/weixin_38278334/article/details/82289378》> * |
叶年进: "基于深度学习的HDR成像方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 07, 15 July 2020 (2020-07-15), pages 138 - 917 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255609A (en) * | 2021-07-02 | 2021-08-13 | 智道网联科技(北京)有限公司 | Traffic identification recognition method and device based on neural network model |
CN113255609B (en) * | 2021-07-02 | 2021-10-29 | 智道网联科技(北京)有限公司 | Traffic identification recognition method and device based on neural network model |
WO2023126736A1 (en) * | 2021-12-30 | 2023-07-06 | Mobileye Vision Technologies Ltd. | Image position dependent blur control within hdr blending scheme |
Also Published As
Publication number | Publication date |
---|---|
CN112528944B (en) | 2024-08-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019223586A1 (en) | Method and apparatus for detecting parking space usage condition, electronic device, and storage medium | |
US11380104B2 (en) | Method and device for detecting illegal parking, and electronic device | |
US20130336581A1 (en) | Multi-cue object detection and analysis | |
CN110689724B (en) | Automatic motor vehicle zebra crossing present pedestrian auditing method based on deep learning | |
WO2016145626A1 (en) | Traffic abnormity detection method and device, and image monitoring system | |
CN111814593B (en) | Traffic scene analysis method and equipment and storage medium | |
JP6700373B2 (en) | Apparatus and method for learning object image packaging for artificial intelligence of video animation | |
WO2020007589A1 (en) | Training a deep convolutional neural network for individual routes | |
CN112528944A (en) | Image identification method and device, electronic equipment and storage medium | |
US20210295058A1 (en) | Apparatus, method, and computer program for identifying state of object, and controller | |
CN111539268A (en) | Road condition early warning method and device during vehicle running and electronic equipment | |
CN111724607B (en) | Steering lamp use detection method and device, computer equipment and storage medium | |
CN114419552A (en) | Illegal vehicle tracking method and system based on target detection | |
CN108573244B (en) | Vehicle detection method, device and system | |
CN112699711B (en) | Lane line detection method and device, storage medium and electronic equipment | |
CN111768630A (en) | Violation waste image detection method and device and electronic equipment | |
Špoljar et al. | Lane detection and lane departure warning using front view camera in vehicle | |
CN112784817B (en) | Method, device and equipment for detecting lane where vehicle is located and storage medium | |
CN113435350A (en) | Traffic marking detection method, device, equipment and medium | |
CN114040094A (en) | Method and equipment for adjusting preset position based on pan-tilt camera | |
CN114141022A (en) | Emergency lane occupation behavior detection method and device, electronic equipment and storage medium | |
CN114693722B (en) | Vehicle driving behavior detection method, detection device and detection equipment | |
CN115761699A (en) | Traffic signal lamp classification method and device and electronic equipment | |
CN115762153A (en) | Method and device for detecting backing up | |
CN115249407B (en) | Indicator light state identification method and device, electronic equipment, storage medium and product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |