CN112528944A

CN112528944A - Image identification method and device, electronic equipment and storage medium

Info

Publication number: CN112528944A
Application number: CN202011544194.3A
Authority: CN
Inventors: 邝宏武
Original assignee: Hangzhou Haikang Auto Software Co ltd
Current assignee: Hangzhou Haikang Auto Software Co ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-03-19
Anticipated expiration: 2040-12-23
Also published as: CN112528944B

Abstract

The image identification method, the image identification device, the electronic equipment and the storage medium provided by the embodiment of the invention can acquire at least two groups of single-exposure image information within one frame time, wherein each group of single-exposure image information is the image information acquired in the process of one-time exposure; superposing at least two groups of single-exposure image information; and identifying the target object of the superposed image based on the object type which can appear in the preset scene. The method has the advantages that identification is not needed after the wide dynamic fusion, so that the problem that the identification accuracy is not high enough due to the loss of part of details in the wide dynamic fusion process is solved, and the image identification accuracy can be improved.

Description

Image identification method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image recognition method and apparatus, an electronic device, and a storage medium.

Background

At present, image recognition has been widely applied, for example, in the field of automatic driving technology, traffic signs at intersections can be recognized by recognizing images acquired in real time, so as to take measures such as corresponding steering and braking. During image recognition, in order to obtain the detailed characteristics of a high illumination area and a low illumination area, a long-frame image with long exposure and a short-frame image with short exposure are fused through wide dynamic fusion, and then image recognition is performed according to the fused images.

However, under the condition of uneven illumination in the image, when the images of long exposure and short exposure are subjected to wide dynamic fusion, due to the difference of each target in the image, the wide dynamic fusion image obtained after the wide dynamic fusion still has over-exposure or under-exposure of a part of the targets, which results in loss of a part of details in the wide dynamic fusion image, so that the identification accuracy is not high enough when the image identification is performed based on the wide dynamic fusion image.

Disclosure of Invention

An embodiment of the invention provides an image recognition method, an image recognition device, an electronic device and a storage medium, which are used for improving the accuracy of image recognition. The specific technical scheme is as follows:

in a first aspect of the present invention, there is provided an image recognition method, including:

acquiring at least two groups of single-exposure image information within one frame time, wherein each group of single-exposure image information is image information acquired in a single-exposure process;

superposing at least two groups of single-exposure image information;

and identifying the target object of the superposed image based on the object type which can appear in the preset scene.

Optionally, the superimposing at least two sets of single-exposure image information includes:

respectively extracting the characteristics of each group of single-exposure image information images in the at least two groups of single-exposure image information to obtain the data of each image channel of each group of single-exposure image information in the at least two groups of single-exposure image information;

and overlapping the data of the same image channel in the data of each image channel.

Optionally, the identifying of the target object is performed on the superimposed image based on an object type that may appear in a preset scene, and the identifying includes:

detecting whether the superimposed images contain traffic light frame images or not;

and after the traffic light frame image is detected, identifying the color and the shape of the traffic light for the traffic light frame image.

and performing semantic segmentation on the superposed image based on object classes which can appear in a preset scene to generate a mask image for identifying a plurality of target objects and target classes of the target objects in the superposed image.

Optionally, after performing semantic segmentation on the superimposed image based on object classes that may appear in a preset scene, and generating a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image, the method further includes:

judging whether a plurality of target objects in the superposed image contain traffic marks or not based on the mask image;

and if the superposed images contain the traffic signs, carrying out image recognition on the images at the corresponding positions in the superposed images based on the positions of the traffic signs in the mask images.

Optionally, the method further includes:

calculating the weight of each pixel point combined with pixel type information according to the target types of a plurality of target objects in the mask image and preset pixel values of the mask image of each preset target type;

and performing wide dynamic fusion on at least two groups of single-exposure image information based on the weight of each pixel point combined with the pixel type information.

Optionally, calculating the weight of each pixel point combined with the pixel type information according to the target types of the plurality of target objects in the mask image and preset pixel values of the mask image of each preset target type, including:

according to the target classes of a plurality of target objects in the mask image and preset pixel values of the mask image of each preset target class, through a preset formula:

calculating the weight of each pixel point combined with the pixel type information, wherein W_ci(x, y) is the weight of the pixel point at the target coordinate (x, y) in combination with the pixel type information, I_ci(x, y) is the pixel value at the c-th type target coordinate of (x, y) in the i-th image, μ_c(x, y) is the preset pixel value, σ, for the class c target class in the mask map_cIs the variance of the preset pixel values for each target class in the mask map.

Optionally, based on the weight of each pixel point combined with the pixel type information, performing wide dynamic fusion on at least two groups of single-exposure image information, including:

based on the weight of each pixel point combined with the pixel type information, through a formula:

I_WDR＝∑_iI_i(x,y)*W_ci(x,y)/∑_iW_ci(x,y)，

calculating to obtain the pixel value of each pixel point in the wide dynamic fusion image, wherein I_i(x, y) represents the pixel value at the coordinate (x, y) in the ith image in at least two sets of single-exposure image information, I_WDRIs a pixel point with coordinates (x, y) in the wide dynamic fusion imageThe pixel value of (2).

In a second aspect of the present invention, there is provided an image recognition apparatus comprising:

the image acquisition module is used for acquiring at least two groups of single-exposure image information within one frame time, wherein each group of single-exposure image information is the image information acquired in the process of one exposure;

the image superposition module is used for superposing at least two groups of single-exposure image information;

and the target identification module is used for identifying the target object of the superposed image based on the object type which can appear in the preset scene.

Optionally, the image overlaying module includes:

the data acquisition sub-module is used for respectively extracting the characteristics of each group of single-exposure image information images in the at least two groups of single-exposure image information to obtain the data of each image channel of each group of single-exposure image information in the at least two groups of single-exposure image information;

and the data superposition submodule is used for superposing the data of the same image channel in the data of each image channel.

Optionally, the object recognition module includes:

the lamp frame detection submodule is used for detecting whether the superposed images contain the traffic light lamp frame images or not;

and the color identification submodule is used for identifying the color and the shape of the traffic light frame image after the traffic light frame image is detected.

Optionally, the object recognition module includes:

and the semantic segmentation submodule is used for performing semantic segmentation on the superposed image based on object classes which can appear in a preset scene to generate a mask image for identifying a plurality of target objects and target classes of the plurality of target objects in the superposed image.

Optionally, the apparatus further comprises:

the traffic sign judging module is used for judging whether a plurality of target objects in the superposed image contain traffic signs or not based on the mask image;

and the traffic sign recognition module is used for carrying out image recognition on the image at the corresponding position in the superposed image based on the position of the traffic sign in the mask image if the superposed image contains the traffic sign.

Optionally, the apparatus further comprises:

the weight calculation module is used for calculating the weight of each pixel point combined with the pixel type information according to the target types of the target objects in the mask image and the preset pixel values of the mask image of each preset target type;

and the wide dynamic fusion module is used for performing wide dynamic fusion on at least two groups of single-exposure image information based on the weight of each pixel point combined with the pixel type information.

Optionally, the weight calculating module is specifically configured to, according to the target classes of the plurality of target objects in the mask image and preset pixel values of the mask image of each preset target class, through a preset formula:

Optionally, the wide dynamic fusion module is specifically configured to combine the weight of the pixel type information based on each pixel point, and according to a formula:

I_WDR＝∑_iI_i(x,y)*W_ci(x,y)/∑_iW_ci(x,y)，

calculating to obtain the pixel value of each pixel point in the wide dynamic fusion image, wherein I_i(x, y) represents at least two sets of single-exposure image informationThe pixel value at coordinate (x, y), I, in the ith image_WDRThe pixel value of the pixel point with the coordinate (x, y) in the wide dynamic fusion image is shown.

In a third aspect of an implementation of the present invention, there is provided an electronic device comprising a processor, a memory;

a memory for storing a computer program;

and a processor for implementing any of the image recognition methods described above when executing the computer program stored in the memory.

In a fourth aspect of the embodiments of the present invention, a computer-readable storage medium is provided, in which a computer program is stored, and the computer program is executed by a processor to implement any one of the image recognition methods described above.

The embodiment of the invention has the following beneficial effects:

the image identification method, the image identification device, the electronic equipment and the storage medium provided by the embodiment of the invention can acquire at least two groups of single-exposure image information within one frame time, wherein each group of single-exposure image information is the image information acquired in the process of one-time exposure; superposing at least two groups of single-exposure image information; and identifying the target object of the superposed image based on the object type which can appear in the preset scene.

In the image recognition process, after at least two groups of single-exposure image information within one frame time are acquired, the at least two groups of single-exposure image information are superposed; the target object is identified for the superposed image based on the object type which can appear in the preset scene, identification after wide dynamic fusion is not needed, the problem that the accuracy of identification is not high enough due to the loss of part of details in the wide dynamic fusion process is avoided, and therefore the accuracy of image identification can be improved.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a first flowchart of an image recognition method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a second image recognition method according to an embodiment of the present application;

fig. 3 is a diagram illustrating an example of traffic light detection provided in an embodiment of the present application;

FIG. 4 is a diagram illustrating an example of image recognition provided by an embodiment of the present application;

fig. 5 is a third flowchart illustrating an image recognition method according to an embodiment of the present application;

fig. 6 is a fourth flowchart illustrating an image recognition method according to an embodiment of the present application;

FIG. 7 is a diagram illustrating an example of image recognition performed by a computer system in a vehicle according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to improve the accuracy of image recognition, the embodiment of the application provides an image recognition method.

In one embodiment of the present application, the method includes:

superposing at least two groups of single-exposure image information;

By applying the image identification method provided by the embodiment of the application, after at least two groups of single-exposure image information within one frame time are obtained, the at least two groups of single-exposure image information are superposed; the target object is identified for the superposed image based on the object type which can appear in the preset scene, identification after wide dynamic fusion is not needed, the problem that the accuracy of identification is not high enough due to the loss of part of details in the wide dynamic fusion process is avoided, and therefore the accuracy of image identification can be improved.

The following describes in detail the image recognition method provided in the embodiments of the present application with specific embodiments.

In the embodiment of the present application, the image recognition may refer to: after at least two groups of single-exposure image information shot aiming at the same scene by the same camera are obtained, wherein each group of single-exposure image information is the image information obtained in the process of one-time exposure, and the targets in the shot at least two groups of single-exposure image information are identified. For example:

in the running process of the vehicle, a computer system in the vehicle can acquire images around the vehicle through a camera connected with the computer system, and whether obstacles exist or not is judged by analyzing the images, so that the driving state of a driver is interfered or an unmanned vehicle is controlled according to the analysis result, for example, emergency braking measures or obstacle avoidance measures are taken. When a computer system in a vehicle acquires images around the vehicle through a camera connected with the computer system, in order to improve the effect of the acquired images, at least two groups of single-exposure image information within one frame time are acquired, the at least two groups of single-exposure image information are superposed, and a target object is identified for the superposed images based on the object type appearing in a preset scene so as to judge whether an obstacle or not exists.

Specifically, referring to fig. 1, fig. 1 is a first schematic flow chart of an image recognition method provided in the embodiment of the present application, including:

in step S11, at least two sets of single-exposure image information within one frame time are acquired, where each set of single-exposure image information is acquired during a single exposure.

The at least two sets of single-exposure image information may be captured for the same scene by the same camera. In practical use, the at least two sets of single-exposure image information may be obtained by a single-frame double-exposure technique or by a multi-frame imaging technique, for example, when obtaining two sets of single-exposure image information, the two sets of single-exposure image information may be a long-frame image and a short-frame image obtained by the single-frame double-exposure technique, respectively.

In this embodiment, the image recognition method may be executed by an intelligent terminal, and the intelligent terminal may be a computer system or a computer in a vehicle of a vehicle, a server, or the like.

Step S12, at least two sets of single-exposure image information are superimposed.

The superposition of the at least two sets of single-exposure image information can be carried out in a channel superposition mode. Optionally, the superimposing at least two sets of single-exposure image information includes: respectively extracting the characteristics of each group of single-exposure image information images in the at least two groups of single-exposure image information to obtain the data of each image channel of each group of single-exposure image information in the at least two groups of single-exposure image information; and overlapping the data of the same image channel in the data of each image channel.

The feature extraction is respectively carried out on each group of single-exposure image information images in the at least two groups of single-exposure image information, and the feature extraction can be realized by adopting a convolutional neural network, and the training process of the convolutional neural network can be completed by adopting a supervision mode. For example, a large number of pictures including artificial markers are input into a convolutional neural network to be trained, feature extraction is performed on the large number of pictures through the convolutional neural network to be trained to obtain data of each image channel, and the data are overlapped to obtain an overlapped image. And then comparing the superposed images with the artificial marks, calculating the error of the convolutional neural network to be trained, correcting the weight of the convolutional neural network to be trained according to the obtained error, and retraining until the calculated error is smaller than a preset error threshold value, thereby obtaining the trained convolutional neural network.

And step S13, identifying the target object of the superposed image based on the object type which can appear in the preset scene.

The object types that may appear in the preset scene may be types set according to the application scene, for example, when the computer system applied in the vehicle identifies images that are captured more than once during the driving of the vehicle, the types may include 5 types of roads, road markings, moving obstacles, traffic signs, and others.

The identification of the target object may be performed on the superimposed image, and the target objects and the target categories of the target objects in the superimposed image may be obtained by performing semantic segmentation on the superimposed image. For example, when detecting whether an object of a preset object class exists in the image, the object of the preset object class exists in the detected image is determined by identifying the class of each object and comparing the identified class with the preset object class.

The target object is identified for the superposed image based on the object type appearing in the preset scene, whether the image has vehicles or pedestrians and the like can be judged, and whether the driving state of the vehicle is interfered or controlled is judged according to the identification result.

In some embodiments, in order to further improve the accuracy of the traffic light recognition, when the superimposed image is recognized for a target object based on an object category that may appear in a preset scene, the method further includes recognizing the color and shape of the traffic light for the superimposed image, see fig. 2, where fig. 2 is a second flowchart of the image recognition method provided in the embodiment of the present application, and includes:

Specifically, at least two sets of single-exposure image information within one frame time are obtained, and each set of single-exposure image information is image information obtained in a single-exposure process, which can be referred to as step S11 in fig. 1

The specific implementation manner of this step may be the same as step S11 in fig. 1, and may specifically refer to fig. 1, which is not described herein again.

The specific implementation manner of this step may be the same as step S12 in fig. 1, and may specifically refer to fig. 1, which is not described herein again.

Step S131, whether the superimposed image contains a traffic light frame image is detected.

As shown in fig. 3, after a long frame image and a short frame image are obtained, feature extraction may be selected, and then fusion of the feature images is performed in a channel superposition mode or the like to obtain a fused image, or the long frame image and the short frame image may be directly subjected to image fusion, then an image of a traffic light frame in the fused image is obtained by checking the red road light frame of the fused image, and then the color and the shape of the traffic light are identified to obtain a traffic light current state identification result. When the detection is performed through a preset neural network model, the neural network model may be a YOLO model (a kind of object detection algorithm). For example, the current fusion image is input into a previously trained YOLO model, and whether the current fusion image includes a traffic light frame image and a position of the traffic light frame image in the current fusion image is determined. The training method of the YOLO model is similar to the training method of the feature extraction model, and is not described here again.

Step S132, after the traffic light frame image is detected, the color and the shape of the traffic light are identified for the traffic light frame image.

The method comprises the steps of detecting a traffic light frame image, identifying the color and the shape of a traffic light of the traffic light frame image, detecting whether the superimposed image contains the traffic light frame image, and intercepting the traffic light frame image when the superimposed image contains the traffic light frame image. And identifying the color and the shape of the traffic light according to the intercepted traffic light frame image to obtain the identification result of the current state of the traffic light. The identification result may include a red light, a green light, and a yellow light, among others. When the camera is a camera in the unmanned vehicle, the unmanned vehicle can take a brake or the like operation by the recognition result.

The color and the shape of the traffic light can be identified through a pre-trained traffic light identification model, the training method of the traffic light identification model is similar to that of the feature extraction model, and the detailed description is omitted here.

Therefore, whether the traffic light frame image is included in the superposed image or not is detected, and after the traffic light frame image is detected, the color and the shape of the traffic light are identified for the traffic light frame image. The current state identification result of the traffic light can be obtained, so that operations such as braking and the like are performed according to the identification result, and the safety in the driving process is ensured.

In the actual use process, the position of the traffic light frame relative to the camera can be determined based on the superposed images; and storing the traffic light state identification result and the position of the traffic light frame relative to the camera.

Determining the position of the traffic light frame relative to the camera based on the superposed images; the traffic light state identification result and the position of the traffic light frame relative to the camera are stored, so that the state of the traffic light can be conveniently detected according to the stored position of the traffic light frame relative to the camera, and the traffic light detection efficiency is improved. For example, when the color and the shape of the traffic light are identified, and the identification result of the current state of the traffic light is obtained as the red light, the fused image at the next moment can be continuously obtained, and the color and the shape of the traffic light are directly identified according to the stored position of the traffic light frame relative to the camera, so that the identification result of the current state of the traffic light is obtained, and the detection of the traffic light frame is not needed. Moreover, the position of the traffic light frame relative to the camera is determined based on the superposed images; the traffic light state identification result and the position of the traffic light frame relative to the camera are stored, so that the computer system in the vehicle can be conveniently intervened or controlled according to the inspection result of the traffic light, for example, when the position of the traffic light frame relative to the camera is detected to be greater than a certain threshold value, the vehicle can be judged to be far away from the traffic light intersection and can continue to run, and when the position of the traffic light frame relative to the camera is smaller than the certain threshold value, the vehicle can be judged to be close to the traffic light intersection, and braking measures and the like are required to be taken in advance.

Optionally, in step S13, based on the object class that may appear in the preset scene, the identifying of the target object is performed on the superimposed image, where the identifying includes: step S133, based on the object class that may appear in the preset scene, semantically segmenting the superimposed image, and generating a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image.

The mask map is an image generated in the image recognition process and used for recognizing each target in the map, the boundary of the target can be labeled by a labeling frame for different targets in the map, and different types of targets can be labeled by different colors. For example, vehicles are labeled green and pedestrians are labeled red.

The above-mentioned semantic segmentation is performed on the superimposed image based on the object classes that may appear in the preset scene, and a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image is generated, as shown in fig. 4, fig. 4 is an example diagram of image recognition provided in the embodiment of the present application, and is implemented by presetting an Encode-Decode convolutional neural network model. As shown in fig. 4, after the long frame image and the short frame image are obtained, feature extraction may be selectively performed, and then feature image fusion is performed in a channel superposition manner or the like to obtain a fusion image, or the long frame image and the short frame image may be directly subjected to image fusion, and then semantic segmentation is performed by encoding and decoding the fusion image to generate a mask image. The current fusion image can be subjected to feature extraction through an encoding process, and a segmentation image consistent with the resolution of the input image can be restored layer by layer according to the feature image through a decoding process. Based on the preset object category, semantic segmentation of the current fusion image can be realized through an encoding process, and a mask image corresponding to the current fusion image is generated according to a plurality of target objects and a plurality of target categories and can be realized through a decoding process. The convolutional neural network model of the Encode-Decode can be realized by adopting a convolutional neural network, and the training process of the convolutional neural network model of the Encode-Decode can be finished by adopting a supervision mode. For example, a large number of pictures including artificial marks are input into a convolutional neural network model to be trained, the large number of pictures are subjected to semantic segmentation and mask image output through the convolutional neural network model to be trained, then the output result is compared with the artificial marks, the error of the convolutional neural network model to be trained is calculated, the weight of the convolutional neural network model to be trained is revised according to the obtained error, training is carried out again until the calculated error is smaller than a preset error threshold, and the trained convolutional neural network model is obtained.

In some embodiments, in order to further improve the accuracy of traffic sign recognition, after semantic segmentation is performed on the superimposed image based on object classes that may appear in a preset scene, and a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image is generated, the method further includes performing traffic sign recognition based on the mask map, referring to fig. 5, where fig. 5 is a third flowchart of an image recognition method provided in an embodiment of the present application, and includes:

Step S133, based on the object class that may appear in the preset scene, semantically segmenting the superimposed image, and generating a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image.

The specific implementation manner of this step may be the same as that in the above embodiment, and specifically, the implementation manner may be the same as that in the above embodiment, and details are not described here.

Step S14 is to determine whether or not the plurality of target objects in the superimposed image include traffic signs based on the mask map.

The traffic sign image in the present application may be an image of a road sign, such as a left turn prohibition, a whistling prohibition, a speed limit of 80km/h, and the like, or may be information played on a display screen for indicating a traffic condition, such as a detour request for a preceding accident. This is not limited in this application.

Whether the multiple target objects in the superimposed image contain the traffic signs or not is judged based on the mask map, and when the mask map for marking the multiple target objects in the superimposed image and the target categories of the multiple target objects is generated, the target categories include the categories corresponding to the traffic signs, and whether the multiple target objects in the current fusion characteristic image contain the traffic signs or not is judged according to the categories corresponding to the traffic signs identified in the process of generating the mask map. For example, when a mask map corresponding to the current fusion feature image is generated according to the multiple target objects and the multiple target categories, each category of target is marked with one color, for example, the traffic sign is red, and when it is determined whether the multiple target objects in the current fusion feature image include the traffic sign, it is only necessary to check the traffic sign or determine whether the mask map includes the target marked with red.

In step S15, if the superimposed image includes a traffic sign, the image at the corresponding position in the superimposed image is recognized based on the position of the traffic sign in the mask map.

The image recognition method includes the steps of generating a mask map corresponding to a current fusion image according to a plurality of target objects and a plurality of target categories when the mask map is generated, and marking each target object as the same color according to the target category and the position of each target object to obtain the mask map. Then, based on the position of the traffic sign in the mask image, the mask image is compared with the superposed image to determine the position of the traffic sign in the superposed image, and the image of the corresponding position in the superposed image is identified. For example, the images at the corresponding positions in the superimposed images are subjected to image recognition through a pre-trained traffic sign recognition model. The training process of the traffic sign recognition model trained in advance is similar to that of the feature extraction model, and the training process of the feature extraction model can be referred to. The traffic sign recognition result can comprise the conditions of forbidding left turn, forbidding whistling, limiting the speed by 80km/h and the like.

By applying the method provided by the embodiment of the application, whether the multiple target objects in the superposed image contain the traffic signs or not is judged based on the mask image, and if the superposed image contains the traffic signs, the image recognition is carried out on the image at the corresponding position in the superposed image based on the position of the traffic signs in the mask image. The traffic signs in the acquired images can be identified to obtain an identification result, so that the computer system in the vehicle can control or intervene the vehicle according to the identification result.

In some embodiments, in order to further improve the quality of the acquired image, after performing semantic segmentation on the superimposed image based on object classes that may appear in a preset scene and generating a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image, the method further includes performing wide dynamic fusion on at least two sets of single-exposure image information, see fig. 6, where fig. 6 is a fourth flowchart of the image recognition method provided in the embodiment of the present application, and the fourth flowchart includes:

Step S16, calculating the weight of each pixel point in combination with the pixel type information according to the target type of the plurality of target objects in the mask map and the preset pixel value of the mask map of each preset target type.

The weight of each pixel point combined with the pixel type information is calculated according to the target types of a plurality of target objects in the mask image and the preset pixel values of the mask image of each preset target type, different preset pixel values can be set for each target type, and the weight of each pixel point combined with the pixel type information is calculated according to the preset pixel values of each target type.

And step S17, based on the weight of each pixel point combined with the pixel type information, performing wide dynamic fusion on at least two groups of single-exposure image information.

I_WDR＝∑_iI_i(x,y)*W_ci(x,y)/∑_iW_ci(x,y)，

calculating to obtain each image in the wide dynamic fusion imagePixel value of a pixel point, wherein I_i(x, y) represents the pixel value at the coordinate (x, y) in the ith image in at least two sets of single-exposure image information, I_WDRThe pixel value of the pixel point with the coordinate (x, y) in the wide dynamic fusion image is shown.

By applying the image identification method provided by the embodiment of the application, the weight of each pixel point combined with the pixel type information is calculated according to the target types of a plurality of target objects in the mask image and the preset pixel value of each preset target type of the mask image, and the wide dynamic fusion of at least two groups of single-exposure image information is carried out based on the weight of each pixel point combined with the pixel type information. Therefore, each pixel point can be combined with the weight of the pixel type information to perform wide dynamic fusion on the long frame image and the short frame image to obtain a wide dynamic fusion image, and the quality of the obtained image is improved.

Referring to fig. 7, fig. 7 is a diagram illustrating an example of image recognition performed by a computer system in a vehicle according to an embodiment of the present application, including:

when the image recognition method is applied to the computer system in the unmanned vehicle, the camera on the vehicle connected with the computer system in the unmanned vehicle can be used for collecting the image of the environment around the vehicle.

The camera on the unmanned vehicle can collect a plurality of long frame images and short frame images aiming at the same scene, feature extraction can be selectively carried out on the collected long frame images and short frame images, then fusion of the feature images is carried out in the modes of channel superposition and the like to obtain fused images, the fused images are obtained, and the long frame images and the short frame images can also be directly subjected to image fusion to obtain the fused images.

Through the fused images, a computer system in the unmanned vehicle can identify traffic lights. The traffic light frame can be checked through a pre-trained network model, for example, the traffic light frame can be checked through a pre-trained YOLO detection network model. And judging whether a traffic light frame exists or not, and directly acquiring the traffic light frame image of the fused image when the traffic light frame exists. And then, identifying the color and the shape of the traffic light in the traffic light frame image through a pre-trained traffic light identification model to obtain an identification result. The recognition result may include: red, green, yellow. The unmanned vehicle can determine whether to brake or continue traveling according to the recognition result.

Based on the fused image, the computer system in the unmanned vehicle can also perform identification of traffic signs. The fused image is semantically segmented by an encoding and decoding model to generate a mask map for identifying a plurality of target objects and target classes of the plurality of target objects in the current fused image, for example, the mask map for identifying the plurality of target objects and the target classes of the plurality of target objects in the current fused image is semantically segmented by a pre-trained convolutional neural network model of Encode-Decode. When performing semantic segmentation, performing semantic segmentation on the current fusion image based on a preset object class, where the preset object class may include a traffic sign. And then acquiring an image corresponding to the traffic sign in the fused image according to the position of the traffic sign, and recognizing the image of the traffic sign through a pre-trained traffic sign recognition model to obtain a recognition result. Wherein, the identification result may include: forbidding left turn, forbidding whistling, limiting speed by 80km/h and the like. The computer system in the unmanned vehicle can control the running state of the vehicle according to the recognition result of the traffic sign, for example, when the recognition result is the speed limit of 80km/h, the speed limit can be compared with the current vehicle speed, and when the vehicle speed exceeds 80km/h, a deceleration measure is taken.

Based on the fused image and the mask image, the computer system in the unmanned vehicle can also perform wide dynamic fusion to obtain a wide dynamic fusion image. The method comprises the steps of calculating the weight of each pixel point combined with pixel type information according to target types of a plurality of target objects in a mask image and preset pixel values of the preset mask image of each target type, carrying out wide dynamic fusion on a long frame image and a short frame image based on the weight of each pixel point combined with the pixel type information to obtain a wide dynamic fusion image, and obtaining the details of the dark part of the image through the wide dynamic fusion image, wherein the bright part of the image is not too saturated. And the computer system in the unmanned vehicle can record the driving state of the vehicle according to the wide dynamic fusion image or display the driving state through a streaming media rearview mirror and the like for the personnel in the vehicle to watch.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an image recognition apparatus according to an embodiment of the present application, including:

an image obtaining module 801, configured to obtain at least two sets of single-exposure image information within one frame time, where each set of single-exposure image information is obtained in a single-exposure process;

an image overlaying module 802, configured to overlay at least two sets of single-exposure image information;

and the target identification module 803 is configured to identify a target object for the superimposed image based on an object type that may appear in a preset scene.

Optionally, the image overlaying module 802 includes:

Optionally, the target identifying module 803 includes:

Optionally, the apparatus further comprises:

I_WDR＝∑_iI_i(x,y)*W_ci(x,y)/∑_iW_ci(x,y)，

calculating to obtain the pixel value of each pixel point in the wide dynamic fusion image; wherein, I_i(x, y) represents the pixel value at the coordinate (x, y) in the ith image in at least two sets of single-exposure image information, I_WDRThe pixel value of the pixel point with the coordinate (x, y) in the wide dynamic fusion image is shown.

Referring to fig. 9, an embodiment of the present invention further provides an electronic device, including a processor, 901 and a memory 902;

a memory 902 for storing a computer program;

the processor 901 is configured to implement the following steps when executing the program stored in the memory:

superposing at least two groups of single-exposure image information;

Alternatively, when the processor 901 executes a program stored in the memory 902, any of the image recognition methods described above may be implemented.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program realizes the steps of any one of the above image recognition methods when executed by a processor.

In a further embodiment, the present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the image recognition methods of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. An image recognition method, comprising:

superposing the at least two groups of single-exposure image information;

2. The method of claim 1, wherein said superimposing said at least two sets of single-exposure image information comprises:

and superposing the data of the same image channel in the data of each image channel.

3. The method according to claim 1, wherein the identifying the target object for the superimposed image based on the object class that can appear in the preset scene comprises:

detecting whether the superimposed image contains a traffic light frame image or not;

4. The method according to claim 1, wherein the identifying the target object for the superimposed image based on the object class that can appear in the preset scene comprises:

and performing semantic segmentation on the superposed image based on object classes which can appear in a preset scene to generate a mask image for identifying a plurality of target objects in the superposed image and the target classes of the target objects.

5. The method of claim 4, wherein after semantically segmenting the superimposed image based on object classes that may occur in a preset scene, generating a mask map that identifies a plurality of target objects and target classes of the plurality of target objects in the superimposed image, the method further comprises:

and if the superposed image contains the traffic sign, carrying out image recognition on the image at the corresponding position in the superposed image based on the position of the traffic sign in the mask image.

6. The method of claim 4, further comprising:

and performing wide dynamic fusion on the at least two groups of single-exposure image information based on the weight of each pixel point combined with the pixel type information.

7. The method according to claim 6, wherein the calculating the weight of each pixel point in combination with the pixel type information according to the target class of the plurality of target objects in the mask image and the preset pixel value of the preset mask image of each target class comprises:

calculating the weight of each pixel point combined with the pixel type information, wherein the W_ci(x, y) is the weight of the pixel point at the target coordinate (x, y) in combination with the pixel type information, I_ci(x, y) is the pixel value at the c-th type target coordinate (x, y) in the i-th image, and the μ_c(x, y) is the preset pixel value of the c-th class of targets in the mask map, σ_cIs the variance of the preset pixel values for each target class in the mask map.

8. The method according to claim 7, wherein said performing wide dynamic fusion on said at least two sets of single-exposure image information based on weights of said respective pixel points in combination with pixel type information comprises:

I_WDR＝∑_iI_i(x，y)*W_ci(x，y)/∑_iW_ci(x，y)，

calculating to obtain the pixel value of each pixel point in the wide dynamic fusion image, wherein the I_i(x, y) represents the pixel value at the coordinate (x, y) in the ith image in at least two sets of single-exposure image information, wherein I is_WDRThe pixel value of the pixel point with the coordinate (x, y) in the wide dynamic fusion image is shown.

9. An image recognition apparatus, comprising:

the image superposition module is used for superposing the at least two groups of single-exposure image information;

10. The apparatus of claim 9, wherein the image overlay module comprises:

11. The apparatus of claim 9, wherein the object recognition module comprises:

12. The apparatus of claim 9, wherein the object recognition module comprises:

and the semantic segmentation submodule is used for performing semantic segmentation on the superposed image based on object categories which can appear in a preset scene to generate a mask image for identifying a plurality of target objects in the superposed image and the target categories of the plurality of target objects.

13. The apparatus of claim 12, further comprising:

14. The apparatus of claim 13, further comprising:

and the wide dynamic fusion module is used for performing wide dynamic fusion on the at least two groups of single-exposure image information based on the weight of each pixel point combined with the pixel type information.

15. The apparatus of claim 14,

the weight calculation module is specifically configured to, according to the target categories of the plurality of target objects in the mask image and preset pixel values of the mask image of each preset target category, through a preset formula:

16. The apparatus of claim 15,

the wide dynamic fusion module is specifically configured to combine the weight of the pixel type information with the weight of each pixel point, and according to a formula:

I_WDR＝∑_iI_i(x，y)*W_ci(x，y)/∑_iW_ci(x，y)，

17. An electronic device comprising a processor, a memory;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 8 when executing a program stored in the memory.