CN111079563A

CN111079563A - Traffic signal lamp identification method and device, electronic equipment and storage medium

Info

Publication number: CN111079563A
Application number: CN201911180310.5A
Authority: CN
Inventors: 吕廷迅; 史信楚
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2019-11-27
Filing date: 2019-11-27
Publication date: 2020-04-28

Abstract

The application discloses a traffic signal lamp identification method and device, electronic equipment and a storage medium. The method is applied to a traffic signal lamp identification model and comprises the following steps: analyzing the target image based on the multi-stage network to obtain a plurality of stage characteristics; fusing the image features to obtain fused features; identifying an outer frame comprising a group of traffic signal lamps by utilizing a surrounding frame regression method based on the fusion characteristics; identifying an inner frame containing a traffic signal lamp by using a mask representation method based on the fusion characteristics and the identified outer frame; and determining the attribute information of the traffic signal lamp according to the identified outer frame of the traffic signal lamp and the identified inner frame of the lamp. According to the scheme, a group of traffic signal lamps can be identified, and even if a plurality of bulbs in the group of traffic signal lamps are simultaneously lightened, the information such as the positions and the attributes of the bulbs can be accurately identified through the mask technology.

Description

Traffic signal lamp identification method and device, electronic equipment and storage medium

Technical Field

The application relates to the field of automatic driving, in particular to a traffic signal lamp identification method and device, electronic equipment and a storage medium.

Background

Automatic driving is under vigorous development, in which whether an automobile is going straight or stopped, turning left or turning right is determined by acquiring the current state of a traffic light intersection, and therefore, accurate detection of a traffic light is crucial to automatic driving.

The main defect of the existing scheme in the traffic light detection is that the traffic light bulb is very small and is not easy to detect. Some detection schemes choose not to detect each small traffic light bulb, and only detect the outer frame of a group of traffic light bulbs, but the schemes cannot handle the situation that a plurality of small light bulbs are simultaneously on. However, in the existing scheme for simultaneously detecting the outer frame containing a group of traffic lights and the inner frame aiming at each bulb, whether a traffic light exists at a certain position in an image is judged by utilizing the outer large frame, and then the state of each bulb of the group of traffic lights is determined by utilizing the small inner frame.

Content of application

In view of the above, the present application is proposed to provide a traffic signal identification method, apparatus, electronic device and storage medium that overcome or at least partially solve the above-mentioned problems.

According to an aspect of the present application, there is provided a traffic signal recognition method, applied to a traffic signal recognition model, including:

analyzing the target image based on the multi-stage network to obtain a plurality of stage characteristics;

fusing the plurality of stage features to obtain fused features;

identifying an outer frame comprising a group of traffic signal lamps by utilizing a surrounding frame regression method based on the fusion characteristics;

identifying an inner frame containing a single traffic signal lamp by using a mask representation method based on the fusion characteristics and the identified outer frame;

and determining the attribute information of the traffic signal lamp according to the identified outer frame of the traffic signal lamp and the identified inner frame of the traffic signal lamp.

Optionally, the method further includes: making a data set of the traffic signal lamp for training the traffic signal lamp recognition model, and specifically comprising:

acquiring and obtaining a picture of an actual intersection;

performing attribute labeling on each picture, wherein the attribute labeling comprises at least one of the following items: the position attribute of the outer frame, the position attribute of the inner frame, the color attribute, the shape attribute and the direction attribute;

carrying out electronic labeling on the pictures with the attribute labels and storing a labeling file;

and analyzing the label file, extracting the data of the outer frame and the inner frame, and generating the data set.

Optionally, the identifying, based on the fusion feature and by using a bounding box regression method, an outer frame including a group of traffic signal lamps includes:

extracting a candidate region according to the fusion characteristics by using an RPN (resilient packet network), and identifying the outer frame from the candidate region according to the surrounding frame regression method;

the identifying the inner frame of the traffic signal lamp based on the fusion feature, the identified outer frame and the mask representation method comprises:

and acquiring a mask from the fusion characteristics based on the identified outer frame, acquiring the attribute of the inner frame through the label transformation of the mask, and acquiring the inner frame of the traffic signal lamp through filtering.

Optionally, the obtaining of the inner frame attribute through the label transformation of the mask includes:

and coding the mask to obtain a pixel point vector, decoding the pixel point vector, and obtaining the attribute of the inner frame.

Optionally, the mask pixel point vector includes at least one of the following: the color attribute, the ratio of the width of the inner frame to the width of the outer frame, the ratio of the height of the inner frame to the height of the outer frame, the shape attribute, the direction attribute, and the ratio of the value of the center point x or y of the inner frame to the width of the outer frame.

Optionally, the method further includes:

and carrying out visual operation on the outer frame and the inner frame.

Optionally, the method further includes:

selecting ResNet as a backbone network of the multi-stage network, and initializing the model by adopting pre-trained model weight or kaiming initialization to realize the configuration of the traffic signal lamp identification model.

According to another aspect of the present application, there is provided a traffic signal recognition apparatus, which is applied to a traffic signal recognition model, including:

the analysis unit is suitable for analyzing the target image based on the multi-stage network to obtain a plurality of stage characteristics;

the fusion unit is suitable for fusing the plurality of stage characteristics to obtain fusion characteristics;

the regression unit is suitable for identifying an outer frame containing a group of traffic signal lamps by utilizing a surrounding frame regression method based on the fusion characteristics;

the mask unit is suitable for identifying an inner frame containing a single traffic signal lamp by using a mask representation method based on the fusion characteristics and the identified outer frame;

and the identification unit is suitable for determining the attribute information of the traffic signal lamp according to the identified outer frame of the traffic signal lamp and the identified inner frame of the traffic signal lamp.

In accordance with yet another aspect of the present application, there is provided an electronic device including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method as any one of the above.

According to a further aspect of the application, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement a method as in any above.

Therefore, the technical scheme for identifying the traffic signal lamp disclosed by the application is applied to a traffic signal lamp identification model and comprises the following steps: analyzing the target image based on the multi-stage network to obtain a plurality of image characteristics; fusing the image features to obtain fused features; identifying an outer frame of the traffic signal lamp according to the fusion characteristics and a surrounding frame regression method; identifying an inner frame of the traffic signal lamp based on the fusion characteristics, the identified surrounding frame and the mask representation method; and determining attribute information of each traffic signal lamp according to the identified outer frame of the traffic signal lamp and the identified inner frame of the traffic signal lamp. According to the scheme, not only can a group of traffic signal lamps be identified, but also the positions and the attributes of the bulbs can be accurately identified through the mask technology even if the bulbs in the group of traffic signal lamps are simultaneously lightened.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 illustrates a flow diagram of a traffic signal identification method according to one embodiment of the present application;

FIG. 2 illustrates a schematic structural diagram of a traffic signal light identification device according to one embodiment of the present application;

FIG. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 4 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application;

FIG. 5 illustrates a schematic structural view of a traffic light outer frame and an inner frame according to an embodiment of the present application;

FIG. 6 shows a flow diagram of data set production according to one embodiment of the present application;

FIG. 7 illustrates a structural schematic of a recognition model network architecture according to one embodiment of the present application;

FIG. 8 illustrates a flow diagram of a mask label transformation according to one embodiment of the present application;

FIG. 9 shows a schematic diagram of an output visual recognition result according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 illustrates a traffic signal light recognition method according to an embodiment of the present application, which is applied to a traffic signal light recognition model, and specifically includes the following steps:

step S110, multi-stage analysis is carried out on the target image based on the multi-stage network, and a plurality of stage characteristics are obtained.

The current deep learning generally refers to deep neural network learning, a neural network model with a specific recognition target is formed by selecting different networks or network models and performing targeted training, a mask branch in a MaskRCNN model is referred to in the application, a traffic signal lamp recognition model MaskTLNet is established, interested areas and attributes are extracted through learning, and the specific target is recognized from an image.

In this step, an image to be recognized or an image under training is input into a traffic signal recognition model, and features associated with a recognition or detection target are extracted by analyzing through a multistage network in the model.

And step S120, fusing the multiple stage characteristics to obtain fused characteristics.

For example, the FPN module can be used to extract and fuse edge features or semantic features in networks at different stages, so that features at different levels are used as input of the FPN module to output enhanced multi-level fusion features.

Step S130, based on the fusion characteristics, an outer frame containing a group of traffic signal lamps is identified by a surrounding frame regression method.

According to the fusion characteristics, candidate areas where traffic lights possibly exist are found through an RPN, and then outer frames where the traffic lights exist are screened out from the candidate areas through a Regression algorithm such as a Bounding Box Regression method.

The Bounding Box Regression method (Bounding Box Regression) finds a mapping by Regression so that a frame obtained by an original model is closer to a real frame, and belongs to the prior art, and is not described herein again.

Referring to fig. 5, the large frame in the image is an outer frame containing a set of traffic lights during labeling and recognition.

Step S140, based on the fusion features and the identified outer frame, an inner frame including a single traffic signal lamp is identified by using a mask representation method.

Masks are masks that mask (wholly or partially) the image to be processed with a selected image, graphic or object to control the area or process of image processing. The specific image or object to be covered is called a mask or template, and in digital image processing, the mask is a two-dimensional matrix array, and a multivalued image is sometimes used.

In the present embodiment, a mask is further extracted from the fusion features by using a surrounding frame or an outer frame, and an inner frame containing a bulb is obtained by performing a transformation process on the mask, which is exemplified by a small frame in fig. 5.

And S150, determining the attribute information of the traffic signal lamp according to the identified outer frame of the traffic signal lamp and the inner frame of the traffic signal lamp.

According to the steps, specific attribute information of a group of traffic lights and each traffic light in the image can be obtained, wherein the specific attribute information comprises information of the position, the shape, the color and the like of the group of traffic lights and each information bulb.

According to the scheme, the positions of the large frames of a group of traffic signal lamps are obtained by adopting a surrounding frame regression method in the field of target detection, each bulb inner frame in the outer frame is processed by adopting a mask representation method, and the outer frame and the inner frame are respectively detected by different modules, so that the model is easy to train and the problem that each bulb is too small and difficult to detect can be solved.

In one embodiment, the method further comprises: making a data set of the traffic signal lamp for training the traffic signal lamp recognition model, and specifically comprising: acquiring and obtaining a picture of an actual intersection; performing attribute labeling on each picture, wherein the attribute labeling comprises at least one of the following items: the position attribute of the outer frame, the position attribute of the inner frame, the color attribute, the shape attribute and the direction attribute; carrying out electronic labeling on the pictures with the attribute labels and storing a labeling file; and analyzing the label file, extracting the data of the outer frame and the inner frame, and generating the data set.

Referring to the schematic data set production flow diagram of fig. 6, in a specific implementation, a data collection vehicle is used to record driving videos at an actual intersection, and frame processing is performed on the collected videos, such as high definition images with a resolution of 1920 × 1080 and about 4.6 ten thousand pictures can be generated.

The manual labeling is carried out on each picture, the labeling of the outer frame GroupBox adopts a representation mode of the upper left corner and the lower right corner during labeling, the labeling of the inner frame InnerBox also adopts a vector Attr vector to represent, the vector comprises position attributes, colors, shapes and direction attributes, the color attributes are defined as red, green and yellow and are undefined, the shape attributes are defined as rider-shaped cycles, pedestrian-shaped pedestrians, bicycle-shaped cycles, timer-shaped timers, circle, arrow-shaped arrows, built-in timer-shaped timers, undefined circles and linear lines which are 9 traffic light shapes in total, and the direction attributes are defined as straight lines, left turns and right turns, and the position attributes are the relative width and height of each InnerBox relative to GroupBox.

And marking by adopting a webpage version marking platform, and storing each marked picture by using json. And finally, analyzing the label file, analyzing json data, extracting GroupBox and InnerBox data, preprocessing the data to generate a group Truth for model training.

In specific training, training sets and verification sets are divided, wherein the training sets count 41668, and the verification sets count 5000.

In one embodiment, step S130 includes: and extracting a candidate region according to the fusion characteristics by using an RPN network, and identifying the outer frame from the candidate region according to the enclosure frame regression method.

The step S140 includes: and acquiring a mask from the fusion characteristics based on the identified outer frame, acquiring the attribute of the inner frame through the label transformation of the mask, and acquiring the inner frame of the traffic signal lamp through filtering.

Referring to fig. 7, in the network architecture for identifying a model in this embodiment, a backbone network performs feature extraction on an image by using a multi-stage network of ResNet50, a fused feature is processed by using an FPN algorithm, and a network header is divided into three branches, i.e., RPNHead, groupthead, and masktlhhead, where RPNHead is used to extract a region of interest, for example, about 1000 candidate boxes may be extracted for further screening of FPN features; GroupTLHead is used to predict GroupBox; and the MaskTLHead further processes the FPN characteristics by utilizing the outer frame obtained by the GroupBox to obtain a mask, extracts the attribute information of the inner frame from the mask through label conversion, filters most useless frames by setting score thresholds of color attributes, shape attributes and direction attributes, and simultaneously filters and detects redundant inner frames by inhibiting nms through a non-maximum value.

The non-maximum suppression nms is to filter out non-maximum frames covering the target, select one of the frames with the highest confidence as the final result, eliminate redundant frames, find the best target detection position, and control the number of the interested regions.

Optionally, in the data preprocessing and model training stages, two ways may be adopted for labeling and identifying the inner frame, one way is to label and identify the inner frame for each traffic signal lamp in a group of large frames of traffic signal lamps, and then obtain attribute information of the bulb in each inner frame in real time by using a mask representation method.

And the other method is to label and identify the inner frame of the bright traffic signal lamp, so that the identification efficiency is further improved, and the occupation of computing resources is reduced.

In one embodiment, the step S140 further includes: and coding the mask to obtain a pixel point vector, decoding the pixel point vector, and obtaining the attribute of the inner frame.

Fig. 8 is a schematic flow chart of mask label transformation, where the mask in this embodiment is a three-dimensional tensor, and the shape is (mask _ c, mask _ h, mask _ w), and mask _ w and mask _ h indicate the width and height of the mask, and the default is 28, and mask _ c is 20, and indicates the attribute of each pixel. The AttrVector of each pixel point can be obtained by encoding the mask, and the attributes of the InnerBox, such as width, height, position, color, shape, direction and the like, can be obtained by decoding the attribute vector AttrVector.

In one embodiment, the mask pixel point vector includes at least one of: the color attribute, the ratio of the width of the inner frame to the width of the outer frame, the ratio of the height of the inner frame to the height of the outer frame, the shape attribute, the direction attribute, and the ratio of the value of the center point x or y of the inner frame to the width of the outer frame.

Specifically, the mask pixel point vector may be composed of 20 values, 0 to 3 represent a color attribute, 4 represents a ratio of a width of the InnerBox to a width of the group pbox, 5 represents a high ratio of the InnerBox to the group pbox, 6 to 14 represent a shape attribute, 15 to 17 represent a direction attribute, 18 represents a ratio of a value of x from a center point of the InnerBox to a width of the group pbox, and 19 represents a ratio of a value of a center point y of the InnerBox to the width of the group pbox.

The shape attribute and the direction attribute in the vector value of the mask pixel point disclosed in the above embodiment correspond to various situations in a real scene, and specifically include: during vertical installation, the traffic light color from top to bottom should be: red, yellow, green; when transversely installing, traffic lights should be from left to right: red, yellow and green.

The model of the embodiment is also suitable for the shapes of other traffic lights, for example, traffic lights composed of arrows indicating directions are respectively arranged above each lane, and the colors of the traffic lights change along with time to respectively indicate the passing or stopping of the lane.

According to the mask acquisition and representation mode in the embodiment, the traffic signal lamp is identified, the outer frame and the inner frame are simultaneously detected by different modules, so that the model is easy to train, and the problem that a small frame is too small and difficult to detect can be solved.

In a specific embodiment, the method further comprises: and carrying out visual operation on the outer frame and the inner frame.

For convenience of identification and display, in this embodiment, the outer frame GroupBox and the inner frame InnerBox may be simultaneously sent to the visualization module, where the visualization of the inner frame InnerBox requires specific positions of the frames and attribute information of color, shape, and direction, and a clear image output by the visualization module is shown in fig. 9.

In a specific embodiment, the method further comprises: selecting ResNet as a backbone network of the multi-stage network, and initializing the model by adopting pre-trained model weight or kaiming initialization to realize the configuration of the traffic signal lamp identification model.

When the recognition model is constructed, ResNet50 or ResNet101 and the like can be selected as a main network of the network, wherein C1-C5 represent the network from the first stage to the fifth stage, the network weight values are initialized by using the model weight values pre-trained on ImageNet, other layers are initialized by using kaiming, and then the whole network is trained. In a specific implementation, preferably, end-to-end training is adopted for MaskTLNet, the initial learning rate can be set to 0.0025, wherein the first 500 iterations adopt a war-up strategy, in the selection of the loss function loss, the classification loss adopts cross entropy loss, the coordinate regression loss adopts smooth-l1, and the optimizer adopts random gradient descent SGD, wherein the batch size is 16.

FIG. 2 illustrates a schematic structural diagram of a traffic signal light identification device 200 according to one embodiment of the present application; the device is applied to a traffic signal lamp recognition model, and specifically comprises:

the analyzing unit 210 is adapted to analyze the target image based on the multi-stage network to obtain a plurality of stage features.

The current deep learning generally refers to deep neural network learning, and a neural network model with a specific recognition target is formed by selecting different networks or network models and performing targeted training, such as an RCNN model and a model obtained by improving the RCNN model on the basis of the model, so that an interested region and attributes can be extracted through learning, and the specific target can be recognized from an image.

In this unit, an image to be recognized or an image under training is input to a traffic signal recognition model, and features associated with a recognition or detection target are extracted by analysis through a multistage network in the model.

The fusion unit 220 is adapted to fuse the plurality of stage features to obtain a fusion feature.

The regression unit 230 is adapted to identify an outer frame including a group of traffic signal lamps by using a bounding box regression method based on the fusion features.

The bounding box regression method finds a mapping by regression so that a frame obtained by an original model is closer to a real frame, belongs to the prior art, and is not described herein again.

And a masking unit 240 adapted to identify an inner frame containing a single traffic signal lamp by using a masking representation method based on the fusion feature and the identified outer frame.

The identification unit 250 is adapted to determine attribute information of the traffic signal lamp according to the identified outer frame of the traffic signal lamp and the inner frame of the traffic signal lamp.

According to the unit or the module, specific attribute information of a group of traffic lights and each traffic light in the image can be obtained, wherein the specific attribute information comprises information of the position, the shape, the color and the like of the group of traffic lights and each information bulb.

In a specific embodiment, the apparatus further comprises: the data set making unit is suitable for making a data set of the traffic signal lamp and used for training the multi-stage network, and specifically comprises the following steps: acquiring and obtaining a picture of an actual intersection; performing attribute labeling on each picture, wherein the attribute labeling comprises at least one of the following items: the position attribute of the outer frame, the position attribute of the inner frame, the color attribute, the shape attribute and the direction attribute; carrying out electronic labeling on the pictures with the attribute labels and storing a labeling file; and analyzing the label file, extracting the data of the outer frame and the inner frame, and generating a data set for model training.

In a particular embodiment, the regression unit 230 is adapted to: and extracting a candidate region according to the fusion characteristics by using an RPN network, and identifying the outer frame from the candidate region according to the enclosure frame regression method.

The mask unit 240 is adapted to: and acquiring a mask from the fusion characteristics based on the identified outer frame, acquiring the attribute of the inner frame through the label transformation of the mask, and acquiring the inner frame of the traffic signal lamp through filtering.

In one embodiment, the mask unit 240 is further adapted to: and coding the mask to obtain a pixel point vector, decoding the pixel point vector, and obtaining the attribute of the inner frame.

In a specific embodiment, the mask pixel point vector includes at least one of: the color attribute, the ratio of the width of the inner frame to the width of the outer frame, the ratio of the height of the inner frame to the height of the outer frame, the shape attribute, the direction attribute, and the ratio of the value of the center point x or y of the inner frame to the width of the outer frame.

In a specific embodiment, the method further comprises a visualization unit adapted to perform visualization operations on the outer frame and the inner frame.

In a specific embodiment, the method further comprises a configuration unit, which is adapted to select ResNet as a backbone network of the multi-stage network, initialize the model by adopting a pre-trained model weight or kaiming initialization, and realize configuration of the traffic signal lamp identification model.

To sum up, the technical scheme disclosed by the technical scheme of the application is applied to a traffic signal lamp identification model, and comprises the following steps: analyzing the target image based on the multi-stage network to obtain a plurality of image characteristics; fusing the image features to obtain fused features; identifying an outer frame of the traffic signal lamp according to the fusion characteristics and a surrounding frame regression method; identifying an inner frame of the traffic signal lamp based on the fusion characteristics, the identified surrounding frame and the mask representation method; and determining the traffic signal lamp according to the identified outer frame of the traffic signal lamp and the identified inner frame of the traffic signal lamp. According to the scheme, not only can a group of traffic signal lamps be identified, but also the positions and the attributes of the bulbs can be accurately identified through the mask technology even if the bulbs in the group of traffic signal lamps are simultaneously lightened.

It should be noted that:

the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various application aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, application is directed to less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a traffic signal identification device according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 300 comprises a processor 310 and a memory 320 arranged to store computer executable instructions (computer readable program code). The memory 320 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 320 has a storage space 330 storing computer readable program code 331 for performing any of the method steps described above. For example, the storage space 330 for storing the computer readable program code may comprise respective computer readable program codes 331 for respectively implementing various steps in the above method. The computer readable program code 331 may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 4. FIG. 4 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 400 has stored thereon a computer readable program code 331 for performing the steps of the method according to the application, readable by a processor 310 of an electronic device 300, which computer readable program code 331, when executed by the electronic device 300, causes the electronic device 300 to perform the steps of the method described above, in particular the computer readable program code 331 stored on the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 331 may be compressed in a suitable form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A traffic signal lamp identification method is applied to a traffic signal lamp identification model and comprises the following steps:

performing multi-stage analysis on the target image based on a multi-stage network to obtain a plurality of stage characteristics;

fusing the plurality of stage features to obtain fused features;

attribute information of each traffic signal lamp is determined according to the identified outer frame containing a group of traffic signal lamps and the identified inner frame containing a single traffic signal lamp.

2. The method of claim 1, wherein the method further comprises: making a data set of the traffic signal lamp for training the traffic signal lamp recognition model, and specifically comprising:

acquiring and obtaining a picture of an actual intersection;

3. The method of claim 1,

the identifying the outer frame containing the group of traffic signal lamps by utilizing a bounding box regression method based on the fusion characteristics comprises the following steps:

4. The method of claim 3, wherein the obtaining in-frame properties through label transformation of the mask comprises:

5. The method of claim 4, wherein the mask pixel point vector comprises at least one of: the color attribute, the ratio of the width of the inner frame to the width of the outer frame, the ratio of the height of the inner frame to the height of the outer frame, the shape attribute, the direction attribute, and the ratio of the value of the center point x or y of the inner frame to the width of the outer frame.

6. The method of any one of claims 1-5, further comprising:

and carrying out visual operation on the outer frame and the inner frame.

7. The method of any one of claims 1-5, further comprising:

selecting ResNet as a main network of the multi-stage network, and initializing the traffic signal lamp identification model by adopting pre-trained model weight or kaiming initialization to realize the configuration of the traffic signal lamp identification model.

8. A traffic signal recognition device, which is applied to a traffic signal recognition model, comprises:

the regression unit is suitable for identifying the outer frame of the traffic signal lamp according to the fusion characteristics and the enclosure frame regression method;

the mask unit is suitable for identifying the inner frame of the traffic signal lamp based on the fusion characteristics, the identified outer frame and the mask representation method;

and the identification unit is suitable for determining the traffic signal lamp according to the identified outer frame of the traffic signal lamp and the identified inner frame of the traffic signal lamp.

9. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform the method of any one of claims 1-7.

10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-7.