CN113780148A

CN113780148A - Traffic sign image recognition model training method and traffic sign image recognition method

Info

Publication number: CN113780148A
Application number: CN202111040058.5A
Authority: CN
Inventors: 徐鑫; 张亮亮
Original assignee: Jingdong Kunpeng Jiangsu Technology Co Ltd
Current assignee: Jingdong Kunpeng Jiangsu Technology Co Ltd
Priority date: 2021-09-06
Filing date: 2021-09-06
Publication date: 2021-12-10

Abstract

The embodiment of the disclosure discloses a traffic sign image recognition model training method and a traffic sign image recognition method. One specific implementation of the traffic sign image recognition model training method comprises the following steps: inputting the traffic sign template image included in the preprocessed training sample into a template image recognition network included in a preset initial recognition model to generate template image features; updating a preset mark feature index table based on the template image features and the corresponding sample label information to generate an updated index table; inputting a mark image to be trained into an image recognition network to be trained so as to generate the characteristics of the image to be trained; generating an image recognition result based on the template image characteristics and the characteristics of the image to be trained; and generating a traffic sign image recognition model based on the updated index table and the to-be-trained image recognition network in response to the fact that the image recognition result meets the preset condition. The embodiment can improve the training efficiency of the traffic sign image recognition model.

Description

Traffic sign image recognition model training method and traffic sign image recognition method

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a traffic sign image recognition model training method, a device, electronic equipment and a computer readable medium.

Background

Traffic sign image recognition models have become a hot research direction in recent years in the fields of computer vision, image processing, pattern recognition, and the like. The traffic sign identification technology has very important research and application values. At present, when training a traffic sign image recognition model, the common mode is as follows: and taking the natural scene image as a training sample, and training the convolutional neural network to obtain a traffic sign image recognition model.

However, when the traffic sign image recognition model is trained in the above manner, the following technical problems often exist:

firstly, the difference between the natural scene image after being affected by the environment (e.g. wear, reflection, occlusion, etc.) and the standard traffic sign image is not considered, which results in reducing the training efficiency of the traffic sign image recognition model;

secondly, due to the complex background of the natural scene image, the semantic features of the traffic signs are weak, so that the inter-class feature variance among different classes of traffic signs is small, and the intra-class feature variance among the same class of traffic signs is large, so that the natural scene image is directly used as a training sample, and the recognition accuracy of the traffic sign image recognition model is low.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a traffic sign image recognition model training method and a traffic sign image recognition method to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide a method for training a traffic sign image recognition model, the method including: inputting a traffic sign template image included in a preprocessed training sample into a template image recognition network included in a preset initial recognition model to generate template image characteristics, wherein the training sample further comprises a to-be-trained sign image and sample label information, and the initial recognition model further comprises the to-be-trained image recognition network; updating a preset mark feature index table based on the template image features and the corresponding sample label information to generate an updated index table; inputting the mark image to be trained into the image recognition network to be trained so as to generate the image feature to be trained; generating an image recognition result based on the template image features and the to-be-trained image features, wherein the image recognition result is used for representing the difference between the traffic sign template image and the to-be-trained sign image; and generating a traffic sign image recognition model based on the updated index table and the to-be-trained image recognition network in response to the fact that the image recognition result meets the preset condition.

Optionally, the method further includes: and adjusting parameters in the initial recognition model in response to determining that the image recognition result does not meet the preset condition.

Optionally, the sample tag information includes a name of a traffic sign template image, size information of the traffic sign template image, category information of the traffic sign template image, and usage information of the traffic sign template image; and updating a preset mark feature index table based on the template image features and the corresponding sample label information to generate an updated index table, including: the template image feature, the traffic sign template image name, the traffic sign template image size information, the traffic sign template image category information, and the traffic sign template image use information are recorded and stored as a table in the sign feature index table to generate an update index table.

Optionally, the preprocessed training sample is generated by the following method: acquiring a traffic sign template image set and a natural scene image set; extracting traffic signs from the natural scene images in the natural scene image set to generate a traffic sign image set; matching each traffic sign template image in the traffic sign template image set with the corresponding traffic sign image in the traffic sign image set to obtain a traffic sign image sample pair set; and generating the preprocessed training sample based on the traffic sign image sample pair set.

Optionally, the initial recognition model further includes an output network; and generating an image recognition result based on the template image features and the image features to be trained, comprising: determining the difference between the template image features and the image features to be trained to generate difference features; inputting the difference characteristics into the output network to generate an image recognition result

Optionally, the template image recognition network in the initial recognition model includes at least one template image convolution layer and at least one template image full-link layer, the to-be-trained image recognition network in the initial recognition model includes at least one to-be-trained image convolution layer and at least one to-be-trained image full-link layer, and the at least one template image full-link layer and the at least one to-be-trained image full-link layer do not share a weight.

Optionally, the method further includes: in response to determining that the target sample does not match the updated index table, inputting a traffic sign template image included in the target sample to a template image recognition network included in the initial recognition model to generate a target feature, wherein the target sample further includes target sample label information; and adjusting the updated index table according to the target characteristics and the target sample label information, and updating the traffic sign image recognition model according to the adjusted updated index table.

In a second aspect, some embodiments of the present disclosure provide a traffic sign image recognition method, including: acquiring a traffic sign image; and inputting the traffic sign image into a traffic sign image recognition model to generate traffic sign recognition information, wherein the traffic sign image recognition model is generated by the traffic sign image recognition model training method.

In a third aspect, some embodiments of the present disclosure provide a training apparatus for a traffic sign image recognition model, the apparatus including: the system comprises a first input unit, a second input unit and a third input unit, wherein the first input unit is configured to input a traffic sign template image included in a preprocessed training sample into a template image recognition network included in a preset initial recognition model so as to generate template image features, the training sample further comprises a to-be-trained sign image and sample label information, and the initial recognition model further comprises the to-be-trained image recognition network; the updating unit is configured to update a preset mark feature index table based on the template image features and the corresponding sample label information so as to generate an updated index table; the second input unit is configured to input the marker image to be trained to the image recognition network to be trained so as to generate the image feature to be trained; a first generating unit, configured to generate an image recognition result based on the template image feature and the image feature to be trained, wherein the image recognition result is used for representing a difference between the traffic sign template image and the image to be trained; and the second generation unit is configured to generate a traffic sign image recognition model based on the updated index table and the to-be-trained image recognition network in response to the fact that the image recognition result meets the preset condition.

Optionally, the training device for a traffic sign image recognition model further includes a first adjusting subunit. Wherein the first adjusting subunit is configured to adjust a parameter in the initial recognition model in response to determining that the image recognition result does not satisfy the preset condition.

Optionally, the sample tag information includes a name of a traffic sign template image, size information of the traffic sign template image, category information of the traffic sign template image, and usage information of the traffic sign template image; and the update unit is further configured to store the template image feature, the traffic sign template image name, the traffic sign template image size information, the traffic sign template image category information, and the traffic sign template image use information as table records into the sign feature index table to generate an update index table.

Optionally, the initial recognition model further includes an output network; and the first generating unit is further configured to determine a difference between the template image feature and the image feature to be trained to generate a difference feature; and inputting the difference characteristics into the output network to generate an image recognition result.

Optionally, the training device for a traffic sign image recognition model further includes an input subunit and a second adjusting subunit. Wherein the input subunit is configured to, in response to determining that the target sample does not match the updated index table, input the traffic sign template image included in the target sample to a template image recognition network included in the initial recognition model to generate a target feature, wherein the target sample further includes target sample label information; the second adjusting subunit is configured to adjust the update index table according to the target feature and the target sample tag information, and update the traffic sign image recognition model according to the adjusted update index table.

In a fourth aspect, some embodiments of the present disclosure provide a traffic sign image recognition apparatus, comprising: an acquisition unit configured to acquire a traffic sign image; and the identification unit is configured to input the traffic sign image into a traffic sign image identification model to generate traffic sign identification information, wherein the traffic sign image identification model is generated by the traffic sign image identification model training method.

In a fifth aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement the method described in any of the implementations of the first aspect.

In a sixth aspect, some embodiments of the present disclosure provide a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following advantages: by the aid of the training method for the traffic sign image recognition model, training efficiency of the traffic sign image recognition model can be improved. Specifically, the reason why the training efficiency of the traffic sign image recognition model is not high is that: the differences between the natural scene image after environmental influences (e.g., wear, reflections, occlusion, etc.) and the standard traffic sign image are not taken into account. Based on this, the training method of the traffic sign image recognition model according to some embodiments of the present disclosure first generates an update index table for recording template image features of a traffic sign template image. And then, generating an image recognition result to represent the difference between the traffic sign template image and the sign image to be trained. Therefore, the traffic sign image recognition model can have the capability of recognizing the difference between the sign image to be trained and the traffic sign template image. Due to the participation of updating the index table, on the capability of the traffic sign image recognition model for recognizing the difference between the natural scene image and the traffic sign template image, the corresponding relation between the image feature to be trained and the template image feature can be further determined by updating the index table. Therefore, the natural scene image which is different from the traffic sign template image due to the influence of the environment can be quickly identified. The convergence rate of the traffic sign image recognition model in the training process is improved. Furthermore, the aim of improving the training efficiency of the traffic sign image recognition model is fulfilled.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a schematic diagram of an application scenario of a traffic sign image recognition model training method of some embodiments of the present disclosure;

FIG. 2 is a schematic diagram of one application scenario of the traffic sign image recognition method of some embodiments of the present disclosure;

FIG. 3 is a flow diagram of some embodiments of a traffic sign image recognition model training method according to the present disclosure;

FIG. 4 is a flow diagram of further embodiments of a traffic sign image recognition model training method according to the present disclosure;

FIG. 5 is a flow diagram of some embodiments of a traffic sign image recognition method according to the present disclosure;

FIG. 6 is a schematic structural diagram of some embodiments of a traffic sign image recognition model training apparatus according to the present disclosure;

FIG. 7 is a schematic diagram of a structure of some embodiments of a traffic sign image recognition device according to the present disclosure;

FIG. 8 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a schematic diagram of an application scenario of a traffic sign image recognition model training method according to some embodiments of the present disclosure.

In the application scenario of fig. 1, first, the computing device 101 may input the traffic sign template image 1021 included in the pre-processed training sample 102 to the template image recognition network 1031 included in the preset initial recognition model 103 to generate the template image features 104, where the training sample 102 further includes a to-be-trained sign image 1022 and sample label information 1023, and the initial recognition model 103 further includes an to-be-trained image recognition network 1032. Next, the computing device 101 may update the preset marker feature index table 105 based on the template image features 104 and the corresponding sample label information 1023 to generate the update index table 106. Then, the computing device 101 may input the above-mentioned marker image 1022 to be trained to the above-mentioned image recognition network 1032 to be trained to generate the image feature 107 to be trained. Thereafter, the computing device 101 may generate an image recognition result 108 based on the template image feature 104 and the image feature 107 to be trained, where the image recognition result 108 is used for representing a difference between the traffic sign template image 1021 and the sign image 1022 to be trained. Finally, the computing device 101 may generate the traffic sign image recognition model 109 based on the updated index table 106 and the to-be-trained image recognition network 1032 in response to determining that the image recognition result 108 satisfies a preset condition.

Fig. 2 is a schematic diagram of an application scenario of the traffic sign image recognition method according to some embodiments of the present disclosure.

In the context of fig. 2, first, the computing device 201 may acquire a traffic sign image 202. The computing device 201 may then input the traffic sign image 202 to a traffic sign image recognition model 203 to generate traffic sign recognition information 204, wherein the traffic sign image recognition model 203 is generated by the traffic sign image recognition model training method as described above.

The computing devices 101 and 201 may be hardware or software. When the computing device is hardware, it may be implemented as a distributed cluster composed of multiple servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of computing devices in fig. 1-2 is merely illustrative. There may be any number of computing devices, as implementation needs dictate.

With continued reference to fig. 3, a flow 300 of some embodiments of a traffic sign image recognition model training method according to the present disclosure is shown. The process 300 of the traffic sign image recognition model training method comprises the following steps:

step 301, inputting the traffic sign template image included in the preprocessed training sample into a template image recognition network included in a preset initial recognition model to generate template image features.

In some embodiments, an executing subject (such as the computing device 101 shown in fig. 1) of the traffic sign image recognition model training method may input the traffic sign template image included in the preprocessed training sample to a template image recognition network included in a preset initial recognition model to generate the template image features. The training sample can also comprise a mark image to be trained and sample label information. The initial recognition model may further include an image recognition network to be trained. The traffic sign template image may be an internationally recognized traffic sign map having standard dimensions, standard definition, standard colors, and the like. The template image features may correspond to traffic sign template images. The sample label information may include a number of the traffic sign template image. The number may serve as a unique identifier characterizing the traffic sign template image. Template image features may also correspond to exemplar label information. The traffic sign template image has the characteristics of standard size, clear color and the like. Therefore, the template image characteristics corresponding to different traffic sign template images can have obvious difference. The image recognition network to be trained can be used for recognizing the marker image to be trained.

In some optional implementations of some embodiments, the template image recognition network in the initial recognition model may include at least one template image convolution layer and at least one template image full-link layer. The image recognition network to be trained in the initial recognition model may include at least one convolution layer of images to be trained and at least one full-connected layer of images to be trained. The at least one template image full-link layer and the at least one image to be trained may not share weights therebetween. Wherein, the at least one template image convolution layer and the at least one image convolution layer to be trained can share weight. Sharing convolutional layer weights can reduce the hyper-parameters of the model, thereby improving the efficiency of model training. In practice, if feature extraction is performed only by at least one template image convolution layer and at least one image convolution layer to be trained, a situation where the training result is not converged may occur. If at least one template image full-link layer and at least one image full-link layer to be trained are added and the full-link layer weight is shared, the situation that the training result is not converged still occurs. Thus, the model training efficiency is reduced. Therefore, in order to improve the efficiency of model training, at least one template image full-connected layer and at least one image to be trained full-connected layer are added into the initial recognition model, and the weights are set not to be shared.

Step 302, updating a preset mark feature index table based on the template image features and the corresponding sample label information to generate an updated index table.

In some embodiments, the execution subject may update a preset index table of the marker feature based on the template image feature and the corresponding sample tag information to generate an updated index table. The preset index table of the marker feature may be a pre-created empty data table, and is used to store the template image feature and the corresponding sample label information. Therefore, the unique identifier included in the template image feature and the sample label information can be stored in the feature index table to update the feature index table, so as to obtain an updated index table.

Step 303, inputting the image of the mark to be trained into the image recognition network to be trained, so as to generate the image feature to be trained.

In some embodiments, the executing subject may input the marker image to be trained to the image recognition network to be trained, so as to generate the image feature to be trained. The to-be-trained marker image may be an image captured from a natural scene image. Since natural scene images are susceptible to environmental influences. Therefore, the initial recognition model can be set as two network branches, and the natural scene image and the traffic sign template image are separately subjected to feature extraction to perform model training. Thus, the image feature mixture of the natural scene image and the traffic sign template image can be avoided. Furthermore, the negative migration phenomenon generated in the model training is avoided. Thus, the efficiency of model training can be improved.

And 304, generating an image recognition result based on the template image characteristics and the image characteristics to be trained.

In some embodiments, the executing subject may generate an image recognition result based on the template image feature and the image feature to be trained. The image recognition result can be used for representing the difference between the traffic sign template image and the sign image to be trained. The feature variance values of the template image features and the image features to be trained can be determined as an image recognition result. In addition, the image recognition result can correspond to the unique identification of the traffic sign template image. Therefore, whether all the preset traffic sign template images participate in model training can be determined according to the image recognition result.

And 305, in response to the fact that the image recognition result meets the preset condition, generating a traffic sign image recognition model based on the updated index table and the to-be-trained image recognition network.

In some embodiments, the executing entity may generate a traffic sign image recognition model based on the update index table and the to-be-trained image recognition network in response to determining that the image recognition result satisfies a preset condition. The preset condition may be that a feature variance value in the image recognition result is less than or equal to a preset variance threshold, and that the unique identifier corresponding to the image recognition result represents that all of a plurality of preset traffic sign template images are trained. The template image recognition network is used for recognizing the traffic sign template image. Therefore, after all the preset traffic sign template images are trained, the template image recognition network can be removed, and the generated updated index table and the to-be-trained image recognition network are used for determining the traffic sign image recognition model. Therefore, the image recognition efficiency of the traffic sign image recognition model can be improved.

Optionally, the executing body may further adjust a parameter in the initial recognition model in response to determining that the image recognition result does not satisfy the preset condition. And if the image recognition result does not meet the preset condition, the model can be represented to be not converged, and the model training process is not finished. Thus, the parameters in the initial recognition model need to be adjusted for the next model training.

With further reference to FIG. 4, a flow 400 of further embodiments of a traffic sign image recognition model training method is illustrated. The process 400 of the traffic sign image recognition model training method comprises the following steps:

step 401, inputting the traffic sign template image included in the preprocessed training sample into a template image recognition network included in a preset initial recognition model to generate template image features.

In some embodiments, an executing subject (such as the computing device 101 shown in fig. 1) of the traffic sign image recognition model training method may input the traffic sign template image included in the preprocessed training sample to a template image recognition network included in a preset initial recognition model to generate the template image features. The training sample can also comprise a mark image to be trained and sample label information. The initial recognition model may further include an image recognition network to be trained.

As an example, the template image recognition network may include a first template image convolution layer, a first template image pooling layer, a second template image convolution layer, a second template image pooling layer, a third template image convolution layer, a third template image pooling layer, a fourth template image convolution layer, a fourth template image pooling layer, a fifth template image convolution layer, a fifth template image pooling layer, a first template image full-connection layer, and a second template image full-connection layer. The to-be-trained image recognition network can comprise a first to-be-trained image convolution layer, a first to-be-trained image pooling layer, a second to-be-trained image convolution layer, a second to-be-trained image pooling layer, a third to-be-trained image convolution layer, a third to-be-trained image pooling layer, a fourth to-be-trained image convolution layer, a fourth to-be-trained image pooling layer, a fifth to-be-trained image convolution layer, a fifth to-be-trained image pooling layer, a first to-be-trained image full-connection layer and a second to-be-trained image full-connection layer.

In some optional implementations of some embodiments, the preprocessed training samples may be generated by:

firstly, acquiring a traffic sign template image set and a natural scene image set. The traffic sign template image set may be used to represent the preset traffic sign template images. The natural scene images in the natural scene image set can be used for generating the above-mentioned marker images to be trained.

And secondly, extracting traffic signs from all the natural scene images in the natural scene image set to generate a traffic sign image set. The traffic sign extraction may be performed on each natural scene Image in the natural scene Image set by a target extraction algorithm (e.g., Deep lab (Semantic Image Segmentation with Deep volumetric connectivity networks, and associated and Connected CRFs, Semantic Image Segmentation using a Deep Convolution network, and associated CRFs), HRNet (Deep High-Resolution retrieval for Visual Recognition, High-Resolution Representation Learning for Visual Recognition), or the like) to generate a traffic sign Image set.

And thirdly, matching each traffic sign template image in the traffic sign template image set with the corresponding traffic sign image in the traffic sign image set to obtain a traffic sign image sample pair set. The traffic sign template image and the traffic sign image can be labeled to determine unique identifiers corresponding to the traffic sign template image and the traffic sign image. Then, one traffic sign template image and one traffic sign image having the same unique identifier may be determined as a matching relationship. Finally, the matched traffic sign template image and the unique identifier corresponding to the traffic sign image can be determined as a sample label. And combining the matched traffic sign template image and the matched traffic sign image to be used as a traffic sign image sample pair.

And fourthly, generating the preprocessed training sample based on the traffic sign image sample pair set. The traffic sign image sample pairs that are not replaced randomly in the traffic sign image sample pair set can be taken out as the training samples after the and processing. Therefore, different traffic sign image samples can be taken out each time for carrying out the training of the traffic sign image recognition model.

In some optional implementation manners of some embodiments, the executing subject matches each traffic sign template image in the traffic sign template image set with a traffic sign image in the traffic sign image set to obtain a traffic sign image sample pair set, and may further include the following steps:

the method comprises the following steps of firstly, carrying out image annotation on each traffic sign template image in a traffic sign template image set and each traffic sign image in the traffic sign image set to obtain a template image annotation information set and a traffic sign image annotation information set. The image annotation can be the unique identifier for annotating the traffic sign template image and the traffic sign image. And determining the unique identification as template image labeling information and traffic sign image labeling information. The unique identifier may be used to characterize the traffic sign template image and the category and number of the traffic sign image.

And secondly, carrying out image enhancement on each traffic sign template image in the traffic sign template image set and each traffic sign image in the traffic sign image set to obtain an enhanced traffic sign template image set and an enhanced traffic sign image set. This can improve the diversity of the traffic sign images. The traffic sign influenced by the natural environment in the actual natural scene can have better representation capability. Therefore, the identification accuracy of the traffic sign image identification model is improved.

As an example, the enhancement may be a range of angular rotations of the traffic sign image (e.g., [ -20 degrees, 20 degrees ]). It is also possible to zoom the traffic sign image. E.g. to between 20, 200 pixels.

And thirdly, classifying each enhanced traffic sign template image in the enhanced traffic sign template image set to obtain a classified template image group set. Wherein each enhanced traffic sign template image in the set of enhanced traffic sign template images may be classified according to a preset category (e.g., "alert", "prohibited", "indicated", etc.). Each of the set of classified template images may be used to characterize a class of traffic signs.

And fourthly, classifying all the enhanced traffic sign images in the enhanced traffic sign image set according to the preset classes to obtain a classified sign image set. Wherein each of the set of classified marker images can be used to characterize a class of traffic markers.

And fifthly, determining the classified template image and the classified sign image corresponding to the same traffic sign type as a sample pair. Thus, a positive and a negative sample pair may be defined. A set of positive and negative sample pairs is obtained. The positive and negative sample pair set may include a plurality of positive sample pairs and a plurality of negative sample pairs. Each positive sample pair or negative sample pair in the positive and negative sample pair set, and the corresponding template image labeling information or traffic sign image labeling information may be determined as a traffic sign sample pair. And obtaining a traffic sign image sample pair set. The positive sample pair or the negative sample pair can also comprise template image annotation information and traffic sign image annotation information. The positive sample pair may include a traffic sign template image (i.e., a classified template image) and a traffic sign image in the same category (i.e., a classified sign image). The negative example pairs may include a traffic sign template image and an appearance-approximating traffic sign image.

In addition, when the positive and negative sample pair sets are generated, the quantity balance between the positive and negative sample pairs of different categories can be considered, and the proportion of the corresponding positive sample pair or negative sample pair in the positive and negative sample pair sets can be increased according to the preset difficult sample identification. Therefore, difficult samples can be better distinguished in model training. In practice, the model convergence rate is reduced by subtracting the pixel mean of the base channel (e.g., three channels red, green, and blue) from the pixels of the traffic sign template image and the traffic sign image. The pixel values of the images included in the training samples may thus be normalized before the model training is performed. Thus, the efficiency of model training can be improved.

As an example, in the positive sample pair, the traffic sign template image is the standard speed limit 30 sign. The traffic sign image may be the speed limit 30 sign in a natural scene. In the negative sample pair, the traffic sign template image is the standard 30-sign speed limit. The traffic sign image with an approximate appearance may be a traffic sign with a weight limit of 30t (tons).

Step 402, updating a preset mark feature index table based on the template image features and the corresponding sample label information to generate an updated index table.

In some embodiments, the sample label information may further include a traffic sign template image name, traffic sign template image size information, traffic sign template image category information, and traffic sign template image usage information. The execution body may record the template image feature, the traffic sign template image name, the traffic sign template image size information, the traffic sign template image category information, and the traffic sign template image use information as tables in the sign feature index table to generate an update index table. Wherein, a corresponding table field may be set in the index table of the mark feature in advance. Then, the name of the traffic sign template image, the size information of the traffic sign template image, the category information of the traffic sign template image, the use information of the traffic sign template image, and the characteristics of the template image included in the tag information may be stored in the tag characteristic index table, so as to obtain an updated index table.

And 403, inputting the mark image to be trained into the image recognition network to be trained to generate the image feature to be trained.

In some embodiments, the specific implementation manner and technical effects of step 403 may refer to step 303 in those embodiments corresponding to fig. 3, and are not described herein again.

At step 404, differences between the template image features and the image features to be trained are determined to generate difference features.

In some embodiments, the executing subject may determine a difference between the template image feature and the image feature to be trained to generate a difference feature. Wherein the initial recognition model may further comprise an output network. The output network may be formed of at least one fully connected layer for performing feature mapping on the difference features to generate the recognition result. Specifically, the 1-norm of the difference between the template image feature and the image feature to be trained may be determined as the difference feature.

Step 405, inputting the difference features into an output network to generate an image recognition result.

In some embodiments, the execution subject may input the difference feature to the output network to generate an image recognition result. The difference features may be input to at least one full connection layer included in the output network to perform feature mapping, so as to obtain an image recognition result. The image recognition result can be used for representing the category of the mark image to be trained.

As an example, the image recognition result may be a number for characterizing a category. E.g., a, b, c, etc. The image recognition result may also be a classification value used to characterize whether the traffic sign template image and the to-be-trained sign image are of the same category. E.g., 0 and 1.

And 406, in response to the fact that the image recognition result meets the preset condition, generating a traffic sign image recognition model based on the updated index table and the to-be-trained image recognition network.

In some embodiments, the executing agent may generate the traffic sign image recognition model based on the updated index table and the to-be-trained image recognition network in response to determining that the image recognition result satisfies the preset condition. Wherein, whether the image recognition result meets the preset condition can be determined through the following steps:

firstly, determining an image recognition result of a historical training sample to obtain a historical recognition result set. The historical training samples may be training samples that are input to the initial recognition model for model training before the image recognition result is generated. The number of historical recognition results in the set of historical recognition results may be indicative of the number of training samples input to the initial recognition model for model training prior to generating the image recognition result. In addition, if the training sample after the preprocessing is the first training sample. It may be determined that the historical recognition result set has only one historical recognition result, which is the image recognition result. Specifically, the historical recognition result in the historical recognition result set may also include a classification value of whether the traffic sign template image and the to-be-trained sign image are of the same category.

Second, the training loss value may be determined by the following loss function formula:

wherein L represents the training loss value. i represents a serial number. n represents the number of history recognition results in the history recognition result set. y represents a classification value in the history recognition result. y is_iAnd the classification value in the ith historical recognition result in the historical recognition result set is shown.

The classification values of the positive or negative sample pairs in the traffic sign image sample pair corresponding to the historical recognition results are represented (e.g., the classification values of the positive sample pairs may be classification values characterizing the same category.

And representing the classification value of the positive sample pair or the negative sample pair in the traffic sign image sample pair corresponding to the ith historical recognition result in the historical recognition result set.

Specifically, the correspondence between the historical recognition result and the sample label can be determined through a training sample.

And thirdly, in response to the fact that the training loss value is smaller than or equal to a preset loss threshold value, determining that the image recognition result meets a preset condition. Thus, the initial recognition model of the removed template image recognition network may be determined as the traffic sign image recognition model. The model may include updating an index table, an image recognition network to be trained, and an output network.

Optionally, the executing main body may further perform the following steps:

in a first step, in response to determining that the target sample does not match the updated index table, the traffic sign template image included in the target sample is input to a template image recognition network included in the initial recognition model to generate a target feature, wherein the target sample may further include target sample label information. The target sample may be a sample composed of traffic sign template images different from each of the traffic sign template images in the traffic sign template image set. In consideration of the actual situation, there is an addition of a new traffic sign template image. Thus, the introduction of the target sample may be used to adjust the updated index table. The target sample can comprise a positive sample pair or a negative sample pair consisting of a target traffic sign template image and a traffic sign image to be trained, and the target sample label information. The mismatch between the target sample and the updated index table may be used to characterize that the target traffic sign template image included in the target sample is different from each of the traffic sign template images in the traffic sign template image set.

And secondly, adjusting the updated index table according to the target characteristics and the target sample label information, and updating the traffic sign image recognition model according to the adjusted updated index table. The target feature and the target sample tag information may be added to the updated index table to obtain an added index table. Updating the traffic sign image recognition model according to the adjusted update index table may be replacing the update index table in the traffic sign image recognition model with the addition index table. In this way, the added index table can be utilized, so that the traffic sign image recognition model has the recognition capability of more different types of traffic sign images. Because, the function expansion of the traffic sign image recognition model can be realized only by updating the index table. Therefore, the recognition efficiency of the traffic sign image recognition model can be improved.

As can be seen from fig. 4, compared with the description of some embodiments corresponding to fig. 3, the flow 400 of the training method for the image recognition model of the traffic sign in some embodiments corresponding to fig. 4 embodies the steps of preprocessing the training sample and generating the image recognition result. Firstly, the characteristic difference between the natural scene image and the traffic sign template image can be amplified through the design of positive and negative sample pairs. Then, by updating the preset feature index table, it is possible to make it possible to definitely record the difference between the image feature of the natural scene image and the template image feature. And finally, the identification of the natural scene image can be realized by updating the index table. Therefore, the scheme described in the embodiments can further improve the recognition accuracy of the traffic sign image recognition model on the basis of improving the training efficiency of the traffic sign image recognition model.

With further reference to fig. 5, a flow 500 of some embodiments of a traffic sign image recognition method according to the present disclosure is shown. The flow 500 of the traffic sign image recognition method includes the following steps:

step 501, acquiring a traffic sign image.

In some embodiments, the subject of execution of the traffic sign image recognition method (e.g., computing device 201 shown in fig. 2) may obtain the traffic sign image. The traffic sign image may be a traffic sign image in a natural scene, or may be a standard traffic sign image.

Step 502, inputting the traffic sign image into the traffic sign image recognition model to generate traffic sign recognition information.

In some embodiments, the executing subject may input the traffic sign image to a traffic sign image recognition model to generate the traffic sign recognition information, wherein the traffic sign image recognition model may be generated through steps in those embodiments corresponding to fig. 3 or fig. 4. The updated index table is introduced in the training process of the traffic sign image recognition model. Therefore, index matching can be carried out on the image characteristics identified by the natural scene image. Therefore, the accuracy of the traffic sign image recognition can be improved.

With further reference to fig. 6, as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of a traffic sign image recognition model training apparatus, which correspond to those shown in fig. 3, and which can be applied in various electronic devices.

As shown in fig. 6, the training apparatus 600 for the image recognition model of the traffic sign according to some embodiments includes: a first input unit 601, an update unit 602, a second input unit 603, a first generation unit 604, and a second generation unit 605. The first input unit 601 is configured to input a traffic sign template image included in a preprocessed training sample to a template image recognition network included in a preset initial recognition model to generate template image features, wherein the training sample further includes a to-be-trained sign image and sample label information, and the initial recognition model further includes the to-be-trained image recognition network; an updating unit 602, configured to update a preset marker feature index table based on the template image features and corresponding sample tag information, so as to generate an updated index table; a second input unit 603 configured to input the to-be-trained marker image to the to-be-trained image recognition network to generate to-be-trained image features; a first generating unit 604, configured to generate an image recognition result based on the template image feature and the to-be-trained image feature, where the image recognition result is used to represent a difference between the traffic sign template image and the to-be-trained sign image; a second generating unit 605, configured to generate a traffic sign image recognition model based on the updated index table and the to-be-trained image recognition network in response to determining that the image recognition result satisfies a preset condition.

In an optional implementation manner of some embodiments, the above-mentioned training apparatus 600 for a traffic sign image recognition model may further include a first adjusting subunit. Wherein the first adjusting subunit is configured to adjust a parameter in the initial recognition model in response to determining that the image recognition result does not satisfy the preset condition.

In an optional implementation manner of some embodiments, the sample tag information includes a name of the traffic sign template image, size information of the traffic sign template image, category information of the traffic sign template image, and usage information of the traffic sign template image; and the updating unit 602 is further configured to store the template image feature, the traffic sign template image name, the traffic sign template image size information, the traffic sign template image category information, and the traffic sign template image use information as table records in the sign feature index table to generate an update index table.

In an alternative implementation of some embodiments, the preprocessed training samples are generated by: acquiring a traffic sign template image set and a natural scene image set; extracting traffic signs from the natural scene images in the natural scene image set to generate a traffic sign image set; matching each traffic sign template image in the traffic sign template image set with the corresponding traffic sign image in the traffic sign image set to obtain a traffic sign image sample pair set; and generating the preprocessed training sample based on the traffic sign image sample pair set.

In an optional implementation of some embodiments, the initial recognition model further includes an output network; and the first generating unit 604 is further configured to determine a difference between the template image feature and the image feature to be trained to generate a difference feature; and inputting the difference characteristics into the output network to generate an image recognition result.

In an optional implementation manner of some embodiments, the template image recognition network in the initial recognition model includes at least one template image convolution layer and at least one template image full-link layer, the image recognition network to be trained in the initial recognition model includes at least one image convolution layer to be trained and at least one image full-link layer to be trained, and the at least one template image full-link layer and the at least one image full-link layer to be trained do not share a weight therebetween.

In an optional implementation manner of some embodiments, the above-mentioned training apparatus 600 further includes an input subunit and a second adjusting subunit. Wherein the input subunit is configured to, in response to determining that the target sample does not match the updated index table, input the traffic sign template image included in the target sample to a template image recognition network included in the initial recognition model to generate a target feature, wherein the target sample further includes target sample label information; the second adjusting subunit is configured to adjust the update index table according to the target feature and the target sample tag information, and update the traffic sign image recognition model according to the adjusted update index table.

It will be understood that the elements described in the apparatus 600 correspond to various steps in the method described with reference to fig. 3. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 600 and the units included therein, and are not described herein again.

Referring next to fig. 7, as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of a traffic sign image recognition model training apparatus, which correspond to those shown in fig. 5, and which can be applied in various electronic devices.

As shown in fig. 7, a traffic sign image recognition apparatus 700 of some embodiments includes: an acquisition unit 701 and a recognition unit 702. Wherein, the acquiring unit 701 is configured to acquire a traffic sign image; a recognition unit 702 configured to input the traffic sign image into a traffic sign image recognition model to generate traffic sign recognition information, wherein the traffic sign image recognition model may be generated through the steps in those embodiments corresponding to fig. 3 or fig. 4.

It will be understood that the elements described in the apparatus 700 correspond to various steps in the method described with reference to fig. 5. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 700 and the units included therein, and will not be described herein again.

Referring now to fig. 8, shown is a schematic diagram of an electronic device 800 suitable for use in implementing some embodiments of the present disclosure. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, an electronic device 800 may include a processing means (e.g., central processing unit, graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 8 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 8 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through communications device 809, or installed from storage device 808, or installed from ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: inputting a traffic sign template image included in a preprocessed training sample into a template image recognition network included in a preset initial recognition model to generate template image characteristics, wherein the training sample further comprises a to-be-trained sign image and sample label information, and the initial recognition model further comprises the to-be-trained image recognition network; updating a preset mark feature index table based on the template image features and the corresponding sample label information to generate an updated index table; inputting the mark image to be trained into the image recognition network to be trained so as to generate the image feature to be trained; generating an image recognition result based on the template image features and the to-be-trained image features, wherein the image recognition result is used for representing the difference between the traffic sign template image and the to-be-trained sign image; and generating a traffic sign image recognition model based on the updated index table and the to-be-trained image recognition network in response to the fact that the image recognition result meets the preset condition.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first input unit, an update unit, a second input unit, a first generation unit, and a second generation unit. Where the names of the cells do not in some cases constitute a limitation of the cell itself, for example, the second generation cell may also be described as a "cell generating a traffic sign image recognition model".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

Claims

1. A traffic sign image recognition model training method comprises the following steps:

inputting a traffic sign template image included in a preprocessed training sample into a template image recognition network included in a preset initial recognition model to generate template image features, wherein the training sample further comprises a to-be-trained sign image and sample label information, and the initial recognition model further comprises the to-be-trained image recognition network;

updating a preset mark feature index table based on the template image features and the corresponding sample label information to generate an updated index table;

inputting the mark image to be trained into the image recognition network to be trained so as to generate the image feature to be trained;

generating an image recognition result based on the template image feature and the image feature to be trained, wherein the image recognition result is used for representing the difference between the traffic sign template image and the image to be trained;

and in response to the fact that the image recognition result meets the preset condition, generating a traffic sign image recognition model based on the updated index table and the to-be-trained image recognition network.

2. The method of claim 1, wherein the method further comprises:

and adjusting parameters in the initial recognition model in response to determining that the image recognition result does not satisfy the preset condition.

3. The method of claim 1, wherein the sample label information includes a traffic sign template image name, traffic sign template image size information, traffic sign template image category information, and traffic sign template image usage information; and

the updating a preset marker feature index table based on the template image features and the corresponding sample label information to generate an updated index table, including:

and storing the template image features, the traffic sign template image names, the traffic sign template image size information, the traffic sign template image category information and the traffic sign template image use information into the sign feature index table as table records to generate an updated index table.

4. The method of claim 1, wherein the preprocessed training samples are generated by:

acquiring a traffic sign template image set and a natural scene image set;

extracting traffic signs from all natural scene images in the natural scene image set to generate a traffic sign image set;

matching each traffic sign template image in the traffic sign template image set with the corresponding traffic sign image in the traffic sign image set to obtain a traffic sign image sample pair set;

and generating the preprocessed training sample based on the traffic sign image sample pair set.

5. The method of claim 1, wherein the initial recognition model further comprises an output network; and

generating an image recognition result based on the template image features and the image features to be trained, wherein the image recognition result comprises:

determining differences between the template image features and the image features to be trained to generate difference features;

inputting the difference features to the output network to generate an image recognition result.

6. The method of claim 1, wherein the template image recognition network in the initial recognition model comprises at least one template image convolution layer and at least one template image full-link layer, the image recognition network to be trained in the initial recognition model comprises at least one image convolution layer to be trained and at least one image full-link layer to be trained, and the at least one template image full-link layer and the at least one image full-link layer to be trained do not share a weight therebetween.

7. The method of claim 1, wherein the method further comprises:

in response to determining that a target sample does not match the updated index table, inputting a traffic sign template image included in the target sample to a template image recognition network included in the initial recognition model to generate a target feature, wherein the target sample further includes target sample label information;

and adjusting the updated index table according to the target characteristics and the target sample label information, and updating the traffic sign image recognition model according to the adjusted updated index table.

8. A traffic sign image recognition method, comprising:

acquiring a traffic sign image;

inputting the traffic sign image into a traffic sign image recognition model to generate traffic sign recognition information, wherein the traffic sign image recognition model is generated by the method according to one of claims 1 to 7.

9. A traffic sign image recognition model training device comprises:

the system comprises a first input unit, a second input unit and a third input unit, wherein the first input unit is configured to input a traffic sign template image included in a preprocessed training sample into a template image recognition network included in a preset initial recognition model so as to generate template image features, the training sample further comprises a to-be-trained sign image and sample label information, and the initial recognition model further comprises the to-be-trained image recognition network;

an updating unit configured to update a preset marker feature index table based on the template image features and corresponding sample tag information to generate an updated index table;

the second input unit is configured to input the mark image to be trained to the image recognition network to be trained so as to generate the image feature to be trained;

a first generating unit, configured to generate an image recognition result based on the template image feature and the image feature to be trained, wherein the image recognition result is used for representing a difference between the traffic sign template image and the image to be trained;

and the second generation unit is configured to generate a traffic sign image recognition model based on the updated index table and the to-be-trained image recognition network in response to the fact that the image recognition result meets the preset condition.

10. A traffic sign image recognition device comprising:

an acquisition unit configured to acquire a traffic sign image;

a recognition unit configured to input the traffic sign image to a traffic sign image recognition model to generate traffic sign recognition information, wherein the traffic sign image recognition model is generated by the method according to one of claims 1 to 7.

11. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.

12. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-8.