CN113807315A

CN113807315A - Method, device, equipment and medium for constructing recognition model of object to be recognized

Info

Publication number: CN113807315A
Application number: CN202111171015.0A
Authority: CN
Inventors: 陈茜
Original assignee: Wensihai Huizhike Technology Co ltd
Current assignee: Wensihai Huizhike Technology Co ltd
Priority date: 2021-10-08
Filing date: 2021-10-08
Publication date: 2021-12-17
Anticipated expiration: 2041-10-08
Also published as: CN113807315B

Abstract

The application provides a method, a device, equipment and a medium for constructing an identification model of an object to be identified, wherein the method comprises the following steps: obtaining a sample picture; acquiring a first position coordinate of a graph of the object to be identified in each sample picture with the object to be identified; for each sample picture, inputting the sample picture into an identification initial model of the object to be identified to obtain a second position coordinate of a prediction graph of the object to be identified; and training the recognition initial model of the object to be recognized based on the second position coordinate of the prediction graph of the object to be recognized and the first position coordinate of the graph of the object to be recognized to obtain the recognition model of the object to be recognized. According to the method and the device, the problem that the recognition accuracy of the recognition model of the object to be recognized obtained through training in the prior art is not high is solved.

Description

Method, device, equipment and medium for constructing recognition model of object to be recognized

Technical Field

The present application relates to the field of computer information technology, and in particular, to a method, an apparatus, a device, and a medium for constructing an identification model of an object to be identified.

Background

With the rapid development of automation technology in recent years, the requirements for automatic detection and automatic identification of pictures are also increasing. For example, the traffic sign is used as an important component of road facilities and an important carrier of road traffic information, and comprises a plurality of key traffic information such as speed limit prompts, front road condition changes and the like, and the traffic sign can provide road information for a driver, provide safety warning for the driver in time and urge the driver to drive cautiously, so that the identification of the traffic sign in the field of automatic driving needs to be faster and more accurate.

In the prior art, there are many methods for identifying pictures, which are relatively common in that an identification model is built, and a picture to be identified is input into the identification model, so that whether the input picture to be identified contains a desired object to be identified can be obtained. However, in the method, when the identification model is constructed, the sample picture containing the object to be identified and the sample picture not containing the object to be identified are used as training sets, and whether the input sample picture contains the prediction result of the object to be identified or not is compared with the actual result of whether the sample picture contains the object to be identified or not through the model, so that the training of the identification model of the object to be identified is completed.

Disclosure of Invention

In view of this, an object of the present application is to provide a method, an apparatus, a device, and a medium for constructing an object to be recognized recognition model, so as to solve the problem in the prior art that the recognition accuracy of the object to be recognized recognition model obtained by training is not high.

In a first aspect, an embodiment of the present application provides a method for constructing a recognition model of an object to be recognized, where the method includes:

obtaining a sample picture;

acquiring a first position coordinate of a graph of the object to be identified in each sample picture with the object to be identified;

for each sample picture, inputting the sample picture into an identification initial model of the object to be identified to obtain a second position coordinate of a prediction graph of the object to be identified;

and training the recognition initial model of the object to be recognized based on the second position coordinate of the prediction graph of the object to be recognized and the first position coordinate of the graph of the object to be recognized to obtain the recognition model of the object to be recognized.

Further, the training of the identification initial model of the object to be identified based on the second position coordinate of the prediction graph of the object to be identified and the first position coordinate of the graph of the object to be identified includes:

if the sample picture corresponding to the object to be recognized prediction graph is a picture without the object to be recognized, adjusting the training parameters of the object to be recognized recognition initial model until the object to be recognized prediction graph output by the trained object to be recognized recognition initial model is empty;

if the sample picture corresponding to the predicted graph of the object to be recognized is a picture with the object to be recognized, acquiring a first pixel point of the predicted graph of the object to be recognized from the predicted graph of the object to be recognized;

acquiring a first pixel number marked as an object to be identified in a sample picture, and acquiring a second pixel number marked as the object to be identified from a predicted graph of the object to be identified;

calculating a loss value based on the second position coordinate of the first pixel point, the first position coordinate corresponding to the first pixel point, the first pixel number and the second pixel number;

and if the loss value is greater than the preset loss threshold, adjusting the training parameters of the initial model for recognizing the object to be recognized until the loss value of the trained initial model for recognizing the object to be recognized is not greater than the loss threshold.

Further, the method further comprises:

adjusting the obtained sample picture to the input picture size required by the identification model of the object to be identified;

carrying out data enhancement processing on the sample picture with the adjusted size to obtain an enhanced picture;

selecting enhanced pictures with random numbers to be spliced to obtain spliced pictures;

adjusting the spliced picture to the size of the input picture, and acquiring the position coordinates of each object to be identified in the spliced picture with the adjusted size;

and expanding the sample picture with the adjusted size according to the spliced picture with the adjusted size.

Further, the data enhancement comprises: random scaling, gamut variation, flipping.

Further, the data enhancement includes random amplification, and performs data enhancement processing on the sample picture with the adjusted size to obtain an enhanced picture, including:

and adding an additional bar around the sample picture with the adjusted size to obtain an enhanced picture with the additional bar.

Further, the method further comprises:

acquiring a picture to be recognized, and adjusting the acquired picture to be recognized to an input picture size required by the recognition model of the object to be recognized;

and inputting the picture to be recognized with the adjusted size into the recognition model of the object to be recognized to obtain the image of the object to be recognized.

Further, the object to be identified is a traffic sign, and the method further includes:

and inquiring a preset mapping relation library of each traffic sign template graph and the traffic sign type, and identifying the traffic sign type of the object graph to be identified.

In a second aspect, an embodiment of the present application provides an apparatus for constructing a recognition model of an object to be recognized, where the apparatus includes:

the sample picture acquisition module is used for acquiring a sample picture;

the first position coordinate acquisition module is used for acquiring a first position coordinate of a graph of the object to be identified in each sample picture with the object to be identified;

the second position coordinate acquisition module is used for inputting the sample picture into the identification initial model of the object to be identified aiming at each sample picture to obtain a second position coordinate of the prediction graph of the object to be identified;

and the identification model determining module of the object to be identified is used for training the identification initial model of the object to be identified based on the second position coordinate of the prediction graph of the object to be identified and the first position coordinate of the graph of the object to be identified to obtain the identification model of the object to be identified.

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the method of constructing an object recognition model to be recognized as described above.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to execute the steps of the method for constructing an object recognition model to be recognized.

According to the method and the device for constructing the identification model of the object to be identified, a sample picture is obtained; then, acquiring a first position coordinate of the graph of the object to be identified in each sample picture with the object to be identified; for each sample picture, inputting the sample picture into an identification initial model of the object to be identified to obtain a second position coordinate of a prediction graph of the object to be identified; and finally, training the identification initial model of the object to be identified based on the second position coordinate of the prediction graph of the object to be identified and the first position coordinate of the graph of the object to be identified to obtain the identification model of the object to be identified.

According to the method and the device for constructing the recognition model of the object to be recognized, when the initial model of the object to be recognized is trained, the position coordinates of the image of the object to be recognized in the sample picture are compared with the position coordinates of the predicted object to be recognized in the predicted picture, and then the pixel marks are used for comparison. The higher the accuracy of the model for identifying the object to be identified is, the more accurate the identified object to be identified is.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a flowchart of a method for constructing an object recognition model to be recognized according to an embodiment of the present application;

FIG. 2 is a flowchart of a method for training an object to be recognized to recognize an initial model according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an apparatus for constructing an identification model of an object to be identified according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. Every other embodiment that can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present application falls within the protection scope of the present application.

It has been found that there are many methods for identifying an object to be identified in the prior art, such as color-based detection, shape-based detection, multi-feature fusion-based detection, and candidate region-based target detection algorithms. However, there are many disadvantages to the several approaches described above.

The detection method based on the color is divided into two methods, one method is an RGB color model method, the method directly divides the collected RGB image, thus the calculated amount can be reduced, the speed is greatly improved, and the real-time requirement of the algorithm is met, but the method has certain defects, when the environment where the traffic sign is located is complex, the traffic sign can be mixed with background noise, and the algorithm can not achieve good detection effect; the other method is an HSI color model method, the HSI color space has the characteristics of illumination invariance and the like, so the robustness is better, but the conversion of RGB into the HSI color space has certain calculation amount, and the real-time performance needs to be improved by means of hardware processing.

The basic idea of the shape-based detection method is to divide the image into cells and accumulate histograms of edge directions within the cells, and finally generate features to describe the object by combining the histogram entries. This method has the advantage of rotational scaling invariance, but is computationally too extensive.

The detection method based on multi-feature fusion combines the information of RGB and HIS color channels to segment the traffic sign. The algorithm combines the segmentation results of RGB and HIS color spaces, overcomes the defect of image information loss caused by S space segmentation in the HIS space, and improves the detection accuracy, but the method has extremely low detection speed and cannot meet the requirements of real-time application.

The candidate region-based target detection algorithm contains a rich feature layer structure for accurate object detection and semantic segmentation, and achieves excellent object detection accuracy by classifying object proposals using a deep convolutional neural network, but the detection speed of this method is slow because it repeatedly extracts and stores features of each candidate region, taking a lot of computation time and storage resources.

Whether the method for detecting the object to be recognized is based on color, shape, multi-feature fusion or candidate region, there are corresponding recognition models, and in the prior art, when training these models, training of the recognition models is basically performed by using the whole image to be recognized, for example, training of the recognition models is performed based on the color of the whole object to be recognized or based on the shape of the whole object to be recognized. However, the precision of the training mode is not high, the prediction precision of the obtained recognition model is not high, and recognition errors may occur when the object to be recognized is recognized.

Based on this, the embodiment of the application provides a method for constructing an object to be recognized recognition model, so as to solve the problem that the recognition accuracy of the object to be recognized recognition model obtained through training in the prior art is not high, and improve the recognition accuracy of the object to be recognized recognition model obtained through training.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for constructing an object recognition model according to an embodiment of the present disclosure. As shown in fig. 1, a method for constructing an object recognition model to be recognized according to an embodiment of the present application includes:

and S101, acquiring a sample picture.

It should be noted that the sample picture refers to each training sample in the model training set for training the prediction model. The sample picture may be a picture with or without the object to be recognized. As an optional implementation mode, the sample picture can be a picture with a traffic sign, and can also be a picture without a traffic sign. Traffic signs refer to assets that convey guidance, restriction, warning, or indication information in words or symbols. In general, a traffic sign which is safe, striking, clear and bright is an important measure for implementing traffic management and ensuring the safety and smoothness of road traffic. The sample picture with the traffic signs can contain various types of traffic signs, and can be distinguished in various ways: primary and secondary signs; movable signs and fixed signs; illuminated signs, luminous signs and reflective signs; and a variable information mark reflecting the driving environment change. After the sample picture is obtained, the traffic signs in the sample picture need to be identified. Specifically, there are many methods for identifying the traffic sign in the sample picture, for example, manually identifying the sample picture, or identifying the traffic sign by using the existing color-based, shape-based, multi-feature fusion-based, candidate region-based target detection algorithm. How to perform traffic sign identification based on color, shape, multi-feature fusion and candidate area-based target detection algorithm is described in detail in the prior art, and is not described in detail herein. As an optional implementation manner, the sample picture may be a picture taken by a camera, or a picture uploaded by a user, which is not limited in this application.

Here, it should be noted that the above example for the sample picture is merely an example, and actually, the sample picture is not limited to the above example.

When the sample pictures are used for training the identification model of the object to be identified, the sizes of different sample pictures are possibly different, so that the sizes of the obtained sample pictures are adjusted to be the same, and the construction speed of the identification model of the object to be identified can be increased. As an alternative embodiment, the method is a sample picture obtained by the following steps:

step 1011, adjusting the obtained sample picture to the input picture size required by the identification model of the object to be identified.

It should be noted that the identification model of the object to be identified refers to a model for identifying the object to be identified in the picture. The input picture size refers to the size of a picture which is preset in advance and required by the recognition model of the object to be recognized.

For the above step 1011, in a specific implementation, the sample picture obtained in step S101 is adjusted to the input picture size required by the identification model of the object to be identified, so as to obtain a sample picture with the same size as the input picture. And judging whether the size of the sample picture is larger than the size of an input picture, if so, reducing the size of the sample picture to the size of the input picture to obtain the picture with the same size as the sample picture. And if the size of the sample picture is smaller than the size of the input picture, adding additional strips around the sample picture to obtain the picture with the same sample size, so that the size of the picture with the same sample size is the same as that of the input picture. Here, the additional bar refers to an additional bar of the same color which is formed by one more turn around the original picture, outside the normal picture of the sample picture. As an alternative embodiment, the above-mentioned additional bar may be black or gray, and the application is not limited thereto. In specific implementation, after the size of the sample picture is judged to be smaller than the size of the input picture, additional strips are added around the sample picture, and the sample picture added with the additional strips is the same as the size of the input picture, so that the picture with the same size as the sample is obtained. For example, the resulting sample picture is 16: 9, the input picture size is 4:3, and then an additional strip needs to be added around the original sample picture to make the size of the adjusted sample picture reach 4: 3.

Here, it should be noted that the above selection of the color of the additional bar is merely an example, and in reality, the color of the additional bar is not limited to the above example.

Therefore, when the identification model of the object to be identified is constructed, the sample pictures are adjusted to the same size, all the sample pictures can be adjusted to the picture size required by the traffic sign identification model, the size problem of the sample pictures does not need to be considered when the identification model of the object to be identified is constructed, the size of each processed sample picture is the same as that of the input picture, and the construction speed of the identification model of the object to be identified can be improved.

And 1012, performing data enhancement processing on the sample picture with the adjusted size to obtain an enhanced picture.

As an optional embodiment, the data enhancement comprises: random scaling, gamut variation, flipping.

The enhancement picture refers to a picture obtained by performing data enhancement processing on the resized sample picture. The random scaling refers to an operation of scaling the sample picture with the adjusted size, the color gamut change refers to an operation of changing the brightness, saturation and hue of the sample picture with the adjusted size, and the flipping refers to an operation of flipping the sample picture with the adjusted size from left to right.

As an optional implementation manner, the data enhancement includes random amplification, and performs data enhancement processing on the sample picture with the adjusted size to obtain an enhanced picture, including:

The random enlargement refers to an operation of randomly enlarging the sample picture of the adjusted size. The additional strip refers to an additional strip with the same color which is arranged outside the normal picture of the sample picture and is arranged around the original sample picture in a circle. As an alternative embodiment, the above-mentioned additional bar may be black or gray, and the application is not limited thereto. In a specific implementation, when the data enhancement operation is a random zoom-in operation, an additional bar may be added around the resized sample picture, resulting in an enhanced picture with the additional bar. Specifically, because the colors in the additional strips are uniform, when the initial model for identifying the object to be identified identifies the enhanced picture with the additional strips, when it is detected that the color of a certain pixel point in the enhanced picture is the same as the color of a preset additional strip, it is determined that the position corresponding to the pixel point does not necessarily contain the object to be identified, and therefore, only images except the additional strips are identified when the initial model for identifying the object to be identified identifies the enhanced picture with the additional strips.

And 1013, selecting the enhanced pictures with the random numbers to be spliced to obtain spliced pictures.

It should be noted that splicing refers to splicing at least two enhanced pictures into one spliced picture. As an alternative embodiment, four enhanced pictures may be randomly selected and spliced. The Mosaic of the enhanced picture can adopt a Mosaic data enhancement mode. Specifically, the Mosaic data enhancement is to randomly select four enhancement pictures and splice the four enhancement pictures in a random distribution manner to obtain a spliced picture. Continuing the example of splicing the four enhanced pictures, in specific implementation, firstly, randomly reading the four enhanced pictures, splicing the four enhanced pictures together according to a random distribution mode, for example, placing the four enhanced pictures in the order of placing the first enhanced picture in the upper left corner, placing the second enhanced picture in the upper right corner, placing the third enhanced picture in the lower left corner, and placing the fourth enhanced picture in the lower right corner. After the placement of the four enhanced pictures is completed, the fixed areas of the four enhanced pictures are intercepted in a matrix mode, and then the four enhanced pictures are spliced together to form a new picture serving as a spliced picture.

The splicing mode greatly enriches the model training set, particularly increases a plurality of small targets by random scaling, and can ensure that the robustness of the prediction model is better. And a plurality of pictures are spliced to obtain a spliced picture before prediction, then the spliced picture is transmitted into an initial model for identifying the object to be identified for learning, namely four pieces of reinforcement are transmitted to a neural network for learning at one time, so that the backgrounds of the detected object are enriched, and the data of a plurality of sample pictures can be calculated at one time when the object to be identified is identified, so that a GPU can achieve a good effect.

Here, it should be noted that the selection of the splicing manner of the enhanced pictures and the selection of the splicing number of the enhanced pictures are merely examples, and in practice, the splicing manner of the enhanced pictures and the splicing number of the enhanced pictures are not limited to the above examples.

And 1014, adjusting the spliced picture to the size of the input picture, and acquiring the position coordinates of each object to be identified in the spliced picture with the adjusted size.

For the above step 1014, after the stitched picture is obtained, the stitched picture is adjusted to the input picture size, and specifically, the method for adjusting the size of the stitched picture is the same as the method for adjusting the size of the acquired sample picture to the input picture size in step 1011, and is not described herein again. After the size is adjusted, the position coordinates of each object to be identified in the spliced picture with the adjusted size are also acquired. After the sample picture is randomly scaled and spliced, the position coordinates of the object to be recognized in the sample picture are also changed. For example, the sample picture has a size of 500 pixels × 500 pixels, and the position coordinates of the object to be recognized in the sample picture are (100, 50). When stitching, the sample picture is reduced in size to 50% of the original sample picture. At this time, the size of the reduced sample picture is 250 pixels × 250 pixels, and the position coordinates of the object to be recognized in the reduced sample picture are (50, 25).

And step 1015, expanding the sample picture with the adjusted size according to the spliced picture with the adjusted size.

In step 1024, after the spliced picture with the adjusted size is obtained, the spliced picture with the adjusted size is also used as a sample picture with the adjusted size, so that training data for constructing an initial model for identifying the object to be identified can be richer, and the constructed model for identifying the object to be identified is more accurate.

S102, aiming at each sample picture with the object to be identified in the sample pictures, obtaining a first position coordinate of the object to be identified in the sample picture.

It should be noted that the object to be recognized refers to an object existing in the sample picture and desired to be recognized from the sample picture. The first position coordinates are used for representing outline position coordinates of a graph of the object to be identified in the sample picture. Continuing with the above embodiment, when the sample picture is a picture with a traffic sign, the image to be recognized is the traffic sign in the sample picture, and the first position coordinate is the first position coordinate of the pattern of the traffic sign in the sample picture.

For step S102, for each sample picture with the object to be identified, a first position coordinate of the object to be identified in the sample picture is obtained. Specifically, after a sample picture with an object to be recognized is recognized, a contour of the object to be recognized is obtained, contour pixel points of the object to be recognized in the sample picture are marked relative to the sample picture according to pixel points in the contour, after the contour pixel points of the object to be recognized in the sample picture are obtained, a coordinate system can be established by taking a vertex at the lower left corner of the sample picture as an origin, and a first position coordinate of a graph of the object to be recognized in the sample picture is determined based on the coordinate system.

Here, it should be noted that the above-described manner of acquiring the first position coordinates of the object pattern to be recognized in the sample picture is merely an example, and in practice, the manner of splicing the enhanced pictures and the number of splicing the enhanced pictures are not limited to the above-described example.

S103, aiming at each sample picture, inputting the sample picture into an identification initial model of the object to be identified to obtain a second position coordinate of the prediction graph of the object to be identified.

It should be noted that the identification initial model of the object to be identified refers to an initial model for identifying the object to be identified in the sample picture. The object to be recognized prediction graph refers to a graph recognized by the object to be recognized recognition initial model aiming at the sample picture. Since the sample picture may be a picture with the object to be recognized or a picture without the object to be recognized, the second position coordinates of the predicted pattern of the object to be recognized, which is recognized by the initial model for recognizing the object to be recognized, may not exist.

In the specific implementation of step S103, for each sample picture, the sample picture is input into the initial model for identifying the object to be identified, and the neural network in the initial model for identifying the object to be identified is used to determine the second position coordinates of the predicted graph of the object to be identified in the sample picture.

Specifically, after the initial model for recognizing the object to be recognized determines the predicted graph of the object to be recognized in the sample picture, the obtained predicted graph of the object to be recognized is also marked to obtain the second position coordinate of the predicted graph of the object to be recognized in the sample picture. Specifically, after the predicted graph of the object to be recognized is recognized, the contour of the predicted graph of the object to be recognized is obtained, according to the contour pixel points of the pixel points in the contour relative to the predicted graph of the object to be recognized in the sample picture, after the contour pixel points of the predicted graph of the object to be recognized in the sample picture are obtained, a coordinate system can be established by taking the vertex at the lower left corner of the sample picture as an origin, and the second position coordinate of the predicted graph of the object to be recognized in the sample picture is determined based on the coordinate system.

And S104, training the identification initial model of the object to be identified based on the second position coordinate of the prediction graph of the object to be identified and the first position coordinate of the graph of the object to be identified to obtain the identification model of the object to be identified.

After the second position coordinate of the predicted graph of the object to be recognized and the first position coordinate of the graph of the object to be recognized are determined, the initial model of the object to be recognized is trained by using the two parameters, so as to obtain the recognition model of the object to be recognized in step S104.

And the first position coordinate is the outline position coordinate of the object to be identified in the picture with the same size as the sample.

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for training an object to be recognized to recognize an initial model according to an embodiment of the present disclosure. As shown in fig. 2, the training the object to be recognized identification initial model based on the second position coordinate of the object to be recognized prediction graph and the first position coordinate of the object to be recognized graph includes:

s201, if the sample picture corresponding to the object to be recognized prediction graph is a picture without the object to be recognized, adjusting the training parameters of the object to be recognized recognition initial model until the object to be recognized prediction graph output by the trained object to be recognized recognition initial model is empty.

In step S201, the sample pictures include pictures with objects to be recognized and pictures without objects to be recognized. When the initial model for recognizing the object to be recognized recognizes the picture without the object to be recognized, a predicted graph of the object to be recognized is obtained, and at this time, it is considered that the recognition of the initial model for recognizing the object to be recognized is wrong, and training parameters of the initial model for recognizing the object to be recognized need to be modified, specifically, the training parameters may be learning rate, network parameters and the like of the initial model for recognizing the object to be recognized. The method comprises the steps that a training parameter of an object to be recognized identification initial model is continuously adjusted through an iteration mode, an object to be recognized prediction graph predicted by the object to be recognized identification initial model is output again in each iteration step, when the object to be recognized prediction graph is not empty, the training parameter of the object to be recognized identification initial model is continuously adjusted, the new parameter is output to obtain a new object to be recognized prediction graph until the object to be recognized prediction graph output by the trained object to be recognized identification initial model is empty, and at the moment, the object to be recognized identification initial model is considered to be accurately recognized.

S202, if the sample picture corresponding to the predicted graph of the object to be recognized is a picture with the object to be recognized, obtaining a first pixel point of the predicted graph of the object to be recognized from the predicted graph of the object to be recognized.

For the step S202, when the initial model for recognizing the object to be recognized recognizes the picture with the object to be recognized, the initial model for recognizing the object to be recognized outputs a predicted graph of the object to be recognized. The range of the object to be recognized exists in the prediction graph of the object to be recognized, and the range of the object not to be recognized also exists, so that the first pixel point marked as the object to be recognized needs to be obtained. Here, the pixel point means that in an image, the image is divided into a plurality of small squares, and each small square becomes one pixel point. According to the embodiment provided by the application, the obtained prediction graph of the object to be recognized is divided into a plurality of small squares, and the pixel point marked as the object to be recognized is obtained as the first pixel point.

S203, acquiring a first pixel number marked as an object to be identified in the sample picture, and acquiring a second pixel number marked as the object to be identified from the predicted graph of the object to be identified.

It should be noted that the number of pixels refers to the total number of pixels for marking the object to be recognized. For step S203, in a specific implementation, the total number of the pixel points of the object to be identified marked in the sample picture is obtained from the sample picture based on the object to be identified in the sample picture, and is used as the first pixel number of the object to be identified. And acquiring the total number of pixel points of the marked object to be recognized based on the first pixel points marked as the object to be recognized from the object to be recognized prediction graph output by the initial model for recognizing the object to be recognized, and taking the total number as the second pixel number of the object to be recognized.

S204, calculating a loss value based on the second position coordinate of the first pixel point, the first position coordinate corresponding to the first pixel point, the first pixel number and the second pixel number.

It should be noted that the loss value (loss function) is a function value that maps a random event or a value of a random variable related to the random event to a non-negative real number to represent "risk" or "loss" of the random event. In application, the loss values are usually associated with the optimization problem as learning criteria, i.e. the model is solved and evaluated by minimizing the loss function.

For the step S204, when calculating the loss value of the initial model for identifying the object to be identified, the method includes two parts, one part is to calculate the loss value by using the error between the first position coordinate and the second position coordinate of the first pixel point, the other part is to judge the accuracy of identifying the initial model for identifying the object to be identified, and the loss value is calculated by using the first pixel number and the second pixel number.

When the loss is calculated by using the error between the first position coordinate and the second position coordinate of the first pixel point, whether the prediction of the to-be-recognized object recognition initial model is accurate is judged by comparing the first position coordinate and the second position coordinate of the first pixel point, and when the first position coordinate and the second position coordinate of the first pixel point are different, the prediction of the to-be-recognized object recognition initial model is considered to be inaccurate. For example, if the first position coordinate of the first pixel point is determined to be (250 ) and the second position coordinate of the first pixel point is determined to be (100, 50), it is determined that an error exists between the first position coordinate and the second position coordinate of the first pixel point, that is, the prediction of the identification initial model of the object to be identified is inaccurate. At this time, the loss value of the recognition initial model of the object to be recognized in the current state needs to be calculated. The manner in which the loss value is calculated is described in detail in the prior art and will not be described in greater detail herein.

When the loss value is calculated by using the first pixel number and the second pixel number, whether the prediction of the to-be-recognized object recognition initial model is accurate is judged by comparing the first pixel number with the second pixel number, when the first pixel number and the second pixel number have a difference, the prediction of the to-be-recognized object recognition initial model is considered to be inaccurate, and at this time, the loss value of the to-be-recognized object recognition initial model in the current state needs to be calculated. The manner in which the loss value is calculated is described in detail in the prior art and will not be described in greater detail herein.

When the sample picture is a spliced sample picture, that is, the sample picture may include a plurality of sample pictures, the obtained first pixel point marked as the object to be identified is also plural, and the corresponding first position coordinate, second position coordinate, first pixel number and second pixel number are also plural. At this time, the parameters corresponding to the objects to be recognized need to be compared respectively. For example, the sample picture is obtained by splicing two sample pictures, including a sample picture a and a sample picture B, where the sample picture a includes an object a to be identified, and the sample picture B includes an object B to be identified. After the sample picture is input into the initial model for identifying the object to be identified, the initial model for identifying the object to be identified correspondingly outputs two predicted graphs of the object to be identified, one predicted graph A of the object to be identified in the sample picture A and the other predicted graph B of the object to be identified in the sample picture B. At this time, the two predicted images of the object to be recognized need to be compared respectively, the predicted image a of the object to be recognized is compared with the object a to be recognized in the sample image a, the predicted image B of the object to be recognized is compared with the object B to be recognized in the sample image B, and whether the prediction of the initial model for recognizing the object to be recognized is accurate or not is judged.

And S205, if the loss value is greater than a preset loss threshold, adjusting the training parameters of the to-be-recognized object recognition initial model until the loss value of the trained to-be-recognized object recognition initial model is not greater than the loss threshold.

In the embodiments provided in the present application, the loss threshold refers to a criterion that is set in advance, as an alternative implementation manner, the minimum threshold may be set to be a second derivative of the loss value close to 0, because when the second derivative is close to 0, the slope of the loss value is minimum, that is, the change of the loss value between two iterations of the initial model for identifying the object to be identified is already small, when the loss value is close to the loss threshold, the initial model for identifying the object to be identified is considered to reach a convergence state, and the prediction of the initial model for identifying the object to be identified at this time is relatively accurate.

For step S205, after the loss value of the to-be-recognized object recognition initial model in the current state is calculated in step S204, the training parameters in the to-be-recognized object recognition initial model are continuously adjusted, specifically, the training parameters may be the learning rate, the network parameters, and the like of the to-be-recognized object recognition initial model. Specifically, the loss of the initial model for identifying the object to be identified is continuously minimized in an iterative manner, the loss value of the initial model for identifying the object to be identified is calculated in each step of the iteration, when the loss value of the initial model for identifying the object to be identified cannot reach a loss threshold value, the training parameters of the initial model for identifying the object to be identified are continuously updated, and the new parameters are calculated to obtain a new loss value, so that the loss value shows a trend of fluctuation reduction in the iterative process. And finally, when the loss value reaches the smoothness, namely the loss value of the trained object recognition initial model to be recognized is not larger than the loss threshold value, namely the loss value is not obviously reduced compared with the loss value calculated last time, the object recognition initial model to be recognized is considered to reach the convergence, and the training is finished at this time to obtain the object recognition model to be recognized.

After the identification model of the object to be identified is constructed, the identification model of the object to be identified is used for identifying the object to be identified in the sample picture, and specifically, the method further comprises the following steps:

a: and acquiring a picture to be recognized, and adjusting the acquired picture to be recognized to the size of the input picture required by the recognition model of the object to be recognized.

It should be noted that the picture to be recognized refers to a picture to be recognized, which may include an object to be recognized. As an optional implementation manner, the picture to be recognized may be a picture shot by a camera, or may also be a picture uploaded by a user, which is not limited in this application.

For the above steps, in specific implementation, after the picture to be recognized is acquired, the acquired picture to be recognized is adjusted to the input picture size required by the recognition model of the object to be recognized. Specifically, the method for adjusting the size of the picture to be recognized is the same as the method for adjusting the size of the acquired sample picture to the input picture in step 1011, and is not repeated here.

B: and inputting the picture to be recognized with the adjusted size into the recognition model of the object to be recognized to obtain the image of the object to be recognized.

Aiming at the steps, in the specific implementation, the picture to be recognized with the adjusted size is input into the recognition model of the object to be recognized, and the image of the object to be recognized is obtained. Here, the object map to be recognized refers to a picture with an object to be recognized. Specifically, when the image of the object to be recognized is obtained, firstly, the pixel points of the object to be recognized in the image to be recognized with the adjusted size are determined. And marking by using pixel points of the object to be identified in the image to be identified with the adjusted size and acquiring the position coordinates of the object to be identified in the image to be identified with the adjusted size. And drawing in the picture to be recognized with the adjusted size by using the determined position coordinates, namely connecting the positions corresponding to the determined position coordinates by using lines to obtain a position block diagram in the picture to be recognized with the adjusted size. The picture content in the position frame diagram is the object to be identified, so the picture content in the position frame diagram is used as the object to be identified in the picture to be identified.

As an optional implementation manner, the object to be identified is a traffic sign, and the method further includes:

It should be noted that the traffic sign template map refers to a pre-stored template map for identifying the type of the traffic sign. The mapping relation library refers to a database for storing mapping relations between objects, and corresponds to a database for representing information in the form of objects. Mapping relationships generally refer to object relational mapping, which is the transformation between data used to implement different types of systems in an object-oriented programming language. According to the embodiment provided by the application, preset traffic sign template graphs and traffic sign types can be stored in the mapping relation library, and one traffic sign template graph corresponds to one traffic sign type.

Traffic sign types can be distinguished in various ways: primary and secondary signs; movable signs and fixed signs; illuminated signs, luminous signs and reflective signs; and a variable information mark reflecting the driving environment change. The main signs may include the following four major categories: the road traffic warning sign is a sign for warning drivers and pedestrians of danger and taking measures in time; the road traffic indicator is used for indicating drivers and pedestrians to drive according to specified directions and places; the road traffic direction mark is used for indicating the direction of a transmission road; road traffic prohibition sign: is a sign for imposing restrictions on a part of traffic behaviors of vehicles and pedestrians.

Here, it should be noted that the above description for the traffic sign type in the mapping relation library is merely an example, and actually, the traffic sign type in the mapping relation library is not limited to the above example.

As an optional implementation manner, after the traffic sign map in the picture to be recognized is obtained, the traffic sign type of the object map to be recognized may be recognized by querying a mapping relation library of preset traffic sign template maps and traffic sign types.

According to the embodiment provided by the application, the picture to be recognized can be input into the recognition model of the object to be recognized, the traffic sign map in the picture to be recognized is quickly recognized, the preset mapping relation library of the traffic sign template maps and the traffic sign types is inquired, the traffic sign types of the traffic sign map are recognized, road information is provided for vehicles in time, and the unmanned vehicles can be helped to select correct roads to run.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an apparatus for constructing a recognition model of an object to be recognized according to an embodiment of the present disclosure. As shown in fig. 3, the apparatus 300 for constructing an object recognition model to be recognized includes:

a sample picture obtaining module 301, configured to obtain a sample picture;

a first position coordinate obtaining module 302, configured to obtain, for each sample picture with an object to be identified in the sample pictures, a first position coordinate of the object to be identified in the sample picture;

a second position coordinate obtaining module 303, configured to, for each sample picture, input the sample picture into the identification initial model of the object to be identified, and obtain a second position coordinate of the prediction graph of the object to be identified;

and the to-be-recognized object recognition model determining module 304 is configured to train the to-be-recognized object recognition initial model based on the second position coordinate of the to-be-recognized object prediction graph and the first position coordinate of the to-be-recognized object graph, so as to obtain the to-be-recognized object recognition model.

Further, the apparatus 300 for constructing an object recognition model to be recognized is further configured to:

Further, the object to be recognized is a traffic sign, and the apparatus 300 for constructing a recognition model of the object to be recognized is further configured to:

Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 4, the electronic device 400 includes a processor 410, a memory 420, and a bus 430.

The memory 420 stores machine-readable instructions executable by the processor 410, when the electronic device 400 runs, the processor 410 communicates with the memory 420 through the bus 430, and when the machine-readable instructions are executed by the processor 410, the steps of the method for constructing the identification model of the object to be identified in the method embodiments shown in fig. 1 and fig. 2 can be executed, so that the problem that the identification precision of the identification model of the object to be identified obtained by training in the prior art is not high is solved, and specific implementation manners can refer to the method embodiments and are not described herein again.

An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for constructing the object to be recognized in the method embodiments shown in fig. 1 and fig. 2 may be executed, so as to solve the problem that the recognition accuracy of the object to be recognized obtained by training in the prior art is not high.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of constructing a recognition model of an object to be recognized, the method comprising:

obtaining a sample picture;

2. The method according to claim 1, wherein the first position coordinates are contour position coordinates of the object to be recognized in a sample picture, and the training of the object to be recognized recognition initial model based on the second position coordinates of the object to be recognized prediction graph and the first position coordinates of the object to be recognized graph comprises:

3. The method of claim 1, further comprising:

4. The method of claim 3, wherein the data enhancement comprises: random scaling, gamut variation, flipping.

5. The method according to claim 4, wherein the data enhancement comprises random amplification, and the data enhancement processing is performed on the sample picture with the adjusted size to obtain an enhanced picture, and comprises:

6. The method according to any one of claims 1 to 5, further comprising:

7. The method of claim 6, wherein the object to be identified is a traffic sign, the method further comprising:

8. An apparatus for constructing a recognition model of an object to be recognized, the apparatus comprising:

the sample picture acquisition module is used for acquiring a sample picture;

9. An electronic device, comprising: processor, memory and bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the method of constructing an object recognition model to be recognized according to any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method of constructing an object recognition model to be recognized according to any one of claims 1 to 7.