CN110288082B

CN110288082B - Convolutional neural network model training method and device and computer readable storage medium

Info

Publication number: CN110288082B
Application number: CN201910485052.5A
Authority: CN
Inventors: 朱延东; 王长虎
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-06-05
Filing date: 2019-06-05
Publication date: 2022-04-05
Anticipated expiration: 2039-06-05
Also published as: CN110288082A

Abstract

The invention discloses a convolutional neural network model training method, a convolutional neural network model training device, electronic equipment and a computer readable storage medium. The method comprises the following steps: dividing the convolutional neural network into a plurality of convolution stages; wherein the convolution stage is comprised of at least one convolution layer; determining parameters of a convolutional neural network; inputting the positive training sample set into a convolutional neural network for training to obtain characteristic images of a plurality of convolution stages corresponding to each sample image; for each sample image, fusing the corresponding characteristic images of a plurality of convolution stages; obtaining a positive sample convolution neural network model according to the feature image after each sample image is fused; wherein the positive sample convolutional neural network model is used to identify a target region. According to the embodiment of the invention, the characteristic images of the convolutional neural network at a plurality of convolution stages are fused in the training process of the positive sample convolutional neural network model, so that the correct identification rate of the positive sample convolutional neural network model to the target area can be improved.

Description

Convolutional neural network model training method and device and computer readable storage medium

Technical Field

The present disclosure relates to the field of convolutional neural network model training technologies, and in particular, to a convolutional neural network model training method, an apparatus, and a computer-readable storage medium.

Background

Many of the shot video images contain automobiles, and the images containing the automobiles generally contain license plates, so that the license plates in the video images need to be processed or covered by other images due to privacy. When processing an image containing a license plate, it is critical to identify the license plate region in the image.

In the prior art, a straight-barrel type network is usually used for training a model, and the model obtained through training is adopted for identifying a license plate area in an image, so that the edge of the finally identified license plate area is very fuzzy, and the outline of the license plate cannot be accurately positioned. In addition, some areas similar to the license plate, such as the blue-background sign, etc., may be misjudged as the license plate.

Disclosure of Invention

The technical problem to be solved by the present disclosure is to provide a convolutional neural network model training method to at least partially solve the technical problem in the prior art that high-speed data reading and writing cannot be realized and persistence can be realized. In addition, a convolutional neural network model training device, a convolutional neural network model training hardware device, a computer readable storage medium and a convolutional neural network model training terminal are also provided.

In order to achieve the above object, according to one aspect of the present disclosure, the following technical solutions are provided:

a convolutional neural network model training method comprises the following steps:

segmenting a plurality of convolutional layers of the convolutional neural network to obtain a plurality of convolutional stages; wherein the plurality of convolutional layers are sequentially connected in series;

determining parameters of the convolutional neural network;

inputting a positive training sample set into the convolutional neural network for training to obtain a plurality of characteristic images of convolution stages corresponding to each sample image; wherein the positive training sample set is composed of a plurality of sample images marked with target areas;

for each sample image, fusing the corresponding characteristic images of a plurality of convolution stages;

obtaining a positive sample convolution neural network model according to the feature image after each sample image is fused; wherein the positive sample convolutional neural network model is used to identify the target region.

Further, the obtaining a positive sample convolutional neural network model according to the feature image after the fusion of each sample image includes:

for each sample image, identifying the fused characteristic image through an output layer of the convolutional neural network to obtain a prediction target area;

determining a prediction error according to the prediction target areas of all the sample images and the real target areas contained in the corresponding sample images;

if the prediction error is larger than a preset error, re-determining the parameters of the convolutional neural network according to the preset error, continuing to repeat the training process until the prediction error is smaller than or equal to the preset error, and ending the training process to obtain the positive sample convolutional neural network model.

Further, the method further comprises:

inputting a verification sample set into the positive sample convolutional neural network model for identification to obtain a prediction target area of a verification sample image; the verification sample set consists of a plurality of verification sample images marked with target areas;

and if the prediction error determined according to the prediction target areas of all the verification sample images and the real target areas contained in the corresponding verification sample images is less than or equal to the preset error, determining that the positive sample convolutional neural network model passes the verification, otherwise, re-determining the parameters of the convolutional neural network, and continuing training until the obtained positive sample convolutional neural network model passes the verification.

Further, the method further comprises:

and if the number of the positive training samples exceeds the preset number, grouping the positive training samples, wherein each group of positive training samples is used as a positive training sample set.

Further, the determining a prediction error according to the prediction target regions of all the sample images and the real target region included in the corresponding sample image includes:

calculating the loss between the prediction target area and the real target area contained in the corresponding sample image by adopting a loss function aiming at each sample image;

the prediction error is determined from the loss of all sample images.

Further, the determining a prediction error according to the loss of all sample images includes:

respectively setting weight for each sample image according to the loss of each sample image;

and performing weighted fusion on the loss according to the weight of each sample image, and taking the loss after weighted fusion as the prediction error.

Further, the more the loss, the less the weight corresponding to the sample image.

Further, the target area is a license plate area.

acquiring a negative training sample set; the negative training sample set consists of a plurality of background areas marked with non-target areas and a plurality of foreground areas marked with target areas;

and training by adopting any one of the convolutional neural network model training methods according to the negative training sample set to obtain a negative sample convolutional neural network model.

Further, the background area and/or the foreground area are/is a binary image.

Further, the target area is a license plate area.

a target area identification method, comprising:

acquiring an image to be identified;

and inputting the image to be recognized into a positive sample convolutional neural network model obtained by training by adopting any one of the convolutional neural network model training methods for recognition, and obtaining a target area.

Further, the method further comprises:

inputting the target area into a negative sample convolutional neural network model obtained by training by adopting any one of the convolutional neural network model training methods for classification;

determining the real target area of the target area as a foreground area or a background area according to the classification result;

if the classification result is the foreground area, determining that the target area is a real target area; and if the classification result is the background area, determining that the target area is a misjudged target area.

Further, the step of inputting the target area into a negative sample convolutional neural network model obtained by training with the convolutional neural network model training method according to any one of the above methods for classification includes:

carrying out binarization processing on the target area to obtain a binarization image;

and inputting the binary image into a negative sample convolutional neural network model obtained by training by adopting any one of the convolutional neural network model training methods for classification.

Further, the target area is a license plate area.

a convolutional neural network model training apparatus, comprising:

the stage division module is used for dividing a plurality of convolution layers of the convolutional neural network to obtain a plurality of convolution stages; wherein the plurality of convolutional layers are sequentially connected in series;

a parameter determination module for determining parameters of the convolutional neural network;

the characteristic image acquisition module is used for inputting a positive training sample set into the convolutional neural network for training to obtain a plurality of characteristic images of convolution stages corresponding to each sample image; wherein the positive training sample set is composed of a plurality of sample images marked with target areas;

the characteristic fusion module is used for fusing the corresponding characteristic images in the plurality of convolution stages for each sample image;

the positive sample model training module is used for obtaining a positive sample convolution neural network model according to the feature image after each sample image is fused; wherein the positive sample convolutional neural network model is used to identify the target region.

Further, the positive sample model training module is specifically configured to: for each sample image, identifying the fused characteristic image through an output layer of the convolutional neural network to obtain a prediction target area; determining a prediction error according to the prediction target areas of all the sample images and the real target areas contained in the corresponding sample images; if the prediction error is larger than a preset error, re-determining the parameters of the convolutional neural network according to the preset error, continuing to repeat the training process until the prediction error is smaller than or equal to the preset error, and ending the training process to obtain the positive sample convolutional neural network model.

Further, the apparatus further comprises:

the model verification module is used for inputting a verification sample set into the positive sample convolutional neural network model for identification to obtain a prediction target area of a verification sample image; the verification sample set consists of a plurality of verification sample images marked with target areas; and if the prediction error determined according to the prediction target areas of all the verification sample images and the real target areas contained in the corresponding verification sample images is less than or equal to the preset error, determining that the positive sample convolutional neural network model passes the verification, otherwise, re-determining the parameters of the convolutional neural network, and continuing training until the obtained positive sample convolutional neural network model passes the verification.

Further, the apparatus further comprises:

and the grouping module is used for grouping the positive training samples if the number of the positive training samples exceeds the preset number, and each group of positive training samples is used as a positive training sample set.

Further, the positive sample model training module is specifically configured to: calculating the loss between the prediction target area and the real target area contained in the corresponding sample image by adopting a loss function aiming at each sample image; the prediction error is determined from the loss of all sample images.

Further, the positive sample model training module is specifically configured to: respectively setting weight for each sample image according to the loss of each sample image; and performing weighted fusion on the loss according to the weight of each sample image, and taking the loss after weighted fusion as the prediction error.

Further, the target area is a license plate area.

a convolutional neural network model training apparatus, comprising:

the negative sample determining module is used for determining a negative training sample set; the negative training sample set consists of a plurality of background areas marked with non-target areas and a plurality of foreground areas marked with target areas;

and the negative sample model training module is used for training by adopting any one of the convolutional neural network model training methods according to the negative training sample set to obtain a negative sample convolutional neural network model.

Further, the background area and/or the foreground area are/is a binary image.

Further, the target area is a license plate area.

a target area identifying apparatus comprising:

the image acquisition module is used for acquiring an image to be identified;

and the image identification module is used for inputting the image to be identified into the positive sample convolutional neural network model obtained by training by adopting any one of the convolutional neural network model training methods for identification to obtain a target area.

Further, the image recognition module is further configured to: inputting the target area into a negative sample convolutional neural network model obtained by training by adopting any one of the convolutional neural network model training methods for classification; determining the target area as a foreground area or a background area according to the classification result; if the classification result is the foreground area, determining that the target area is a real target area; and if the classification result is the background area, determining that the target area is a misjudged target area.

Further, the image recognition module is specifically configured to: carrying out binarization processing on the target area to obtain a binarization image; and inputting the binary image into a negative sample convolutional neural network model obtained by training by adopting any one of the convolutional neural network model training methods for identification.

an electronic device, comprising:

a memory for storing non-transitory computer readable instructions; and

a processor for executing the computer readable instructions, so that the processor when executing implements any one of the above convolutional neural network model training methods.

a computer readable storage medium storing non-transitory computer readable instructions which, when executed by a computer, cause the computer to perform any one of the convolutional neural network model training methods described above.

an electronic device, comprising:

a memory for storing non-transitory computer readable instructions; and

a processor for executing the computer readable instructions, so that the processor when executing realizes the data reading method of any one of the above.

a computer-readable storage medium storing non-transitory computer-readable instructions which, when executed by a computer, cause the computer to perform a data reading method of any one of the above.

an electronic device, comprising:

a memory for storing non-transitory computer readable instructions; and

a processor for executing the computer readable instructions, so that the processor implements the target area identification method described in any one of the above items when executed.

a computer readable storage medium storing non-transitory computer readable instructions which, when executed by a computer, cause the computer to perform a target area identification method as any one of the above.

In order to achieve the above object, according to still another aspect of the present disclosure, the following technical solutions are also provided:

a convolutional neural network model training terminal comprises any convolutional neural network model training device.

a data reading terminal comprises any one of the data reading devices.

According to the embodiment of the invention, the characteristic images of the convolutional neural network at a plurality of convolution stages are fused in the training process of the positive sample convolutional neural network model, so that the correct identification rate of the positive sample convolutional neural network model to the target area can be improved.

The foregoing is a summary of the present disclosure, and for the purposes of promoting a clear understanding of the technical means of the present disclosure, the present disclosure may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

Drawings

FIG. 1a is a schematic flow chart diagram of a convolutional neural network model training method according to one embodiment of the present disclosure;

FIG. 1b is a schematic structural diagram of a convolutional neural network in a convolutional neural network model training method according to an embodiment of the present disclosure;

FIG. 1c is a schematic diagram of a convolution process of convolution layers in a convolutional neural network model training method according to an embodiment of the present disclosure;

FIG. 1d is a diagram illustrating convolution results of convolutional layers in a convolutional neural network model training method according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow diagram of a convolutional neural network model training method according to one embodiment of the present disclosure;

FIG. 3 is a schematic flow chart diagram of a target area identification method according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a convolutional neural network model training device according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a convolutional neural network model training device according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of a target area recognition apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The embodiments of the present disclosure are described below with specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure in the specification. It is to be understood that the described embodiments are merely illustrative of some, and not restrictive, of the embodiments of the disclosure. The disclosure may be embodied or carried out in various other specific embodiments, and various modifications and changes may be made in the details within the description without departing from the spirit of the disclosure. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present disclosure, and the drawings only show the components related to the present disclosure rather than the number, shape and size of the components in actual implementation, and the type, amount and ratio of the components in actual implementation may be changed arbitrarily, and the layout of the components may be more complicated.

In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.

Example one

In order to solve the technical problem of low target area identification accuracy in the prior art, the embodiment of the disclosure provides a convolutional neural network model training method. As shown in fig. 1a, the convolutional neural network model training method mainly includes the following steps S11 to S15. Wherein:

step S11: segmenting a plurality of convolutional layers of the convolutional neural network to obtain a plurality of convolutional stages; wherein the plurality of convolutional layers are sequentially connected in series.

The Convolutional Neural Networks (CNN) are a type of feed-forward Neural network that includes convolution calculation and has a deep structure, and mainly include an input layer, a plurality of Convolutional layers, a pooling layer, a full-link layer, and an output layer. As shown in fig. 1b, an example of the structure of a convolutional neural network includes three convolutional layers, i.e., convolutional layer 1, convolutional layer 2, and convolutional layer 3.

The convolution layer includes convolution kernel, which may be a matrix, for performing convolution on the input image, and the specific calculation method is to multiply the elements of different local matrices of the input image and each position of the convolution kernel matrix, and then add them.

For example, as shown in FIG. 1c, the input is a two-dimensional 3x4 matrix and the convolution kernel is a 2x2 matrix. Assuming that the convolution is performed by shifting one pixel at a time, the input top left corner 2x2 is first partially convolved with a convolution kernel, i.e. the elements at each position are multiplied and then added to obtain the S00 element of the output matrix S, which has the value aw + bx + ey + fzaw + bx + ey + fz. The input local is then shifted one pixel to the right, now a matrix of four elements (b, c, f, g) is convolved with a convolution kernel, thus obtaining the elements of S01 of the output matrix S, and in the same way, the elements of S02, S10, S11, S12, S10, S11, S12 of the output matrix S can be obtained. The resulting matrix of convolution outputs is a 2x3 matrix S, as shown in fig. 1 d.

Wherein, the convolution stage can be divided by self. Specifically, a division point may be set at the corresponding convolution layer, and the convolution stage may be determined according to the division point. The farther from the division point of the output layer, the more convolutional layers the corresponding convolutional stages include, and the more convolutional layers the corresponding convolutional stages include, the more convolutional layers the division point of the output layer includes. As shown in fig. 1b, a division point may be set at each of the convolutional layer 2 and convolutional layer 3, so that the convolutional layers 1 and 2 are divided into a first convolution stage, and the convolutional layers 1, 2 and 3 are divided into a second convolution stage.

Step S12: determining parameters of the convolutional neural network.

The parameters include parameters corresponding to convolution kernels of the convolution layers, for example, the size of a convolution matrix, which may be set to 3 × 3, for example, and different convolution layers may have different convolution kernels. In addition, parameters of the pooling layer, such as the size of the pooling matrix, the pooling matrix which may be 3 × 3, or parameters of the output layer, such as a linear coefficient matrix and a bias vector, may also be included.

Step S13: inputting a positive training sample set into the convolutional neural network for training to obtain a plurality of characteristic images of convolution stages corresponding to each sample image; wherein the positive training sample set is composed of a plurality of sample images marked with target areas.

The target area may be a license plate area.

Specifically, a positive training sample set is converted into a multi-dimensional vector through an input layer of the convolutional neural network, and then convolution calculation is performed through a plurality of convolutional layers in sequence to obtain a characteristic image corresponding to each convolution stage. Referring to the example of step S11, the convolution stage is divided into two convolution stages, where the first convolution stage includes convolution layer 1 and convolution layer 2, that is, after the multi-dimensional vector is subjected to convolution calculation by convolution layer 1, the calculation result is input into convolution layer 2 again for convolution calculation again, and the feature image calculated by convolution layer 2 is obtained as the feature image of the first convolution stage. Similarly, the feature image calculated by the convolutional layer 2 is input into the convolutional layer 3 again for calculation, and the feature image calculated by the convolutional layer 3 is obtained as the feature image of the second convolution stage.

Step S14: and for each sample image, fusing the characteristic images of the corresponding multiple convolution stages.

Specifically, for each sample image, the pixel values at the same position of the feature images in each convolution stage may be summed, and the summed value may be used as the pixel value to construct a fused feature image. Or weighting the pixel values at the same position of the characteristic images in each convolution stage, and constructing a fused characteristic image by taking the weighted values as the pixel values. Specifically, when setting the weight, the weight corresponding to the feature image corresponding to the convolution stage closer to the input layer is larger.

Step S15: obtaining a positive sample convolution neural network model according to the feature image after each sample image is fused; wherein the positive sample convolutional neural network model is used to identify the target region.

Because the characteristic images obtained in different convolution stages contain different characteristics, the characteristic images obtained in the convolution stages corresponding to the closer input layer contain more characteristic information, so that the edge of the target area is easily defined, and the edge of the target area identified by the positive sample convolution neural network model is clearer.

In addition, the feature images of the convolutional neural network at a plurality of convolution stages are fused in the training process of the positive sample convolutional neural network model, so that the fused feature images contain more feature information, and the correct recognition rate of the positive sample convolutional neural network model to the target area can be improved.

In an optional embodiment, step S15 specifically includes:

step S151: and aiming at each sample image, identifying the fused characteristic image through an output layer of the convolutional neural network to obtain a prediction target area.

The output layer comprises a Softmax activation function and is used for identifying the fused characteristic images and outputting the prediction target area.

Step S152: and determining a prediction error according to the prediction target areas of all the sample images and the real target area contained in the corresponding sample image.

Step S153: if the prediction error is larger than a preset error, re-determining the parameters of the convolutional neural network according to the preset error, continuously repeating the training processes of the steps S13, S14, S151, S152 and S153 until the prediction error is smaller than or equal to the preset error, and ending the training process to obtain the positive sample convolutional neural network model.

Wherein, the preset error can be set by user.

In an optional embodiment, in order to ensure the identification accuracy of the positive sample convolutional neural network model, the method further includes a verification process of the positive sample convolutional neural network model, which is as follows:

step S16: inputting a verification sample set into the positive sample convolutional neural network model for identification to obtain a prediction target area of a verification sample image; wherein the verification sample set is composed of a plurality of verification sample images marked with target areas.

Step S17: and if the prediction error determined according to the prediction target areas of all the verification sample images and the real target areas contained in the corresponding verification sample images is less than or equal to the preset error, determining that the positive sample convolutional neural network model passes the verification, otherwise, re-determining the parameters of the convolutional neural network, and continuing training until the obtained positive sample convolutional neural network model passes the verification.

In an optional embodiment, to increase the training speed, the method further comprises:

The preset number can be set in a user-defined mode.

For example, when the number of samples being trained is on the order of ten thousand, if so many data are trained simultaneously, not only the amount of computation increases, but also the training speed decreases. Therefore, the positive training samples can be grouped, and each group of positive training sample set is trained respectively. For example, every 100 positive training samples may be grouped into a group.

In an alternative embodiment, step S152 includes:

the prediction error is determined from the loss of all sample images.

Wherein the loss function may measure the output loss of the training samples.

Wherein the higher the loss, the lower the weight corresponding to the sample image.

Example two

In order to solve the technical problem of low target area identification accuracy in the prior art, the embodiment of the disclosure further provides a convolutional neural network model training method. The convolutional neural network model training method mainly comprises the following steps: acquiring a negative training sample set; the negative training sample set consists of a plurality of background areas marked with non-target areas and a plurality of foreground areas marked with target areas; and training by adopting the convolutional neural network model training method in the first embodiment according to the negative training sample set to obtain a negative sample convolutional neural network model. As shown in fig. 2, the method specifically includes:

step S21: segmenting a plurality of convolutional layers of the convolutional neural network to obtain a plurality of convolutional stages; wherein the plurality of convolutional layers are sequentially connected in series.

Step S22: determining parameters of the convolutional neural network.

Step S23: and inputting the negative training sample set into the convolutional neural network for training to obtain the characteristic images of a plurality of convolution stages corresponding to each training sample.

The negative training sample set is composed of a plurality of background areas marked with non-target areas and a plurality of foreground areas marked with target areas.

Wherein the non-target area is an image area approximating the target area.

For example, the target area may be a license plate area, and the non-target area may be an approximate license plate area, for example, a guideboard in an image background is relatively small and is a white character with blue background, and is easily recognized as a license plate, so these are trained as training samples, which can eliminate these false identifications and further improve the correct recognition rate of the model.

Herein, an image containing a non-target region is positioned to a background region, and an image containing a target region is defined as a foreground region.

Step S24: and fusing the characteristic images of the corresponding multiple convolution stages for each training sample.

Step S25: obtaining a negative sample convolution neural network model according to the feature image after each training sample is fused; wherein the negative sample convolutional neural network model is used to identify the background region and foreground region.

In an optional embodiment, step S25 specifically includes:

step S251: and aiming at each training sample, identifying the fused characteristic image through an output layer of the convolutional neural network to obtain a prediction background area or a prediction foreground area.

Step S252: and determining a prediction error according to the prediction foreground region or the prediction foreground region of all the training samples and the real foreground region or the real background region contained in the corresponding training samples.

Step S253: if the prediction error is larger than the preset error, re-determining the parameters of the convolutional neural network according to the preset error, continuing to repeat the training process until the prediction error is smaller than or equal to the preset error, and ending the training process to obtain the negative sample convolutional neural network model.

In an optional embodiment, the method further comprises:

step S26: inputting the verification sample set into the negative sample convolutional neural network model for identification to obtain a prediction background area or a prediction foreground area of the verification sample image; wherein, the verification sample set is composed of a plurality of foreground areas marked with target areas and a plurality of background areas marked with non-target areas.

Step S27: and if the prediction error determined according to the prediction background area or the prediction foreground area of all the verification sample images and the real background area or the real foreground area contained in the corresponding verification samples is less than or equal to the preset error, determining that the negative sample convolutional neural network model passes the verification, otherwise, re-determining the parameters of the convolutional neural network, and continuing training until the obtained negative sample convolutional neural network model passes the verification.

In an optional embodiment, the method further comprises:

and if the number of the negative training samples exceeds the preset number, grouping the negative training samples, wherein each group of negative training samples is used as a negative training sample set.

In an alternative embodiment, step S252 specifically includes:

calculating the loss between the prediction foreground area or the prediction background area and the real foreground area or the real background area contained in the corresponding training sample by adopting a loss function aiming at each training sample;

the prediction error is determined from the loss of all training samples.

Further, the determining a prediction error according to the loss of all training samples includes:

respectively setting weight for each training sample according to the loss of each training sample;

and performing weighted fusion on the loss according to the weight of each training sample, and taking the loss after weighted fusion as the prediction error.

Further, the background area and/or the foreground area are/is a binary image.

For the convenience of subsequent identification, binarization processing can be performed on a background region and/or a foreground region, for example, a target region is a license plate region, and since the license plate region is a blue-background white character, a region corresponding to the license plate region after binarization processing is performed on the license plate region is approximately white, corresponding characteristics are more obvious, and the processing is more convenient and faster. Based on this principle, the present embodiment binarizes the background region and the foreground region, and for example, the threshold value may be set to 0.5, the pixel value corresponding to the pixel point having the pixel value smaller than 0.5 may be reset to 0, and the pixel value corresponding to the pixel point having the pixel value greater than or equal to 0.5 may be reset to 1. And training to obtain a negative sample convolution neural network model by taking the background area and/or the foreground area after binarization as a negative training sample set.

In the embodiment, the background region containing the non-target region is used as the training sample, the negative sample convolutional neural network model is obtained through training, the regions similar to the target region and appearing in the background region can be filtered, and only the real target region is reserved.

EXAMPLE III

An embodiment of the present disclosure further provides a target area identification method, as shown in fig. 3, specifically including:

and S31, acquiring the image to be recognized.

The image to be recognized can be acquired in real time through the camera. Or locally acquire a pre-stored image to be identified.

And S32, inputting the image to be recognized into a positive sample convolutional neural network model for recognition to obtain a target area.

The positive sample convolutional neural network model is obtained by training by using the convolutional neural network model training method described in the first embodiment, and the specific training process refers to the first embodiment.

In an optional embodiment, the method further comprises:

step S33: and inputting the target area into a negative sample convolutional neural network model for classification.

The negative sample convolutional neural network model is obtained by training by adopting the convolutional neural network model training method described in the second embodiment, and the specific training process refers to the second embodiment.

Step S34: and determining the real target area of the target area as a foreground area or a background area according to the classification result.

For an explanation of the foreground region or the background region, see the above-described embodiments.

Step S35: if the classification result is the foreground area, determining that the target area is a real target area; and if the classification result is the background area, determining that the target area is a misjudged target area.

Further, step S33 includes:

step S331: and carrying out binarization processing on the target area to obtain a binarized image.

For example, the threshold value may be set to 0.5, the pixel values corresponding to pixel points having pixel values less than 0.5 may be reset to 0, and the pixel values corresponding to pixel points having pixel values greater than or equal to 0.5 may be reset to 1.

Step S332: and inputting the binary image into a negative sample convolutional neural network model for classification.

The negative sample convolutional neural network model is obtained by training by adopting the convolutional neural network model training method described in the second embodiment.

In this embodiment, the negative sample convolutional neural network model corresponds to a model obtained by using a background region and a foreground region after binarization as training samples. The model classifies the input binary image, and if the classification result is the foreground region, the target region is determined to be a real target region; and if the classification result is the background area, determining that the target area is a misjudged target area.

In this embodiment, the negative sample convolutional neural network model is used to classify the target region obtained by identifying the positive sample convolutional neural network model, and further determine whether the target region is a real target region, so that regions similar to the target region in the background region can be excluded and filtered, and only the real target region is reserved.

It will be appreciated by those skilled in the art that obvious modifications (e.g., combinations of the enumerated modes) or equivalents may be made to the above-described embodiments.

In the above, although the steps in the embodiment of the convolutional neural network model training method are described in the above sequence, it should be clear to those skilled in the art that the steps in the embodiment of the present disclosure are not necessarily performed in the above sequence, and may also be performed in other sequences such as reverse, parallel, and cross, and further, on the basis of the above steps, those skilled in the art may also add other steps, and these obvious modifications or equivalents should also be included in the protection scope of the present disclosure, and are not described herein again.

For convenience of description, only the relevant parts of the embodiments of the present disclosure are shown, and details of the specific techniques are not disclosed, please refer to the embodiments of the method of the present disclosure.

Example four

In order to solve the technical problem of low target area identification accuracy in the prior art, the embodiment of the disclosure provides a convolutional neural network model training device. The apparatus may perform the steps in the convolutional neural network model training method described in the first embodiment. As shown in fig. 2, the apparatus mainly includes: a stage division module 41, a parameter determination module 42, a feature image acquisition module 43, a feature fusion module 44 and a positive sample model training module 45; wherein the content of the first and second substances,

the stage division module 41 is configured to divide a plurality of convolutional layers of the convolutional neural network to obtain a plurality of convolutional stages; wherein the plurality of convolutional layers are sequentially connected in series;

the parameter determining module 42 is used for determining the parameters of the convolutional neural network;

the feature image obtaining module 43 is configured to input a training sample set into the convolutional neural network for training, so as to obtain feature images of multiple convolution stages corresponding to each sample image; wherein the positive training sample set is composed of a plurality of sample images marked with target areas;

the feature fusion module 44 is configured to fuse, for each sample image, feature images of a plurality of corresponding convolution stages;

the positive sample model training module 45 is configured to obtain a positive sample convolutional neural network model according to the feature image after each sample image is fused; wherein the positive sample convolutional neural network model is used to identify the target region.

Further, the positive sample model training module 45 is specifically configured to: for each sample image, identifying the fused characteristic image through an output layer of the convolutional neural network to obtain a prediction target area; determining a prediction error according to the prediction target areas of all the sample images and the real target areas contained in the corresponding sample images; if the prediction error is larger than a preset error, re-determining the parameters of the convolutional neural network according to the preset error, continuing to repeat the training process until the prediction error is smaller than or equal to the preset error, and ending the training process to obtain the positive sample convolutional neural network model.

Further, the apparatus further comprises: a model verification module 46;

the model verification module 46 is configured to input a verification sample set into the positive sample convolutional neural network model for identification, so as to obtain a prediction target region of a verification sample image; the verification sample set consists of a plurality of verification sample images marked with target areas; and if the prediction error determined according to the prediction target areas of all the verification sample images and the real target areas contained in the corresponding verification sample images is less than or equal to the preset error, determining that the positive sample convolutional neural network model passes the verification, otherwise, re-determining the parameters of the convolutional neural network, and continuing training until the obtained positive sample convolutional neural network model passes the verification.

Further, the apparatus further comprises: a grouping module 47;

the grouping module 47 is configured to group the positive training samples if the number of the positive training samples exceeds a preset number, where each group of the positive training samples serves as a positive training sample set.

Further, the positive sample model training module 45 is specifically configured to: calculating the loss between the prediction target area and the real target area contained in the corresponding sample image by adopting a loss function aiming at each sample image; the prediction error is determined from the loss of all sample images.

Further, the positive sample model training module 45 is specifically configured to: respectively setting weight for each sample image according to the loss of each sample image; and performing weighted fusion on the loss according to the weight of each sample image, and taking the loss after weighted fusion as the prediction error.

Further, the target area is a license plate area.

For detailed descriptions of the working principle, the realized technical effect, and the like of the embodiment of the convolutional neural network model training device, reference may be made to the related descriptions in the embodiment of the convolutional neural network model training method, and further description is omitted here.

EXAMPLE five

In order to solve the technical problem of low target area identification accuracy in the prior art, the embodiment of the disclosure provides a convolutional neural network model training device. The apparatus may perform the steps in the convolutional neural network model training method embodiment described in the second embodiment above. As shown in fig. 5, the apparatus mainly includes: a negative sample determination module 51 and a negative sample model training module 52; wherein the content of the first and second substances,

the negative sample determining module 51 is configured to determine a negative training sample set; the negative training sample set consists of a plurality of background areas marked with non-target areas and a plurality of foreground areas marked with target areas;

the negative sample model training module 52 is configured to obtain a negative sample convolutional neural network model according to the negative training sample set.

Specifically, the convolutional neural network model training method described in the first embodiment of the present invention may be used for training.

Further, the background area and/or the foreground area are/is a binary image.

Further, the target area is a license plate area.

EXAMPLE six

In order to solve the technical problem of low accuracy of target area identification in the prior art, the embodiment of the present disclosure provides a target area identification device. The apparatus may perform the steps in the embodiment of the convolutional neural network model training method described in the third embodiment. As shown in fig. 6, the apparatus mainly includes: an image acquisition module 61 and an image recognition module 62; wherein the content of the first and second substances,

the image acquisition module 61 is used for acquiring an image to be identified;

the image recognition module 62 is configured to input the image to be recognized into the positive sample convolutional neural network model for recognition, so as to obtain a target region.

The positive sample convolutional neural network model is obtained by training by adopting the convolutional neural network model training method described in the first embodiment.

Further, the image recognition module 62 is further configured to: inputting the target area into a negative sample convolutional neural network model for classification; determining the target area as a foreground area or a background area according to the classification result; if the classification result is the foreground area, determining that the target area is a real target area; and if the classification result is the background area, determining that the target area is a misjudged target area.

Further, the image recognition module 62 is specifically configured to: carrying out binarization processing on the target area to obtain a binarization image; and inputting the binary image into a negative sample convolutional neural network model for identification.

Further, the target area is a license plate area.

For detailed descriptions of the working principle, the technical effect of implementation, and the like of the embodiment of the target area identification device, reference may be made to the description of the embodiment of the target area identification method, and further description is omitted here.

EXAMPLE seven

Referring now to FIG. 5, shown is a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, the electronic device may include a processing device (e.g., central processing unit, graphics processor, etc.) 701, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage device 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, or the like; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication device 709 may allow the electronic device to communicate wirelessly or by wire with other devices to exchange data. While fig. 5 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: segmenting a plurality of convolutional layers of the convolutional neural network to obtain a plurality of convolutional stages; wherein the plurality of convolutional layers are sequentially connected in series; determining parameters of the convolutional neural network; inputting a positive training sample set into the convolutional neural network for training to obtain a plurality of characteristic images of convolution stages corresponding to each sample image; wherein the positive training sample set is composed of a plurality of sample images marked with target areas; for each sample image, fusing the corresponding characteristic images of a plurality of convolution stages; obtaining a positive sample convolution neural network model according to the feature image after each sample image is fused; wherein the positive sample convolutional neural network model is used to identify the target region.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A target area identification method, comprising:

acquiring an image to be identified;

inputting the image to be identified into a positive sample convolutional neural network model for identification to obtain a target area;

inputting the target area into a negative sample convolutional neural network model for classification;

determining the real target area of the target area as a foreground area or a background area according to the classification result; if the classification result is the foreground area, determining that the target area is a real target area; and if the classification result is the background area, determining that the target area is a misjudged target area.

2. The method of claim 1, wherein the positive sample convolutional neural network model is generated by:

determining parameters of the convolutional neural network;

3. The method according to claim 2, wherein the obtaining a positive sample convolutional neural network model from the feature image after each sample image fusion comprises:

4. The method of claim 2, further comprising:

5. The method of claim 2, further comprising:

6. The method according to claim 3, wherein determining the prediction error according to the prediction target regions of all the sample images and the real target region contained in the corresponding sample image comprises:

the prediction error is determined from the loss of all sample images.

7. The method of claim 6, wherein determining the prediction error based on the loss of all sample images comprises:

8. The method of claim 7, wherein a higher loss sample image corresponds to a lower weight.

9. The method of any one of claims 1-8, wherein the target region is a license plate region.

10. The method of claim 1, wherein the negative sample convolutional neural network model is generated by:

according to the negative training sample set, replacing the positive sample in the training step of any one of claims 2 to 9 with the negative sample for training to obtain a negative sample convolutional neural network model.

11. The method according to claim 10, characterized in that the background region and/or the foreground region is a binarized image.

12. The method of claim 1, wherein the classifying the target region input negative sample convolutional neural network model comprises:

and inputting the binary image into the negative sample convolutional neural network model for classification.

13. A target area identifying apparatus, comprising:

the image acquisition module is used for acquiring an image to be identified;

the image identification module is used for inputting the image to be identified into the positive sample convolutional neural network model for identification to obtain a target area; inputting the target area into a negative sample convolutional neural network model for classification; determining the real target area of the target area as a foreground area or a background area according to the classification result; if the classification result is the foreground area, determining that the target area is a real target area; and if the classification result is the background area, determining that the target area is a misjudged target area.

14. An electronic device, comprising:

a memory for storing non-transitory computer readable instructions; and

a processor for executing the computer readable instructions such that the processor when executing performs the method of any of claim 12.

15. A computer-readable storage medium storing non-transitory computer-readable instructions that, when executed by a computer, cause the computer to perform the method of any one of claims 1-12.