CN110287955B - Target area determination model training method, device and computer readable storage medium - Google Patents

Target area determination model training method, device and computer readable storage medium Download PDF

Info

Publication number
CN110287955B
CN110287955B CN201910485063.3A CN201910485063A CN110287955B CN 110287955 B CN110287955 B CN 110287955B CN 201910485063 A CN201910485063 A CN 201910485063A CN 110287955 B CN110287955 B CN 110287955B
Authority
CN
China
Prior art keywords
training
target area
image
key points
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910485063.3A
Other languages
Chinese (zh)
Other versions
CN110287955A (en
Inventor
朱延东
王长虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Douyin Vision Co Ltd
Douyin Vision Beijing Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910485063.3A priority Critical patent/CN110287955B/en
Publication of CN110287955A publication Critical patent/CN110287955A/en
Application granted granted Critical
Publication of CN110287955B publication Critical patent/CN110287955B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target area determination model training method and device, electronic equipment and a computer readable storage medium. The method comprises the following steps: carrying out target area identification on the sample image to obtain an image area containing a target area; forming a training sample set by a plurality of image areas with fixed sizes, marking a plurality of key points on the image areas in the training sample set, and locating the plurality of key points in a target area; each training channel of the convolutional neural network is independently trained until meeting respective convergence conditions, and a target area determination model comprising a plurality of training channels is obtained; the plurality of training channels are respectively used for predicting the displacement of the plurality of key points relative to the reference point. According to the embodiment of the invention, the training sample set is trained through the parallel training channels to obtain the training channels, and the training channels are used for predicting the displacement of the key points relative to the reference point, so that the target area is obtained according to the key points, and the identification accuracy of the target area can be improved.

Description

Target area determination model training method, device and computer readable storage medium
Technical Field
The present disclosure relates to the field of target region determination model training technologies, and in particular, to a target region determination model training method, apparatus, and computer-readable storage medium.
Background
Many of the shot video images contain automobiles, and the images containing the automobiles generally contain license plates, so that the license plates in the video images need to be processed or covered by other images due to privacy. When processing an image containing a license plate, it is critical to identify the license plate region in the image.
In the prior art, for the identification of a license plate region, a model is generally used to determine whether each pixel in the license plate is a foreground or a background, where the foreground is the license plate region to be identified and the background is a region except the license plate in an image. However, by using this method, the classifier only cares whether the pixel is a pixel in the license plate, and when only using a judgment condition whether the pixel is a pixel in the license plate region, the accurate region of the license plate is easily affected by some misjudgment points.
Disclosure of Invention
The technical problem solved by the present disclosure is to provide a target region determination model training method to at least partially solve the technical problem in the prior art that a target region is misjudged. In addition, a target area determination model training device, a target area determination model training hardware device, a computer readable storage medium and a target area determination model training terminal are also provided.
In order to achieve the above object, according to one aspect of the present disclosure, the following technical solutions are provided:
a target area determination model training method comprises the following steps:
carrying out target area identification on the sample image to obtain an image area containing the target area;
scaling the image area to a fixed size, forming a training sample set by a plurality of image areas of the fixed size, and marking a plurality of key points on the image areas in the training sample set, wherein the plurality of key points are located in the target area;
inputting the set of training samples into a convolutional neural network; the convolutional layer of the convolutional neural network comprises a plurality of convolutional kernels which are parallel, and each convolutional kernel corresponds to one training channel;
each training channel is independently trained according to the training sample set until respective convergence conditions are met, and a target area determination model comprising a plurality of training channels is obtained; the plurality of training channels of the target region determination model are respectively used for predicting the displacement of the plurality of key points relative to a reference point, wherein the reference point is located in the target region.
Further, each training channel is independently trained according to the training sample set until respective convergence conditions are met, so as to obtain a target area determination model including a plurality of training channels, including:
determining parameters of each training channel;
each training channel calculates according to the determined parameters, the marked key points and the reference point to obtain a predicted displacement;
determining a prediction error according to the prediction displacement and the real displacement of each image area;
and readjusting the parameters of the training channel corresponding to the prediction error larger than the preset error, continuously repeating the training process of the corresponding training channel until the prediction error of the corresponding training channel is smaller than or equal to the preset error, and finishing the training process of the corresponding training channel.
Further, the target area is a polygonal area, and the key points are vertices of the polygonal area.
Further, the reference point is a center point of the image area.
Further, the target area is a license plate area.
In order to achieve the above object, according to one aspect of the present disclosure, the following technical solutions are provided:
a target area determination method, comprising:
carrying out target area identification on a currently input video frame to obtain an image area containing the target area;
scaling the image region to a fixed size;
inputting the image area with fixed size into a target area determination model obtained by training by adopting the target area determination model training method of any one of claims 1 to 5;
respectively predicting and obtaining a plurality of displacements relative to a reference point through a plurality of training channels of the target area determination model; wherein the reference point is located within the target region;
determining a plurality of key points according to the displacements of the relative reference points and the coordinates of the reference points;
and determining a target area according to the plurality of key points.
Further, the determining a target area according to the plurality of key points includes:
and taking a closed area formed by the connection lines of the plurality of key points as a target area.
Further, the method further comprises:
acquiring a next frame of video image, and taking the next frame of video image as the currently input video frame;
and expanding a preset area to the periphery by taking the image area as a reference, and zooming the expanded image area into a fixed size to be used as an image area which corresponds to the next frame of video image and contains the target area.
Further, the target area is a license plate area.
In order to achieve the above object, according to one aspect of the present disclosure, the following technical solutions are provided:
a target region determination model training apparatus, comprising:
the sample identification module is used for identifying a target area of a sample image to obtain an image area containing the target area;
a training set determining module, configured to scale the image region to a fixed size, form a training sample set from a plurality of image regions of the fixed size, and mark a plurality of key points on the image region in the training sample set, where the plurality of key points are located in the target region;
a training set input module for inputting the training sample set into a convolutional neural network; the convolutional layer of the convolutional neural network comprises a plurality of convolutional kernels which are parallel, and each convolutional kernel corresponds to one training channel;
the model training module is used for each training channel to independently train according to the training sample set until the respective convergence condition is met, and a target area determination model containing a plurality of training channels is obtained; the plurality of training channels of the target region determination model are respectively used for predicting the displacement of the plurality of key points relative to a reference point, wherein the reference point is located in the target region.
Further, the model training module is specifically configured to: determining parameters of each training channel; each training channel calculates according to the determined parameters, the marked key points and the reference point to obtain a predicted displacement;
determining a prediction error according to the prediction displacement and the real displacement of each image area; and readjusting the parameters of the training channel corresponding to the prediction error larger than the preset error, continuously repeating the training process of the corresponding training channel until the prediction error of the corresponding training channel is smaller than or equal to the preset error, and finishing the training process of the corresponding training channel.
Further, the target area is a polygonal area, and the key points are vertices of the polygonal area.
Further, the reference point is a center point of the image area.
Further, the target area is a license plate area.
In order to achieve the above object, according to one aspect of the present disclosure, the following technical solutions are provided:
a target area determination apparatus comprising:
the area identification module is used for identifying a target area of a currently input video frame to obtain an image area containing the target area;
an image scaling module to scale the image region to a fixed size;
an image input module, configured to input the image area with the fixed size into a target area determination model obtained by training with the target area determination model training method according to any one of claims 1 to 5;
the prediction module is used for respectively predicting and obtaining the displacements of a plurality of relative reference points through a plurality of training channels of the target area determination model; wherein the reference point is located within the target region;
a key point determining module for determining a plurality of key points according to the displacements of the relative reference points and the coordinates of the reference points;
and the region determining module is used for determining a target region according to the plurality of key points.
Further, the region determining module is specifically configured to: and taking a closed area formed by the connection lines of the plurality of key points as a target area.
Further, the area identification module is further configured to: acquiring a next frame of video image, and taking the next frame of video image as the currently input video frame; and expanding a preset area to the periphery by taking the image area as a reference, and zooming the expanded image area into a fixed size to be used as an image area which corresponds to the next frame of video image and contains the target area.
Further, the target area is a license plate area.
In order to achieve the above object, according to one aspect of the present disclosure, the following technical solutions are provided:
an electronic device, comprising:
a memory for storing non-transitory computer readable instructions; and
a processor for executing the computer readable instructions, so that the processor when executing implements the target region determination model training method described in any one of the above.
In order to achieve the above object, according to one aspect of the present disclosure, the following technical solutions are provided:
a computer readable storage medium storing non-transitory computer readable instructions which, when executed by a computer, cause the computer to perform a target area determination model training method as any one of the above.
In order to achieve the above object, according to one aspect of the present disclosure, the following technical solutions are provided:
an electronic device, comprising:
a memory for storing non-transitory computer readable instructions; and
a processor configured to execute the computer readable instructions, so that the processor implements the target area determination method described in any one of the above.
In order to achieve the above object, according to one aspect of the present disclosure, the following technical solutions are provided:
a computer readable storage medium storing non-transitory computer readable instructions which, when executed by a computer, cause the computer to perform a target area determination method as in any one of the above.
In order to achieve the above object, according to still another aspect of the present disclosure, the following technical solutions are also provided:
a target area determination model training terminal comprises any one of the target area determination model training devices.
In order to achieve the above object, according to still another aspect of the present disclosure, the following technical solutions are also provided:
a data reading terminal comprises any one of the data reading devices.
The method and the device for identifying the target area have the advantages that the training sample set is trained through the parallel training channels to obtain the training channels, and the training channels are used for predicting the displacement of the key points relative to the reference point and are associated with the target area, so that the target area is obtained according to the displacement of the key points relative to the reference point, and the identification accuracy of the target area can be improved.
The foregoing is a summary of the present disclosure, and for the purposes of promoting a clear understanding of the technical means of the present disclosure, the present disclosure may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
Drawings
FIG. 1a is a schematic flow chart diagram of a target region determination model training method according to one embodiment of the present disclosure;
FIG. 1b is a schematic diagram of a convolution process of a convolutional layer in a target region determination model training method according to an embodiment of the present disclosure;
FIG. 1c is a diagram illustrating convolution results of convolutional layers in a target region determination model training method according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart diagram of a target area determination method according to one embodiment of the present disclosure;
FIG. 3 is a schematic structural diagram of a target area determination model training apparatus according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of a target area determination apparatus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
The embodiments of the present disclosure are described below with specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure in the specification. It is to be understood that the described embodiments are merely illustrative of some, and not restrictive, of the embodiments of the disclosure. The disclosure may be embodied or carried out in various other specific embodiments, and various modifications and changes may be made in the details within the description without departing from the spirit of the disclosure. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present disclosure, and the drawings only show the components related to the present disclosure rather than the number, shape and size of the components in actual implementation, and the type, amount and ratio of the components in actual implementation may be changed arbitrarily, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
Example one
In order to solve the technical problem of low correct identification rate of a target region in the prior art, the embodiment of the disclosure provides a training method for a target region determination model. As shown in fig. 1a, the training method of the target region determination model mainly includes the following steps S11 to S14. Wherein:
step S11: and identifying a target area of the sample image to obtain an image area containing the target area.
The target area may be a license plate area.
The sample image is an image which is used as a training sample and input into the convolutional neural network for training. Specifically, the information may be obtained by a camera of the terminal device, or may be pre-stored in a local database and obtained from the local database.
Specifically, the existing convolutional neural network model may be adopted to identify the sample image, and an image region including the target region is obtained.
Step S12: the image area is scaled to a fixed size, a training sample set is composed of a plurality of image areas of the fixed size, and the image areas in the training sample set are labeled with a plurality of key points, wherein the plurality of key points are located in the target area.
In order to ensure that the trained target region determination model has general applicability, that is, image regions of any size can be predicted, the size of the image regions is unified into a fixed size in the embodiment, wherein the fixed size can be set in a self-defined manner.
The key points are pixel points in the target area, and specifically can be pixel points at the boundary of the target area. The keypoints may be represented by coordinates. For example, if the target region is a license plate region, since the license plate region is usually a rectangular region, the key points may be pixel points at four corners of the vehicle region.
Step S13: inputting the set of training samples into a convolutional neural network; the convolutional layer of the convolutional neural network comprises a plurality of convolutional kernels which are parallel, and each convolutional kernel corresponds to one training channel.
The Convolutional Neural Networks (CNN) are a type of feed-forward Neural network that includes Convolutional calculation and has a deep structure, and mainly include an input layer, a Convolutional layer, a pooling layer, a full-link layer, and an output layer. Also, a convolutional neural network may include a plurality of convolutional layers. In this context, the convolutional neural network may be a straight-tube convolutional neural network, and may also be a deep learning convolutional neural network, which is not limited specifically herein.
The convolution layer includes convolution kernel, the convolution kernel may be a matrix for performing convolution on the input image, and the specific calculation method is to multiply elements of different local matrices of the input image and each position of the convolution kernel matrix and then add the multiplied elements. In this context, each training channel corresponds to a different convolution kernel.
For example, as shown in FIG. 1b, the input is a two-dimensional 3x4 matrix and the convolution kernel is a 2x2 matrix. Assuming that the convolution is performed by shifting one pixel at a time, the input top left corner 2x2 is first partially convolved with a convolution kernel, i.e. the elements at each position are multiplied and then added to obtain the S00 element of the output matrix S, which has the value aw + bx + ey + fzaw + bx + ey + fz. The input local is then shifted one pixel to the right, now a matrix of four elements (b, c, f, g) is convolved with a convolution kernel, thus obtaining the elements of S01 of the output matrix S, and in the same way, the elements of S02, S10, S11, S12, S10, S11, S12 of the output matrix S can be obtained. The resulting matrix of convolution outputs is a 2x3 matrix S, as shown in fig. 1 c.
Wherein each training channel is independent, the plurality of training channels sharing other layers of the convolutional neural network except using different convolutional kernels at convolutional layers.
The number of the training channels is determined by the number of the marked key points, if the number of the marked key points is 6, the number of the corresponding training channels is 6, and one key point corresponds to one training channel.
Step S14: each training channel is independently trained according to the training sample set until respective convergence conditions are met, and a target area determination model comprising a plurality of training channels is obtained; the plurality of training channels of the target region determination model are respectively used for predicting the displacement of the plurality of key points relative to a reference point, wherein the reference point is located in the target region.
Wherein, the reference point can be selected by self-definition. Specifically, the reference points may be labeled before training, or calculated according to labeled key points. For example, when the target region is a license plate region and the corresponding key points are four vertices of the license plate region, the center point of the license plate region may be determined according to the marked four key points, and the center point may be used as a reference point.
In this embodiment, the training sample set is trained through a plurality of parallel training channels, so that the trained target region determination model includes a plurality of training channels, and the plurality of training channels are used for predicting displacements of a plurality of relative reference points associated with the target region, and a plurality of key points associated with the target region can be obtained, so that the target region is obtained according to the plurality of key points, and the target region identification accuracy can be improved.
In an alternative embodiment, step S14 includes:
step S141: parameters for each training channel are determined.
The parameters include parameters corresponding to convolution kernels of the convolution layers, for example, the size of a convolution matrix, which may be set to 3 × 3, for example, and different convolution layers may have different convolution kernels. In addition, parameters of the pooling layer, such as the size of the pooling matrix, the pooling matrix which may be 3 × 3, or parameters of the output layer, such as a linear coefficient matrix and a bias vector, may also be included. And the parameters corresponding to each training channel are different.
Step S142: and each training channel calculates to obtain the predicted displacement according to the determined parameters, the marked key points and the reference point.
Specifically, firstly, a training sample set passes through an input layer of the convolutional neural network, the training sample set is converted into a multi-dimensional vector, and then convolution calculation is performed through a convolutional layer to obtain a characteristic image corresponding to a convolution stage. In this context, the convolutional layer contains a plurality of convolution kernels in parallel, and after an input image enters the convolutional layer, the convolutional layer performs convolution calculation with different convolution kernels to obtain a plurality of convolution results, and then enters the pooling layer, the fully-connected layer and the output layer for prediction.
Step S143: and determining a prediction error according to the prediction displacement and the real displacement of each image area.
Wherein, the preset error can be set by user.
Specifically, the loss between the predicted feature data and the true feature data may be calculated using a loss function, and the loss is taken as a prediction error.
Step S144: and readjusting the parameters of the training channel corresponding to the prediction error larger than the preset error, continuously repeating the training process of the corresponding training channel until the prediction error of the corresponding training channel is smaller than or equal to the preset error, and finishing the training process of the corresponding training channel.
In an alternative embodiment, the target region is a polygon region, and the key points are vertices of the polygon region.
Specifically, when the target region is a license plate region, the corresponding polygon is a quadrangle, i.e., a rectangle. When the training sample is marked, only the vertex of the corresponding target area in the image area is marked. For example, the four vertices of the rectangle.
Example two
In order to solve the technical problem of low accuracy of determining a target area in the prior art, an embodiment of the present disclosure further provides a method for determining a target area, which specifically includes, as shown in fig. 2:
and S21, identifying the target area of the currently input video frame to obtain an image area containing the target area.
The input video frame can be acquired in real time through a camera, or a pre-stored video image is acquired from local.
The target area can be a license plate area, and the corresponding target area is a rectangular area.
Specifically, the existing convolutional neural network model may be adopted to perform initial identification on the input video frame, so as to obtain an image region including the target region.
S22, the image area is scaled to a fixed size.
The fixed size is consistent with the image area of the training sample input during the training of the target area determination model.
And S23, inputting the image area with the fixed size into a target area determination model obtained by training through a target area determination model training method.
The target area determination model is obtained by training by using the target area determination model training method described in the first embodiment, and the specific training process refers to the first embodiment.
S24, respectively predicting and obtaining the displacements of a plurality of relative reference points through a plurality of training channels of the target area determination model; wherein the reference point is located within the target region.
Wherein a training channel is used to predict a displacement relative to the reference point.
The reference point may be preset, and specifically may be set as a central point of the target area, or other pixel points of the target area.
And S25, determining a plurality of key points according to the displacements of the relative reference points and the coordinates of the reference points.
For example, if the target region may be a license plate region, the target region corresponds to a rectangular region, and four points of the rectangular region, namely, an upper left point, a lower left point, an upper right point and a lower right point, may be selected as key points for training when the target region is trained to determine a model, so as to obtain four training channels, which are respectively used for predicting displacements of the four key points of the upper left point, the lower left point, the upper right point and the lower right point relative to a reference point. Then, in this embodiment, the displacements of the four relative reference points can be obtained through the prediction of the four training channels of the target region determination model, and the four key points of the upper left, the lower left, the upper right and the lower right are further obtained through calculation according to the displacements of the four relative reference points and the coordinates of the reference points.
And S26, determining a target area according to the key points.
In this embodiment, displacements of the plurality of relative reference points are respectively obtained by predicting through the plurality of training channels of the target area determination model, and a plurality of key points can be obtained according to the displacements of the plurality of relative reference points, so as to obtain a target area, thereby improving the target area identification accuracy.
In an optional embodiment, step S26 specifically includes:
and taking a closed area formed by the connection lines of the plurality of key points as a target area.
Specifically, referring to the example in step 25, if four key points, i.e., upper left, lower left, upper right, and lower right, are obtained, a rectangular region can be obtained by connecting the four key points, and the rectangular region is the target region.
In an optional embodiment, the method further comprises:
step 27: and acquiring a next frame of video image, and taking the next frame of video image as the currently input video frame.
Step 28: and expanding a preset area to the periphery by taking the image area as a reference, and zooming the expanded image area into a fixed size to be used as an image area which corresponds to the next frame of video image and contains the target area.
Specifically, when determining the target region of the next frame of video image, the image region in step S21 may be partially enlarged to serve as the image region of the next frame of video image, and then steps S22-S26 are performed, so that, since the position of the target region does not change much in consecutive frames of video images, the target region is generally still in the image region after the image region is partially enlarged each time, and thus the steps S22-S26 can be directly used for license plate region identification, and it is not necessary to use a complex target region identification model to initially identify the next frame of video image to obtain the image region including the target region, which is more efficient.
It will be appreciated by those skilled in the art that obvious modifications (e.g., combinations of the enumerated modes) or equivalents may be made to the above-described embodiments.
In the above, although the steps in the embodiment of the target area determination model training method are described in the above sequence, it should be clear to those skilled in the art that the steps in the embodiment of the present disclosure are not necessarily performed in the above sequence, and may also be performed in other sequences such as reverse, parallel, and cross, and further, on the basis of the above steps, those skilled in the art may also add other steps, and these obvious modifications or equivalents should also be included in the protection scope of the present disclosure, and are not described herein again.
For convenience of description, only the relevant parts of the embodiments of the present disclosure are shown, and details of the specific techniques are not disclosed, please refer to the embodiments of the method of the present disclosure.
EXAMPLE III
In order to solve the technical problem of low accuracy of target area determination in the prior art, the embodiment of the present disclosure provides a target area determination model training device. The apparatus may perform the steps in the embodiment of the target region determination model training method described in the first embodiment. As shown in fig. 3, the apparatus mainly includes: a sample recognition module 31, a training set determination module 32, a training set input module 33 and a model training module 34; wherein the content of the first and second substances,
the sample identification module 31 is configured to perform target area identification on a sample image to obtain an image area including the target area;
the training set determining module 32 is configured to scale the image region to a fixed size, form a training sample set by a plurality of image regions of the fixed size, and mark a plurality of key points on the image regions in the training sample set, where the plurality of key points are located in the target region;
a training set input module 33 for inputting the training sample set into a convolutional neural network; the convolutional layer of the convolutional neural network comprises a plurality of convolutional kernels which are parallel, and each convolutional kernel corresponds to one training channel;
the model training module 34 is used for each training channel to independently train according to the training sample set until respective convergence conditions are met, so as to obtain a target area determination model comprising a plurality of training channels; the plurality of training channels of the target region determination model are respectively used for predicting the displacement of the plurality of key points relative to a reference point, wherein the reference point is located in the target region.
Further, the model training module 34 is specifically configured to: determining parameters of each training channel; each training channel calculates according to the determined parameters, the marked key points and the reference point to obtain a predicted displacement; determining a prediction error according to the prediction displacement and the real displacement of each image area; and readjusting the parameters of the training channel corresponding to the prediction error larger than the preset error, continuously repeating the training process of the corresponding training channel until the prediction error of the corresponding training channel is smaller than or equal to the preset error, and finishing the training process of the corresponding training channel.
Further, the target area is a polygonal area, and the key points are vertices of the polygonal area.
Further, the reference point is a center point of the image area.
Further, the target area is a license plate area.
For detailed descriptions of the working principle, the achieved technical effect, and the like of the embodiment of the target area determination model training apparatus, reference may be made to the related descriptions in the foregoing embodiment of the target area determination model training method, and details are not repeated here.
Example four
In order to solve the technical problem of low accuracy in determining a target area in the prior art, an embodiment of the present disclosure provides a target area determining apparatus. The apparatus may perform the steps in the embodiment of the target region determination model training method described in the second embodiment. As shown in fig. 4, the apparatus mainly includes: a region identification module 41, an image scaling module 42, an image input module 43, a prediction module 44, a keypoint determination module 45, and a region determination module 46; wherein the content of the first and second substances,
the region identification module 41 is configured to perform target region identification on a currently input video frame to obtain an image region including the target region;
an image scaling module 42 for scaling the image area to a fixed size;
the image input module 43 is configured to input the image region with the fixed size into a target region determination model obtained by training with the target region determination model training method according to any one of claims 1 to 5;
the prediction module 44 is configured to respectively predict, through a plurality of training channels of the target region determination model, a plurality of displacements relative to a reference point; wherein the reference point is located within the target region;
the key point determining module 45 is configured to determine a plurality of key points according to the displacements of the plurality of relative reference points and the reference point coordinates;
the region determining module 46 is configured to determine a target region according to the plurality of key points.
Further, the region determining module 46 is specifically configured to: and taking a closed area formed by the connection lines of the plurality of key points as a target area.
Further, the area identification module is further configured to: acquiring a next frame of video image, and taking the next frame of video image as the currently input video frame; and expanding a preset area to the periphery by taking the image area as a reference, and zooming the expanded image area into a fixed size to be used as an image area which corresponds to the next frame of video image and contains the target area.
Further, the target area is a license plate area.
For detailed descriptions of the working principle, the technical effect of implementation, and the like of the embodiment of the target area determining apparatus, reference may be made to the description of the embodiment of the target area determining method, and details are not repeated here.
EXAMPLE five
Referring now to FIG. 5, shown is a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, the electronic device may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: carrying out target area identification on the sample image to obtain an image area containing the target area; scaling the image area to a fixed size, forming a training sample set by a plurality of image areas of the fixed size, and marking a plurality of key points on the image areas in the training sample set, wherein the plurality of key points are located in the target area; inputting the set of training samples into a convolutional neural network; the convolutional layer of the convolutional neural network comprises a plurality of convolutional kernels which are parallel, and each convolutional kernel corresponds to one training channel; each training channel is independently trained according to the training sample set until respective convergence conditions are met, and a target area determination model comprising a plurality of training channels is obtained; the plurality of training channels of the target region determination model are respectively used for predicting the displacement of the plurality of key points relative to a reference point, wherein the reference point is located in the target region.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (15)

1. A target area determination model training method is characterized by comprising the following steps:
carrying out target area identification on the sample image to obtain an image area containing the target area;
scaling the image area to a fixed size, forming a training sample set by a plurality of image areas of the fixed size, and marking a plurality of key points on the image areas in the training sample set, wherein the plurality of key points are located in the target area;
inputting the set of training samples into a convolutional neural network; the convolutional layer of the convolutional neural network comprises a plurality of convolutional kernels which are parallel, and each convolutional kernel corresponds to one training channel;
each training channel is independently trained according to the training sample set until respective convergence conditions are met, and a target area determination model comprising a plurality of training channels is obtained; the method comprises the steps that a plurality of training channels of a target region determination model are respectively used for predicting the displacement of a plurality of key points relative to a reference point, wherein the reference point is located in the target region, and the displacement of the plurality of key points relative to the reference point is used for obtaining the target region.
2. The method of claim 1, wherein each training channel is independently trained according to the training sample set until a respective convergence condition is satisfied, and obtaining a target region determination model including a plurality of training channels comprises:
determining parameters of each training channel;
each training channel calculates according to the determined parameters, the marked key points and the reference point to obtain a predicted displacement;
determining a prediction error according to the prediction displacement and the real displacement of each image area;
and readjusting the parameters of the training channel corresponding to the prediction error larger than the preset error, continuously repeating the training process of the corresponding training channel until the prediction error of the corresponding training channel is smaller than or equal to the preset error, and finishing the training process of the corresponding training channel.
3. The method of claim 1 or 2, wherein the target region is a polygonal region and the keypoints are vertices of the polygonal region.
4. The method according to claim 1 or 2, wherein the reference point is a center point of the image area.
5. The method of claim 1 or 2, wherein the target area is a license plate area.
6. A target area determination method, comprising:
carrying out target area identification on a currently input video frame to obtain an image area containing the target area;
scaling the image region to a fixed size;
inputting the image area with fixed size into a target area determination model obtained by training by adopting the target area determination model training method of any one of claims 1 to 5;
respectively predicting and obtaining a plurality of displacements relative to a reference point through a plurality of training channels of the target area determination model; wherein the reference point is located within the target region;
determining a plurality of key points according to the displacements of the relative reference points and the coordinates of the reference points;
and determining a target area according to the plurality of key points.
7. The method of claim 6, wherein determining a target region from the plurality of keypoints comprises:
and taking a closed area formed by the connection lines of the plurality of key points as a target area.
8. The method of claim 6, further comprising:
acquiring a next frame of video image, and taking the next frame of video image as the currently input video frame;
and expanding a preset area to the periphery by taking the image area as a reference, and zooming the expanded image area into a fixed size to be used as an image area which corresponds to the next frame of video image and contains the target area.
9. The method of any one of claims 6-8, wherein the target region is a license plate region.
10. A target region determination model training apparatus, comprising:
the sample identification module is used for identifying a target area of a sample image to obtain an image area containing the target area;
a training set determining module, configured to scale the image region to a fixed size, form a training sample set from a plurality of image regions of the fixed size, and mark a plurality of key points on the image region in the training sample set, where the plurality of key points are located in the target region;
a training set input module for inputting the training sample set into a convolutional neural network; the convolutional layer of the convolutional neural network comprises a plurality of convolutional kernels which are parallel, and each convolutional kernel corresponds to one training channel;
the model training module is used for each training channel to independently train according to the training sample set until the respective convergence condition is met, and a target area determination model containing a plurality of training channels is obtained; the method comprises the steps that a plurality of training channels of a target region determination model are respectively used for predicting the displacement of a plurality of key points relative to a reference point, wherein the reference point is located in the target region, and the displacement of the plurality of key points relative to the reference point is used for obtaining the target region.
11. A target area determination apparatus, comprising:
the area identification module is used for identifying a target area of a currently input video frame to obtain an image area containing the target area;
an image scaling module to scale the image region to a fixed size;
an image input module, configured to input the image area with the fixed size into a target area determination model obtained by training with the target area determination model training method according to any one of claims 1 to 5;
the prediction module is used for respectively predicting and obtaining the displacements of a plurality of relative reference points through a plurality of training channels of the target area determination model; wherein the reference point is located within the target region;
a key point determining module for determining a plurality of key points according to the displacements of the relative reference points and the coordinates of the reference points;
and the region determining module is used for determining a target region according to the plurality of key points.
12. An electronic device, comprising:
a memory for storing non-transitory computer readable instructions; and
a processor for executing the computer readable instructions such that the processor when executing performs the target area determination model training method according to any one of claims 1-5.
13. A computer-readable storage medium storing non-transitory computer-readable instructions that, when executed by a computer, cause the computer to perform the target area determination model training method of any one of claims 1-5.
14. An electronic device, comprising:
a memory for storing non-transitory computer readable instructions; and
a processor for executing the computer readable instructions such that the processor when executing performs the target area determination method according to any one of claims 6-9.
15. A computer-readable storage medium storing non-transitory computer-readable instructions which, when executed by a computer, cause the computer to perform the target area determination method of any one of claims 6-9.
CN201910485063.3A 2019-06-05 2019-06-05 Target area determination model training method, device and computer readable storage medium Active CN110287955B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910485063.3A CN110287955B (en) 2019-06-05 2019-06-05 Target area determination model training method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910485063.3A CN110287955B (en) 2019-06-05 2019-06-05 Target area determination model training method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110287955A CN110287955A (en) 2019-09-27
CN110287955B true CN110287955B (en) 2021-06-22

Family

ID=68003418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910485063.3A Active CN110287955B (en) 2019-06-05 2019-06-05 Target area determination model training method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110287955B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113326836B (en) * 2020-02-28 2024-06-14 深圳市丰驰顺行信息技术有限公司 License plate recognition method, license plate recognition device, server and storage medium
CN111383219B (en) * 2020-03-12 2020-12-01 众能联合数字技术有限公司 Method and system for intelligently detecting cleanliness of aerial work platform equipment
CN111476169B (en) * 2020-04-08 2023-11-07 智慧互通科技股份有限公司 Complex scene road side parking behavior identification method based on video frame
CN113516673B (en) * 2020-04-10 2022-12-02 阿里巴巴集团控股有限公司 Image detection method, device, equipment and storage medium
CN113807147B (en) * 2020-06-15 2024-05-21 北京达佳互联信息技术有限公司 Target detection and network training method and device thereof
CN112541902A (en) * 2020-12-15 2021-03-23 平安科技(深圳)有限公司 Similar area searching method, similar area searching device, electronic equipment and medium
CN112507921B (en) * 2020-12-16 2024-03-19 平安银行股份有限公司 Target area-based graphic searching method, system, electronic device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599830A (en) * 2016-12-09 2017-04-26 中国科学院自动化研究所 Method and apparatus for positioning face key points
CN108510734A (en) * 2018-03-30 2018-09-07 深圳市金溢科技股份有限公司 A kind of information of vehicles matching process of roadside unit and a kind of roadside unit
CN108764048A (en) * 2018-04-28 2018-11-06 中国科学院自动化研究所 Face critical point detection method and device
CN109145928A (en) * 2017-06-16 2019-01-04 杭州海康威视数字技术股份有限公司 It is a kind of based on the headstock of image towards recognition methods and device
CN109214343A (en) * 2018-09-14 2019-01-15 北京字节跳动网络技术有限公司 Method and apparatus for generating face critical point detection model

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4955616B2 (en) * 2008-06-27 2012-06-20 富士フイルム株式会社 Image processing apparatus, image processing method, and image processing program
CN106295476B (en) * 2015-05-29 2019-05-17 腾讯科技(深圳)有限公司 Face key point localization method and device
US10621725B2 (en) * 2017-04-12 2020-04-14 Here Global B.V. Small object detection from a large image
CN107492121B (en) * 2017-07-03 2020-12-29 广州新节奏智能科技股份有限公司 Two-dimensional human body bone point positioning method of monocular depth video
CN107491726B (en) * 2017-07-04 2020-08-04 重庆邮电大学 Real-time expression recognition method based on multichannel parallel convolutional neural network
CN107766811A (en) * 2017-10-10 2018-03-06 浙江大学 A kind of face identification method and system based on complicated flow structure
CN107862705B (en) * 2017-11-21 2021-03-30 重庆邮电大学 Unmanned aerial vehicle small target detection method based on motion characteristics and deep learning characteristics

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599830A (en) * 2016-12-09 2017-04-26 中国科学院自动化研究所 Method and apparatus for positioning face key points
CN109145928A (en) * 2017-06-16 2019-01-04 杭州海康威视数字技术股份有限公司 It is a kind of based on the headstock of image towards recognition methods and device
CN108510734A (en) * 2018-03-30 2018-09-07 深圳市金溢科技股份有限公司 A kind of information of vehicles matching process of roadside unit and a kind of roadside unit
CN108764048A (en) * 2018-04-28 2018-11-06 中国科学院自动化研究所 Face critical point detection method and device
CN109214343A (en) * 2018-09-14 2019-01-15 北京字节跳动网络技术有限公司 Method and apparatus for generating face critical point detection model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
复杂环境下车牌定位的研究与应用;郜伟;《中国优秀硕士学位论文全文数据库 信息科技辑》;20161015;第I138-430页 *

Also Published As

Publication number Publication date
CN110287955A (en) 2019-09-27

Similar Documents

Publication Publication Date Title
CN110287955B (en) Target area determination model training method, device and computer readable storage medium
CN110276345B (en) Convolutional neural network model training method and device and computer readable storage medium
CN110288082B (en) Convolutional neural network model training method and device and computer readable storage medium
CN110321958B (en) Training method of neural network model and video similarity determination method
CN110276346B (en) Target area recognition model training method, device and computer readable storage medium
CN110298413B (en) Image feature extraction method and device, storage medium and electronic equipment
CN110413812B (en) Neural network model training method and device, electronic equipment and storage medium
CN110211195B (en) Method, device, electronic equipment and computer-readable storage medium for generating image set
CN112329762A (en) Image processing method, model training method, device, computer device and medium
CN114282581A (en) Training sample obtaining method and device based on data enhancement and electronic equipment
CN110555861B (en) Optical flow calculation method and device and electronic equipment
CN112330788A (en) Image processing method, image processing device, readable medium and electronic equipment
CN111862352A (en) Positioning model optimization method, positioning method and positioning equipment
CN110287817B (en) Target recognition and target recognition model training method and device and electronic equipment
CN113592033B (en) Oil tank image recognition model training method, oil tank image recognition method and device
CN111382695A (en) Method and apparatus for detecting boundary points of object
CN111862351B (en) Positioning model optimization method, positioning method and positioning equipment
CN110321858B (en) Video similarity determination method and device, electronic equipment and storage medium
CN110378936B (en) Optical flow calculation method and device and electronic equipment
CN113255812B (en) Video frame detection method and device and electronic equipment
CN114419322B (en) Image instance segmentation method and device, electronic equipment and storage medium
CN114494379A (en) Convex hull assisted three-dimensional point cloud registration method, equipment and medium based on FPFH (field programmable gate flash)
CN114155545A (en) Form identification method and device, readable medium and electronic equipment
CN114004229A (en) Text recognition method and device, readable medium and electronic equipment
CN114495080A (en) Font identification method and device, readable medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Douyin Vision Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: Tiktok vision (Beijing) Co.,Ltd.

Address after: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee after: Tiktok vision (Beijing) Co.,Ltd.

Address before: 100041 B-0035, 2 floor, 3 building, 30 Shixing street, Shijingshan District, Beijing.

Patentee before: BEIJING BYTEDANCE NETWORK TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder