CN111008634A

CN111008634A - Character recognition method and character recognition device based on example segmentation

Info

Publication number: CN111008634A
Application number: CN201911159564.9A
Authority: CN
Inventors: 许永喜; 孙巍巍; 师小凯; 邓一星
Original assignee: Beijing Elite Road Technology Co ltd
Current assignee: Beijing Elite Road Technology Co ltd
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2020-04-14
Anticipated expiration: 2039-11-22
Also published as: CN111008634B

Abstract

The application provides a character recognition method and a character recognition device based on example segmentation, which are used for improving the accuracy of character recognition. The character recognition method based on example segmentation comprises the following steps: acquiring an image to be identified; wherein the image to be recognized comprises at least one character to be recognized; carrying out example segmentation on the image to be recognized to obtain a first example segmentation image of the carrier of the at least one character to be recognized; carrying out example segmentation on the first example segmentation image to obtain a second example segmentation image of the at least one character to be recognized; and performing character recognition on the second example segmentation image to obtain the at least one character to be recognized.

Description

Character recognition method and character recognition device based on example segmentation

Technical Field

The present disclosure relates to the field of image pattern recognition, video target detection and tracking, intelligent video surveillance or intelligent transportation, and in particular, to a character recognition method and a character recognition apparatus based on instance segmentation.

Background

With the application of the machine learning technology in the field of image processing, the application effect of the image processing technology is greatly improved. An important content in the field of image processing is character recognition, and an important application scene of character recognition is license plate recognition.

Currently, a license plate recognition system mainly includes 3 processing modules, such as license plate detection, character segmentation and character recognition. In the traditional character segmentation, pixel regions belonging to different objects are separated, for example, a foreground is separated from a background, and the region of each character is separated from the background, so that the segmentation mode is rough, the precision is low, and certain influence on the accuracy of subsequent character recognition is avoided.

Therefore, the accuracy of character recognition in the prior art is low.

Disclosure of Invention

The embodiment of the application provides a character recognition method and a character recognition device based on example segmentation, which are used for improving the accuracy of character recognition.

In a first aspect, the present application provides a character recognition method based on example segmentation, including: acquiring an image to be identified; wherein the image to be recognized comprises at least one character to be recognized; carrying out example segmentation on the image to be recognized to obtain a first example segmentation image of the carrier of the at least one character to be recognized; carrying out example segmentation on the first example segmentation image to obtain a second example segmentation image of the at least one character to be recognized; and performing character recognition on the second example segmentation image to obtain the at least one character to be recognized.

In the embodiment of the application, the example segmentation is carried out on the image to be recognized, so that the pixel-level segmentation is realized, and the accuracy of character recognition can be ensured.

Further, in the embodiment of the application, the image to be recognized is subjected to instance segmentation to obtain the first instance segmentation image of the carrier of the at least one character to be recognized, and then the instance segmentation and character recognition are performed based on the first instance segmentation image.

On the other hand, in the embodiment of the application, since the interference of other background objects in the image to be recognized is eliminated, the detection range of at least one character to be recognized is reduced, so that the segmentation efficiency of at least one character to be recognized is improved, that is, the character recognition efficiency is improved.

In one possible design, performing instance segmentation on the image to be recognized to obtain a first instance segmentation image of the carrier of the at least one character to be recognized, includes: carrying out example segmentation on the image to be identified by utilizing a first example segmentation model to obtain a first example segmentation image; the training sample set of the first example segmentation network corresponding to the first example segmentation model corresponds to at least two kinds of illumination intensity and/or at least two kinds of inclination angles; correspondingly, the example segmentation is performed on the first example segmentation image to obtain a second example segmentation image of the at least one character to be recognized, and the method comprises the following steps: carrying out example segmentation on the first example segmentation image by using a second example segmentation model to obtain a second example segmentation image; and the training sample set of the second example segmentation network corresponding to the second example segmentation model corresponds to at least two illumination intensities and/or at least two inclination angles.

In the embodiment of the application, the training sample sets of the first example segmentation network and the second example segmentation network correspond to at least two illumination intensities and/or at least two inclination angles, so that the first example segmentation model after the first example segmentation network is trained and the second example segmentation model obtained after the second example segmentation network is trained can eliminate character recognition errors caused by the influence of the illumination intensity and/or the inclination angle of the image to be recognized, and the accuracy of character recognition can be further improved.

In one possible design, the structure of the first instance splitting network and the structure of the second instance splitting network are the same; or the structure of the first instance splitting network and the structure of the second instance splitting network are different; wherein a size of a convolution kernel of the first instance splitting network is larger than a size of a convolution kernel of the second instance splitting network when a structure of the first instance splitting network and a structure of the second instance splitting network are different.

In the embodiment of the present application, when the structure of the first example division network and the structure of the second example division network are different example division network structures, the convolution kernel size of the first example division network is larger than the convolution kernel size of the second example division network. The image area of the carrier of the at least one character to be recognized is larger than the image area of each character to be recognized in the at least one character to be recognized, so that the feature maps of the carrier with more resolutions can be obtained through the arrangement, the example segmentation accuracy is further improved, and the character recognition accuracy is further improved.

In one possible design, the number of convolution layers of the first example segmentation network and the second example segmentation network is less than a preset number of convolution layers.

In the embodiment of the application, the number of convolution layers of the first example segmentation network and the second example segmentation network is smaller than the preset number of convolution layers, so that the calculation amount can be reduced, and the character recognition efficiency is improved. The number of the preset convolution layers can be 23, 20 or 18, and the preset convolution layers can be set according to actual needs.

In one possible design, performing instance segmentation on the first instance segmentation image to obtain a second instance segmentation image of the at least one character to be recognized includes: performing edge fitting on the first example segmentation image to obtain a fitted first example segmentation image; and carrying out example segmentation on the fitted first example segmentation image to obtain the second example segmentation image.

Because the obtained first example segmented image edge is not regular enough, in the specific implementation process, the edge point obtained by example segmentation can be adopted to fit the edge of the first example segmented image, so that the edge of the first example segmented image is more regular, and the subsequent feature selection can be facilitated.

In a second aspect, an embodiment of the present application further provides a character recognition apparatus, including:

the acquisition module is used for acquiring an image to be identified; wherein the image to be recognized comprises at least one character to be recognized;

the first segmentation module is used for carrying out example segmentation on the image to be recognized to obtain a first example segmentation image of the carrier of the at least one character to be recognized;

the second segmentation module is used for carrying out example segmentation on the first example segmentation image to obtain a second example segmentation image of the at least one character to be recognized;

and the recognition module is used for carrying out character recognition on the second example segmentation image to obtain the at least one character to be recognized.

In a possible design, when the first segmentation module performs instance segmentation on the image to be recognized to obtain a first instance segmentation image of the carrier of the at least one character to be recognized, the first segmentation module is specifically configured to:

carrying out example segmentation on the image to be identified by utilizing a first example segmentation model to obtain a first example segmentation image; the training sample set of the first example segmentation network corresponding to the first example segmentation model corresponds to at least two kinds of illumination intensity and/or at least two kinds of inclination angles;

correspondingly, when the second segmentation module performs instance segmentation on the first instance segmentation image to obtain a second instance segmentation image of the at least one character to be recognized, the method is specifically configured to:

carrying out example segmentation on the first example segmentation image by using a second example segmentation model to obtain a second example segmentation image; and the training sample set of the second example segmentation network corresponding to the second example segmentation model corresponds to at least two illumination intensities and/or at least two inclination angles.

In one possible design, the structure of the first instance splitting network and the structure of the second instance splitting network are the same; or

The structure of the first instance split network and the structure of the second instance split network are different;

wherein a size of a convolution kernel of the first instance splitting network is larger than a size of a convolution kernel of the second instance splitting network when a structure of the first instance splitting network and a structure of the second instance splitting network are different.

In a possible design, when the second segmentation module performs instance segmentation on the first instance segmentation image to obtain a second instance segmentation image of the at least one character to be recognized, the method is specifically configured to:

performing edge fitting on the first example segmentation image to obtain a fitted first example segmentation image;

and carrying out example segmentation on the fitted first example segmentation image to obtain the second example segmentation image.

In a third aspect, the present application further provides a character recognition apparatus, including:

a memory storing instructions;

a processor configured to read instructions stored in the memory and execute the method according to any one of the first aspect and the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform the method of the above aspects.

In a fifth aspect, the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the above aspects.

Furthermore, in the embodiment of the present application, since the interference of other background objects in the image to be recognized is eliminated, the detection range of the at least one character to be recognized is reduced, so that the segmentation efficiency of the at least one character to be recognized is improved, that is, the character recognition efficiency is improved.

Drawings

FIG. 1 is a schematic diagram of a U-net split network in the prior art;

FIG. 2 is a schematic view of a vehicle image captured by the present application;

fig. 3 is a schematic diagram of an application scenario provided in the present application;

FIG. 4 is a schematic flow chart of a character recognition method based on example segmentation according to the present application;

5A-5B are schematic diagrams illustrating tagging of sample data in the present application;

FIG. 6 is a schematic illustration of a first correction to a first example segmented image according to the present application;

FIG. 7 is a schematic illustration of a second correction to the first example segmentation according to the present application;

FIG. 8 is a schematic view of a specific flow chart of the example segmentation-based character recognition method applied to license plate recognition according to the present application;

FIG. 9 is a schematic view of a basic flow chart of a character recognition method based on example segmentation applied to license plate recognition according to the present application;

fig. 10 is a schematic structural diagram of a character recognition apparatus according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of another character recognition apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a character recognition apparatus provided in an embodiment of the present application when the character recognition apparatus is a server;

fig. 13 is a schematic structural diagram of a character recognition apparatus provided in the embodiment of the present application when the character recognition apparatus is a terminal device.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

Hereinafter, some terms in the embodiments of the present application are explained to facilitate understanding by those skilled in the art.

(1) The U-net partitions the network. Referring to fig. 1, the U-net split network includes a convolutional layer, a max-pooling layer (downsampling), a deconvolution layer (upsampling), and an activation function (reguu), which will be described in detail below.

Maximum pooling, down-sampling process:

assuming that the size of the image input first is 572 × 572, the convolution operation is performed twice on convolution kernels of 3 × 3 × 64(64 convolution kernels, which obtain 64 feature maps), and then an image of 568 × 568 × 64 size is obtained. Then, a maximum pooling operation of 2 × 2 was performed, and the size was changed to 248 × 248 × 64. After the convolution with 3 × 3, the original image is down-sampled by the ReLU. And each down-sampling will increase one time the number of channels, i.e. the number of convolution kernels is multiplied.

When the bottom layer is reached, i.e., after the4 th maximum pooling, the image size becomes 32 × 32 × 512, and then, after performing convolution operations of 3 × 3 × 1024 twice, an image having a size of 28 × 28 × 1024 is obtained.

Deconvolution layer, up-sampling process:

firstly, deconvolution operation of 2 × 2 is carried out to obtain an image with the size of 56 × 56 × 512, then, the image before the corresponding maximum pooling layer is copied and cut, and is spliced with the image obtained by deconvolution to obtain an image with the size of 56 × 56 × 1024, and then, convolution operation of 3 × 3 × 512 is carried out.

The above process is repeated 4 times, i.e. 4 deconvolution of 2 × 2 + convolution of 3 × 3 are performed, the number of convolution kernels is reduced by a factor of two after the first convolution operation of 3 × 3 after each splice is performed. When the uppermost layer is reached, namely after the4 th deconvolution, the image size is changed to 392 × 392 × 64, copying and cropping are performed, then splicing is performed to obtain an image with the size of 392 × 392 × 128, then the convolution operation of 3 × 3 × 64 is performed twice to obtain an image with the size of 388 × 388 × 64, and finally the convolution operation of 1 × 1 × 2 is performed again, so that the feature vectors of 64 channels can be converted into the number of required classification results.

(2) The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship, unless otherwise specified. Also, in the description of the embodiments of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not intended to indicate or imply relative importance nor order to indicate or imply order.

Before describing the example segmentation based character recognition method provided by the present application, application scenarios of the example segmentation based character recognition method provided by the present application are first described, including but not limited to the following application scenarios, which will be briefly described below.

First, license plate recognition scene

By way of example, license plate recognition scenarios include, but are not limited to, road traffic monitoring, site investigation of traffic accidents, automatic recording of traffic violations, automatic highway toll collection systems, automatic parking lot safety management, intelligent circle management, and the like. For example, the automatic recording of traffic violation is performed, the video image is acquired by the image acquisition device and then sent to the server, and the server acquires a vehicle image (for example, the vehicle image shown in fig. 2) from the acquired video image, and further identifies a license plate in the vehicle image, so that the information of the violation vehicle is automatically acquired.

Second, Optical Character Recognition (OCR) scene

OCR is a process of converting words in a document to be recognized into a text format through character recognition, such as zip code, book scanning document faxing, recognizing gender of an identification card, type of vehicle driving a license, and the like.

It should be understood that the character recognition method based on example segmentation provided by the embodiment of the application can be applied to an implementation environment interacting with a terminal device and a server. When applied to an implementation environment interacting with a terminal device and a server, please refer to fig. 3, the implementation environment includes: the system comprises a server and terminal equipment connected with the server through a network. The terminal device is used for sending video data recorded by the terminal device to the server, the server is used for executing the example segmentation-based character recognition method provided by the embodiment of the application so as to perform character recognition in each video frame included in the video data sent by the terminal device, or the terminal device is used for sending image data acquired by the terminal device to the server, and the server is used for executing the example segmentation-based character recognition method provided by the embodiment of the application.

Wherein, the terminal device can be connected to a Wireless network such as Wireless Fidelity (WIFI), third generation mobile communication technology (the 3th generation, 3G), and fourth generation mobile communication technology (the 4)^thGeneration, 4G), fifth generation mobile communication technology (the 5)^thGeneration, 5G) to communicate with a server, or to communicate with a server via a wired network. The terminal device may be an independent device having an image capturing function, such as a camera, or may be a device integrated with an image capturing function module, such as a computer, a smart phone, a tablet computer, and the like. The server may be an application server or a World Wide Web (Web) server, and when the server is deployed in actual application, the server may be an independent server or a cluster server composed of a plurality of servers, or the server may be an embedded intelligent analysis device, such as an embedded development board, and the running level of the algorithmThe table is provided with a plurality of tables,

of course, the character recognition method based on example segmentation provided by the embodiment of the application can also be applied to terminal equipment. The terminal device may be an intelligent analysis all-in-one dome camera (the dome camera is a short for spherical camera and is used for acquiring video data), or may be a portable device, which is exemplified by a mobile device, such as a mobile phone, a tablet computer, a notebook computer, or a wearable device (e.g., a smart watch or smart glasses) with a wireless communication function. Exemplary embodiments of the mobile device include, but are not limited to, a piggy-back

Or other operating system device.

In the following description, the character recognition method based on example segmentation is applied to a terminal device as an example for explanation. It should be understood that the execution subject of the character recognition method based on example division is not limited to the terminal device, and may be applied to a device having an image processing function, such as a server. Referring to fig. 4, a schematic flow chart of a character recognition method based on example segmentation according to an embodiment of the present application is shown, where the flow chart of the method is described as follows:

s401: acquiring an image to be identified; wherein the image to be recognized comprises at least one character to be recognized.

In the specific implementation process, since the character recognition result depends on the definition of the image to be recognized, for example, when the definition of the image to be recognized is low, the accuracy of character recognition is low, and in case of a recognition error, the image to be recognized needs to be obtained again, thereby also affecting the efficiency of character recognition.

The definition of the image to be recognized is often easily influenced by the environmental factors, such as jitter or illumination intensity, of the terminal device when the image is acquired. That is, in the embodiment of the present application, the influence of jitter and/or illumination intensity on character recognition is avoided. In a specific implementation process, when the image acquisition device acquires an image, the stable parameters of the terminal device are firstly acquired. And the stability parameter is used for evaluating the stability degree of the terminal equipment. I.e. the stability parameter is used to indicate whether the terminal device is in a stable state, i.e. the stability parameter is used to indicate the jitter level of the terminal device. For example, when the jitter amplitude of the terminal device is large, the stability parameter of the terminal device indicates that the stability degree of the terminal device is low.

In the embodiment of the present application, the stability parameter may be measured by a sensor in the terminal device, such as a gyroscope sensor, a gravity sensor, an acceleration sensor, a rotation vector sensor, and the like. Illustratively, the rotation angle of the image acquisition device within a preset time length is measured by a gyroscope sensor, the displacement of the image acquisition device within the preset time length in the gravity direction is measured by a gravity sensor, and the acceleration value of the image acquisition device within one direction within the preset time length is measured by an acceleration sensor.

And thirdly, when the image acquisition device acquires the image, acquiring the illumination intensity of the current environment of the terminal equipment. For example, the light may be collected by a light sensor in the terminal device.

It should be understood here that in the actual operation process, the illumination intensity may be acquired first, the stable parameter may also be acquired first, and of course, the illumination intensity and the stable parameter may also be acquired simultaneously, and the acquiring order of the illumination intensity and the stable parameter is not limited by the above-described order. After the stable parameters and the illumination intensity of the terminal equipment are obtained, whether the stable parameters and the illumination intensity meet preset conditions or not is determined, and the image is collected only when the stable parameters and the illumination intensity both meet the preset conditions. As an example, the acceleration value in the stability parameter is smaller than a preset acceleration, and the light intensity is greater than a first preset light intensity and smaller than a second preset light intensity. In the specific implementation process, when the stability parameter and the illumination intensity do not meet the preset conditions, the stability parameter or the illumination intensity of the terminal device can be adjusted through manual intervention, so that both the stability parameter and the illumination intensity can meet the preset conditions.

It should be noted here that, when the example segmentation-based character recognition method of the present application is applied to a license plate recognition scenario, most of the terminal devices are in a fixed state, that is, most of the terminal devices are in a stable state, and then in the application scenario, it is not necessary to obtain stable parameters of the terminal devices. Then, after the illumination intensity of the environment where the terminal device is located meets the preset condition, the terminal device may generate real-time video data, and obtain video frame data from the video data, that is, the image to be identified in step S301.

When the character recognition method based on example segmentation is applied to an OCR scene, most terminal equipment is in a handheld state, namely, a user of the terminal equipment holds the terminal equipment, and when the user holds the terminal equipment, the image acquisition device is in a shaking state due to the influence of external factors. Therefore, in this scenario, it is necessary to obtain both the stable parameter of the terminal device and the illumination intensity of the environment where the terminal device is located. And then, after the stable parameters and the illumination intensity both meet preset conditions, the terminal equipment collects the image to be identified.

After how the terminal device acquires the image to be recognized in step S401 is described, at least one character to be recognized included in the image to be recognized is simply described. In the embodiment of the application, at least one character to be recognized can be any one or more combinations of Chinese characters, numbers, symbols and English letters. As an example, the license plate of a car is a chuan a12345, which includes both chinese characters and english letters and numbers. As another example, a piece of text "theory and application research of intelligent transportation" on books raises a hot tide, including chinese characters and symbols.

After the terminal device obtains the image to be recognized, the image to be recognized is processed for subsequent character recognition, and step S402 is described first below: carrying out example segmentation on the image to be recognized to obtain a first example segmentation image of the carrier of the at least one character to be recognized; and step S403: and carrying out example segmentation on the first example segmentation image to obtain a second example segmentation image of the at least one character to be recognized.

Wherein, the carrier of at least one character to be recognized refers to a first carrier of at least one character to be recognized. As an example, the image to be recognized is a vehicle image, and the carrier of at least one character to be recognized refers to a license plate and is not a vehicle; as another example, the image to be recognized is a beverage bottle, and the carrier of the at least one character to be recognized refers to a label attached to the beverage bottle, not the beverage bottle.

The following describes in detail the specific implementation process of step S402 and step S403, including the following steps:

correspondingly, a second example segmentation model is used for carrying out example segmentation on the first example segmentation image to obtain a second example segmentation image; and the training sample set of the second example segmentation network corresponding to the second example segmentation model corresponds to at least two illumination intensities and/or at least two inclination angles.

It should be noted that the example segmentation model is obtained by continuous learning and training of the example segmentation network, where the segmentation network corresponding to the first example segmentation model is referred to as a first example segmentation network, and the segmentation network corresponding to the second example segmentation model is referred to as a second example segmentation network.

In the embodiment of the present application, the structure of the first example segmentation network and the structure of the second example segmentation network may be the same, for example, the structure of the first example segmentation network and the structure of the second example segmentation network both include a contraction path and an expansion path, each two convolution layers on the contraction path are followed by a maximum pooling layer of 2 × 2, and each convolution layer is followed by a resampling function to perform a downsampling operation on the original image. It should be noted here that the size of the convolution kernel at the uppermost layer on the contraction path is larger than that of the convolution kernel at the lower layer, for example, the size of the convolution kernel at the uppermost layer is 6 × 6, and the size of the convolution kernel at the lower layer is 3 × 3; on the extended path, each step will have one 2 x 2 convolutional layer and two 3 x 3 convolutional layers, and at the last layer of the net is one 1 x 1 convolutional layer, which includes 23 convolutional layers.

In a specific implementation process, the structure of the first instance division network and the structure of the second instance division network may also be different. And when the structure of the first instance division network and the structure of the second instance division network are different, the size of the convolution kernel of the first instance division network is larger than the size of the convolution kernel of the second instance division network. As an example, the example split network introduced in the above paragraph is taken as a first example split network and the U-net split network is taken as a second example split network. The reason why the size of the convolution kernel of the first example segmentation network is designed to be larger than that of the convolution kernel of the second example segmentation network is that the image area of the carrier of the at least one character to be recognized is larger than that of the at least one character to be recognized, so that in order to extract a feature map with more resolution, the size of the convolution kernel of the first example segmentation network for performing example segmentation on the image to be recognized can be set to be larger.

Further, in the embodiment of the application, in order to reduce the calculation amount, the example segmentation efficiency is improved. In a specific implementation process, the number of convolution layers of the first instance segmentation network and the second instance segmentation network is smaller than the preset number of convolution layers. The preset convolution layer number can be 17 layers, 18 layers, 23 layers, or other layers, and can be specifically set according to actual needs. As an example, taking a preset number of convolutional layer layers as 23, the number of convolutional layer layers of the first example segmentation network is 18, and the number of convolutional layer layers of the second example segmentation network is 20; or the number of convolutional layers of the first example split network is 20, and the number of convolutional layers of the second example split network is also 20, which is not illustrated here. Therefore, to further reduce the amount of computation, the convolutional layer of the first example split network of the above design may be reduced to 18 layers, and the convolutional layer of the U-net split network may be reduced to 20 layers.

It can be understood here that the first example segmentation model is a segmentation model of a carrier of at least one character to be recognized, and thus the first example segmentation model may be referred to as a carrier example segmentation model, and as an example, when the character recognition method based on example segmentation provided by the present application is applied to license plate recognition, the first example segmentation model may be specifically referred to as a license plate example segmentation model; the second example segmentation model is used for segmenting at least one character to be recognized, and therefore the second example segmentation model can be particularly called a character example segmentation model. Of course, in a specific implementation process, the first example segmentation model and the second example segmentation model may also be named by other names according to actual functions thereof, and are not limited in the embodiments of the present application.

After the structure of the first example segmentation network and the structure of the second example segmentation network are introduced, how to train the first example segmentation network and the second example segmentation network to obtain the first example segmentation model and the second example segmentation model is described below.

At present, the conventional character recognition method is to perform normalization processing on an image to be recognized to form an image composed of pixel gray levels, where pixels are set to be 0 or 1, and then perform character recognition. The algorithm is easily affected by external conditions, such as light intensity, inclination angle, etc., and when the light intensity is too strong or too weak, the input characteristic value 0 or 1 of the pixel bit is set incorrectly, thereby causing the misrecognition of the character.

In the embodiment of the present application, in order to reduce the probability of character misrecognition, when training the first example segmented network and the second example segmented network, the influence of the illumination intensity, the inclination angle, and the like is considered in the training samples, which is described below separately. It should be noted that, in the following description, the example of applying the character recognition method based on example segmentation provided in the present application to the license plate character recognition is taken as an example.

1) Training a first instance segmentation network

A training sample set is first prepared.

In the embodiment of the present application, the training samples in the training sample set are collected in the following two ways, which are described below.

In a first mode

The terminal equipment is used for collecting images of vehicles in different time periods every day in a period, wherein the period can be half a month, one week, five working days and the like, and can be set according to actual needs. Because the vehicle images of different time periods of a day are collected, the obtained vehicle image data can contain the conditions of different illumination intensities, and in the process of collecting the terminal device, the collected vehicle is in motion change, such as forward movement, left turning, right turning and the like, so that the collected vehicle image data can contain different inclination angles.

Mode two

The method comprises the steps of acquiring an image of a vehicle by using a terminal device, adjusting the illumination intensity of the image of the vehicle by using an image processing technology, stretching, zooming and rotating the image of the vehicle, and further obtaining a group of images, wherein the group of images comprise vehicle photos corresponding to different illumination intensities and photos corresponding to different inclination angles.

Thirdly, training the first example segmentation network by using the obtained training sample set

After the vehicle image is acquired or the processed vehicle image is obtained, 75% of the acquired vehicle image data is used as training data of the first example segmentation network, namely a training sample set of the first example segmentation network, and the remaining 25% of the vehicle image data is used as test data of the first example segmentation network.

Before the first example segmentation network is trained, an image labeling tool Labelme is used for labeling the vehicle images in the training set sample set, that is, a license plate region is labeled on the vehicle images in the training sample set, specifically, refer to FIG. 5A and FIG. 5B, and then the labeled vehicle image data is input into the first example segmentation network, and the first example segmentation model is obtained after training.

2) Training a second instance segmentation network

Training of the second example segmented network also first prepares a training sample set, it being understood that the training samples in the training sample set should be license plate image data. The manner of obtaining the training samples in the training data set of the second example segmented network is the same as the manner of obtaining the training samples of the first example segmented network, and is not described herein again.

After the license plate image is collected or the processed license plate image is obtained, 75% of the obtained license plate image data is used as training data of the second example segmentation network, namely a training sample set of the second example segmentation network, and the remaining 25% of the license plate image data is used as test data of the second example segmentation network.

Before the second example segmentation network is trained, an image labeling tool Labelme is used for labeling the training samples in the training sample set, namely, a character area is labeled on the license plate image in the training sample set, then the labeled license plate image data is input into the second example segmentation network, and after the training is finished, a second example segmentation model is obtained.

And after the first example segmentation model and the second example segmentation model are obtained, segmenting the image to be recognized. Firstly, carrying out example segmentation on an image to be recognized by using a first example segmentation model to obtain a first example segmentation image of a carrier of at least one character to be recognized.

After the first example segmented image is obtained, step S403 is executed next, and in the implementation process, the implementation process of step S403 includes the following steps:

In a specific implementation process, the edge of the obtained first example segmented image may be some discrete points or line segments, which are irregular, so that the edge is more regular, and a basis is laid for subsequent feature selection, and then the edge of the first example segmented image may be fitted by using edge pixel points obtained by example segmentation. The fitting method may be least square fitting, gradient descent fitting, or gaussian-newton, column-horse algorithm fitting, or other fitting algorithms, which are not illustrated herein.

And after the first example segmentation image is fitted, performing example segmentation on the fitted first example segmentation image by using a second example segmentation model. When the terminal device acquires the image to be recognized, the acquired image is inclined due to the influence of external factors, such as the angle and position of a license plate suspension, the track of a vehicle or the inclination of the terminal device, and when the inclination angle of the acquired image is greater than or equal to 2 degrees, the subsequent segmentation accuracy and character recognition rate are influenced. Therefore, in the embodiment of the present application, before the example segmentation is performed on the fitted first example segmented image, the correction of the inclination angle needs to be performed on the fitted image.

In a specific implementation process, the correction of the fitted image mainly comprises the following steps:

step one, angle detection

In the embodiment of the application, a method based on Radon transformation is adopted to search straight line segments forming a rectangle and the inclination angles of the straight line segments. Among them, the basic idea of the Randon transform is point-line duality. The image is in the image space before the image transformation and in the parameter space after the transformation. The method comprises the following specific steps:

and carrying out binarization on the fitted image, then carrying out Radon transformation after marginalizing the binary image, and then calculating a peak value in a Radon transformation matrix. After Radon transformation, straight line segments in the original image correspond to points in the Radon space, and the longer the line segment, the greater the corresponding luminance of the points. Therefore, the peak point (ρ, θ) is found in the Radon space, and θ is the inclination angle of the straight line segment in the corresponding original image.

Step two, correcting

The correction is in two steps. First, one side of the quadrangle representing the license plate region is planned to be parallel to the y-axis, as shown in fig. 6. Without loss of generality, assume a center of rotation of (x)₀，y₀) At α, any point (x, y) in the original image is converted to a point (x)_k，y_k) This can be described by the following formula:

wherein α takes a negative value clockwise and takes a positive value counterclockwise.

Then, the transformed new image is operated on again. The pixel position is adjusted according to the slope of the horizontal edge of the license plate, the lowest point of the parallelogram is taken as a reference pixel point, the reference pixel point is taken as a reference horizontal line, each pixel point on the bottom edge vertically sinks to a reference straight line, and other elements on the image sequentially move downwards as shown in fig. 7.

Let the coordinate of the lowest point A be (x)₁，y₁) The included angle between the oblique side and the reference straight line is β, and the coordinate of any pixel point (x, y) in the license plate is corrected to be (x ', y'), and the relationship can be represented by the following formula:

thus, the tilted image is shifted to the normal position by the two corrections.

And after the fitted first example image is subjected to tilt correction, obtaining a corrected first example segmentation image, and then performing example segmentation on the corrected first example segmentation image by using a second example segmentation model to obtain a second example segmentation image.

After the second example segmented image is obtained, step S404 is executed: and performing character recognition on the second example segmentation image to obtain the at least one character to be recognized.

In this embodiment of the application, the method for performing character recognition on the second example segmented image may be a Support Vector Machine (SVM), or may be based on a Neural Network, for example, at least one of a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), a feed-forward Neural Network, and a feed-back Neural Network. The neural network model may be a model obtained by training sample characters, and the sample characters are data labeled as actual character results.

The above steps describe the character recognition method based on example segmentation provided by the present application in detail, and a complete flow chart for applying the character recognition method based on example segmentation to the field of license plate recognition is provided below, specifically referring to fig. 8.

Step 1: acquiring an image to be identified; the image to be identified is extracted from the acquired video data, and the video data is data which is acquired by the terminal equipment by using the image acquisition unit and comprises at least two frames of images.

Step 2: acquiring a license plate instance segmentation model; the structure and training process of the license plate example segmentation model may refer to the introduction of the first example segmentation model in the character recognition method based on example segmentation, and are not described herein again.

And step 3: carrying out instance segmentation on an image to be recognized by utilizing a license plate instance segmentation model to obtain a segmented license plate image, wherein a region which is corresponding to the oblique line in the picture in the step is the segmented license plate image;

and 4, step 4: performing edge fitting on the segmented license plate image to obtain a fitted license plate image; the fitting method is the same as the fitting method introduced in the character recognition method based on example segmentation, and is not described herein again.

And 5: correcting the fitted license plate image to obtain a corrected license plate image;

step 6: acquiring a character instance segmentation model; the structure and training process of the character example segmentation model may refer to the description of the second example segmentation model in the character recognition method based on example segmentation, and are not described herein again.

And 7: carrying out instance segmentation on the corrected license plate image by using a character instance segmentation model to obtain a character segmentation image;

and 8: performing character recognition on the character segmentation image to obtain at least one character to be recognized; the character recognition method is the same as the character recognition algorithm described above, and is not described herein again.

And step 9: and outputting at least one character to be recognized.

In summary, referring to fig. 9, the basic license plate recognition process provided by the present application mainly includes the following steps: the method comprises the steps of collecting a vehicle image, then carrying out license plate positioning (namely, carrying out two example segmentation processes on an image to be recognized), then carrying out character recognition on a license plate, outputting at least one character to be recognized, and processing the at least one character to be recognized.

Based on the same inventive concept, please refer to fig. 10, the present application further provides a character recognition apparatus 1000, comprising:

an obtaining module 1001, configured to obtain an image to be identified; wherein the image to be recognized comprises at least one character to be recognized;

a first segmentation module 1002, configured to perform instance segmentation on the image to be recognized to obtain a first instance segmentation image of the carrier of the at least one character to be recognized;

a second segmentation module 1003, configured to perform instance segmentation on the first instance segmentation image to obtain a second instance segmentation image of the at least one character to be recognized;

the identifying module 1004 is configured to perform character identification on the second example segmented image to obtain the at least one character to be identified.

In a possible design, when the first segmentation module 1002 performs instance segmentation on the image to be recognized to obtain a first instance segmentation image of the carrier of the at least one character to be recognized, the first segmentation module is specifically configured to:

correspondingly, when the second segmentation module 1003 performs instance segmentation on the first instance segmentation image to obtain a second instance segmentation image of the at least one character to be recognized, the method is specifically configured to:

In a possible design, when the second segmentation module 1003 performs instance segmentation on the first instance segmentation image to obtain a second instance segmentation image of the at least one character to be recognized, the method is specifically configured to:

Referring to fig. 11, an embodiment of the present application further provides a character recognition apparatus 1100, including:

a memory 1101 for storing instructions;

and a processor 1102 for reading the instructions stored in the memory to implement the character recognition method based on example segmentation as shown in fig. 4.

The number of the Memory 1101 may be one or more, and the Memory 1101 may be a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk Memory, or the like.

The processor 1102 may include one or more processing cores, such as a 4-core processor, an 8-core processor. The Processor 1102 may be a Central Processing Unit (CPU), a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, transistor logic, hardware components, or any combination thereof. The processor 1102 may also include a main processor, which is a processor for processing data in the wake state, also referred to as a CPU, and a coprocessor, which is a low power processor for processing data in the standby state. In some embodiments, the processor 1102 may be further integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, the processor 1102 may further include an Artificial Intelligence (AI) processor for processing operations related to machine learning. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure.

Before introducing the example segmentation-based character recognition method provided by the present application, it is mentioned that the execution subject of the present application may be a service or a terminal device, and thus the character recognition apparatus 1100 herein may be a server or a terminal device; when the character recognition device 1100 is a terminal device, the terminal device may be an intelligent analysis all-in-one ball machine or a mobile phone.

Referring to fig. 12, when the character recognition apparatus 1100 is a server, the character recognition apparatus 1100 may further include at least one power source, at least one wired or wireless network interface, at least one input/output interface, and/or at least one operating system.

Referring to fig. 13, when the character recognition apparatus 1100 is a terminal device, the terminal device is a ball machine, and the ball machine may further include a wireless communication module, a voice acquisition module, a sensor, a power supply, and other components. It will be appreciated by those of ordinary skill in the art that the ball machine configuration described above does not constitute a limitation of ball machines and may include more or fewer components than those described above, or some components may be combined, or a different arrangement of components.

The following describes each component of the ball machine specifically by taking the terminal device as the ball machine:

a wireless communication module, for example, a WIFI module, a bluetooth module, a 3G module, a 4G module, a 5G communication module, or other next-generation communication modules;

the voice acquisition module, such as a microphone, is used for acquiring voice information to realize voice automatic positioning or voice recognition to realize identity recognition and the like;

the ball machine may further comprise at least one sensor, such as a light sensor, an acceleration sensor, a gravity sensor. The ball machine also comprises a power supply for supplying power to each component, wherein the power supply can be logically connected with the processor through the power management system, so that the functions of charging, discharging, power consumption management and the like can be managed through the power management system.

The present application further provides a computer storage medium, which may include a memory, where the memory may store a program, and the program includes all the steps executed by the terminal device as described in the method embodiment shown in fig. 4.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A character recognition method based on example segmentation is characterized by comprising the following steps:

acquiring an image to be identified; wherein the image to be recognized comprises at least one character to be recognized;

carrying out example segmentation on the image to be recognized to obtain a first example segmentation image of the carrier of the at least one character to be recognized;

carrying out example segmentation on the first example segmentation image to obtain a second example segmentation image of the at least one character to be recognized;

and performing character recognition on the second example segmentation image to obtain the at least one character to be recognized.

2. The method according to claim 1, wherein performing instance segmentation on the image to be recognized to obtain a first instance segmentation image of the carrier of the at least one character to be recognized comprises:

correspondingly, the example segmentation is performed on the first example segmentation image to obtain a second example segmentation image of the at least one character to be recognized, and the method comprises the following steps:

3. The method of claim 2,

the structure of the first instance split network is the same as the structure of the second instance split network; or

4. The method of claim 2,

the number of convolution layers of the first example segmentation network and the second example segmentation network is smaller than the number of preset convolution layers.

5. The method according to claim 1, wherein performing instance segmentation on the first instance segmentation image to obtain a second instance segmentation image of the at least one character to be recognized comprises:

6. A character recognition apparatus, comprising:

7. The apparatus according to claim 6, wherein when the first segmentation module performs instance segmentation on the image to be recognized to obtain the first instance segmentation image of the carrier of the at least one character to be recognized, the apparatus is specifically configured to:

8. The apparatus of claim 7,

9. A character recognition apparatus, comprising:

a memory storing instructions;

a processor for reading instructions stored in the memory to perform the method of any one of claims 1-5.

10. A computer-readable storage medium having stored thereon instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1-5.