CN114399670A

CN114399670A - Control method for extracting characters in pictures in 5G messages in real time

Info

Publication number: CN114399670A
Application number: CN202210038976.2A
Authority: CN
Inventors: 黄书涵; 陈淼生; 郑仲嵩
Original assignee: China Telecom Fufu Information Technology Co Ltd
Current assignee: China Telecom Fufu Information Technology Co Ltd
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2022-04-26

Abstract

The invention discloses a control method for extracting characters in a picture in a 5G message in real time, which introduces graphics-based preprocessing, optimizes the characteristics of 5G message junk pictures and obtains a great deal of efficiency improvement of an optical character recognition stage with a small amount of time cost of graphics operation. The method can control the sample picture to flexibly select the training direction of the model according to the current situation of the picture variant, and improve the identification accuracy. Reasonable optimization is carried out on a preprocessing algorithm and an identification model, the speed and the accuracy are improved to reasonable threshold values, and the requirement of real-time picture authentication is met. And the extracted text is processed by a common text filtering link, and a final judgment result is returned so as to realize the management and control of the junk picture message.

Description

Control method for extracting characters in pictures in 5G messages in real time

Technical Field

The invention relates to the field of 5G technical application, in particular to a control method for extracting characters in pictures in 5G messages in real time.

Background

With the advent of the 5G era, operators have developed 5G messages based on RCS (rich media communication) in hopes of expanding richer message services beyond traditional short message service (sms) and multimedia message service (mms) communication. However, the spam is a problem that the spam cannot be eradicated from short messages and multimedia messages to 5G messages. Perfecting the spam management and control platform is a long-term race between an operator and a sender.

Unlike IM software, the weak client-side characteristics of 5G messages result in the need for information management and control capabilities on the device side. And the real-time scene of the 5G message requires that the management and control have low time delay. In media types supported by the 5G messages, the text real-time monitoring technology is mature; the streaming media is difficult to filter in real time under the current computing power; the picture real-time communication is mainly multimedia messages before the 5G message, the multimedia message traffic of each operator is very low at present, and low load enables equipment resources to better cope with real-time picture processing; and the multimedia message is rarely in a real-time interactive scene, so the requirement on the processing speed is not strict.

Disclosure of Invention

The invention aims to provide a control method for extracting characters in a picture in a 5G message in real time.

The technical scheme adopted by the invention is as follows:

a control method for extracting characters in pictures in 5G messages in real time comprises the following steps:

step 1, preprocessing a picture by graphics processing under an opencv frame; the method specifically comprises the following steps:

step 1-1, graying the picture; the gray-scale image is a single-channel image only containing brightness information and no color information, and each pixel is the brightness value of the single-channel image;

1-2, carrying out threshold segmentation and binarization on a gray level picture;

in view of the different gray scale configurations of different pictures, if a fixed threshold is used for all the picture binaryzation, there is a possibility that a subject is blended into the background or noise is highlighted, which may cause interference. The partition threshold needs to be calculated.

Step 1-3, performing noise reduction processing on the binarized picture;

a large number of samples are analyzed to find that most of the noise points of the junk pictures are isolated small points and are distributed in a non-main area of the pictures in a large number. This is often the case where the manufacturer uses algorithms to add various types of noise to the picture, creating obstacles for the information monitor. It is necessary to effectively remove the independent noise using a noise reduction algorithm.

Step 1-4, performing edge detection to obtain a character outline, and obtaining a text block after morphological expansion and corrosion;

the efficiency and accuracy of character recognition are interfered by useless graphic information in the binary image; meanwhile, the line-dividing recognition capability of cnocr is easily influenced by the complicated layout of the text in the graph and becomes inaccurate. Firstly, carrying out edge detection on the binary image to obtain an edge highlight image of the character; followed by morphological erosion and dilation to smooth the patch area;

step 1-5, acquiring four-corner coordinates of a maximum rectangle occupied by the outer edge of the text block, and acquiring each text block in a binary image;

identifying pixel coordinates of the color block outline; and then extracting a corresponding part of subgraph from the obtained rectangular region coordinate to the original binary image. Since cnocr scans pictures by taking files as units, sub-pictures obtained by scanning also need to be pieced into a regular large line drawing, so that the speed and the recognition rate are optimal.

1-6, neatly splicing each text block into a picture;

2, training a model according to the characteristics of the 5G message junk pictures by optical character recognition under a cnocr suite; the recognition of the optical character recognition to the common fonts of the non-standard characters and the junk information pictures is improved, so that the management and control effect is improved.

Step 3, the management and control service process carries out keyword matching on the extracted text and returns a management and control result in real time; and meanwhile, the text is sent to a statistical module for natural language identification to find suspicion and generate a recommendation strategy.

Further, the threshold segmentation in the step 1-2 adopts an OTSU method.

Further, an eight-neighborhood algorithm is adopted in the step 1-3 for noise reduction; or an eight-neighborhood algorithm is combined with a connected domain algorithm to perform noise reduction processing.

Further, Sobel operator is adopted in the step 1-4 for edge detection.

Further, the preprocessing of step 1 also eliminates the processing flow of the pictures which obviously do not have text characteristics according to the analysis characteristic data, so as to reduce the processing load.

Further, the training of step 2 comprises the following procedures:

step 2-1, selecting a large number of sample pictures for unified binarization processing;

step 2-2, generating a training set and a test set and converting binary files;

step 2-3, executing a training set by using a training script;

and 2-4, verifying the effect on the test set by using the test script and importing a new model.

By adopting the technical scheme, the invention introduces the preprocessing based on the graphics, optimizes the characteristics of the 5G message junk pictures and obtains a great deal of efficiency improvement of the optical character recognition stage at the cost of a small amount of time of the graphics operation. The method can control the sample picture to flexibly select the training direction of the model according to the current situation of the picture variant, and improve the identification accuracy.

Compared with the prior art, the invention has the main advantages that: 1. the identification speed is high, and a large amount of concurrent real-time processing can be supported through framework optimization under the condition of GPU hardware support. And a large amount of concurrent quasi-real-time processing and real-time processing of a small-traffic system can be realized under the condition of a non-GPU. 2. The optical character recognition is established on a domestic open source suite, and the Chinese recognition rate is far higher than the general Tesseract in the industry. 3. Model training is carried out according to a large number of existing sample picture cases, and the recognition rate of non-standard fonts is high.

Drawings

The invention is described in further detail below with reference to the accompanying drawings and the detailed description;

fig. 1 is a schematic flow chart of a control method for extracting text in a picture in a 5G message in real time according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

In the 5G message, the picture is directly sent and integrated into the interactive scene in a manner similar to IM. And 5G messages do not maintain a list of communicating entities as does IM, the recipient of a 5G message may be number based. Therefore, under the limit that the management and control can not be positioned at the terminal, the real-time accuracy requirement of the platform on the picture monitoring is greatly improved compared with the multimedia message.

According to the data feedback display of the multimedia message platform and the IM platform, the junk picture information in real-time communication is mainly used for solidifying related texts in a picture mode, and interference factors are added in the picture to avoid the monitoring of the platform. For the junk information, the invention provides a set of character extraction method based on opencv image preprocessing and cnocr optical character recognition, reasonable optimization is carried out on a preprocessing algorithm and a recognition model, the speed and the accuracy are improved to reasonable threshold values, and the requirement of real-time image authentication is met. And the extracted text is processed by a common text filtering link, and a final judgment result is returned so as to realize the management and control of the junk picture message.

OpenCV is a set of BSD-based open-source computer vision library, and it provides an interface and functions to process images conveniently, and is one of the most widely used vision libraries in the industry.

The cnocr is a domestic lightweight open source OCR library based on a Cyclic Neural Network (CNN) and a convolutional neural network (RNN), supports GPU hardware are preset, and the cnocr is superior to the Google open source library Tesseract which is used in the industry in Chinese recognition and model training.

As shown in fig. 1, the present invention discloses a control method for extracting text in a picture in a 5G message in real time, which comprises the following steps:

step 1, the graphic processing uses a self-writing algorithm and an encapsulation function under an opencv framework. The method mainly comprises the following steps:

step 1-1, graying processing. The gray scale value interval is [0,255], representing from darkest to brightest. The RGB to grayscale map conversion may call cvtColor () to get the result of the graying.

And step 1-2, carrying out binarization treatment. The OTSU method firstly counts the number omega _1 of each pixel in the gray level in the whole image; calculating the probability distribution omega _2 of each pixel in the whole image; traversing and searching the gray levels to obtain a background average gray level mu _1 and a foreground average gray level mu _2, and calculating the probability between foreground and background classes under the current gray level; and finally, calculating the inter-class variance g, namely a target threshold value, through an objective function. The formula is as follows:

g＝ω1*ω2*(μ1-μ2)2

the obtained threshold value is used as a parameter for calling threshold ()

And 1-3, denoising. The eight neighborhood noise reduction principle is that all non-white points in a traversal graph are calculated, if the number of the non-white points in 8 points around the graph is calculated to be less than a certain threshold value, the noise points are judged to be juxtaposed to be white. The time complexity of the method is only O (MN), and the method is an effective and simple pretreatment method.

And 1-4, detecting edges. The edge detection is carried out by using a method of acquiring the first-order gradient of the image by using a Sobel operator. Using two 3-by-3 matrixes to carry out convolution operation with the original image to obtain the transverse and longitudinal gradient values of a certain point,

after the convolution, the evolution of the sum of the squares of the horizontal and vertical gradient values is calculated, and if the evolution is larger than the threshold value, the point is considered as an edge point.

And 1-5, performing morphological treatment. The dilation operation may be implemented by directly calling a dilate () and the function of the erosion operation is an enode ().

And 1-6, defining a text area. By eliminating blocks with too small area and significantly non-compliant aspect ratio, the four corner pixel coordinates of the smallest rectangle containing the block can be obtained by using the bounding select function of the standard rectangle Rect class for the remaining blocks. And after the blocks with different proportions are removed, the four-corner coordinates of each block can be obtained.

And 1-7, splicing the blocks. Cutting out corresponding subgraphs on the original binary image according to the four-corner coordinates by using an ROI method, and splicing a regular branch image.

And 2, realizing optical character recognition under a cnocr suite. The specific identification process is not changed, and corresponding model training is mainly carried out according to the 5G message junk picture characteristics. The training comprises the following procedures:

and 2-1, selecting a sample picture, cutting out a single row of characters for storage, and uniformly numbering file names. And performing binarization processing on the image data to serve as an image data source.

And 2-2, writing a program, and generating a training set and a test set according to the picture file name, the corresponding correct characters and the indexes of the characters in the Label file.

And 2-3, converting the data into a binary format by using the recordio of the artificial intelligence suite mxnet. Because mxnet is used for improving IO efficiency, the picture file cannot be directly read, the picture list and the label are converted into a binary file in a RecordIO format, data can be read sequentially during training, and IO rate is greatly improved. And calling the script im2rec.

Step 2-4, the model is trained using a training script, cnocr _ train. To increase speed, the configuration may use GPU training.

Step 2-5, cnocr provides the evaluation tool cnocr _ evaluation. py, which can test the recognition effect of the new model on the test set.

And 2-6, importing the new model file, namely using ocr () to specify the new model to be used for processing in the program.

Step 3, the management and control service process analyzes the picture through a protocol and inputs the picture into the process; performing keyword matching on the extracted text and returning a control result in real time; and meanwhile, the text is sent to a statistical module for natural language identification to find suspicion and generate a recommendation strategy.

The following is a description of specific effects of the present invention:

the invention divides the test set of jpg pictures into four groups of 20 pieces according to the conditions of whether the test set has background noise and interference color blocks and the number of words of 50 or 150. And taking a common CPU server as test hardware. The test set was preprocessed and the time spent in each link was calculated using the self-training model ocr, with the results in milliseconds, as shown in the table below.

	50 has no interference	50 band interference	150 has no interference	150 band interference
					Time consuming pretreatment	17	101	58	133
OCR is time consuming	37	151	111	205
					Rate of accuracy	98.9％	94.5％	97.3％	94.5％
Direct OCR is time consuming	605	866	973	1380
					Rate of accuracy	44.5％	<10％	26％	<10％

It can be seen that the speed of extracting the characters of a single common picture is over 6 times of the direct OCR of the original picture, the accuracy rate is improved to about 95% from the basic unavailability, and if the cluster architecture with the GPU can be used for deployment, the requirements of real-time and quasi-real-time authentication of 5G messages can be basically met. The comprehensive treatment scheme proposed by the present invention is effective.

Possible future application scenarios of the invention: 1. and performing real-time or quasi-real-time picture text management and control in multimedia communication represented by a 5G message service. 2. And identifying the image watermark and the character identification in the large data service system of various images.

The invention provides the capability of picture management and control for the 5G message real-time communication and improves the overall safety of the service. The image-text recognition capability does not depend on a real-time authentication interface provided by a professional company, the economic cost of interface use is saved, a network where the system is located does not need to be communicated with the network of the professional company, and the overall safety of the system is improved.

It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. The embodiments and features of the embodiments in the present application may be combined with each other without conflict. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Claims

1. A control method for extracting characters in pictures in a 5G message in real time is characterized in that: which comprises the following steps:

step 1-3, carrying out noise reduction treatment on the binarized picture to effectively remove independent noise points;

step 1-4, performing edge detection to obtain an edge highlight image of the character, and obtaining a text block after morphological expansion and corrosion;

1-6, neatly splicing each text block into a picture;

2, training a model according to the characteristics of the 5G message junk pictures by optical character recognition under a cnocr suite; the identification of the optical character identification to the common fonts of the non-standard characters and the junk information pictures is improved;

2. The method according to claim 1, wherein the method comprises the following steps: and the threshold segmentation in the step 1-2 adopts an OTSU method.

3. The method according to claim 1, wherein the method comprises the following steps: in the step 1-3, an eight-neighborhood algorithm is adopted for noise reduction; or an eight-neighborhood algorithm is combined with a connected domain algorithm to perform noise reduction processing.

4. The method according to claim 1, wherein the method comprises the following steps: and (4) adopting a Sobel operator to carry out edge detection in the steps 1-4.

5. The method according to claim 1, wherein the method comprises the following steps: the preprocessing of the step 1 also eliminates the processing flow of the pictures which obviously do not have text characteristics according to the analysis characteristic data so as to reduce the processing load.

6. The method according to claim 1, wherein the method comprises the following steps: the training of step 2 comprises the following procedures:

step 2-2, generating a training set and a test set and converting binary files;

step 2-3, executing a training set by using a training script;