CN111382740A

CN111382740A - Text picture analysis method and device, computer equipment and storage medium

Info

Publication number: CN111382740A
Application number: CN202010173437.0A
Authority: CN
Inventors: 郑泽重; 范有文; 潭江龙
Original assignee: Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Current assignee: Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority date: 2020-03-13
Filing date: 2020-03-13
Publication date: 2020-07-07
Anticipated expiration: 2040-03-13
Also published as: CN111382740B

Abstract

The invention relates to a text picture analysis method, a text picture analysis device, computer equipment and a storage medium, wherein the method comprises the steps of obtaining picture data to be analyzed to obtain a picture to be analyzed; inputting the picture to be analyzed into a text direction identification model for text direction identification to obtain an identification result; preprocessing the picture to be analyzed by using the identification result to obtain an intermediate picture; inputting the intermediate picture into a text region positioning model to perform text region positioning so as to obtain a positioning result; the intermediate picture is processed again according to the positioning result to obtain a text region picture; performing line segment detection on the text region picture to obtain key information; intercepting the text area picture according to the key information to obtain a key text picture; analyzing the key text picture to form an analysis result; and sending the analysis result to a terminal so that the terminal displays the analysis result. The invention realizes the high-efficiency and high-accuracy analysis of the text picture.

Description

Text picture analysis method and device, computer equipment and storage medium

Technical Field

The invention relates to a picture analysis method, in particular to a text picture analysis method, a text picture analysis device, computer equipment and a storage medium.

Background

With the development of the internet, the popularization of mobile phones and the requirement for picture identification in daily life, numerous image recognition software appears, for example, an identity card number on an identity card photo, a house number of a store in a picture, a license number of an automobile in the picture and the like are extracted.

However, the error rate of the text information formed after the image recognition software analyzes the image is high, the current method for accurately analyzing the image is a manual identification and analysis method, and the key information of the text image is extracted by using a manual method.

Therefore, it is necessary to design a new method for parsing a text picture with high efficiency and high accuracy.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a text picture parsing method, a text picture parsing device, computer equipment and a storage medium.

In order to achieve the purpose, the invention adopts the following technical scheme: the text picture analysis method comprises the following steps:

acquiring picture data to be analyzed to obtain a picture to be analyzed;

inputting the picture to be analyzed into a text direction identification model for text direction identification to obtain an identification result;

preprocessing the picture to be analyzed by using the identification result to obtain an intermediate picture;

inputting the intermediate picture into a text region positioning model to perform text region positioning so as to obtain a positioning result;

the intermediate picture is processed again according to the positioning result to obtain a text region picture;

performing line segment detection on the text region picture to obtain key information;

intercepting the text area picture according to the key information to obtain a key text picture;

analyzing the key text picture to form an analysis result;

sending the analysis result to a terminal so that the terminal can display the analysis result;

the text direction recognition model is obtained by training a text picture with a text direction label as a sample set;

the text region positioning model is obtained by training a text picture with text region labels and upward characters as a sample set.

The further technical scheme is as follows: the text direction recognition model is obtained by training a text picture with a text direction label as a sample set, and comprises the following steps:

acquiring a text picture with a text direction label to obtain a first sample set, and dividing the first sample set into a first training set and a first testing set;

constructing a first convolution neural network and a first loss function;

inputting the first training set into a first convolutional neural network for convolutional training to obtain a first training result;

calculating a first training result and a loss value of the text direction label by using a first loss function to obtain a first loss value;

judging whether the first loss value is kept unchanged;

if the first loss value is not maintained, adjusting parameters of the first convolutional neural network, and executing the first training set to be input into the first convolutional neural network for convolutional training to obtain a first training result;

if the first loss value is kept unchanged, inputting a first test set into a first convolution neural network for convolution test to obtain a first test result;

judging whether the first test result meets the condition;

if the first test result meets the condition, taking the first convolution neural network as a text direction recognition model;

and if the first test result does not meet the condition, executing the adjustment of the parameter of the first convolutional neural network.

The further technical scheme is as follows: the text region positioning model is obtained by training a text picture with text region labels and upward characters as a sample set, and comprises the following steps:

acquiring a text picture with a text area label and upward characters to obtain a second sample set, and dividing the second sample set into a second training set and a second testing set;

constructing a second convolutional neural network and a second loss function;

inputting the second training set into a second convolutional neural network for convolutional training to obtain a second training result;

calculating the loss values of the second training result and the text region label by using a second loss function to obtain a second loss value;

judging whether the second loss value is kept unchanged;

if the second loss value is not maintained, adjusting parameters of the second convolutional neural network, and executing the convolutional training by inputting a second training set into the second convolutional neural network to obtain a second training result;

if the second loss value is kept unchanged, inputting a second test set into a second convolutional neural network for convolutional test to obtain a second test result;

judging whether the second test result meets the condition;

if the second test result meets the condition, taking the second convolutional neural network as a text region positioning model;

and if the second test result does not meet the condition, executing the adjustment of the parameters of the second convolutional neural network.

The further technical scheme is as follows: the performing line segment detection on the text region picture to obtain key information includes:

detecting line segments in the text region picture according to the Holman transformation principle to obtain line segment information;

and performing line segment filtering and merging according to the line segment information to obtain key information.

The further technical scheme is as follows: the intercepting the text region picture according to the key information to obtain a key text picture comprises the following steps:

intercepting the corresponding position of the text region picture according to the key information to obtain a candidate key picture;

and splicing the candidate key pictures to obtain a key text picture.

The further technical scheme is as follows: the parsing the key text picture to form a parsing result includes:

recognizing the key text picture by adopting an OCR technology to obtain a text recognition result;

and analyzing the text recognition result according to different characteristics of different data to obtain an analysis result.

The further technical scheme is as follows: the analyzing the text recognition result according to different characteristics of different data to obtain an analysis result includes:

and analyzing the text recognition result through logic judgment and a regular expression, and outputting the result in a specific format to obtain an analysis result.

The invention also provides a text picture analysis device, which comprises:

the picture to be analyzed acquiring unit is used for acquiring picture data to be analyzed so as to obtain a picture to be analyzed;

the text direction identification unit is used for inputting the picture to be analyzed into the text direction identification model to identify the text direction so as to obtain an identification result;

the preprocessing unit is used for preprocessing the picture to be analyzed by utilizing the identification result to obtain an intermediate picture;

the area positioning unit is used for inputting the intermediate picture into the text area positioning model to perform text area positioning so as to obtain a positioning result;

the reprocessing unit is used for reprocessing the intermediate picture according to the positioning result to obtain a text region picture;

the line segment detection unit is used for performing line segment detection on the text region picture to obtain key information;

the intercepting unit is used for intercepting the text region picture according to the key information to obtain a key text picture;

the analysis unit is used for analyzing the key text picture to form an analysis result;

and the sending unit is used for sending the analysis result to the terminal so that the terminal can display the analysis result.

The invention also provides computer equipment which comprises a memory and a processor, wherein the memory is stored with a computer program, and the processor realizes the method when executing the computer program.

The invention also provides a storage medium storing a computer program which, when executed by a processor, is operable to carry out the method as described above.

Compared with the prior art, the invention has the beneficial effects that: the method comprises the steps of identifying the text direction of the text picture and detecting the text area by adopting two models to obtain the text area picture with upward characters and only comprising the text, then detecting line segments and intercepting the text area picture, and analyzing the intercepted content by adopting an OCR technology to realize automatic analysis of the text picture and realize high-efficiency and high-accuracy analysis of the text picture.

The invention is further described below with reference to the accompanying drawings and specific embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a text picture parsing method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a text picture parsing method according to an embodiment of the present invention;

fig. 3 is a schematic sub-flow diagram of a text picture parsing method according to an embodiment of the present invention;

fig. 4 is a schematic sub-flow diagram of a text picture parsing method according to an embodiment of the present invention;

fig. 5 is a schematic sub-flow diagram of a text picture parsing method according to an embodiment of the present invention;

fig. 6 is a schematic block diagram of a text picture parsing apparatus according to an embodiment of the present invention;

fig. 7 is a schematic block diagram of a line segment detecting unit of the text picture parsing apparatus according to the embodiment of the present invention;

fig. 8 is a schematic block diagram of an intercepting unit of a text picture parsing apparatus according to an embodiment of the present invention;

fig. 9 is a schematic block diagram of a parsing unit of a text picture parsing apparatus according to an embodiment of the present invention;

FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of a text picture parsing method according to an embodiment of the present invention. Fig. 2 is a schematic flowchart of a text picture parsing method according to an embodiment of the present invention. The text picture analysis method is applied to a server. The server performs data interaction with the terminal, a user inputs a text picture to be analyzed from the terminal, the server performs text direction identification, text region positioning, line segment detection, interception and text content analysis on the text picture to form an analysis result, and the analysis result is sent to the terminal so that the terminal displays the analysis result.

Fig. 2 is a schematic flowchart of a text picture parsing method according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S110 to S190.

And S110, acquiring picture data to be analyzed to obtain a picture to be analyzed.

In this embodiment, the picture to be parsed refers to text picture data with a special label for the key text data, such as underline label, and may be input through the terminal.

And S120, inputting the picture to be analyzed into the text direction recognition model for text direction recognition to obtain a recognition result.

In this embodiment, the recognition result refers to an angle and a rotation center required for rotating the current picture to be analyzed to have the text with characters facing upwards.

The text direction recognition model is obtained by training a text picture with a text direction label as a sample set, and the text direction recognition is carried out by utilizing a model formed by a convolutional neural network, so that a recognition mode with high efficiency and high accuracy can be achieved.

In an embodiment, the text direction recognition model is obtained by training a text picture with text direction labels as a sample set, and may include the following steps S121 to S129.

S121, obtaining a text picture with a text direction label to obtain a first sample set, and dividing the first sample set into a first training set and a first testing set.

In this embodiment, the first sample set is identification card picture data inclined at different angles from the horizontal line, and the identification card picture data is labeled with a text direction label, where the text direction label is an angle required to rotate the identification card picture to be parallel to the horizontal line.

After the first sample set is divided into a first training set and a first testing set, the first training set is firstly used for training the first convolution neural network, the training is carried out until the first convolution neural network can output the text direction meeting the requirements, and then the first testing set is used for verifying the trained first convolution neural network so as to ensure that the whole first convolution neural network can output the text direction with the accuracy meeting the requirements when being used as a text direction recognition model.

And S122, constructing a first convolution neural network and a first loss function.

In the embodiment, the first convolutional neural network refers to a convolutional neural network-based classifier which can be used for detecting text direction, wherein a first layer architecture of the first convolutional neural network adopts a first convolutional kernel with the size of 7 × 7 × 5, a second layer architecture adopts maximum pooling of 5 × 5 × 5, a third layer architecture adopts a second convolutional kernel with convolution of 5 × 3 × 5, a fourth layer architecture uses SPP pooling, and a fifth layer architecture uses a full connection layer.

And S123, inputting the first training set into the first convolutional neural network for convolutional training to obtain a first training result.

In this embodiment, the first training result includes an angle of the text direction, that is, an angle required for the text image to rotate to the direction of the text characters facing upwards.

S124, calculating the loss values of the first training result and the text direction label by using the first loss function to obtain a first loss value.

In this embodiment, the first loss value refers to a degree of fitting of the first training result to the text direction label.

Specifically, the degree of fitting between the first training result and the text direction label is calculated by using a loss function, which can also be regarded as the degree of difference.

And S125, judging whether the first loss value is kept unchanged.

In this embodiment, when the first loss value remains unchanged, that is, the current first convolutional neural network has converged, that is, the first loss value is substantially unchanged and very small, it also indicates that the current first convolutional neural network can be used as a text direction recognition model, generally, the first loss value is relatively large when training is started, and the first loss value is smaller when training is performed later, and if the first loss value does not remain unchanged, it indicates that the current first convolutional neural network cannot be used as a text direction recognition model, that is, the estimated text direction is not accurate, which may result in inaccurate text information recognition in a later period.

And S126, if the first loss value is not maintained, adjusting parameters of the first convolutional neural network, and executing the convolutional training by inputting the first training set into the first convolutional neural network to obtain a first training result.

In this embodiment, adjusting the parameter of the first convolutional neural network refers to adjusting the weight value of each layer in the first convolutional neural network. By continuous training, a first convolutional neural network meeting the requirements can be obtained.

And S127, if the first loss value is kept unchanged, inputting the first test set into the first convolutional neural network for convolutional test to obtain a first test result.

In this embodiment, the first test result refers to a text direction corresponding to the first test set that can be obtained after the text direction is identified by the first test set.

S128, judging whether the first test result meets the condition;

s129, if the first test result meets the condition, using the first convolution neural network as a text direction recognition model;

if the first test result does not meet the condition, the step S126 is executed.

When the two indexes of the precision and the recall rate of the first test result are evaluated to be in accordance with the conditions, the fitting degree is indicated to be in accordance with the requirements, and the first test result can be considered to be in accordance with the requirements; otherwise, the first test result is considered to be not qualified. The training is stopped when the first convolutional neural network converges. And testing the first convolutional neural network after the first convolutional neural network is trained, and if the first test result is not good, adjusting a training strategy to train the first convolutional neural network again. Certainly, in the training process, training and testing are carried out, and the testing is carried out in order to check the training condition in real time; and after the test of the first convolution neural network is trained, the execution accuracy of the whole first convolution neural network is evaluated by using two indexes of precision and recall rate.

And S130, preprocessing the picture to be analyzed by utilizing the identification result to obtain an intermediate picture.

In this embodiment, the intermediate picture refers to a to-be-analyzed picture with upward text characters, and specifically, the to-be-analyzed picture is rotated by a specified angle according to a specified rotation center according to the recognition result, so as to obtain the picture with upward text characters.

S140, inputting the intermediate picture into a text region positioning model to perform text region positioning so as to obtain a positioning result.

In this embodiment, the positioning result refers to the information of the region where the text content is located, and includes the length and width of the text region, the coordinates of the center point, and the tilt angle.

The text region positioning model is obtained by training a text picture with a text region label and upward characters as a sample set, and the text region is identified by using a model formed by a convolutional neural network, so that an identification mode with high efficiency and high accuracy can be achieved.

In an embodiment, the text region location model is obtained by training a text picture with text region labels and characters facing upwards as a sample set, and may include the following steps S141 to S149.

S141, obtaining a text picture with a text area label and upward characters to obtain a second sample set, and dividing the second sample set into a second training set and a second testing set.

In this embodiment, the second sample set refers to a text picture with text region labels, such as text region length and width, and center coordinates, and with characters facing upward.

After the second sample set is divided into a second training set and a second testing set, the second training set is firstly used for training the second convolutional neural network until the second convolutional neural network can output text regions meeting requirements, and then the second testing set is used for verifying the trained second convolutional neural network so as to ensure that the whole second convolutional neural network can output text region information with accuracy meeting the requirements when the second convolutional neural network is used as a text region positioning model.

And S142, constructing a second convolutional neural network and a second loss function.

In this embodiment, the second convolutional neural network is a bidirectional LSTM (Long Short-Term Memory network).

And S143, inputting the second training set into a second convolutional neural network for convolutional training to obtain a second training result.

In this embodiment, the second training result includes the length and width of the text region, the center point coordinates, and the tilt angle.

And S144, calculating the loss values of the second training result and the text region label by using a second loss function to obtain a second loss value.

In this embodiment, the second loss value refers to a degree of fitting of the second training result to the text region label.

Specifically, the fitting degree between the second training result and the text direction label is calculated by using a second loss function, which can also be regarded as a difference degree.

And S145, judging whether the second loss value is kept unchanged.

In this embodiment, when the second loss value is maintained unchanged, that is, the current second convolutional neural network has converged, that is, the second loss value is substantially unchanged and very small, it also indicates that the current second convolutional neural network can be used as a text region location model, generally, the second loss value is relatively large when training is started, and the second loss value is smaller when training is performed later, and if the second loss value is not maintained unchanged, it indicates that the current second convolutional neural network cannot be used as a text region location model, that is, the estimated text region information is not accurate, which may result in inaccurate text information recognition in a later period.

And S146, if the second loss value is not maintained unchanged, adjusting parameters of the second convolutional neural network, and executing the convolutional training by inputting the second training set into the second convolutional neural network to obtain a second training result.

In this embodiment, adjusting the parameter of the second convolutional neural network refers to adjusting the weight value of each layer in the second convolutional neural network. By continuous training, a second convolutional neural network satisfying the requirements can be obtained.

And S147, if the second loss value is kept unchanged, inputting the second test set into a second convolutional neural network for convolutional test to obtain a second test result.

In this embodiment, the second test result refers to that after the text region detection is performed on the second test set, the text region information corresponding to the second test set can be obtained.

S148, judging whether the second test result meets the condition;

s149, if the second test result meets the condition, using the second convolutional neural network as a text region positioning model;

if the second test result does not meet the condition, the step S146 is executed.

When the two indexes of the precision and the recall rate of the second test result are evaluated to be in accordance with the conditions, the fitting degree is indicated to be in accordance with the requirements, and the second test result can be considered to be in accordance with the requirements; otherwise, the second test result is considered to be not qualified. The second convolutional neural network stops training when it converges. And testing the second convolutional neural network after the second convolutional neural network is trained, and if the second test result is not good, adjusting the training strategy to train the second convolutional neural network again. Certainly, in the training process, training and testing are carried out, and the testing is carried out in order to check the training condition in real time; and after the second convolutional neural network is trained, the execution accuracy of the whole second convolutional neural network is evaluated by using two indexes of precision and recall rate.

And S150, processing the intermediate picture again according to the positioning result to obtain a text region picture.

In this embodiment, the text region picture refers to a picture formed by capturing and setting an intermediate picture by using the positioning result, and the size of the picture is reset, and a fixed width is set. The coordinates of the corresponding pictures in the template file with standard size and form are observed, the pictures are adjusted to be in a standard state by perspective transformation, and the text area is positioned by adopting a model based on a convolutional neural network, so that the accuracy and efficiency of the whole text picture analysis can be improved.

And S160, carrying out line segment detection on the text region picture to obtain key information.

In this embodiment, the key information refers to position information of specific text content in the text region picture.

In an embodiment, referring to fig. 3, the step S160 may include steps S161 to S162.

And S161, detecting line segments in the text region picture according to the Holman transformation principle to obtain line segment information.

In the present embodiment, the line segment information refers to the position information of the line segment in the text area picture.

The Hulman transform principle is a feature detection that is used to identify features in an object being found, such as: a line. Given an object, the kind of shape to be identified, the algorithm performs a vote in the parameter space to determine the shape of the object, which is determined by the local maxima in the accumulation space.

And S162, filtering and merging the line segments according to the line segment information to obtain key information.

Of course, the line segment information includes position information of line segments with different lengths, and therefore, the line segment information needs to be processed to obtain real key information.

The image processing packages such as OPENCV, PIL and the like are used, the text region picture can be operated, identified straight lines are filtered, such as too short or too long line segments are filtered, adjacent straight lines are combined into a straight line, and the like, so that the accuracy of line segment detection is improved, and the accuracy of the whole text picture analysis is further improved.

S170, intercepting the text area picture according to the key information to obtain a key text picture.

In the present embodiment, the key text picture refers to a picture of an area including only text key information.

In an embodiment, referring to fig. 4, the step S170 may include steps S171 to S172.

S171, intercepting the corresponding position of the text region picture according to the key information to obtain a candidate key picture.

In this embodiment, the candidate key pictures refer to all pictures including text key information, including complete or incomplete pictures.

Generally, the text key information is already labeled in some way in the text image, and here, for example, it is assumed that the text key information is already labeled with an underline, so that a picture block of the text key information on the underline is intercepted.

And S172, splicing the candidate key pictures to obtain a key text picture.

Incomplete pictures can occur during interception, for example, information of a single picture block is scattered, so that candidate key pictures need to be spliced according to an intercepted sequence according to a certain rule, for example, a rule of whether the candidate key pictures are centered, so as to generate a key text picture, and further improve the accuracy of the whole text picture analysis.

And S180, analyzing the key text picture to form an analysis result.

In this embodiment, the analysis result includes numbers or words, such as words such as names and numbers such as phone numbers.

In an embodiment, referring to fig. 5, the step S180 may include steps S181 to S182.

And S181, recognizing the key text picture by using an OCR (Optical Character Recognition) technology to obtain a text Recognition result.

In the present embodiment, the text recognition result refers to detailed text information.

And S182, analyzing the text recognition result according to different characteristics of different data to obtain an analysis result.

Specifically, the text recognition result is analyzed through logic judgment and a regular expression, and is output in a specific format to obtain an analysis result.

According to different characteristics of the text key information types, such as one is a number and the other is a character, the text key information is analyzed through means of logic judgment, regular expressions and the like, for example, the regular expressions of the number and the character are [ \\ \ d \ s ] + and [ \\ u4e00- \ u9fa5\ d \ s ] +, and data with different characteristics in a text recognition result processed by an OCR technology can be analyzed. And output in some defined format, such as JSON, XML format, but not limited to the above.

And S190, sending the analysis result to a terminal so that the terminal can display the analysis result.

And sending the analysis result to a terminal for displaying, so that the user can conveniently check the analysis result.

According to the text picture analysis method, the text direction identification and the text region detection of the text picture are carried out by adopting the two models, so that the text region picture with upward characters and only including the text is obtained, then the line segment detection and the interception are carried out on the text region picture, and the intercepted content is analyzed by adopting the OCR technology, so that the automatic analysis of the text picture is realized, and the text picture is analyzed efficiently and accurately.

Fig. 6 is a schematic block diagram of a text picture parsing apparatus 300 according to an embodiment of the present invention. As shown in fig. 6, the present invention further provides a text picture parsing apparatus 300 corresponding to the above text picture parsing method. The text picture parsing apparatus 300 includes a unit for performing the above-described text picture parsing method, and the apparatus may be configured in a server. Specifically, referring to fig. 6, the text image parsing apparatus 300 includes an image to be parsed acquiring unit 301, a text direction identifying unit 302, a preprocessing unit 303, a region locating unit 304, a re-processing unit 305, a line segment detecting unit 306, a clipping unit 307, a parsing unit 308, and a sending unit 309.

A to-be-analyzed picture obtaining unit 301, configured to obtain picture data to be analyzed, so as to obtain a to-be-analyzed picture; the text direction identification unit 302 is configured to input the picture to be analyzed into a text direction identification model to perform text direction identification, so as to obtain an identification result; a preprocessing unit 303, configured to preprocess the picture to be analyzed by using the identification result to obtain an intermediate picture; a region positioning unit 304, configured to input the intermediate picture into a text region positioning model to perform text region positioning, so as to obtain a positioning result; a reprocessing unit 305, configured to reprocess the intermediate picture according to the positioning result to obtain a text region picture; a line segment detection unit 306, configured to perform line segment detection on the text region picture to obtain key information; an intercepting unit 307, configured to intercept the text region picture according to the key information to obtain a key text picture; the parsing unit 308 is configured to parse the key text picture to form a parsing result; a sending unit 309, configured to send the analysis result to the terminal, so that the terminal displays the analysis result.

In an embodiment, the apparatus further includes: the first construction unit is used for training by taking a text picture with a text direction label as a sample set to obtain a text direction recognition model.

In an embodiment, the apparatus further includes: and the second construction unit is used for training by taking a text picture with a text region label and with upward characters as a sample set to obtain a text region positioning model.

In an embodiment, the first constructing unit includes a first sample obtaining subunit, a first network constructing subunit, a first training subunit, a first calculating subunit, a first loss value judging subunit, a first adjusting subunit, a first testing subunit, and a first testing result judging subunit.

The first sample acquiring subunit is used for acquiring a text picture with a text direction label to obtain a first sample set, and dividing the first sample set into a first training set and a first testing set; the first network constructing subunit is used for constructing a first convolutional neural network and a first loss function; the first training subunit is used for inputting the first training set into a first convolutional neural network for convolutional training to obtain a first training result; the first calculating subunit is used for calculating a first training result and a loss value of the text direction label by using a first loss function to obtain a first loss value; a first loss value judging subunit, configured to judge whether the first loss value remains unchanged; a first adjusting subunit, configured to adjust a parameter of the first convolutional neural network if the first loss value is not maintained, and perform convolutional training by inputting the first training set into the first convolutional neural network, so as to obtain a first training result; the first test subunit is configured to, if the first loss value remains unchanged, input the first test set into the first convolutional neural network for a convolutional test to obtain a first test result; a first test result judging subunit, configured to judge whether the first test result meets a condition; if the first test result meets the condition, taking the first convolution neural network as a text direction recognition model; and if the first test result does not meet the condition, executing the adjustment of the parameter of the first convolutional neural network.

In an embodiment, the second constructing unit includes a second sample obtaining subunit, a second network constructing subunit, a second training subunit, a second calculating subunit, a second loss value judging subunit, a second adjusting subunit, a second testing subunit, and a second testing result judging subunit.

The second sample acquisition subunit is configured to acquire a text picture with a text region label and with upward characters to obtain a second sample set, and divide the second sample set into a second training set and a second testing set; the second network construction subunit is used for constructing a second convolutional neural network and a second loss function; the second training subunit is used for inputting the second training set into a second convolutional neural network for convolutional training to obtain a second training result; the second calculating subunit is configured to calculate a second training result and a loss value of the text region label by using a second loss function to obtain a second loss value; a second loss value judgment subunit, configured to judge whether the second loss value remains unchanged; a second adjusting subunit, configured to adjust a parameter of the second convolutional neural network if the second loss value is not maintained, and perform convolutional training by inputting the second training set to the second convolutional neural network, so as to obtain a second training result; the second test subunit is configured to, if the second loss value remains unchanged, input the second test set into a second convolutional neural network for convolutional test to obtain a second test result; a second test result judging subunit, configured to judge whether the second test result meets a condition; if the second test result meets the condition, taking the second convolutional neural network as a text region positioning model; and if the second test result does not meet the condition, executing the adjustment of the parameters of the second convolutional neural network.

In one embodiment, as shown in FIG. 7, the line segment detecting unit 306 includes a detecting subunit 3061 and a line segment processing subunit 3062.

A detecting subunit 3061, configured to detect a line segment within the text region picture according to the holman transformation principle to obtain line segment information; the line segment processing subunit 3062 is configured to perform line segment filtering and merging according to the line segment information to obtain key information. .

In one embodiment, as shown in fig. 8, the truncation unit 307 includes a position truncation subunit 3071 and a splicing subunit 3072.

A position capturing subunit 3071, configured to capture, according to the key information, a corresponding position of the text region picture to obtain a candidate key picture; and the splicing subunit 3072 is configured to splice the candidate key pictures to obtain a key text picture.

In one embodiment, as shown in fig. 9, the parsing unit 308 includes an OCR recognition subunit 3081 and a text parsing subunit 3082.

The OCR identifying subunit 3081 is configured to identify the key text picture by using an OCR technology to obtain a text identification result; the text parsing subunit 3082 is configured to parse the text recognition result according to different features of different data to obtain a parsing result.

Specifically, the text parsing subunit 3082 is configured to parse the text recognition result through logic judgment and a regular expression, and output the text recognition result in a specific format to obtain a parsing result.

It should be noted that, as can be clearly understood by those skilled in the art, the specific implementation processes of the text image parsing apparatus 300 and each unit may refer to the corresponding descriptions in the foregoing method embodiments, and for convenience and brevity of description, no further description is provided herein.

The text picture parsing apparatus 300 may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 10.

Referring to fig. 10, fig. 10 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server, wherein the server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 10, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.

The non-volatile storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032 comprises program instructions that, when executed, cause the processor 502 to perform a text picture parsing method.

The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be enabled to execute a text picture parsing method.

The network interface 505 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 10 is a block diagram of only a portion of the configuration relevant to the present teachings and is not intended to limit the computing device 500 to which the present teachings may be applied, and that a particular computing device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 502 is configured to run the computer program 5032 stored in the memory to implement the following steps:

acquiring picture data to be analyzed to obtain a picture to be analyzed; inputting the picture to be analyzed into a text direction identification model for text direction identification to obtain an identification result; preprocessing the picture to be analyzed by using the identification result to obtain an intermediate picture; inputting the intermediate picture into a text region positioning model to perform text region positioning so as to obtain a positioning result; the intermediate picture is processed again according to the positioning result to obtain a text region picture; performing line segment detection on the text region picture to obtain key information; intercepting the text area picture according to the key information to obtain a key text picture; analyzing the key text picture to form an analysis result; and sending the analysis result to a terminal so that the terminal displays the analysis result.

The text direction recognition model is obtained by training a text picture with a text direction label as a sample set; the text region positioning model is obtained by training a text picture with text region labels and upward characters as a sample set.

In an embodiment, when implementing the step that the text direction recognition model is trained by using a text picture with a text direction label as a sample set, the processor 502 specifically implements the following steps:

acquiring a text picture with a text direction label to obtain a first sample set, and dividing the first sample set into a first training set and a first testing set; constructing a first convolution neural network and a first loss function; inputting the first training set into a first convolutional neural network for convolutional training to obtain a first training result; calculating a first training result and a loss value of the text direction label by using a first loss function to obtain a first loss value; judging whether the first loss value is kept unchanged; if the first loss value is not maintained, adjusting parameters of the first convolutional neural network, and executing the first training set to be input into the first convolutional neural network for convolutional training to obtain a first training result; if the first loss value is kept unchanged, inputting a first test set into a first convolution neural network for convolution test to obtain a first test result; judging whether the first test result meets the condition; if the first test result meets the condition, taking the first convolution neural network as a text direction recognition model; and if the first test result does not meet the condition, executing the adjustment of the parameter of the first convolutional neural network.

In an embodiment, when the processor 502 implements the text region location model by using a text picture with text region labels and characters facing upward as a sample set for training, the following steps are implemented:

acquiring a text picture with a text area label and upward characters to obtain a second sample set, and dividing the second sample set into a second training set and a second testing set; constructing a second convolutional neural network and a second loss function; inputting the second training set into a second convolutional neural network for convolutional training to obtain a second training result; calculating the loss values of the second training result and the text region label by using a second loss function to obtain a second loss value; judging whether the second loss value is kept unchanged; if the second loss value is not maintained, adjusting parameters of the second convolutional neural network, and executing the convolutional training by inputting a second training set into the second convolutional neural network to obtain a second training result; if the second loss value is kept unchanged, inputting a second test set into a second convolutional neural network for convolutional test to obtain a second test result; judging whether the second test result meets the condition; if the second test result meets the condition, taking the second convolutional neural network as a text region positioning model; and if the second test result does not meet the condition, executing the adjustment of the parameters of the second convolutional neural network.

In an embodiment, when implementing the step of performing the line segment detection on the text region picture to obtain the key information, the processor 502 specifically implements the following steps:

intercepting the corresponding position of the text region picture according to the key information to obtain a candidate key picture; and splicing the candidate key pictures to obtain a key text picture.

In an embodiment, when the processor 502 implements the step of parsing the key text picture to form a parsing result, the following steps are specifically implemented:

recognizing the key text picture by adopting an OCR technology to obtain a text recognition result; and analyzing the text recognition result according to different characteristics of different data to obtain an analysis result.

In an embodiment, when the processor 502 implements the step of analyzing the text recognition result according to different features of different data to obtain an analysis result, the following steps are specifically implemented:

It should be understood that, in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program includes program instructions, and the computer program may be stored in a storage medium, which is a computer-readable storage medium. The program instructions are executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the steps of:

In an embodiment, when the processor executes the computer program to implement the step that the text direction recognition model is obtained by training a text picture with a text direction label as a sample set, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to implement the step that the text region location model is obtained by training a text picture with text region labels and characters facing upwards as a sample set, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to implement the step of performing the line segment detection on the text region picture to obtain the key information, the following steps are specifically implemented:

detecting line segments in the text region picture according to the Holman transformation principle to obtain line segment information; and performing line segment filtering and merging according to the line segment information to obtain key information.

In an embodiment, when the processor executes the computer program to implement the step of capturing the text region picture according to the key information to obtain a key text picture, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to implement the step of parsing the key text picture to form a parsing result, the following steps are specifically implemented:

In an embodiment, when the processor executes the computer program to implement the step of analyzing the text recognition result according to different features of different data to obtain an analysis result, the following steps are specifically implemented:

The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The text picture analysis method is characterized by comprising the following steps:

acquiring picture data to be analyzed to obtain a picture to be analyzed;

analyzing the key text picture to form an analysis result;

2. The method according to claim 1, wherein the text direction recognition model is trained by using a text picture with text direction labels as a sample set, and comprises:

constructing a first convolution neural network and a first loss function;

judging whether the first loss value is kept unchanged;

judging whether the first test result meets the condition;

3. The method according to claim 1, wherein the text region location model is trained by using a text picture with text region labels and characters facing upwards as a sample set, and comprises:

constructing a second convolutional neural network and a second loss function;

judging whether the second loss value is kept unchanged;

judging whether the second test result meets the condition;

4. The method according to claim 1, wherein the performing line segment detection on the text region picture to obtain key information comprises:

5. The method according to claim 1, wherein the intercepting the text region picture according to the key information to obtain a key text picture comprises:

and splicing the candidate key pictures to obtain a key text picture.

6. The method according to claim 1, wherein parsing the key text picture to form a parsing result comprises:

7. The method according to claim 6, wherein the parsing the text recognition result according to different features of different data to obtain a parsing result comprises:

8. The text picture analysis device is characterized by comprising:

9. A computer device, characterized in that the computer device comprises a memory, on which a computer program is stored, and a processor, which when executing the computer program implements the method according to any of claims 1 to 7.

10. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 7.