CN111797610A

CN111797610A - Font and key sentence analysis method based on image processing

Info

Publication number: CN111797610A
Application number: CN202010671365.2A
Authority: CN
Inventors: 耿绘绘; 张誉文
Original assignee: Zhengzhou Mayuan Network Technology Co ltd
Current assignee: Zhengzhou Mayuan Network Technology Co ltd
Priority date: 2020-07-13
Filing date: 2020-07-13
Publication date: 2020-10-20

Abstract

The invention provides a font and key sentence analysis method based on image processing, which comprises the following steps: and simultaneously sending the text subimage to be detected into a font analysis twin network and a key sentence analysis twin network to obtain a font analysis result and a key sentence analysis result of the text subimage to be detected, integrating the font analysis result and the key sentence analysis result of the text subimage to be detected to obtain a font analysis result and an article structure judgment result of the text subimage to be detected, and obtaining a grading range of the text image to be detected based on the font analysis result and the article structure judgment result of the text image to be detected. The method can assist the paper marking process of the paper marking teacher, is not influenced by subjective factors of the paper marking teacher, has accurate detection results, and improves the working efficiency of the paper marking teacher.

Description

Font and key sentence analysis method based on image processing

Technical Field

The invention relates to the field of artificial intelligence and image processing, in particular to a font and key sentence analysis method based on image processing.

Background

At present, key sentences of English writing fonts and writing fonts are generally analyzed through artificial analysis, namely analysis is conducted through subjective judgment of a marking teacher, and the method has the defects that the method consumes great time and energy of the marking teacher, the marking teacher is not focused on the key sentences in the writing text, the scoring result is inaccurate, and learning enthusiasm of students is struck.

Disclosure of Invention

In order to solve the above problems, the present invention provides a method for analyzing fonts and key sentences based on image processing, the method comprising:

building a twin network which comprises a font analysis twin network and a key sentence analysis twin network, wherein each twin network comprises two branches sharing weight and a distance calculation module, and each branch comprises a coder and a full connection layer;

training a twin network: constructing a training set by using the collected English text images, and marking whether the images in the training set are attractive or not and whether key sentences are contained or not after intercepting the images in the training set; intercepting two images in the training set, and then simultaneously using the two images as the input of two twin networks, and obtaining the Euclidean distance between two key sentence characteristic vectors and the Euclidean distance between the two font characteristic vectors through the operation processing of the font analysis twin network and the key sentence analysis twin network;

the text image to be detected is subjected to intercepting operation to obtain n text subimages to be detected, a branch is respectively selected from the trained key sentence analysis twin network and font analysis twin network, the text subimages to be detected are sequentially and respectively sent into the selected branch, and the two distance calculation modules, the encoders in the two branches and the full connecting layer respectively correspond to a block; all blocks randomly select available nodes, generate block chain private chains according to the inference sequence between the two selected branches and the distance calculation module, and then calculate at the corresponding nodes according to the block chain private chains;

comparing the Euclidean distances to obtain a font analysis result and a key sentence analysis result of the text subimage to be detected; integrating the font analysis result and the key sentence analysis result of the text subimage to be detected to obtain the font analysis result and the article structure judgment result of the text subimage to be detected; and obtaining the grading range of the text image to be detected based on the font analysis result and the article structure judgment result of the text image to be detected.

The number ratio of the English text images containing the key sentences in the training set to the English text images not containing the key sentences is 1: 1, the number ratio of English text images with beautiful fonts to English text images with unattractive fonts is 1: 1.

and storing the feature vectors obtained in the training process into four categories according to the characteristics of key sentence inclusion, key sentence non-inclusion, beautiful font and unattractive font.

And finishing the intercepting operation through the sliding window, wherein the size of the sliding window is set to contain at least two lines of English text information.

The node selected by the block corresponding to the encoder is a local server node, the node selected by the block corresponding to the full connection layer and the distance calculation module is a cloud server node, and encryption and decryption processing is performed on data transmitted between the nodes by using an encryption and decryption algorithm.

The reasoning sequence specifically comprises:

the key sentence analyzes the twin network: the method comprises the steps that a node selected by a coder corresponding to a block in a selected branch circuit performs feature extraction on an input image to obtain a first feature map, the first feature map is sent to a node selected by a full-connection layer corresponding to the block after being subjected to a leveling operation to perform calculation to obtain a key sentence feature vector, and a distance calculation module calculates Euclidean distances between the key sentence feature vector and two types of stored feature vectors which contain a key sentence and do not contain the key sentence corresponding to the node selected by the block;

font analysis twin network: the node selected by the encoder corresponding to the block in the selected branch performs feature extraction on the input image to obtain a second feature map, the second feature map is sent to the node selected by the corresponding block of the full-connection layer after the leveling operation to perform calculation to obtain a font feature vector, and the distance calculation module calculates the Euclidean distance between the font feature vector and the two types of feature vectors of the stored font with attractive appearance and the stored font with unattractive appearance corresponding to the node selected by the block.

The comparison operation specifically comprises the following steps: setting two threshold values alpha and beta, judging that the keyword sentence is contained when one distance exists in the Euclidean distances between the keyword sentence characteristic vector and the stored characteristic vector containing the keyword sentence and the like and is lower than the threshold value alpha, and judging that the keyword sentence is not contained when one distance exists in the Euclidean distances between the keyword sentence characteristic vector and the stored characteristic vector not containing the keyword sentence and the like and is lower than the threshold value alpha; and when one of the Euclidean distances between the font feature vector and the stored attractive font feature vector is lower than the threshold value beta, judging that the font is attractive, and when one of the Euclidean distances between the font feature vector and the stored unattractive font feature vector is lower than the threshold value beta, judging that the font is unattractive.

The integration operation specifically comprises: for n key sentence analysis results, when the key sentence analysis result is that the ratio of the number of results containing key sentences to n is more than or equal to

When the text image article structure to be detected is excellent, judging the result; when the ratio is greater than or equal to

Is less than

When the text image article structure to be detected is good, judging the result; when the ratio is less than

When the text image article structure to be detected is judged to be general;

for n font analysis results, when the font analysis results are beautiful results, the ratio of the number of the results to n is more than or equal to

Then, the character font analysis result of the text image to be detected is beautiful; otherwise, it is judged to be unattractive.

The scoring rules are as follows: the score is divided into e scores, when the article structure is general, the font is judged to be general, and the score range is [0, a ]; the method comprises the steps of (a, b) when the article structure is general and the font is beautiful, judging the font to be general when the article structure is good, judging the font to be general and the score to be (b, c) when the article structure is good, judging the font to be general when the article structure is excellent, and (c, d) when the article structure is excellent and the font is beautiful, wherein the score to be (d, e) is 0 < a < b < c < d < e.

The invention has the beneficial effects that:

1. the method adopts a simpler twin network to perform rough identification, has simple network and high running speed, can assist the scoring process of a scoring teacher, simultaneously obtains whether key sentences are contained in English compositions and whether fonts are beautiful, combines two analysis results to provide an objective and fair scoring range for the English compositions, and the scoring teacher can give specific scores according to the scoring range and other factors, thereby reducing the workload of the scoring teacher and improving the working efficiency of the scoring teacher.

2. The existing method mainly depends on the manual judgment of a paper making teacher whether key sentences exist in the paper making text, the judgment is easily influenced by subjective factors of the paper making teacher, so that the scoring result is not objective, the method is not influenced by the manual subjective factors, whether key sentences exist in an English paper can be accurately and quickly judged, the aim of assisting the paper making process of the paper making teacher is further fulfilled, the paper making teacher does not need to invest too much effort, the work load of the paper making teacher is reduced, and the problem of scoring errors caused by the fact that the key sentences in the paper making text are not found due to the influence of the subjective factors of the paper making teacher is effectively solved.

3. The method adopts the block chain and the encryption technology, ensures the safety of data, improves the safety performance of a system using the method, and can prevent the data from being leaked in the transmission process.

Drawings

Fig. 1 is a flow chart of the method.

Detailed Description

In order to make the invention more comprehensible to those skilled in the art, the invention is described in detail below with reference to an embodiment and the accompanying drawings, in which fig. 1 is shown.

Example (b):

a method for analyzing fonts and key sentences based on image processing comprises the following steps:

and acquiring English text images, wherein the sizes, formats and the like of writing partial areas in the English test paper are fixed, so that English writing partial images at the same positions are shot by cameras with the same pose.

Building a twin network, including a font analysis twin network and a key sentence analysis twin network, as shown in fig. 1, the network architectures of the two twin networks are the same, the weights are different, each twin network includes two branches sharing the weights and a distance calculation module, specifically:

the key sentence analysis twin network comprises two branches and a first distance calculation module, wherein the left branch comprises a first encoder and a first full connection layer; the right branch comprises a second encoder and a second full connection layer; the first full-connection layer and the second full-connection layer are connected with the first distance calculation module.

The font analysis twin network comprises two branches and a second distance calculation module, wherein the left branch comprises a third encoder and a third full-connection layer, and the right branch comprises a fourth encoder and a fourth full-connection layer; the third full connecting layer and the fourth full connecting layer are connected with the second distance calculating module.

The encoder is used for extracting features of an input image to obtain a feature map, and the feature map is sent to a full-connection layer for calculation after being subjected to a leveling operation to obtain a feature vector.

Training a twin network:

constructing a training set by using the collected English text images, marking whether the images in the training set are attractive or not and whether key sentences are contained or not after intercepting operation, suggesting a multi-choice examination teacher group as an investigation object to obtain the mark whether each image is attractive or not, finally reserving a certain number of images in the set as a training data set after screening, and suggesting that the number ratio of the English text images containing the key sentences in the training set to the English text images not containing the key sentences is 1: 1, the number ratio of English text images with beautiful fonts to English text images with unattractive fonts is 1: 1;

performing sliding window interception operation on two English text images in a randomly selected training set, then simultaneously using the two English text images as the input of two twin networks, and performing operation processing on a font analysis twin network and a key sentence analysis twin network to obtain an Euclidean distance between two key sentence characteristic vectors and an Euclidean distance between the two font characteristic vectors; training the twin network by comparing loss functions; the formula for the contrast loss function is:

wherein, Y is a label indicating whether the two samples match, and when Y is 1, it represents that the two samples are similar or match; when Y is equal to 0, the two samples are not matched, N is the number of the samples, m is a set threshold value, the threshold values in the two twin networks are different, the threshold value in the key sentence analysis twin network is set as alpha, and the threshold value in the font analysis twin network is set as beta;

feature vector X output for each twin network₁，X₂P represents the characteristic dimension of the sample, and in the twin network of the present invention, P is taken to be 1.

According to the loss function, when two samples are similar,

the Euclidean distance of similar samples is used as loss; when similar samples are close to the Euclidean distance, the loss is small; when the similar samples are far from each other in Euclidean distance, the loss is increased.

When the two samples do not match up,

when the Euclidean distance of the unmatched samples is small, the corresponding loss function is large; when the Euclidean distance of the unmatched samples is large, the loss is small. And setting a threshold value, considering only the range between 0 and the threshold valueObviously, the requirement is satisfied when the distance of the unmatched samples is large enough.

The trained twin network can realize that the distance between the characteristic vectors of similar samples is very close, the distance between the characteristic vectors of dissimilar samples is very far, wherein a threshold value m is set as a judgment standard, namely the Euclidean distance is lower than the threshold value m, and the samples are similar; the euclidean distance is greater than the threshold m and the samples are dissimilar. In the two trained twin networks, the feature vectors of the English text image training set are stored into four categories according to the characteristics of key sentences, non-key sentences, beautiful fonts and unattractive fonts. The Euclidean distance of the feature vectors of the samples in the class is close, and the Euclidean distance of the feature vectors of the samples between the classes is far.

It should be noted that two branches in a twin network are shared in weight, and two branches are adopted during training, and training is performed by using a contrast loss function, so that the method can adapt to the situation that the number of samples is not large enough; in actual analysis, only one branch is used for outputting the characteristic vector, and the Euclidean distance is calculated between the characteristic vector and the stored characteristic vector.

The method comprises the steps that a text image to be detected is subjected to sliding window intercepting operation to obtain n text subimages to be detected, wherein the sliding window size suggestion is set to include at least two lines of English text information; the text subimage to be detected is sequentially and respectively sent into one branch of a font analysis twin network and one branch of a key sentence analysis twin network to respectively obtain a key sentence characteristic vector and a font characteristic vector, Euclidean distances between the key sentence characteristic vector and two types of stored characteristic vectors which contain key sentences and do not contain key sentences and between the font characteristic vector and two types of stored characteristic vectors which are beautiful in font and not beautiful in font are respectively calculated by distance calculation modules corresponding to the two branches, and a font analysis result and a key sentence analysis result of the text subimage to be detected are obtained after comparison operation, specifically:

according to the set threshold values alpha and beta of the two twin networks, when one distance in the Euclidean distance between the key sentence feature vector and the stored feature vector containing the key sentence is lower than the threshold value alpha, the key sentence is judged to be contained, and when one distance in the Euclidean distance between the key sentence feature vector and the stored feature vector not containing the key sentence is lower than the threshold value alpha, the key sentence is judged not to be contained; and when one of the Euclidean distances between the font feature vector and the stored attractive font feature vector is lower than the threshold value beta, judging that the font is attractive, and when one of the Euclidean distances between the font feature vector and the stored unattractive font feature vector is lower than the threshold value beta, judging that the font is unattractive.

Integrating the font analysis result and the key sentence analysis result of the text subimage to be detected, which specifically comprises the following steps: for n key sentence analysis results, when the key sentence analysis result is that the ratio of the number of results containing key sentences to n is more than or equal to

Is less than

And when the text image article structure to be detected is judged to be general.

For n font analysis results, when the font analysis results are results with beautiful font, the ratio of the number of the results to n is more than or equal to

Then, the font analysis result of the text image to be detected is that the font is beautiful; otherwise, the font is judged to be unattractive.

Thus, a font analysis result and an article structure judgment result of the text image to be detected are obtained.

Obtaining the scoring range of the text image to be detected based on the font analysis result and the article structure judgment result of the text image to be detected, wherein the scoring specific rule is as follows: the embodiment assumes that the score is full of 30 points, when the article structure is general, the font is judged to be general, and the score range is [0, 10 ]; the score range is (10, 15) when the article structure is general and the font is beautiful, the score range is (15, 20) when the article structure is good and the font is judged to be general, the score range is (20, 25) when the article structure is good and the font is beautiful, the score range is (20, 25) when the article structure is excellent and the font is judged to be general, and the score range is (25, 30) when the article structure is excellent and the font is beautiful.

And giving a specific score by the scoring teacher according to the obtained scoring range and other influence factors.

In consideration of the safety problem in the data transmission process, the method also adopts a block chain and an encryption technology.

An implementer selects a branch in two trained twin networks respectively, an encoder and a full connection layer in the selected branch, a distance calculation module corresponds to a block respectively, all the blocks randomly select available nodes for calculation, the available nodes comprise local server nodes and cloud server nodes, specifically, a mixed cloud calculation mode is adopted, the nodes selected by the encoder corresponding to the blocks are the local server nodes, the nodes selected by the full connection layer and the distance calculation module corresponding to the blocks are the cloud server nodes, each node comprises the distance calculation module, the trained encoder and the full connection layer, a block chain private chain is generated according to the inference sequence of the two selected branches, and data transmitted between the nodes are encrypted by using an encryption algorithm.

The specific way of randomly selecting the available nodes is as follows: generating a random number seed, generating a random number sequence by utilizing a square-based method according to the seed, sequentially distributing the random numbers in the sequence to available local server nodes, sequencing the nodes according to the size of the random numbers, and obtaining a serial number by each local server node; distributing a fixed serial number to each block corresponding to the encoder in the selected branch, and selecting a node with the same serial number as each block to finish the selection of the local server node of the block corresponding to the encoder; the number of the nodes is larger than that of the blocks.

Similarly, according to the method for randomly selecting the nodes, each cloud server node obtains an index, each block corresponding to the full connection layer and the distance calculation module in the selected branch is assigned with a fixed label, each block selects the node with the same index as the label, and the selection of the cloud server nodes of the blocks corresponding to the full connection layer and the distance calculation module is completed.

The implementer can obtain the random number seed by means of IP, Global Time and the like.

The specific method for generating the random number sequence by the square-taking method comprises the following steps: setting the obtained random number seeds as X0, obtaining four digits by X0 mod10000, carrying out square operation on the four digits to obtain eight digits, filling zero in front when the number of the four digits is less than eight, taking the middle four digits of the obtained eight digits as the next random number X1, obtaining a new four digits by X1 mod10000, and repeating the operation to obtain a random number sequence; the number of the random numbers in the random number sequence is the same as the number of the available nodes.

Taking a key sentence analysis twin network left branch and a first distance calculation module as examples to explain the generation process of the block chain private chain:

the reasoning sequence of the network left branch and the first distance calculation module is as follows: the first encoder performs feature extraction on an input image to obtain a first feature map, the first feature map is sent to a first full-connection layer for calculation after being subjected to a leveling operation, and a key sentence feature vector is output; and then, calculating Euclidean distances between the feature vectors of the key sentences and the stored feature vectors of two types including the key sentences and not including the key sentences in a first distance calculation module.

The distance calculation module and the blocks corresponding to the encoder and the full connection layer in the left branch are sequentially marked with serial numbers, specifically, the first encoder corresponds to the block [1], the first full connection layer corresponds to the block [2], the first distance calculation module corresponds to the block [3], the block [1] comprises data of various parameters and image information of the first encoder, the block [2] comprises data of various parameters and image information of the first full connection layer, and the block [3] comprises various information data of the first distance calculation module. Suppose that all blocks have randomly selected three nodes according to the above method for randomly selecting nodes:

a local server node 5 is selected in the block [1], a cloud server node 8 is selected in the block [2], a cloud server node 7 is selected in the block [3], specifically, an encoder in the local server node 5 calculates data in the block [1], a full connection layer in the cloud server node 8 calculates data in the block [2], and a distance calculation module in the cloud server node 7 calculates data in the block [3 ]; and generating a private chain of the block chain by the blocks according to the reasoning sequence, namely connecting the block [1] with the block [2] and connecting the block [2] with the block [3 ].

Similarly, the font analysis twin network left branch and the second distance calculation module also generate a corresponding block chain private chain according to the steps.

Therefore, for each text subimage to be detected, two private block chain links can be generated according to the reasoning sequence, and a font analysis result and a key sentence analysis result of the text subimage to be detected are obtained by calculating at corresponding nodes according to the private block chain links.

The embodiment chooses to use the RC5 encryption algorithm, and data between all nodes is transmitted based on the encryption and decryption algorithm until the analysis of all english composition fonts and key sentences is completed. The mechanism of the RC5 encryption algorithm is:

creating a key group: the RC5 algorithm uses 2r +2 key-dependent 32-bit words for encryption, where r denotes the number of rounds of encryption. A key group is created by first copying the key bytes into an array L of 32-bit words (note here whether the processors are in little-endian order or big-endian order), and the last word can be padded with zeros if necessary. Then, the array S is initialized by using a linear congruence generator, and finally L and S are mixed.

Encryption processing: after the key set is created, encryption of the plaintext is started, and when encryption is performed, the plaintext packet is firstly divided into two 32-bit words: a and B (for example, in the case of assuming that the byte order of the processor is little-endian, w is 32, the first plaintext byte enters the lowest byte of a, the fourth plaintext byte enters the highest byte of a, the fifth plaintext byte enters the lowest byte of B, and so on), and the addition is performed by moving left in a loop. The output ciphertext is the content in registers a and B.

Decryption processing: the ciphertext block is divided into two words: a and B (the storage mode is the same as the encryption mode), and the subtraction operation is carried out according to the circular right shift.

The implementer can choose which encryption method is used.

The above description is intended to provide those skilled in the art with a better understanding of the present invention and is not intended to limit the present invention.

Claims

1. A method for analyzing fonts and key sentences based on image processing is characterized by comprising the following steps:

2. The method of claim 1 wherein the ratio of english text images containing key sentences to english text images not containing key sentences in the training set is 1: 1, the number ratio of English text images with beautiful fonts to English text images with unattractive fonts is 1: 1.

3. the method of claim 1, wherein the feature vectors obtained during the training process are stored in four categories including key sentences, no key sentences, beautiful fonts, and unattractive fonts.

4. The method of claim 1, wherein the intercepting is accomplished through a sliding window sized to contain at least two lines of english text information.

5. The method of claim 1, wherein the selected node of the block corresponding to the encoder is a local server node, the selected node of the block corresponding to the full link layer and the distance calculation module is a cloud server node, and the encryption and decryption algorithm is used to encrypt and decrypt data transmitted between the nodes.

6. The method according to claim 1, characterized in that the inference order is specifically:

7. The method of claim 1, wherein the comparing operation is specifically: setting two threshold values alpha and beta, judging that the keyword sentence is contained when one distance exists in the Euclidean distances between the keyword sentence characteristic vector and the stored characteristic vector containing the keyword sentence and the like and is lower than the threshold value alpha, and judging that the keyword sentence is not contained when one distance exists in the Euclidean distances between the keyword sentence characteristic vector and the stored characteristic vector not containing the keyword sentence and the like and is lower than the threshold value alpha; and when one of the Euclidean distances between the font feature vector and the stored attractive font feature vector is lower than the threshold value beta, judging that the font is attractive, and when one of the Euclidean distances between the font feature vector and the stored unattractive font feature vector is lower than the threshold value beta, judging that the font is unattractive.

8. The method of claim 1, wherein the integrating operation is specifically: for n key sentence analysis results, when the key sentence analysis result is that the ratio of the number of results containing key sentences to n is more than or equal to

Then, the text graph to be detectedThe judgment result is excellent like the article structure; when the ratio is greater than or equal to

Is less than

When the text image article structure to be detected is judged to be general;

9. The method of claim 1, wherein the specific rules for scoring are: the score is divided into e scores, when the article structure is general, the font is judged to be general, and the score range is [0, a ]; the method comprises the steps of (a, b) when the article structure is general and the font is beautiful, judging the font to be general when the article structure is good, judging the font to be general and the score to be (b, c) when the article structure is good, judging the font to be general when the article structure is excellent, and (c, d) when the article structure is excellent and the font is beautiful, wherein the score to be (d, e) is 0 < a < b < c < d < e.