WO2021042505A1

WO2021042505A1 - Note generation method and apparatus based on character recognition technology, and computer device

Info

Publication number: WO2021042505A1
Application number: PCT/CN2019/116337
Authority: WO
Inventors: 温桂龙
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-09-03
Filing date: 2019-11-07
Publication date: 2021-03-11
Also published as: CN110705233B; CN110705233A

Abstract

A note generation method and apparatus based on character recognition technology, a computer device, and a storage medium, the method comprising: acquiring a specified picture with handwritten characters and printed characters; if the specified picture is not similar to the picture previously acquired by a specified terminal, respectively recognising the handwritten characters and the printed characters in the specified picture as handwritten character text and printed character text, and extracting feature data of the handwritten characters in the specified picture; inputting the feature data into an emotion recognition model trained on the basis of a neural network model and acquiring a predicted emotion type outputted by the emotion recognition model; acquiring a target character typesetting type corresponding to the predicted emotion type; and typesetting the printed character text and the handwritten character text on the basis of the target text typesetting type to generate a handwritten note. The degree of information preservation is thereby increased.

Description

Note generation method, device and computer equipment based on text recognition technology

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 3, 2019, the application number is 201910828605.2, and the invention title is "Note Generation Method, Apparatus and Computer Equipment Based on Text Recognition Technology". The entire content of the application is approved The reference is incorporated in this application.

Technical field

This application relates to the computer field, in particular to a method, device, computer equipment and storage medium for generating notes based on text recognition technology.

Background technique

When reading physical books, a large number of people have the habit of taking notes or excerpts. If these physical books with handwritten notes can be converted into digital file texts that are more suitable for editing, it will be more conducive to the user's later collation and editing, and it will be conducive to the understanding and dissemination of information. The prior art generally can only mechanically recognize physical books with handwritten notes, and the obtained texts generally do not distinguish between the original recorded content of the book and the handwritten notes, or retain the handwritten text in the form of pictures (in order to preserve all the information of the handwritten text), and then Splicing with printed text; thus causing the problem of information loss, or the problem of consuming a lot of computing resources for note generation. Therefore, the prior art lacks a perfect technical solution for generating handwritten notes.

technical problem

The main purpose of this application is to provide a note generation method, device, computer equipment, and storage medium based on text recognition technology, aiming to improve the preservation of information when generating notes.

Technical solutions

In order to achieve the above-mentioned purpose of the invention, this application proposes a note generation method based on text recognition technology, which is applied to a designated terminal, and includes:

Obtain designated pictures with handwritten text and printed text;

Using a preset picture similarity judgment method to judge whether the specified picture is similar to the picture previously obtained by the specified terminal;

If the designated picture is not similar to the picture previously obtained by the designated terminal, the handwritten text and printed text in the designated picture are recognized as handwritten text and printed text, respectively, by using a preset text recognition technology , And extracting feature data of the handwritten text in the designated picture, wherein the feature data includes at least the repetition position and the number of repetitions in the handwritten text;

The feature data is input into the emotion recognition model trained based on the neural network model to obtain the predicted emotion category output by the emotion recognition model, wherein the emotion recognition model is based on pre-collected handwritten text and is related to the pre-collected handwritten text. Trained on sample data composed of emotion categories associated with text;

Acquiring the target text typesetting type corresponding to the predicted emotion type according to the preset correspondence relationship between the emotion category and the text typesetting type;

The printed text and the handwritten text are typeset according to the target text typesetting type to generate the note.

Beneficial effect

The note generation method, device, computer equipment, and storage medium based on text recognition technology of this application use the emotion recognition model to recognize the emotion category of the note writer when writing notes, and select the corresponding typesetting method according to the emotion category, thereby integrating the emotion The category information (or excitement, or sadness, etc.) is preserved in the form of typesetting, which overcomes the defect of information loss (such as loss of emotion) when the existing text recognition technology recognizes text. Improve the preservation of information.

Description of the drawings

FIG. 1 is a schematic flowchart of a note generation method based on text recognition technology according to an embodiment of the application;

2 is a schematic block diagram of the structure of a note generation device based on text recognition technology according to an embodiment of the application;

FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.

The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

The best implementation of this application

1, an embodiment of the present application provides a method for generating notes based on text recognition technology, which is applied to a designated terminal, and includes:

S1. Obtain designated pictures with handwritten text and printed text;

S2, using a preset method for judging picture similarity, judge whether the specified picture is similar to the picture previously obtained by the specified terminal;

S3. If the designated picture is not similar to the picture previously acquired by the designated terminal, use a preset text recognition technology to recognize the handwritten text and printed text in the designated picture as handwritten text and printed text, respectively Text, and extracting feature data of the handwritten text in the designated picture, wherein the feature data includes at least the repetition position and the number of repetitions in the handwritten text;

S4. Input the characteristic data into the emotion recognition model trained based on the neural network model to obtain the predicted emotion category output by the emotion recognition model, wherein the emotion recognition model is based on pre-collected handwritten text and is related to the pre-collected Trained on sample data composed of emotional categories associated with handwritten text;

S5. Obtain the target text typesetting type corresponding to the predicted emotion type according to the preset correspondence between the emotion category and the text typesetting type;

S6. Typesetting the printed text and the handwritten text according to the target text typesetting type to generate the note.

As described in step S1 above, a designated picture with handwritten text and printed text is acquired. The designated picture may be a picture with handwritten text and printed text collected in real time through a preset camera, or may be a pre-stored picture with handwritten text and printed text. Printed text refers to the font used by publications to publish text, and is the font used for text printed in batches, where publications are physical carriers such as books and magazines. Therefore, there is a clear distinction between handwritten text and printed text.

As described in step S2 above, a preset picture similarity determination method is used to determine whether the designated picture is similar to the picture previously acquired by the designated terminal. The picture similarity judgment method is for example: sequentially comparing the corresponding pixels in the two pictures, if the number of the same pixels in the number of all the pixels is greater than a predetermined threshold, then the judgment is similar; if the same pixels If the proportion of the number of pixels in the number of all pixels is not greater than a predetermined threshold, it is determined that they are not similar. If the designated picture is similar to the picture previously obtained by the designated terminal, it indicates that the designated picture has undergone recognition processing, and only the recognition result of the last time needs to be called out, and there is no need to perform the recognition operation again.

As described in step S3 above, if the designated picture is not similar to the picture previously acquired by the designated terminal, the handwritten text and printed text in the designated picture are respectively recognized as handwritten by using a preset text recognition technology. Text and printed text, as well as extracting feature data of the handwritten text in the designated picture, wherein the feature data includes at least the repetition position and the number of repetitions in the handwritten text. If the designated picture is not similar to the picture previously acquired by the designated terminal, it indicates that the designated picture has not undergone identification processing and is a brand new picture, and therefore needs to be identified. The preset text recognition technology is, for example, OCR (Optical Character Recognition, optical character recognition) technology, in which one or more of the following technical means can be used in the recognition process: Grayscale: RGB model is used to represent each image For each pixel, take the average value of R, G, and B of each pixel instead of the original R, G, and B values to obtain the gray value of the image; binarization: divide the pixels of the image into black and white Part, black is regarded as foreground information, and white is regarded as background information to process other objects and backgrounds in the original image except the target text; noise reduction: median filter, mean filter, adaptive Wiener filter, etc. Filtering to deal with image noise caused by the process of image acquisition, compression, transmission, etc.; tilt correction: use Hough transform and other methods to process the image to correct image tilt caused by photographing. Text segmentation: Use projection operation to segment text, project a single line of text or multiple lines of text on the X axis, and accumulate the values. The text area must have a relatively large value, and the interval area must have no value. Then consider the rationality of the interval. In this way, a single text is segmented; feature extraction: extract the special points of these pixels, such as extreme points, isolated points, etc., as the feature points of the image, and then perform dimensionality reduction processing on them to increase the processing speed. Classification: use SVM (Support Vector Machine) classifier to classify and get the initial recognition result; processing result: use NLP (Natural Language Processing, natural language processing) method to process and optimize the output of the initial recognition result to eliminate Some misrecognized texts that are close to the correct text but have nothing to do with the context. The method for extracting the feature data of the handwritten text in the designated picture, where the feature data includes at least the position of the repetition and the number of repetitions in the handwritten text, for example, includes: dividing the pen of the handwritten text into multiple points for data Collect and analyze, obtain the pressure value of each point, the clarity of the sequence when writing, etc. by identifying the data change trend of the pixel, and then obtain the characteristic data including the position of the heavy pen and the number of the heavy pen. The heavy pen refers to the handwritten text with the greatest force Strokes.

As described in step S4 above, input the characteristic data into the emotion recognition model trained based on the neural network model to obtain the predicted emotion category output by the emotion recognition model, wherein the emotion recognition model is based on pre-collected handwritten text, and It is trained on sample data composed of emotion categories associated with the pre-collected handwritten text. The neural network model can be any model, such as VGG16 model, VGG-F model, ResNet152 model, ResNet50 model, DPN131 model, AlexNet model, DenseNet model, etc., and the DPN model is preferred. DPN (Dual Path Network) is a neural network structure, which introduces the core content of DenseNet on the basis of ResNeXt, which makes the model use features more fully. The above-mentioned DPN, ResNeXt and DenseNet are existing network structures and will not be repeated here. The emotion categories can be classified in any manner, for example, including tension, happiness, sadness, indignation, and so on.

As described in the above step S5, the target text typesetting type corresponding to the predicted emotion type is obtained according to the preset correspondence between the emotion category and the text typesetting type. The preset correspondence between the emotion category and the text typesetting type is, for example, when the emotion category is a stable emotion, the identifier is used to replace the original handwritten text, and the recognized handwritten text is recorded at the end of the text without destroying the printed text When the emotion category is agitation, typeset the handwritten text with a special font in the original place of the handwritten text. The text typesetting can be any feasible way. Among them, the type of text layout corresponds to the emotion category. For example, for the passionate emotion category, red font and bold are used to reflect; for the sad emotion category, green font and italics are used to reflect. Of course, the typesetting type can also include any other feasible types.

As described in step S6, the printed text and the handwritten text are typeset according to the target text typesetting type to generate the note. Since the handwritten notes obtained by typesetting the printed text and the handwritten text according to the target text typesetting further retain the information of the original handwritten text, the recognition is more relevant and the user experience Better, the rate of missing information is lower.

In one embodiment, the step S2 of judging whether the designated picture is similar to the picture previously obtained by the designated terminal by using a preset picture similarity judgment method includes:

S201: Perform gray-scale processing on the designated picture and the picture previously acquired by the designated terminal respectively to obtain a first gray-scale picture and a second gray-scale picture;

S202: Calculate the average value Am of the gray values of all pixels in the m-th column or the m-th row of the gray-scale picture, and calculate the average value B of the gray values of all the pixels in the gray-scale picture;

S203. According to the formula:

Calculate the overall variance of the m-th column or m-th row of the grayscale image

Where N is the total number of columns or rows in the grayscale picture;

S204, according to the formula:

Obtain the difference between the overall variance of the m-th column or m-th row of the first gray-scale picture and the second gray-scale picture

among them,

Is the overall variance of the m-th column or m-th row of the first grayscale picture,

Is the overall variance of the m-th column or the m-th row of the second grayscale picture;

S205, judgment

Whether it is less than the preset variance error threshold;

S206, if

If it is less than the preset variance error threshold, it is determined that the specified picture is similar to the picture previously obtained by the specified terminal.

As described above, the use of a preset picture similarity judgment method is implemented to judge whether the specified picture is similar to the picture previously obtained by the specified terminal. Among them, grayscale refers to the color representing a grayscale color. For example, in the RGB model, if R=G=B, the color represents a grayscale color, and the value of R=G=B is called grayscale. Therefore, each pixel of a grayscale image only needs one byte to store the grayscale value (also called intensity value, brightness value), which reduces the storage capacity. The gray scale range is, for example, 0-255 (when the values of R, G, and B are all 0-255, of course, it will also change with the change of the value range of R, G, and B). The gray-scale processing method can be any method, such as the component method, the maximum value method, the average method, and the weighted average method. Among them, since there are only 256 value ranges for gray values, image comparison on this basis can greatly reduce the amount of calculation. Then calculate the average value Am of the gray values of all pixels in the m-th column or the m-th row of the gray-scale picture, and calculate the average value B of the gray values of all the pixels in the gray-scale picture. Wherein, the process of calculating the average value Am of the gray values of all pixels in the m-th column or m-th row of the gray-scale picture includes: collecting all the pixels in the m-th column or m-th row of the gray-scale picture Add the gray values of all pixels in the mth column or mth row, and divide the sum of the gray values obtained by the summation by the mth column or The number of all pixels in the m rows is the average value Am of the gray values of all the pixels in the mth column or mth row of the grayscale image. The process of calculating the average value B of the gray values of all pixels in the gray image includes: calculating the sum of the gray values of all pixels in the gray image, and then dividing the sum of the gray values by According to the number of pixels, the average value B of the gray values of all pixels in the gray image is obtained. According to the formula:

Where N is the total number of columns or rows in the grayscale picture. In this application, the overall variance is used to measure the average of the gray values Am of the pixels in the m-th column or the m-th row of the gray-scale picture and the average of the gray-scale values of all pixels in the gray-scale picture. The difference between the value B.

According to the formula:

Obtain the difference between the overall variance of the m-th column or m-th row of the two grayscale images

among them,

Is the overall variance of the mth column or mth row of the first grayscale image,

Is the overall variance of the mth column or mth row of the second grayscale image. Difference of population variance

It reflects the difference in the gray value of the m-th column or m-th row of the two gray-scale pictures. when

When it is smaller, such as 0, it means

Equal to or approximately equal to

It can be considered that the gray value of the m-th column or row of the first gray-scale image is the same or approximately the same as the gray value of the m-th column or m-th row of the second gray-scale image (approximate judgment to save computing power , And because the overall variance of the two different pictures is generally not equal, the accuracy of the judgment is very high), on the contrary, the gray value of the mth column or mth row of the first grayscale image is considered to be the same as the second grayscale value. The gray values of the m-th column or m-th row of the picture are different. judgment

Whether it is less than the preset variance error threshold. among them

The return value is

The maximum value in. If

If it is less than the preset variance error threshold, it is determined that the specified picture is similar to the picture previously acquired by the specified terminal. Approximate judgment is used (because all gray values of grayscale pictures converted from two different pictures are generally not equal, and all grayscale values of grayscale pictures converted from the same picture are generally equal), it is possible to reduce the cost of calculation Under the premise of resources, it is determined whether the designated picture is similar to the picture previously acquired by the designated terminal. Accordingly, when the designated picture is not similar to the picture previously acquired by the designated terminal, the subsequent steps are performed (if the designated picture is similar to the picture previously acquired by the designated terminal, it indicates that the designated picture is similar to the picture previously acquired by the designated terminal. The specified picture has been processed for note generation, so there is no need to process it again), reducing unnecessary resource consumption.

S211: Compare corresponding pixels in the designated picture and the picture previously acquired by the designated terminal in turn, and count the number of the same pixels;

S212. According to the formula: the proportion of the same pixels=the number of the same pixels/the number of all the pixels in the specified picture, to obtain the proportion of the same pixels;

S213: Determine whether the proportion of the same pixel points is greater than a preset proportion threshold;

S214: If the proportion of the same pixel points is greater than a preset proportion threshold, it is determined that the specified picture is similar to the picture previously obtained by the specified terminal.

As described above, it is possible to use a preset method for judging the similarity of pictures to judge whether the designated picture is similar to the picture previously acquired by the designated terminal. In order to accurately determine whether the designated picture is similar to the picture previously acquired by the designated terminal, this embodiment adopts a method of successively comparing pixels for judgment. If the two pictures are the same, the number of the same pixels should account for the vast majority, that is, the proportion of the same pixels is close to 1. According to this, according to the formula: the proportion of the same pixels = the number of the same pixels/the number of all the pixels in the specified picture, the proportion of the same pixels is calculated, and if the proportion of the same pixels is greater than With a preset proportion threshold, it is determined that the designated picture is similar to the picture previously acquired by the designated terminal.

In one embodiment, the color of the handwritten text is different from the color of the printed text, and the preset text recognition technology is used to recognize the handwritten text and the printed text in the designated picture as handwritten text. And step S3 of printed text, including:

S301. Collect the value of the R color channel, the value of the G color channel, and the value of the B color channel in the RGB color model of the pixel in the specified picture, and convert the specified picture into the specified picture according to a preset three-value method. The RGB color of the pixel is set to (0,0,0), (255,255,255) or (P,P,P), where P is a preset value greater than 0 and less than 255 to obtain a temporary picture composed of three colors ；

S302. Calculate the area occupied by the three colors in the temporary picture, and respectively adopt a preset text segmentation method for the area occupied by the two colors with a smaller area to obtain a single handwritten text that is divided and a single handwritten text that is divided. Printed text

S303. Extract the text features of the single handwritten text and the text features of the single printed text, and input them into a preset support vector machine for classification to obtain recognized handwritten text and printed text.

As described above, the recognition of handwritten text and printed text using the three-value method is realized. In order to distinguish between handwritten text and printed text more accurately, this application uses a three-value method, that is, according to a preset three-value method, the RGB color of the pixel in the specified picture is set to (0,0,0 ), (255,255,255) or (P,P,P), where P is a preset value greater than 0 and less than 255, obtain a temporary picture composed of three colors, and calculate the proportion of the three colors in the temporary picture Area, and use the preset text segmentation method for the area occupied by the two colors with the smaller area (because the largest area is definitely the background, so there is no need to analyze the area with the largest area) to obtain a single handwritten text that is divided And separate printed text. The support vector machine is a generalized linear classifier that performs binary classification of data in a supervised learning manner, and is suitable for comparing the recognized text with the pre-stored text to output the most similar text. According to this, the text features of the single handwritten text and the text features of the single printed text are extracted, and input into a preset support vector machine for classification, and the recognized handwritten text and printed text are obtained. The character feature is, for example, a special point in the pixel point corresponding to the character, such as an extreme point, an isolated point, etc.

In one embodiment, the collection of the values of the R color channel, the value of the G color channel and the value of the B color channel in the RGB color model of the pixel in the specified picture is performed according to a preset three-value method The step S301 of setting the RGB color of the pixel in the designated picture to (0, 0, 0), (255, 255, 255) or (P, P, P) includes:

S3011, collect the values of the R color channel, the value of the G color channel and the value of the B color channel in the RGB color model of the pixel in the specified picture, and according to the formula: F1=MIN{ROUND[(a1R+a2G+ a3B)/L,0],A}, get the reference value F1, where MIN is the minimum value function, ROUND is the rounding function, a1, a2, and a3 are all positive numbers greater than 0 and less than L, and L is an integer greater than 0 , A is the first threshold parameter with a preset value in the range (0,255), R, G, and B are respectively the value of the R color channel and the G color in the RGB color model of the designated pixel in the designated picture The value of the channel and the value of the B color channel;

S3012. Determine whether the value of the reference value F1 is equal to A;

S3013. If the value of the reference value F1 is not equal to A, then according to the formula: F2=MAX{ROUND[(a1R+a2G+a3B)/L,0],B}, obtain the reference value F2, where MIN is the maximum value Function, B is the second threshold parameter with a preset value in the range (0,255), and B is greater than A;

S3014. Determine whether the value of the reference value F2 is equal to B;

S3015: If the value of the reference value F2 is not equal to B, set the RGB color of the designated pixel to (255, 255, 255).

As mentioned above, the value of the R color channel, the value of the G color channel, and the value of the B color channel in the RGB color model of the pixel in the specified picture are collected, and all the values are calculated according to the preset three-value method. The RGB color of the pixel in the specified picture is set to (0,0,0), (255,255,255) or (P,P,P). This application adopts the formula: F1=MIN{ROUND[(a1R+a2G+a3B)/L,0],A} and the formula: F2=MAX{ROUND[(a1R+a2G+a3B)/L,0],B} To determine the RGB color of the designated pixel. Further, if the value of the reference value F1 is not equal to A, the RGB color of the designated pixel is set to (0, 0, 0). Further, if the value of the reference value F2 is equal to B, the RGB color of the designated pixel is set to (P, P, P). Three-value processing is realized, so that the background, printed text, and handwritten text can be completely distinguished to facilitate subsequent recognition processing. The ROUND function is a rounding function, ROUND(X,a) refers to the rounding operation of the real number X according to the decimal place a, where a is an integer greater than or equal to 0, for example, ROUND(2.4,0)=2.

In one embodiment, said inputting said characteristic data into an emotion recognition model based on neural network model training to obtain the predicted emotion category output by said emotion recognition model, wherein said emotion recognition model is based on pre-collected handwritten text, And before step S4 trained by sample data composed of emotion categories associated with the pre-collected handwritten text, the method includes:

S401. Retrieve pre-collected sample data, and divide the sample data into a training set and a test set; wherein the sample data includes pre-collected handwritten text and emotion categories associated with the pre-collected handwritten text;

S402: Input the sample data of the training set into a preset neural network model for training to obtain an initial emotion recognition model, where the stochastic gradient descent method is used in the training process;

S403: Use the sample data of the test set to verify the initial emotion recognition model.

S404: If the verification of the initial emotion recognition model is passed, record the initial emotion recognition model as the emotion recognition model.

As described above, the emotion recognition model is set. This application is based on a neural network model to train an emotion recognition model. Among them, the neural network model can be VGG16 model, VGG-F model, ResNet152 model, ResNet50 model, DPN131 model, AlexNet model, DenseNet model, etc. Among them, the stochastic gradient descent method is to randomly sample some training data to replace the entire training set. If the sample size is large (for example, hundreds of thousands), then only tens of thousands or thousands of samples may be used, and iterative When the optimal solution is reached, the training speed can be improved. Further, the training can also use the reverse conduction rule to update the parameters of each layer of the neural network. The reverse conduction law is based on the gradient descent method, and its input-output relationship is essentially a mapping relationship: the function of a neural network with n-input and m-output is from n-dimensional Euclidean space to m-dimensional Ou A continuous mapping of a finite field in the space, this mapping is highly non-linear, which is conducive to the update of the parameters of each layer of the neural network model. Obtain the initial emotion recognition model. The sample data of the test set is then used to verify the initial emotion recognition model, and if the verification is passed, the initial emotion recognition model is recorded as the emotion recognition model.

In one embodiment, after the step S6 of generating the note, the step S6 of formatting the printed text and the handwritten text according to the target text typesetting includes:

S61. Receive an acquisition request for acquiring a handwritten note sent by a second terminal, where the acquisition request records a reading format supported by the second terminal;

S62: Determine whether the reading format of the reading software can display the notes;

S63. If the reading format of the reading software can display the note, send the note to the second terminal.

As described above, it is realized that the note is sent to the second terminal. Since the second terminal may not support reading and displaying the note, the note is formatted and then sent to the second terminal to avoid the second terminal from failing to recognize the handwritten note. Based on this, it is determined whether the reading format of the reading software can display the note; if the reading format of the reading software can display the note, the note is sent to the second terminal. Further, if the reading format of the reading software cannot display the note, the format of the note is converted to the reading format of the reading software, and then sent to the second terminal.

Referring to FIG. 2, an embodiment of the present application provides a note generation device based on text recognition technology, which is applied to a designated terminal, and includes:

The designated picture acquiring unit 10 is used to acquire designated pictures with handwritten text and printed text;

The similarity determination unit 20 is configured to use a preset picture similarity determination method to determine whether the designated picture is similar to the picture previously acquired by the designated terminal;

The feature data acquiring unit 30 is configured to, if the designated picture is not similar to the picture previously acquired by the designated terminal, use a preset text recognition technology to recognize the handwritten text and printed text in the designated picture as Handwritten text and printed text, as well as extracting feature data of the handwritten text in the designated picture, wherein the feature data includes at least the repetition position and the repetition number in the handwritten text;

The predicted emotion category obtaining unit 40 is configured to input the feature data into an emotion recognition model trained based on a neural network model to obtain the predicted emotion category output by the emotion recognition model, wherein the emotion recognition model is based on pre-collected handwritten text , And training from sample data composed of emotion categories associated with the pre-collected handwritten text;

The typesetting type obtaining unit 50 is configured to obtain the target text typesetting type corresponding to the predicted emotion type according to the preset correspondence between the emotion category and the text typesetting type;

The typesetting unit 60 is configured to typeset the printed text and the handwritten text according to the target text typesetting type to generate the note.

The operations performed by the above-mentioned units respectively correspond to the steps of the note generation method based on the text recognition technology of the foregoing embodiment, and will not be repeated here.

In one embodiment, the similarity judgment unit 20 includes:

A grayscale subunit, configured to perform grayscale processing on the designated picture and the picture previously acquired by the designated terminal, respectively, to obtain a first grayscale picture and a second grayscale picture;

The average value calculation subunit is used to calculate the average value Am of the gray values of all pixels in the m-th column or the m-th row of the gray-scale image, and calculate the average value B of the gray values of all the pixels in the gray-scale image ；

The overall variance calculation subunit is used according to the formula:

Where N is the total number of columns or rows in the grayscale picture;

The difference of variance calculation subunit is used according to the formula:

among them,

Error threshold judgment subunit, used to judge

Whether it is less than the preset variance error threshold;

Similarity determination subunit, used if

The operations performed by the above-mentioned sub-units respectively correspond to the steps of the note generation method based on the text recognition technology of the foregoing embodiment, and will not be repeated here.

In one embodiment, the similarity judgment unit 20 includes:

The same pixel count subunit, which is used to sequentially compare corresponding pixels in the designated picture and the picture previously obtained by the designated terminal, and count the number of identical pixels;

The same pixel ratio calculation subunit is used to obtain the same pixel ratio according to the formula: the same pixel ratio=the number of the same pixels/the number of all pixels in the specified picture;

The proportion threshold judging subunit is used to judge whether the proportion of the same pixel is greater than a preset proportion threshold;

The second similarity determination subunit is configured to determine that the designated picture is similar to the picture previously obtained by the designated terminal if the proportion of the same pixel is greater than the preset proportion threshold.

In one embodiment, the color of the handwritten text is different from the color of the printed text, and the characteristic data acquiring unit 30 includes:

The temporary picture generation subunit is used to collect the value of the R color channel, the value of the G color channel and the value of the B color channel in the RGB color model of the pixel in the specified picture, and according to the preset three-value method Set the RGB color of the pixel in the specified picture to (0,0,0), (255,255,255) or (P,P,P), where P is a preset value greater than 0 and less than 255, and the Temporary pictures composed of various colors;

The segmentation subunit is used to calculate the area occupied by the three colors in the temporary picture, and use the preset text segmentation method for the area occupied by the two colors with the smaller area to obtain the divided single handwritten text and Separate single printed text;

The recognition subunit is used to extract the text features of the single handwritten text and the text features of the single printed text, and input them into a preset support vector machine for classification to obtain the recognized handwritten text text and printed text text.

In one embodiment, the temporary picture generation subunit includes:

The reference value F1 calculation module is used to collect the value of the R color channel, the value of the G color channel and the value of the B color channel in the RGB color model of the pixel in the specified picture, and according to the formula: F1=MIN{ROUND [(a1R+a2G+a3B)/L,0],A}, get the reference value F1, where MIN is the minimum value function, ROUND is the rounding function, a1, a2, and a3 are all positive numbers greater than 0 and less than L, L is an integer greater than 0, A is the first threshold parameter with a preset value in the range (0, 255), R, G, and B are respectively the R color in the RGB color model of the designated pixel in the designated picture The numerical value of the channel, the numerical value of the G color channel and the numerical value of the B color channel;

The reference value F1 judgment module is used to judge whether the value of the reference value F1 is equal to A;

The reference value F2 calculation module is used to obtain the reference value F2 according to the formula: F2=MAX{ROUND[(a1R+a2G+a3B)/L,0],B} if the value of the reference value F1 is not equal to A , Where MIN is the maximum value function, B is the second threshold parameter with a preset value in the range (0,255), and B is greater than A;

The reference value F2 judgment module is used to judge whether the value of the reference value F2 is equal to B;

The color setting module is configured to set the RGB color of the designated pixel to (255, 255, 255) if the value of the reference value F2 is not equal to B.

The operations performed by the above-mentioned modules respectively correspond to the steps of the note generation method based on the text recognition technology of the foregoing embodiment, and will not be repeated here.

In one embodiment, the device includes:

The sample data retrieval unit is used to retrieve pre-collected sample data and divide the sample data into a training set and a test set; wherein the sample data includes pre-collected handwritten characters and is associated with the pre-collected handwritten characters Emotional category;

The training unit is used to input the sample data of the training set into the preset neural network model for training to obtain the initial emotion recognition model, wherein the stochastic gradient descent method is used in the training process;

A verification unit for verifying the initial emotion recognition model by using sample data of the test set;

The marking unit is configured to record the initial emotion recognition model as the emotion recognition model if the verification of the initial emotion recognition model is passed.

In one embodiment, the device includes:

A reading format obtaining unit, configured to receive an obtaining request for obtaining handwritten notes sent by a second terminal, wherein the obtaining request records a reading format supported by the second terminal;

The reading format judgment unit is used to judge whether the reading format of the reading software can display the notes;

The note sending unit is configured to send the note to the second terminal if the reading format of the reading software can display the note.

3, an embodiment of the present application also provides a computer device. The computer device may be a server, and its internal structure may be as shown in the figure. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store the data used in the note generation method based on text recognition technology. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a note generation method based on text recognition technology.

The above-mentioned processor executes the above-mentioned note generation method based on text recognition technology, wherein the steps included in the method respectively correspond to the steps of executing the note generation method based on text recognition technology of the aforementioned embodiment one-to-one, and will not be repeated here.

An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, a method for generating notes based on text recognition technology is realized, wherein the steps included in the method are respectively the same as those in The steps of the note generation method based on the text recognition technology of the embodiment correspond one to one, and will not be repeated here. The computer-readable storage medium is, for example, a non-volatile computer-readable storage medium or a volatile computer-readable storage medium.

Claims

A note generation method based on text recognition technology, applied to a designated terminal, is characterized in that it includes:

Obtain designated pictures with handwritten text and printed text;

Using a preset picture similarity judgment method to judge whether the specified picture is similar to the picture previously obtained by the specified terminal;

If the designated picture is not similar to the picture previously obtained by the designated terminal, the handwritten text and printed text in the designated picture are recognized as handwritten text and printed text, respectively, by using a preset text recognition technology , And extracting feature data of the handwritten text in the designated picture, wherein the feature data includes at least the repetition position and the number of repetitions in the handwritten text;

The feature data is input into the emotion recognition model trained based on the neural network model to obtain the predicted emotion category output by the emotion recognition model, wherein the emotion recognition model is based on pre-collected handwritten text and is related to the pre-collected handwritten text. Trained on sample data composed of emotion categories associated with text;

Acquiring the target text typesetting type corresponding to the predicted emotion type according to the preset correspondence relationship between the emotion category and the text typesetting type;

The printed text and the handwritten text are typeset according to the target text typesetting type to generate the note.
The method for generating notes based on text recognition technology according to claim 1, wherein the predetermined method for judging the similarity of pictures is used to judge whether the designated picture is similar to the picture previously obtained by the designated terminal. The steps include:

Performing gray-scale processing on the designated picture and the picture previously acquired by the designated terminal, respectively, to obtain a first gray-scale picture and a second gray-scale picture;

Calculate the average value Am of the gray values of all pixels in the m-th column or the m-th row of the gray-scale picture, and calculate the average value B of the gray values of all the pixels in the gray-scale picture;

According to the formula:
Calculate the overall variance of the m-th column or m-th row of the grayscale image
Where N is the total number of columns or rows in the grayscale picture;

According to the formula:
Obtain the difference between the overall variance of the m-th column or m-th row of the first gray-scale picture and the second gray-scale picture
among them,
Is the overall variance of the m-th column or m-th row of the first grayscale picture,
Is the overall variance of the m-th column or the m-th row of the second grayscale picture;

judgment
Whether it is less than the preset variance error threshold;

If
If it is less than the preset variance error threshold, it is determined that the specified picture is similar to the picture previously acquired by the specified terminal.
The method for generating notes based on text recognition technology according to claim 1, wherein the predetermined method for judging the similarity of pictures is used to judge whether the designated picture is similar to the picture previously obtained by the designated terminal. The steps include:

Sequentially compare the corresponding pixels in the designated picture and the picture previously acquired by the designated terminal, and count the number of the same pixels;

According to the formula: the proportion of the same pixels=the number of the same pixels/the number of all the pixels in the specified picture, the proportion of the same pixels is obtained;

Judging whether the proportion of the same pixel points is greater than a preset proportion threshold;

If the proportion of the same pixel is greater than the preset proportion threshold, it is determined that the specified picture is similar to the picture previously obtained by the specified terminal.
The method for generating notes based on text recognition technology according to claim 1, wherein the color of the handwritten text is different from the color of the printed text, and the predetermined text recognition technology is used to convert the designated picture The steps of recognizing the handwritten text and printed text as handwritten text and printed text respectively include:

Collect the value of the R color channel, the value of the G color channel, and the value of the B color channel in the RGB color model of the pixels in the designated picture, and convert the pixels in the designated picture according to a preset three-value method The RGB color of the point is set to (0,0,0), (255,255,255) or (P,P,P), where P is a preset value greater than 0 and less than 255 to obtain a temporary picture composed of three colors;

Calculate the area occupied by the three colors in the temporary image, and use the preset text segmentation method for the area occupied by the two colors with the smaller area to obtain a single handwritten text divided and a single printed body divided Text

The text features of the single handwritten text and the text features of the single printed text are extracted and input into a preset support vector machine for classification to obtain recognized handwritten text and printed text.
The method for generating notes based on text recognition technology according to claim 4, wherein the value of the R color channel, the value of the G color channel, and the value of the B color channel in the RGB color model of the pixel in the specified picture are collected. The value of the color channel, and the step of setting the RGB color of the pixel in the specified picture to (0,0,0), (255,255,255) or (P,P,P) according to the preset three-value method, include:

Collect the value of the R color channel, the value of the G color channel and the value of the B color channel in the RGB color model of the pixel in the specified picture, and according to the formula: F1=MIN{ROUND[(a1R+a2G+a3B) /L,0],A}, get the reference value F1, where MIN is the minimum value function, ROUND is the rounding function, a1, a2, and a3 are all positive numbers greater than 0 and less than L, L is an integer greater than 0, A It is the preset value of the first threshold parameter within the range (0,255), R, G, and B are respectively the value of the R color channel and the value of the G color channel in the RGB color model of the specified pixel in the specified picture. Value and value of B color channel;

Determine whether the value of the reference value F1 is equal to A;

If the value of the reference value F1 is not equal to A, then the reference value F2 is obtained according to the formula: F2=MAX{ROUND[(a1R+a2G+a3B)/L,0],B}, where MIN is the maximum value function, B is the second threshold parameter with a preset value in the range (0,255), and B is greater than A;

Determine whether the value of the reference value F2 is equal to B;

If the value of the reference value F2 is not equal to B, the RGB color of the designated pixel is set to (255, 255, 255).
The method for generating notes based on text recognition technology according to claim 1, wherein said inputting said characteristic data into an emotion recognition model based on neural network model training to obtain the predicted emotion category output by said emotion recognition model Before the step of training the emotion recognition model based on sample data composed of pre-collected handwritten text and emotion categories associated with the pre-collected handwritten text, it includes:

Retrieve pre-collected sample data, and divide the sample data into a training set and a test set; wherein the sample data includes pre-collected handwritten characters and emotion categories associated with the pre-collected handwritten characters;

Input the sample data of the training set into the preset neural network model for training to obtain the initial emotion recognition model, where the stochastic gradient descent method is used in the training process;

Verifying the initial emotion recognition model by using sample data of the test set;

If the verification of the initial emotion recognition model is passed, the initial emotion recognition model is recorded as the emotion recognition model.
The method for generating notes based on text recognition technology according to claim 1, characterized in that the said printed text and said handwritten text are typeset according to said target text typesetting type to generate said note After the steps, include:

Receiving an acquisition request for acquiring a handwritten note sent by a second terminal, where the acquisition request records a reading format supported by the second terminal;

Determine whether the reading format of the reading software can display the notes;

If the reading format of the reading software can display the note, the note is sent to the second terminal.
A note generation device based on text recognition technology, which is applied to a designated terminal, and is characterized in that it includes:

Designated picture acquisition unit for acquiring designated pictures with handwritten text and printed text;

A similarity judgment unit, configured to use a preset image similarity judgment method to judge whether the designated picture is similar to the picture previously acquired by the designated terminal;

The feature data acquisition unit is configured to, if the designated picture is not similar to the picture previously acquired by the designated terminal, use a preset text recognition technology to recognize the handwritten text and printed text in the designated picture as handwriting. Text and printed text, and extracting feature data of the handwritten text in the designated picture, wherein the feature data includes at least the repetition position and the number of repetitions in the handwritten text;

The predicted emotion category obtaining unit is configured to input the characteristic data into an emotion recognition model trained based on a neural network model to obtain the predicted emotion category output by the emotion recognition model, wherein the emotion recognition model is based on pre-collected handwritten text, And training from sample data composed of emotion categories associated with the pre-collected handwritten text;

The typesetting type obtaining unit is configured to obtain the target text typesetting type corresponding to the predicted emotion type according to the preset correspondence between the emotion category and the text typesetting type;

The typesetting unit is configured to typeset the printed text and the handwritten text according to the target text typesetting type to generate the note.
8. The note generation device based on text recognition technology according to claim 8, wherein the similarity judgment unit comprises:

A grayscale subunit, configured to perform grayscale processing on the designated picture and the picture previously acquired by the designated terminal, respectively, to obtain a first grayscale picture and a second grayscale picture;

The average value calculation subunit is used to calculate the average value Am of the gray values of all pixels in the m-th column or the m-th row of the gray-scale image, and calculate the average value B of the gray values of all the pixels in the gray-scale image ；

The overall variance calculation subunit is used according to the formula:
Calculate the overall variance of the m-th column or m-th row of the grayscale image
Where N is the total number of columns or rows in the grayscale picture;

The difference of variance calculation subunit is used according to the formula:
Obtain the difference between the overall variance of the m-th column or m-th row of the first gray-scale picture and the second gray-scale picture
among them,
Is the overall variance of the m-th column or m-th row of the first grayscale picture,
Is the overall variance of the m-th column or the m-th row of the second grayscale picture;

Error threshold judgment subunit, used to judge
Whether it is less than the preset variance error threshold;

Similarity determination subunit, used if
If it is less than the preset variance error threshold, it is determined that the specified picture is similar to the picture previously acquired by the specified terminal.
8. The note generation device based on text recognition technology according to claim 8, wherein the similarity judgment unit comprises:

The same pixel count subunit, which is used to sequentially compare corresponding pixels in the designated picture and the picture previously obtained by the designated terminal, and count the number of identical pixels;

The same pixel ratio calculation subunit is used to obtain the same pixel ratio according to the formula: the same pixel ratio=the number of the same pixels/the number of all pixels in the specified picture;

The proportion threshold judging subunit is used to judge whether the proportion of the same pixel is greater than a preset proportion threshold;

The second similarity determination subunit is configured to determine that the designated picture is similar to the picture previously obtained by the designated terminal if the proportion of the same pixel is greater than the preset proportion threshold.
8. The note generation device based on text recognition technology according to claim 8, wherein the color of the handwritten text is different from the color of the printed text, and the characteristic data acquiring unit includes:

The temporary picture generation subunit is used to collect the value of the R color channel, the value of the G color channel and the value of the B color channel in the RGB color model of the pixel in the specified picture, and according to the preset three-value method Set the RGB color of the pixel in the specified picture to (0,0,0), (255,255,255) or (P,P,P), where P is a preset value greater than 0 and less than 255, and the Temporary pictures composed of various colors;

The segmentation subunit is used to calculate the area occupied by the three colors in the temporary picture, and use the preset text segmentation method for the area occupied by the two colors with the smaller area to obtain the divided single handwritten text and Separate single printed text;

The recognition subunit is used to extract the text features of the single handwritten text and the text features of the single printed text, and input them into a preset support vector machine for classification to obtain the recognized handwritten text text and printed text text.
The note generation device based on text recognition technology according to claim 11, wherein the temporary picture generation subunit comprises:

The reference value F1 calculation module is used to collect the value of the R color channel, the value of the G color channel and the value of the B color channel in the RGB color model of the pixel in the specified picture, and according to the formula: F1=MIN{ROUND [(a1R+a2G+a3B)/L,0],A}, get the reference value F1, where MIN is the minimum value function, ROUND is the rounding function, a1, a2, and a3 are all positive numbers greater than 0 and less than L, L is an integer greater than 0, A is the first threshold parameter with a preset value in the range (0, 255), R, G, and B are respectively the R color in the RGB color model of the designated pixel in the designated picture The numerical value of the channel, the numerical value of the G color channel and the numerical value of the B color channel;

The reference value F1 judgment module is used to judge whether the value of the reference value F1 is equal to A;

The reference value F2 calculation module is used to obtain the reference value F2 according to the formula: F2=MAX{ROUND[(a1R+a2G+a3B)/L,0],B} if the value of the reference value F1 is not equal to A , Where MIN is the maximum value function, B is the second threshold parameter with a preset value in the range (0,255), and B is greater than A;

The reference value F2 judgment module is used to judge whether the value of the reference value F2 is equal to B;

The color setting module is configured to set the RGB color of the designated pixel to (255, 255, 255) if the value of the reference value F2 is not equal to B.
8. The note generation device based on text recognition technology according to claim 8, said device comprising:

The sample data retrieval unit is used to retrieve pre-collected sample data and divide the sample data into a training set and a test set; wherein the sample data includes pre-collected handwritten characters and is associated with the pre-collected handwritten characters Emotional category;

The training unit is used to input the sample data of the training set into the preset neural network model for training to obtain the initial emotion recognition model, wherein the stochastic gradient descent method is used in the training process;

A verification unit for verifying the initial emotion recognition model by using sample data of the test set;

The marking unit is configured to record the initial emotion recognition model as the emotion recognition model if the verification of the initial emotion recognition model is passed.
8. The note generation device based on text recognition technology according to claim 8, wherein the device comprises:

A reading format obtaining unit, configured to receive an obtaining request for obtaining handwritten notes sent by a second terminal, wherein the obtaining request records a reading format supported by the second terminal;

The reading format judgment unit is used to judge whether the reading format of the reading software can display the notes;

The note sending unit is configured to send the note to the second terminal if the reading format of the reading software can display the note.
A computer device includes a memory and a processor, the memory stores a computer program, and is characterized in that the processor implements a method for generating notes based on text recognition technology when the processor executes the computer program, and the text recognition technology-based Note generation methods, including:

Obtain designated pictures with handwritten text and printed text;

Using a preset picture similarity judgment method to judge whether the specified picture is similar to the picture previously obtained by the specified terminal;

If the designated picture is not similar to the picture previously obtained by the designated terminal, the handwritten text and printed text in the designated picture are recognized as handwritten text and printed text, respectively, by using a preset text recognition technology , And extracting feature data of the handwritten text in the designated picture, wherein the feature data includes at least the repetition position and the number of repetitions in the handwritten text;

The feature data is input into the emotion recognition model trained based on the neural network model to obtain the predicted emotion category output by the emotion recognition model, wherein the emotion recognition model is based on pre-collected handwritten text and is related to the pre-collected handwritten text. Trained on sample data composed of emotion categories associated with text;

Acquiring the target text typesetting type corresponding to the predicted emotion type according to the preset correspondence relationship between the emotion category and the text typesetting type;

The printed text and the handwritten text are typeset according to the target text typesetting type to generate the note.
The computer device according to claim 15, wherein the step of determining whether the designated picture is similar to the picture previously obtained by the designated terminal by using a preset picture similarity judgment method comprises:

Performing gray-scale processing on the designated picture and the picture previously acquired by the designated terminal, respectively, to obtain a first gray-scale picture and a second gray-scale picture;

Calculate the average value Am of the gray values of all pixels in the m-th column or the m-th row of the gray-scale picture, and calculate the average value B of the gray values of all the pixels in the gray-scale picture;

According to the formula:
Calculate the overall variance of the m-th column or m-th row of the grayscale image
Where N is the total number of columns or rows in the grayscale picture;

According to the formula:
Obtain the difference between the overall variance of the m-th column or m-th row of the first gray-scale picture and the second gray-scale picture
among them,
Is the overall variance of the m-th column or m-th row of the first grayscale picture,
Is the overall variance of the m-th column or the m-th row of the second grayscale picture;

judgment
Whether it is less than the preset variance error threshold;

If
If it is less than the preset variance error threshold, it is determined that the specified picture is similar to the picture previously acquired by the specified terminal.
The computer device according to claim 15, wherein the step of determining whether the designated picture is similar to the picture previously obtained by the designated terminal by using a preset picture similarity judgment method comprises:

Sequentially compare the corresponding pixels in the designated picture and the picture previously acquired by the designated terminal, and count the number of the same pixels;

According to the formula: the proportion of the same pixels=the number of the same pixels/the number of all the pixels in the specified picture, the proportion of the same pixels is obtained;

Judging whether the proportion of the same pixel points is greater than a preset proportion threshold;

If the proportion of the same pixel is greater than the preset proportion threshold, it is determined that the specified picture is similar to the picture previously obtained by the specified terminal.
A non-volatile computer-readable storage medium with a computer program stored thereon, wherein the computer program is characterized in that, when the computer program is executed by a processor, a method for generating notes based on text recognition technology is implemented, and the text recognition technology-based Note generation methods, including:

Obtain designated pictures with handwritten text and printed text;

Using a preset picture similarity judgment method to judge whether the specified picture is similar to the picture previously obtained by the specified terminal;

If the designated picture is not similar to the picture previously obtained by the designated terminal, the handwritten text and printed text in the designated picture are recognized as handwritten text and printed text, respectively, by using a preset text recognition technology , And extracting feature data of the handwritten text in the designated picture, wherein the feature data includes at least the repetition position and the number of repetitions in the handwritten text;

The feature data is input into the emotion recognition model trained based on the neural network model to obtain the predicted emotion category output by the emotion recognition model, wherein the emotion recognition model is based on pre-collected handwritten text and is related to the pre-collected handwritten text. Trained on sample data composed of emotion categories associated with text;

Acquiring the target text typesetting type corresponding to the predicted emotion type according to the preset correspondence relationship between the emotion category and the text typesetting type;

The printed text and the handwritten text are typeset according to the target text typesetting type to generate the note.
The non-volatile computer-readable storage medium according to claim 18, wherein the predetermined method for determining the similarity of pictures is used to determine whether the designated picture is the same as the picture previously acquired by the designated terminal. Similar steps include:

Performing gray-scale processing on the designated picture and the picture previously acquired by the designated terminal, respectively, to obtain a first gray-scale picture and a second gray-scale picture;

Calculate the average value Am of the gray values of all pixels in the m-th column or the m-th row of the gray-scale picture, and calculate the average value B of the gray values of all the pixels in the gray-scale picture;

According to the formula:
Calculate the overall variance of the m-th column or m-th row of the grayscale image
Where N is the total number of columns or rows in the grayscale picture;

According to the formula:
Obtain the difference between the overall variance of the m-th column or m-th row of the first gray-scale picture and the second gray-scale picture
among them,
Is the overall variance of the m-th column or m-th row of the first grayscale picture,
Is the overall variance of the m-th column or the m-th row of the second grayscale picture;

judgment
Whether it is less than the preset variance error threshold;

If
If it is less than the preset variance error threshold, it is determined that the specified picture is similar to the picture previously acquired by the specified terminal.
The non-volatile computer-readable storage medium according to claim 18, wherein the predetermined method for determining the similarity of pictures is used to determine whether the designated picture is the same as the picture previously acquired by the designated terminal. Similar steps include:

Sequentially compare the corresponding pixels in the designated picture and the picture previously acquired by the designated terminal, and count the number of the same pixels;

According to the formula: the proportion of the same pixels=the number of the same pixels/the number of all the pixels in the specified picture, the proportion of the same pixels is obtained;

Judging whether the proportion of the same pixel points is greater than a preset proportion threshold;

If the proportion of the same pixel is greater than the preset proportion threshold, it is determined that the specified picture is similar to the picture previously obtained by the specified terminal.