CN118116014A

CN118116014A - Text detection method and device, electronic equipment and storage medium

Info

Publication number: CN118116014A
Application number: CN202211486252.0A
Authority: CN
Inventors: 马咪娜; 李忠利
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Filing date: 2022-11-24
Publication date: 2024-05-31

Abstract

The application discloses a text detection method, a text detection device, electronic equipment and a storage medium, wherein the text detection method comprises the following steps: acquiring a character image to be detected, which is obtained by writing the target characters based on electronic paper; carrying out stroke prediction processing on the text to be detected in the text image to be detected to obtain a stroke prediction result; extracting the stroke characteristics of the target stroke detection dimension of each predicted stroke based on the stroke prediction result to obtain target stroke characteristics of each predicted stroke in the target stroke detection dimension; for each predicted stroke, acquiring standard stroke characteristics of the standard stroke in a target stroke detection dimension, and determining stroke characteristic offset of the target stroke characteristics and the standard stroke characteristics; based on the comparison of the stroke characteristic offset of the predicted stroke and the target stroke dimension threshold, a stroke detection result of the predicted stroke in the target stroke detection dimension is output. The application can accurately detect the normalization of strokes in the handwritten characters and simultaneously well meet the writing experience of users.

Description

Text detection method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a text detection method, a text detection device, an electronic device, and a storage medium.

Background

With the development of computer technology, it is now possible to use an image recognition technology to recognize handwritten characters, for example, to collect a handwritten copybook image of a paper copybook by a camera of an electronic device, and then to recognize each handwritten character in the handwritten copybook image by using the image recognition technology.

However, the related art lacks of writing normalization evaluation of finer granularity of the handwritten text, for example, the normalization of strokes in the handwritten text cannot be accurately evaluated, so that the requirements of some services (such as education services) cannot be met.

Disclosure of Invention

In order to solve the problems in the prior art, the embodiment of the application provides a text detection method, a text detection device, electronic equipment and a storage medium. The technical scheme is as follows:

In one aspect, a text detection method is provided, including:

acquiring a character image to be detected corresponding to the target character; the character image to be detected comprises characters of a type to be detected, which are obtained by writing the target characters based on electronic paper;

carrying out stroke prediction processing on the text to be detected in the text image to be detected to obtain a stroke prediction result; the stroke prediction result indicates a stroke area corresponding to at least one predicted stroke in the text image to be detected and a stroke type of each predicted stroke;

Extracting the stroke characteristics of the target stroke detection dimension of each predicted stroke based on the stroke prediction result to obtain target stroke characteristics of each predicted stroke in the target stroke detection dimension;

for each predicted stroke, acquiring standard stroke characteristics of a standard stroke in the target stroke detection dimension, and determining stroke characteristic offset between target stroke characteristics of the predicted stroke and the standard stroke characteristics; the standard strokes are strokes matched with the predicted strokes in standard type characters corresponding to the target characters;

Based on the comparison condition of the stroke characteristic offset corresponding to the predicted stroke and a target stroke dimension threshold, outputting a stroke detection result of the predicted stroke in the target stroke detection dimension; the target stroke dimension threshold is a stroke dimension threshold corresponding to the target stroke detection dimension.

In another aspect, a text detection device is provided, the device comprising:

the character image acquisition module to be detected is used for acquiring a character image to be detected corresponding to the target character; the character image to be detected comprises characters of a type to be detected, which are obtained by writing the target characters based on electronic paper;

The stroke prediction module is used for carrying out stroke prediction processing on the text to be detected in the text image to be detected to obtain a stroke prediction result; the stroke prediction result indicates a stroke area corresponding to at least one predicted stroke in the text image to be detected and a stroke type of each predicted stroke;

the stroke feature extraction module is used for extracting stroke features of a target stroke detection dimension of each predicted stroke based on the stroke prediction result to obtain target stroke features of each predicted stroke in the target stroke detection dimension;

The stroke characteristic offset determining module is used for obtaining standard stroke characteristics of standard strokes in the target stroke detection dimension for each predicted stroke and determining stroke characteristic offset between the target stroke characteristics of the predicted stroke and the standard stroke characteristics; the standard strokes are strokes matched with the predicted strokes in standard type characters corresponding to the target characters;

The stroke detection result output module is used for outputting a stroke detection result of the predicted stroke in the target stroke detection dimension based on the comparison condition of the stroke characteristic offset corresponding to the predicted stroke and the target stroke dimension threshold; the target stroke dimension threshold is a stroke dimension threshold corresponding to the target stroke detection dimension.

In one exemplary embodiment, the target stroke detection dimensions include a stroke size dimension, a stroke position dimension, and a stroke angle dimension; the stroke feature extraction module comprises:

the first extraction module is used for extracting coordinate information of each pixel point in a stroke area corresponding to each predicted stroke;

The first stroke characteristic determining module is used for determining stroke size characteristics and stroke position characteristics of the predicted stroke based on the coordinate information of each pixel point;

And the second stroke characteristic determining module is used for determining the stroke writing direction of the predicted stroke based on the stroke type of the predicted stroke and determining the stroke angle characteristic of the predicted stroke based on the stroke writing direction and the coordinate information of each pixel point.

In an exemplary embodiment, the first stroke feature determination module includes:

The coordinate selection module is used for selecting a transverse maximum coordinate, a transverse minimum coordinate, a longitudinal maximum coordinate and a longitudinal minimum coordinate from the coordinate information of each pixel point;

The stroke size feature determining module is used for determining the stroke size feature of the predicted stroke based on the difference value between the transverse maximum coordinate and the transverse minimum coordinate and the difference value between the longitudinal maximum coordinate and the longitudinal minimum coordinate;

and the stroke position feature determining module is used for determining a horizontal average coordinate and a vertical average coordinate based on the coordinate information of each pixel point to obtain the stroke position feature of the predicted stroke.

In an exemplary embodiment, the second stroke feature determination module includes:

The coordinate determining module is used for determining stroke starting point coordinate information and stroke ending point coordinate information from the coordinate information of each pixel point based on the stroke writing direction;

The transverse length determining module is used for determining the longitudinal length of the predicted stroke based on the longitudinal coordinate values in the stroke starting point coordinate information and the stroke ending point coordinate information, and determining the transverse length of the predicted stroke based on the transverse coordinate values in the stroke starting point coordinate information and the stroke ending point coordinate information;

And the stroke angle characteristic determining module is used for determining an angle corresponding to the sine value as the stroke angle characteristic of the predicted stroke by taking the ratio of the longitudinal length to the transverse length as the sine value.

In an exemplary embodiment, the apparatus further comprises:

The character feature extraction module is used for extracting character features of the target character detection dimension of the character of the type to be detected based on the stroke prediction result to obtain the target character features of the character of the type to be detected in the target character detection dimension;

the standard character feature acquisition module is used for acquiring standard character features of the standard type characters in the target character detection dimension;

And the character detection result output module is used for outputting a character detection result of the character of the type to be detected in the target character detection dimension based on the character feature offset between the target character feature and the standard character feature.

In an exemplary embodiment, the target text detection dimension includes a text size dimension and a text position dimension; the text detection result output module comprises:

the target text area determining module is used for determining the sum of areas of stroke areas corresponding to all predicted strokes in the at least one predicted stroke to obtain the target text area of the text of the type to be detected; the target text area is used as a target text feature of the text size dimension;

the target pixel center point determining module is used for determining the target pixel center point of the character of the type to be detected based on the coordinate information of the pixel point in the stroke area corresponding to each predicted stroke in the at least one predicted stroke; and the target pixel center point is used as a target character feature of the character position dimension.

In an exemplary embodiment, the text detection result output module includes:

The first text detection result determining module is used for determining the distance between the target pixel center point and the standard pixel center point to obtain a distance offset, and determining a first text detection result of the text to be detected in the text position dimension based on the distance offset;

The area offset determining module is used for determining the area difference between the target text area and the standard text area to obtain an area offset;

The second text detection result determining module is used for determining a second text detection result of the text of the type to be detected in the text size dimension when the area offset is larger than an area offset threshold;

And the first output sub-module is used for outputting the first character detection result and the second character detection result as character detection results of the characters of the type to be detected.

In an exemplary embodiment, the target text detection dimension further includes a text width dimension; the text detection result output module further comprises:

the target character width determining module is used for determining the target character width of the character to be detected based on the coordinate information of the pixel point in the stroke area corresponding to each predicted stroke in the at least one predicted stroke when the area offset is smaller than or equal to the area offset threshold; the target character width is used as a target character feature of the character width dimension;

a third text detection result determining module, configured to determine a width difference between the width of the target text and the width of the standard text to obtain a width offset, and determine a third text detection result of the text to be detected in the text width dimension based on the width offset;

and the second output sub-module is used for outputting the first character detection result and the third character detection result as character detection results of the characters of the type to be detected.

In an exemplary embodiment, the apparatus further comprises:

A first stroke score determination module for determining, for each of the predicted strokes, a stroke score for the predicted stroke in the target stroke detection dimension based on a ratio of the stroke characteristic offset to a standard stroke characteristic for the corresponding standard stroke in the target stroke detection dimension;

The second stroke score determining module is used for averaging the stroke scores of all the predicted strokes in the target stroke detection dimension in the at least one predicted stroke to obtain the stroke score of the character of the type to be detected;

the first text score determining module is used for determining a first text score of the text of the type to be detected in the target text detection dimension based on the ratio between the text feature offset of the text of the type to be detected in the target text detection dimension and the standard text feature;

The second text score determining module is used for determining the image similarity between the text image to be detected and the standard type text image corresponding to the standard type text to obtain a second text score of the text of the type to be detected;

and the character score output module is used for outputting the character score of the character of the type to be detected based on the first character score, the second character score and the stroke score of the character of the type to be detected.

In an exemplary embodiment, the text image acquisition module to be detected includes:

The copybook writing image acquisition module is used for responding to writing submitting instructions aiming at the current copybook for handwriting, and acquiring images of the current copybook for handwriting to obtain copybook writing images; the copybook writing image comprises at least one character written on the current copybook for handwriting based on electronic paper, and each written character is positioned in a writing area of the current copybook for handwriting;

the image segmentation module is used for segmenting the copybook writing image according to the writing areas to obtain a plurality of writing area images;

The writing area image selecting module is used for selecting writing area images containing characters from the writing area images to obtain at least one target writing area image;

and the character image to be detected determining module is used for taking any target writing area image in the at least one target writing area image as the character image to be detected.

In another aspect, there is provided an electronic device including a processor and a memory, where at least one instruction or at least one program is stored in the memory, where the at least one instruction or the at least one program is loaded and executed by the processor to implement the text detection method of any of the above aspects.

In another aspect, a computer readable storage medium having at least one instruction or at least one program stored therein is provided, the at least one instruction or the at least one program loaded and executed by a processor to implement a text detection method as in any of the above aspects.

In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device performs the text detection method of any of the above aspects.

According to the embodiment of the application, the stroke prediction result is obtained by carrying out stroke prediction processing on the character image to be detected of the target character, and the stroke characteristic extraction of the target stroke detection dimension is carried out on each predicted stroke based on the stroke prediction result so as to obtain the target stroke characteristic of each predicted stroke in the target stroke detection dimension, and further, for each predicted stroke, the standard stroke characteristic of the corresponding standard stroke in the target stroke detection dimension is obtained, the stroke characteristic offset between the target stroke characteristic and the standard stroke characteristic is determined, and the stroke detection result of the predicted stroke in the target stroke detection dimension is output based on the comparison condition of the stroke characteristic offset and the target stroke dimension threshold of the target stroke detection dimension.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1a is a schematic illustration of an implementation environment provided by an embodiment of the present application;

FIG. 1b is an example of a terminal-based electronic paper display technology provided by an embodiment of the present application displaying a handwriting practicing copybook;

FIG. 2 is a schematic flow chart of a text detection method according to an embodiment of the present application;

FIG. 3 is a flowchart of another text detection method according to an embodiment of the present application;

FIG. 4 is an example of a stroke prediction process based on a stroke prediction model provided by an embodiment of the present application;

FIG. 5 is an example of a sample text image provided by an embodiment of the present application;

FIG. 6 is a flowchart of another text detection method according to an embodiment of the present application;

FIG. 7 is an example of a mapping relationship between predicted strokes and standard strokes in standard type text corresponding to a target text provided by an embodiment of the present application;

FIG. 8 is a flowchart of another text detection method according to an embodiment of the present application;

FIG. 9 is a flowchart of another text detection method according to an embodiment of the present application;

FIG. 10 is a flowchart of another text detection method according to an embodiment of the present application;

FIG. 11 is a final output example of text detection provided by an embodiment of the present application;

FIG. 12 is a block diagram of a text detection device according to an embodiment of the present application;

fig. 13 is a block diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It will be appreciated that in the specific embodiments of the present application, related data such as user information is involved, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

Referring to fig. 1a, a schematic diagram of an implementation environment provided by an embodiment of the present application is shown, where the implementation environment includes a terminal 110 and a server 120, and communication between the terminal 110 and the server 120 may be through a wired or wireless network connection.

The terminal 110 is based on an electronic paper display technology, and the electronic paper is a black-and-white ink screen, and can be used for writing characters, and the writing experience of the electronic paper is similar to that of paper, and the terminal 110 comprises, but is not limited to, a mobile phone, a computer, intelligent voice interaction equipment, intelligent home appliances, vehicle-mounted terminals, aircrafts and the like. The terminal 110 is installed with client software having a text detection function, such as an Application (App), which may be a stand-alone Application or a subroutine in the Application. The text detection function specifically includes detecting normalization of strokes in the handwritten text and outputting a stroke detection result, for example, the stroke detection result may include whether the size, the position, the angle of the strokes are normalized, and the like.

By way of example, the above-described application installed in the terminal 110 may be a handwriting practicing copybook application, which may include a handwriting practicing copybook including a copybook and a writing area corresponding to the copybook, which may be a field, a square, a trapezoid, or the like. The handwriting practicing copybook application may display the handwriting practicing copybook through the electronic paper display technology of the terminal 110, and further, the user may write in the writing area of the handwriting practicing copybook based on the electronic paper, and the handwriting practicing copybook application may detect the stroke normalization of the characters written in the handwriting practicing copybook based on the character detection function thereof.

Fig. 1b shows an example of displaying a handwriting practicing copybook based on a terminal-based electronic paper display technology, wherein "good, learning, heaven" is the copybook in the handwriting practicing copybook, and the field grid on the same line with each copybook is a plurality of writing areas corresponding to the copybook. The display interface of the handwriting practicing copybook is also provided with a "submit" control, and a user can send a writing submit command for the current handwriting practicing copybook by triggering the "submit" control, so that the terminal 110 can respond to the writing submit command to detect writing characters in the current handwriting practicing copybook, including stroke normalization detection in the characters, and a specific character detection process will be described in detail in the follow-up content of the embodiment of the application.

The server 120 may provide background services for applications in the terminal 110, such as data storage, data processing services, etc., where the server 120 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms.

In an exemplary embodiment, the terminal 110 and the server 120 may be node devices in the blockchain system, and may share the acquired and generated information to other node devices in the blockchain system, so as to implement information sharing between multiple node devices. The plurality of node devices in the blockchain system can be configured with the same blockchain, the blockchain consists of a plurality of blocks, and the blocks adjacent to each other in front and back have an association relationship, so that the data in any block can be detected through the next block when being tampered, thereby avoiding the data in the blockchain from being tampered, and ensuring the safety and reliability of the data in the blockchain.

The embodiment of the application can be applied to various scenes, including but not limited to cloud technology, artificial intelligence, intelligent transportation, auxiliary driving and the like.

Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business mode, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

Referring to fig. 2, a flow chart of a text detection method according to an embodiment of the application is shown, and the method can be applied to the terminal shown in fig. 1 a. It is noted that the present specification provides method operational steps as described in the examples or flowcharts, but may include more or fewer operational steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. In actual system or product execution, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment). As shown in fig. 2, the method may include:

s201, obtaining a character image to be detected corresponding to the target character.

The character image to be detected comprises characters of the type to be detected, which are obtained by writing the target characters based on electronic paper.

It should be noted that, in the embodiment of the present application, the text image to be detected is an image of a single text. The related characters take strokes as basic components, and particularly are characters formed by distinguishable discontinuous strokes, and for example, the characters can comprise Chinese characters, japanese characters, kana, korean, burma, thai letters and the like.

In an exemplary embodiment, the step S201 may include the following steps in fig. 3 when implemented:

s301, responding to a writing submitting instruction aiming at the current handwriting practicing copybook, and acquiring an image of the current handwriting practicing copybook to obtain a copybook writing image.

The copybook writing image comprises at least one character written on the current copybook for handwriting based on electronic paper, and each written character is positioned in one writing area of the current copybook for handwriting, namely, only a single character can be contained in one writing area.

Specifically, a terminal based on an electronic paper display technology can display a handwriting practicing copybook (as shown in 1 b), position coordinates are set for each writing area in the handwriting practicing copybook, a user of the terminal can write in the writing area of the currently displayed handwriting practicing copybook, each writing area is used for writing a single word, after writing is completed, the user can send a writing submitting instruction to the current handwriting practicing copybook by triggering a writing submitting control, and the terminal responds to the writing submitting instruction to the current handwriting practicing copybook to call an image acquisition device to acquire images of the current handwriting practicing copybook so as to obtain an acquired copybook writing image. It can be appreciated that the copybook writing image can include at least one text written by the user on the current copybook based on electronic paper.

S303, cutting the copybook writing image according to the writing areas to obtain a plurality of writing area images.

In a specific implementation, when the handwriting practicing copybook is displayed, the terminal can automatically generate corresponding coordinates for each writing area (such as each field character lattice) in the handwriting practicing copybook, and then when the writing area of the handwriting writing image is cut, the position coordinates of each writing area can be cut, so that a plurality of writing area images are obtained.

S305, selecting a writing area image containing characters from the plurality of writing area images to obtain at least one target writing area image.

It will be appreciated that the user may have written only a portion of the writing area in the handwriting practicing copybook while writing, and therefore may need to select a writing area image containing text from the plurality of segmented writing area images to obtain at least one target writing area image.

S307, taking any one of the at least one target writing area image as a character image to be detected.

It can be understood that the target text corresponding to the text image to be detected is copybook text associated with the writing area corresponding to the corresponding target writing area image in the copybook. Taking the example shown in fig. 1b, the writing areas of the first row of the handwriting practicing copybook are "good" in the handwriting practicing copybook, and the writing areas of the third row are "learning" in the handwriting practicing copybook.

According to the embodiment, the single character image written in the current handwriting practicing copybook is automatically segmented and collected in response to the writing submitting instruction aiming at the current handwriting practicing copybook, and any single character image can be used as a character image of a type to be detected, so that the character detection efficiency of writing characters in the current handwriting practicing copybook can be improved.

S203, carrying out stroke prediction processing on the text to be detected in the text image to be detected, and obtaining a stroke prediction result.

The stroke prediction result indicates a stroke area corresponding to at least one predicted stroke in the text image to be detected and a stroke type of each predicted stroke. For example, stroke types may include horizontal, vertical, dot, left-falling, right-falling, and so forth.

In a specific implementation, the stroke prediction processing can be implemented based on a pre-trained stroke prediction model, and the output stroke prediction result is obtained by inputting the text image to be detected into the stroke prediction model for stroke prediction processing.

The stroke prediction model may be an example segmentation model based on a deep neural network, for example, the stroke prediction model may be Mask R-CNN. At least one stroke area (one predicted stroke for each stroke area) and the stroke type of each predicted stroke of the text in the text image to be detected are predicted by inputting the text image to be detected into the Mask R-CNN.

As shown in fig. 4, an example of performing stroke prediction processing based on a stroke prediction model is shown, a to-be-detected text image containing "corresponding" to-be-detected text is input into the stroke prediction model, the stroke prediction model firstly performs feature extraction on the to-be-detected text image based on a feature extraction network (such as a residual network 101) to obtain a feature image, then determines a plurality of candidate regions of interest based on the feature image, then inputs the plurality of candidate regions of interest into a region recommendation network (Region Proposal Network, RPN) to perform binary classification and candidate frame regression to filter a part of candidate regions of interest to obtain a plurality of target candidate regions of interest, then performs ROIAlign operation on each target candidate region of interest (i.e. firstly corresponds pixels of the to-be-detected text image and the feature image, then corresponds the feature image and an output feature image with a fixed size) to obtain a candidate region of interest feature image with a fixed size, finally, passes through a plurality of fully-connected layers (such as 2 fully-connected layers) to input the candidate region classification layer and the stroke region regression layer respectively, wherein the stroke classification layer is used for determining a stroke regression type of the candidate region in the region of interest, and the stroke prediction layer is used for determining a stroke type in the stroke prediction region, and then performs stroke segmentation processing on each target region and at least one stroke region is further performed in the corresponding region, and a stroke prediction region is segmented region is obtained, and stroke prediction region is segmented by the map is obtained.

Based on this, in one possible implementation, a training process for the stroke prediction model may also be included. A sample text image set is usually prepared before training, wherein the sample text image in the sample text image set is a real writing (such as handwriting) text image (including single text) acquired by a terminal based on an electronic paper display technology, each sample text image corresponds to labeling information, and the labeling information comprises a labeling stroke area and a labeling stroke type of each labeling stroke area to a stroke. An example of a sample text image provided by an embodiment of the present application is shown in fig. 5, where each rectangular region represents a labeled stroke region corresponding to a stroke of the text, while different stroke types are labeled differently, such as the numeric label in fig. 5, the stroke type of "horizontal" is labeled with the numeral 7, the stroke type of "dot" is labeled with the numeral 17, and so on.

After the sample text image set is prepared, an instance segmentation task training can be performed on a preset deep neural network model (such as Mask R-CNN) based on the sample text image set to obtain a stroke prediction model. The loss function in the example segmentation task training process comprises classification loss, regression loss and Mask (Mask) loss, and total loss is calculated through the classification loss, the regression loss and the Mask (Mask) loss, and then model parameters of a preset deep neural network model are reversely adjusted based on the total loss until a preset training ending condition is reached (for example, the iteration number reaches a preset iteration number threshold value or the total loss reaches a preset minimum loss threshold value), training is ended, and the preset deep neural network model corresponding to the model parameters at the end of training is used as a stroke prediction model.

The stroke prediction model can realize accurate and complete stroke extraction by learning and identifying the stroke types and cutting the stroke areas on a large number of marked real writing data, and ensures the accuracy of the subsequent stroke detection in the characters; meanwhile, the stroke prediction model learns stroke characteristics from the real written text image, so that various fonts in a real user scene can be better processed, and generalization and robustness of the whole technical method are ensured.

S205, extracting the stroke characteristics of the target stroke detection dimension of each predicted stroke based on the stroke prediction result to obtain the target stroke characteristics of each predicted stroke in the target stroke detection dimension.

The target stroke detection dimension can be set according to actual needs. In order to improve accuracy of a subsequent stroke detection result, the target stroke detection dimension comprises a stroke size dimension, a stroke position dimension and a stroke angle dimension, so that stroke characteristics of a predicted stroke are extracted from multiple dimensions.

Based on this, in an exemplary embodiment, as shown in fig. 6, the step S205 described above may be implemented to perform the following steps for each predicted stroke:

S601, extracting coordinate information of each pixel point in a stroke area corresponding to the predicted stroke.

Specifically, the coordinate information of the pixel points in the stroke area corresponding to the predicted stroke in the text image to be detected is extracted, so that a pixel point set corresponding to the predicted stroke can be obtained, the pixel points in the pixel point set are represented by the coordinate information, for example, the pixel point set corresponding to a certain predicted stroke can be represented as s= { (x _i,y_i) }, i e [1, N ], wherein (x _i,y_i) represents the coordinate information of the pixel point i in the text image to be detected, and N is the total number of elements in the pixel point set.

S603, determining stroke size characteristics and stroke position characteristics of the predicted stroke based on the coordinate information of each pixel point.

Specifically, the stroke size feature and the stroke position feature of the corresponding predicted stroke may be determined based on the above-described pixel point set S, respectively.

In an exemplary embodiment, when determining the stroke size feature of the predicted stroke based on the coordinate information of each pixel point corresponding to the predicted stroke, the horizontal maximum coordinate, the horizontal minimum coordinate, the vertical maximum coordinate and the vertical minimum coordinate may be selected from the coordinate information of each pixel point; further, a stroke size characteristic of the predicted stroke is determined based on a difference between the lateral maximum coordinate and the lateral minimum coordinate and a difference between the longitudinal maximum coordinate and the longitudinal minimum coordinate.

In a specific implementation, the horizontal maximum coordinate x _max, the horizontal minimum coordinate x _min, the vertical maximum coordinate y _max, and the vertical minimum coordinate y _min are respectively denoted by { x _max,x_min,y_max,y_min }, and then the stroke size characteristic F _s (S) for the predicted stroke S can be expressed as the following formula (1):

When determining the stroke position feature of the predicted stroke based on the coordinate information of each pixel point corresponding to the predicted stroke, the horizontal average coordinate and the vertical average coordinate may be determined based on the coordinate information of each pixel point corresponding to the predicted stroke, so as to obtain the stroke position feature of the predicted stroke.

In a specific implementation, the points indicated by the horizontal average coordinates and the vertical average coordinates are the center points of the corresponding pixel point set of the predicted stroke, that is, the stroke position feature F _p (S) of the predicted stroke S may be expressed as the following formula (2):

wherein, Representing the transverse average coordinates; /(I)Representing the longitudinal average coordinates.

S605, determining the stroke writing direction of the predicted stroke based on the stroke type of the predicted stroke, and determining the stroke angle characteristic of the predicted stroke based on the stroke writing direction and the coordinate information of each pixel point.

Because different stroke types correspond to different stroke writing directions, for example, the stroke type of transverse is the left-right direction, the stroke type of vertical is the up-down direction, and the stroke writing direction of the predicted stroke can be determined based on the corresponding relation between the stroke type and the stroke writing direction after the stroke type of the predicted stroke is obtained.

Further, the stroke angle characteristics of the predicted stroke can be determined by combining the stroke writing direction of the predicted stroke and the pixel point set of the predicted stroke.

In an exemplary embodiment, when determining the stroke angle characteristic of the predicted stroke, the stroke start point coordinate information and the stroke end point coordinate information may be determined from the coordinate information of each pixel point corresponding to the predicted stroke based on the stroke writing direction of the predicted stroke; then, based on the longitudinal coordinate values in the stroke starting point coordinate information and the stroke ending point coordinate information, determining the longitudinal length of the predicted stroke, and based on the transverse coordinate values in the stroke starting point coordinate information and the stroke ending point coordinate information, determining the transverse length of the predicted stroke; and finally, determining the angle corresponding to the sine value as the stroke angle characteristic of the predicted stroke by taking the ratio of the longitudinal length to the transverse length as the sine value.

Illustratively, the stroke angle characteristic F _a (S) of the predicted stroke S may be expressed as the following equation (3):

Wherein, (x _begin,y_begin) represents stroke start point coordinate information; (x _end,y_end) represents stroke end point coordinate information; y _end-y_start represents the longitudinal length; x _end-x_start denotes the transverse length.

In the embodiment, for each predicted stroke, the stroke characteristics (including the stroke size characteristics, the stroke position characteristics and the stroke angle characteristics) of the predicted stroke with fine granularity and multiple dimensions are determined from the stroke size dimensions, the stroke position dimensions and the stroke angle dimensions, so that the follow-up output of the stroke detection results with fine granularity and multiple dimensions is facilitated, and the accuracy of character detection is further improved.

S207, for each predicted stroke, acquiring standard stroke characteristics of the standard stroke in a target stroke detection dimension, and determining stroke characteristic offset between the target stroke characteristics of the predicted stroke and the standard stroke characteristics.

The standard strokes are strokes matched with the predicted strokes in standard type characters corresponding to the target characters. The stroke feature offset may be an absolute value of a difference between the target stroke feature and the corresponding standard stroke feature.

Specifically, a standard text library may be maintained, where the standard text library is used to store each standard type of text, where the standard type may be set based on actual needs, for example, for chinese characters, regular script may be used as a standard type, or songjia or bold type may be used as a standard type. For example, for Chinese characters "answer", regular script of the character "answer" can be stored in the standard character library as the standard type character.

For each standard type word in the standard word stock, the stroke characteristics of each standard stroke in the standard type word in the target stroke detection dimension may be maintained, where the target stroke detection dimension may include the stroke size dimension, the stroke position dimension, and the stroke angle dimension described in the foregoing embodiments of the present application, and { S ', (F _s(S′)、F_p(S′)、F_a (S ')) } may be stored for each standard type word, where S ' represents a standard stroke of the standard type word, F _s (S ') represents a standard stroke size characteristic of the standard stroke S ', F _p (S ') represents a standard stroke position characteristic of the standard stroke S ', and F _a (S ') represents a standard stroke angle characteristic of the standard stroke S '.

Based on this, for each standard type text, the stroke extraction may be performed in advance based on the corresponding standard type text image to obtain the stroke area of at least one standard stroke of each standard type text in the corresponding standard type text image and the stroke type of each standard stroke, and further, the stroke characteristics of each standard stroke in the target stroke detection dimension may be determined by referring to step S601 to step S605 in the foregoing method embodiment shown in fig. 6 according to the embodiment of the present application to obtain the standard stroke characteristics.

Specifically, when the target stroke detection dimension includes a stroke size dimension, a stroke position dimension, and a stroke angle dimension, the standard stroke features obtained with reference to steps S601 to S605 in the embodiment of the manner shown in fig. 6 may include standard stroke size features, standard stroke position features, and standard stroke angle features. Then step S207 described above includes, in determining a stroke characteristic offset between the target stroke characteristic and the standard stroke characteristic of the predicted stroke: determining a stroke size feature offset between a stroke size feature of the predicted stroke and a standard stroke size feature; determining a stroke position feature offset between a stroke position feature of the predicted stroke and a standard stroke position feature; a stroke angle feature offset between a predicted stroke angle feature and a standard stroke angle feature is determined.

In a specific implementation, in order to obtain accurate stroke types and stroke areas of all strokes in standard type characters, the stroke areas of all standard type character images can be marked in a manual marking mode, and the stroke types of the strokes corresponding to all stroke areas are marked. In order to improve efficiency, the trained stroke prediction model according to the embodiment of the application can be used for carrying out stroke prediction processing on the standard type text image so as to output the stroke area of at least one standard stroke in the corresponding standard type text and the stroke type of each standard stroke.

When the step S207 is implemented, the standard type text corresponding to the target text can be searched from the standard text library based on the target text, and the standard strokes matched with each predicted stroke in the standard type text can be determined, so that the standard stroke characteristics of the standard strokes matched with each predicted stroke in the target stroke detection dimension can be obtained.

The determining of the standard strokes matched with each predicted stroke in the standard type text corresponding to the target text can be regarded as a mapping process of the obtained predicted strokes and the standard strokes in the standard type text. In a specific implementation, a distance matrix C from the predicted stroke to the standard stroke may be calculated, where an element C _i,j in the distance matrix C represents a distance from the predicted stroke i to the standard stroke j, where the distance may be calculated as a distance between a center point position of the predicted stroke i and a center point position of the standard stroke j, where the distance may be a euclidean distance, a cosine distance, or a manhattan distance, etc. For example, the Euclidean distance may be calculated as the value of C _i,j using the stroke position feature F _p (i) of the predicted stroke i and the standard stroke position feature F _p (j) of the standard stroke j.

After the distance matrix C is obtained, the mapping process may be converted into a minimum weight matching problem, specifically, a matching matrix X is initialized, where an element X _i,j in the matching matrix X indicates whether the predicted stroke i matches the standard stroke j, typically X is a 0/1 matrix, 0 indicates that the predicted stroke i does not match the standard stroke j, and 1 indicates that the predicted stroke i matches the standard stroke j. Then, the optimization objective of the minimum weight matching problem can be expressed as: the min Σ _i∑_jC_i,jX_i,j can obtain a target matching X matrix by solving the optimization target, and then, based on the values of the elements in the target matching X matrix, the mapping relationship between each predicted stroke and the standard stroke in the standard type text corresponding to the target text can be obtained, as shown in fig. 7, which is an example of the mapping relationship between the predicted stroke and the standard stroke in the standard type text corresponding to the target text, and the standard stroke matched with each predicted stroke can be obtained through the mapping relationship. For example, the above optimization objective may be solved by a matching algorithm such as hungarian matching algorithm, jonker-Volgenant algorithm, and the like.

Based on this, before the implementation of the step S207, the method may further include: for each predicted stroke, determining the distance between the predicted stroke and each standard stroke in the standard type text corresponding to the target text; generating a distance matrix based on the distance between each predicted stroke and each standard stroke; constructing an initialization matching matrix, wherein elements in the initialization matching matrix represent whether the predicted strokes are matched with standard strokes in standard type characters corresponding to the target characters; determining a target matching matrix based on the sum of the products of the elements in the distance matrix and the elements in the initialization matching matrix; wherein each element in the target matching matrix minimizes the sum.

S209, based on the comparison condition of the stroke characteristic offset corresponding to the predicted stroke and the target stroke dimension threshold, outputting a stroke detection result of the predicted stroke in the target stroke detection dimension.

The target stroke dimension threshold is a stroke dimension threshold corresponding to a target stroke detection dimension, and the target stroke dimension threshold can be set based on actual experience.

Wherein, the stroke detection result of the predicted stroke in the target stroke detection dimension may include a detection result of a character description property, such as a stroke comment.

In a specific implementation, for the target stroke detection dimension, a stroke detection result of a corresponding text description property can be preset in combination with a target stroke dimension threshold, for example, a corresponding stroke detection result 1 is output when the target stroke dimension threshold is exceeded, and a corresponding stroke detection result 2 is output when the target stroke dimension threshold is not exceeded. And then, after the stroke characteristic offset of the predicted stroke is obtained, comparing the stroke characteristic offset with a target stroke dimension threshold, outputting a stroke detection result if the stroke characteristic offset exceeds the target stroke dimension threshold, and outputting a corresponding stroke detection result 2 if the stroke characteristic offset does not exceed the target stroke dimension threshold. It can be appreciated that the output of the stroke detection result may be set according to the actual situation, specifically based on how the comparison situation is performed.

When the target stroke detection dimension includes a stroke size dimension, a stroke position dimension, and a stroke angle dimension, the stroke detection results of the three dimensions may be output for each predicted stroke. For example, the target stroke detection dimension threshold is denoted as { T _s,T_p,T_a }, where T _s represents the stroke size dimension threshold, T _p represents the stroke position dimension threshold, T _a represents the stroke angle dimension threshold, assuming that predicted stroke A matches standard stroke B, then |F _s(A)-F_s (B) | represents the stroke size feature offset, |F _p(A)-F_p (B) | represents the stroke position feature offset, and |F _a(A)-F_a (B) | represents the stroke angle feature offset, then a stroke comment (i.e., stroke detection result) may be output for predicted stroke A according to the following strategy:

(1) If |F _s(A)-F_s(B)|>T_s, outputting stroke size dimension comments (bigger/smaller/shorter);

(2) If |F _p(A)-F_p(B)|>T_p, outputting stroke position dimension comments (up/left/down/right);

(3) If |F _a(A)-F_a(B)|>T_a, a stroke angle dimension comment (not straight/offset) is output.

The technical scheme of the embodiment of the application can realize accurate detection of the normalization of strokes in the handwritten characters, and simultaneously can well meet the writing experience of users, thereby meeting the requirements of related services; and fine-grained multidimensional stroke comments can be output, so that the accuracy of character detection is improved.

In an exemplary implementation manner, the embodiment of the application can further output the text detection result of the whole text on the basis of outputting the stroke detection result of each predicted stroke, so as to realize comprehensive and accurate text detection. Specifically, as shown in fig. 8, after obtaining the stroke detection result of each predicted stroke, the method may further include:

S801, extracting character features of the target character detection dimension of the character of the type to be detected based on the stroke prediction result, and obtaining the target character features of the character of the type to be detected in the target character detection dimension.

The dimension of the target text detection can be set based on actual needs, and generally text detection in multiple dimensions can obtain text detection results with finer granularity and accuracy.

S803, obtaining standard character features of standard type characters corresponding to the target characters in the detection dimension of the target characters.

Specifically, for each standard type character in the standard character library, standard character features of each standard type character in the target character detection dimension can be prepared in advance, and then the standard character features can be directly obtained and used, so that character detection efficiency is improved. The method for determining the standard character features of the standard type characters in the target character detection dimension is the same as the method for determining the target character features of the type characters to be detected in the target character detection dimension, and the method for determining the target character features of the type characters to be detected in the target character detection dimension will be described later.

S805, based on the word feature offset between the target word feature and the standard word feature, outputting a word detection result of the word of the type to be detected in the target word detection dimension.

Specifically, the text detection result of the text to be detected in the target text detection dimension may include a detection result of text description properties, such as text comments.

In a specific implementation, for the target text detection dimension, a text detection result of a corresponding text description property may be preset in combination with a target text dimension threshold, for example, the text feature offset exceeds the target text dimension threshold to output a corresponding text detection result 1, and does not exceed the target text dimension threshold to correspond to a text detection result 2. Then, after the word feature offset is obtained, the word feature offset is compared with a target word dimension threshold, if the word feature offset exceeds the target word dimension threshold, a word detection result 1 is output, and if the word feature offset does not exceed the target word dimension threshold, a word detection result 2 is output. It can be understood that the output of the text detection result based on how the text feature offset is specifically performed may be set according to the actual situation.

In an exemplary embodiment, as shown in fig. 9, the target text detection dimension may include a text size dimension and a text position dimension, and then the step S801 may include, when implemented:

S901, determining the sum of areas of stroke areas corresponding to all predicted strokes in at least one predicted stroke, and obtaining the target text area of the text of the type to be detected.

The target text area is used as a target text feature of a text size dimension.

S903, determining a target pixel center point of the text of the type to be detected based on the coordinate information of the pixel point in the stroke area corresponding to each predicted stroke in at least one predicted stroke.

The target pixel center point is used as a target character feature of a character position dimension.

Specifically, the coordinate information of the pixel point in the stroke area corresponding to each predicted stroke in at least one predicted stroke is averaged, so that the coordinate information of the center point of the target pixel of the character of the type to be detected can be obtained.

Then, the standard character feature acquired in the step S803 includes the standard pixel center point and the standard character area, and may further include a standard character width corresponding to the target character width.

Further, the step S805 may include the following steps in fig. 9 when implemented:

S905, determining the distance between the target pixel center point and the standard pixel center point to obtain a distance offset, and determining a first character detection result of the character to be detected in the character position dimension based on the distance offset.

The determining manner of the standard pixel center point as the pixel center point of the standard type text corresponding to the target text may refer to the step S903, and the determining manner may be obtained by averaging the coordinate information of the pixel point in the stroke area corresponding to each standard stroke in at least one standard stroke corresponding to the standard type text.

The distance between the target pixel center point and the standard pixel center point may be a euclidean distance, a manhattan distance, or the like, and taking the euclidean distance as an example, the distance offset is the euclidean distance between the target pixel center point and the standard pixel center point.

In a specific implementation, when determining a first text detection result of a text of a type to be detected in a text position dimension based on a distance offset, a text position dimension threshold may be set, and then the distance offset is compared with the text position dimension threshold, so as to output the first text detection result according to the comparison result, where the first text detection result may be a detection result of a preset text descriptive property, such as a text position comment.

For example, where D represents the distance offset and T _pos represents the text position dimension threshold, then if D > T _pos a text position dimension comment (up/left/down/right) may be output.

S907, determining the area difference between the target text area and the standard text area to obtain the area offset.

The standard text area is the text area of the standard type text corresponding to the target text, and the determination manner can be see the step S901, and the standard text area of the standard type text can be obtained by summing the areas of the stroke areas corresponding to the standard strokes in at least one standard stroke corresponding to the standard type text.

The Area offset may be represented as |area _H-Area_K | when the Area _H is used to represent the target text Area of the text of the type to be detected corresponding to the target text and the Area _K is used to represent the standard text Area of the standard text of the type corresponding to the target text.

S909, determining whether the area offset is larger than an area offset threshold.

Specifically, if the area offset is greater than the area offset threshold, the following steps S911 to S913 may be performed; otherwise, if the area offset is smaller than the area offset threshold and the target text detection dimension further includes a text width dimension, steps S915 to S919 may be performed.

The area offset threshold may be set based on actual needs.

S911, determining a second character detection result of the character of the type to be detected in the character size dimension.

Specifically, the second text detection result may be a detection result of a preset text descriptive property, such as a text size comment.

For example, the Area offset threshold is denoted by T _area, and if |area _H-Area_K|>T_area, a word size dimension comment (larger/smaller) can be output.

And S913, outputting the first character detection result and the second character detection result as character detection results of the characters of the type to be detected.

S915, determining the target character width of the character of the type to be detected based on the coordinate information of the pixel point in the stroke area corresponding to each predicted stroke in at least one predicted stroke.

The target character width is used as the target character feature of the character width dimension.

In specific implementation, the transverse minimum coordinate and the transverse maximum coordinate can be selected, and then the target character width of the character to be detected can be obtained based on the difference between the two coordinates.

S917, determining a width difference value between the width of the target character and the width of the standard character to obtain a width offset, and determining a third character detection result of the character to be detected in the character width dimension based on the width offset.

The standard character width is the character width of the standard type character corresponding to the target character, and the determining mode is similar to the target character width.

If Width _H is used to represent the target word Width and Width _K is used to represent the standard word Width, the Width offset may be represented as |width _H-Width_K |.

When a third character detection result of the character of the type to be detected in the character width dimension is determined based on the width offset, a character width dimension threshold value can be set, and then the width offset is compared with the character width dimension threshold value, so that the third character detection result can be output according to the comparison result, and the third character detection result can be a detection result of the descriptive character of a preset, such as a character width comment. For example, if |width _H-Width_K|>T_width, a word Width dimension comment (Width/Width bias) is output.

S919, outputting the first text detection result and the third text detection result as text detection results of the text of the type to be detected.

According to the embodiment, the characters are integrally detected from three dimensions of the size, the width and the position of the characters, the character detection results of different dimensions are output, the character integral detection result is superimposed on the basis of the output stroke detection result, and the accuracy of character detection is improved.

To further improve the accuracy of text detection, in an exemplary embodiment, the text may be scored to quantify the quality of the text, as shown in fig. 10, and the method may further include:

S1001, for each predicted stroke, determining a stroke score of the predicted stroke in the target stroke detection dimension based on a ratio of the stroke characteristic offset to a standard stroke characteristic of the corresponding standard stroke in the target stroke detection dimension.

Assuming that predicted stroke A matches standard stroke B, a stroke score for predicted stroke A in the target stroke detection dimension may be determined based on the following equation (5):

Wherein, |F _k(A)-F_k (B) | represents the stroke characteristic offset of the predicted stroke A in the target stroke detection dimension k; f _k (B) represents the standard stroke characteristics of the standard stroke B in the target stroke detection dimension k; f _k (A) represents a target stroke characteristic of predicted stroke A in target stroke detection dimension k.

When the target stroke detection dimension k includes a stroke size dimension, a stroke position dimension, and a stroke angle dimension, a stroke score of the predicted stroke a in the three stroke detection dimensions can be obtained based on the above formula (5).

S1003, stroke scores of all predicted strokes in the target stroke detection dimension in at least one predicted stroke are averaged to obtain the stroke score of the character of the type to be detected.

Specifically, the stroke scores of all predicted strokes in each stroke detection dimension are summed and averaged, so that the stroke score of the whole character of the type to be detected can be obtained.

S1005, determining a first character score of the character of the type to be detected in the target character detection dimension based on the ratio of the character feature offset of the character of the type to be detected in the target character detection dimension to the standard character feature.

In a specific implementation, the first text score of the text of the type to be detected in the target text detection dimension can be calculated by referring to the above formula (5), and only the numerator in the above formula (5) is adjusted to calculate the text feature offset, and the denominator is adjusted to be the standard text feature.

S1007, determining the image similarity between the to-be-detected text image and the standard type text image corresponding to the standard type text, and obtaining a second text score of the to-be-detected type text.

The image similarity between the character image to be detected and the standard type character image corresponding to the standard type character can represent the regularity of the character.

In a specific implementation, the image similarity may be a structural similarity (Structural Similarity, SSIM).

S1009, outputting a text score of the text of the type to be detected based on the first text score, the second text score, and the stroke score of the text of the type to be detected.

Specifically, the first text score may include scores of a text size dimension, a text width dimension and a text position dimension, and further, the text scores of the text to be detected may be obtained by performing weighted summation based on the scores and output. Wherein the weights of the respective scores may be set based on actual experience.

In a specific implementation, the scores may be divided into four different levels: size score V ₁, position score V ₂, work score V ₃, and stroke score V ₄, the final score being four score weights: w ₁V₁+w₂V₂+w₃V₃+w₄V₄; wherein V ₁ is the average of the word size dimension score and the word width dimension score; v ₂ is the calculated score of the offset of the character position dimension; v ₃ is a score based on image similarity; v ₄ is the mean of the stroke detection dimension scores for all predicted strokes.

In practical application, after the word score of the word of the type to be detected is obtained, the writing grade corresponding to the word score can be determined by combining with a preset word score grade mapping relation, and then the corresponding writing grade is output while the word score of the word of the type to be detected is output. The writing grade can be divided based on actual needs, for example, more than 80 grades can be divided into A+ grades, 70 grades to 80 grades are divided into A grades, 60 grades to 70 grades are divided into B grades, and 60 grades or less are divided into C grades.

As shown in fig. 11, the final output example of text detection provided by the embodiment of the present application includes, for the text detection result of "good" text of the type to be detected, an overall text score of 82 minutes (and a corresponding writing level a+), text evaluation (i.e., text detection result, writing position being far left), and stroke evaluation (i.e., stroke detection result, not enough straight) of the strokes, so that the detection accuracy of the written text is greatly improved.

The text detection device provided by the embodiment of the present application corresponds to the text detection method provided by the above embodiments, so that the implementation of the text detection method is also applicable to the text detection device provided by the embodiment, and will not be described in detail in the embodiment.

Referring to fig. 12, a schematic structural diagram of a text detection device according to an embodiment of the present application is shown, where the device has a function of implementing the text detection method in the above method embodiment, and the function may be implemented by hardware or implemented by executing corresponding software by hardware. As shown in fig. 12, the text detection apparatus 1200 may include:

The to-be-detected text image obtaining module 1210 is configured to obtain a to-be-detected text image corresponding to the target text; the character image to be detected comprises characters of a type to be detected, which are obtained by writing the target characters based on electronic paper;

The stroke prediction module 1220 is configured to perform a stroke prediction process on the text of the type to be detected in the text image to be detected, so as to obtain a stroke prediction result; the stroke prediction result indicates at least one stroke area corresponding to the predicted stroke in the text image to be detected and the stroke type of each predicted stroke;

A stroke feature extraction module 1230, configured to perform, based on the stroke prediction result, a stroke feature extraction of a target stroke detection dimension for each of the predicted strokes, to obtain a target stroke feature of each of the predicted strokes in the target stroke detection dimension;

A stroke feature offset determining module 1240, configured to obtain, for each of the predicted strokes, a standard stroke feature of the standard stroke in the target stroke detection dimension, and determine a stroke feature offset between the target stroke feature of the predicted stroke and the standard stroke feature; the standard strokes are strokes matched with the predicted strokes in standard type characters corresponding to the target characters;

The stroke detection result output module 1250 is configured to output a stroke detection result of the predicted stroke in the target stroke detection dimension based on a comparison situation of the stroke feature offset corresponding to the predicted stroke and the target stroke dimension threshold; the target stroke dimension threshold is a stroke dimension threshold corresponding to the target stroke detection dimension.

In one exemplary embodiment, the target stroke detection dimension includes a stroke size dimension, a stroke position dimension, and a stroke angle dimension; the stroke feature extraction module 1230 includes:

In one exemplary embodiment, the first stroke feature determination module includes:

In one exemplary embodiment, the second stroke feature determination module includes:

In an exemplary embodiment, the apparatus further comprises:

The character feature extraction module is used for extracting character features of the target character detection dimension of the character of the type to be detected based on the stroke prediction result, and obtaining the target character features of the character of the type to be detected in the target character detection dimension;

The standard character feature acquisition module is used for acquiring the standard character feature of the standard type character in the target character detection dimension;

In one exemplary embodiment, the target text detection dimension includes a text size dimension and a text position dimension; the text detection result output module comprises:

The target pixel center point determining module is used for determining the target pixel center point of the character of the type to be detected based on the coordinate information of the pixel point in the stroke area corresponding to each predicted stroke in the at least one predicted stroke; the target pixel center point serves as a target text feature for the text position dimension.

In an exemplary embodiment, the text detection result output module includes:

the second text detection result determining module is used for determining a second text detection result of the text of the type to be detected in the text size dimension when the area offset is larger than the area offset threshold;

the target character width determining module is used for determining the target character width of the character to be detected based on the coordinate information of the pixel point in the stroke area corresponding to each predicted stroke in the at least one predicted stroke when the area offset is smaller than or equal to the area offset threshold; the target character width is used as the target character feature of the character width dimension;

The third character detection result determining module is used for determining a width difference value between the width of the target character and the width of the standard character to obtain a width offset, and determining a third character detection result of the character to be detected in the character width dimension based on the width offset;

In an exemplary embodiment, the apparatus further comprises:

A first stroke score determination module for determining, for each of the predicted strokes, a stroke score for the predicted stroke in the target stroke detection dimension based on a ratio of the stroke characteristic offset to a standard stroke characteristic of the corresponding standard stroke in the target stroke detection dimension;

The first character score determining module is used for determining a first character score of the character of the type to be detected in the target character detection dimension based on the ratio between the character feature offset of the character of the type to be detected in the target character detection dimension and the standard character feature;

The character score output module is used for outputting the character score of the character of the type to be detected based on the first character score, the second character score and the stroke score of the character of the type to be detected.

In an exemplary embodiment, the text image to be detected acquisition module 1210 includes:

The copybook writing image acquisition module is used for responding to writing submitting instructions aiming at the current copybook for handwriting, and acquiring images of the current copybook for handwriting to obtain copybook writing images; the copybook writing image comprises at least one character written on the current copybook based on electronic paper, and each written character is positioned in a writing area of the current copybook;

the writing area image selecting module is used for selecting writing area images containing characters from the plurality of writing area images to obtain at least one target writing area image;

It should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

The embodiment of the application provides electronic equipment, which comprises a processor and a memory, wherein at least one instruction or at least one section of program is stored in the memory, and the at least one instruction or the at least one section of program is loaded and executed by the processor to realize any word detection method provided by the embodiment of the method.

The memory may be used to store software programs and modules that the processor executes to perform various functional applications and data processing by executing the software programs and modules stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor.

The method embodiments provided by the embodiments of the present application may be performed in a computer terminal, a server, or a similar computing device, i.e., the electronic device may include a computer terminal, a server, or a similar computing device. Taking a terminal based on an electronic paper display technology as an example, fig. 13 is a block diagram of a hardware structure of an electronic device running a text detection method according to an embodiment of the present application, specifically:

The terminal can include RF (Radio Frequency) circuitry 1310, memory 1320 including one or more computer-readable storage media, input unit 1330, display unit 1340, sensor 1350, audio circuitry 1360, wiFi (WIRELESS FIDELITY ) module 1370, processor 1380 including one or more processing cores, and power supply 1390, among other components. It will be appreciated by those skilled in the art that the terminal structure shown in fig. 13 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

The RF circuit 1310 may be used for receiving and transmitting signals during a message or a call, and in particular, after receiving downlink information of a base station, the downlink information is processed by one or more processors 1380; in addition, data relating to uplink is transmitted to the base station. Typically, RF circuitry 1310 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier ), a duplexer, and the like. In addition, the RF circuitry 1310 may also communicate with networks and other terminals via wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, GSM (Global System of Mobile communication, global system for mobile communications), GPRS (GENERAL PACKET Radio Service), CDMA (Code Division Multiple Access ), WCDMA (Wideband Code Division Multiple Access, wideband code division multiple access), LTE (Long Term Evolution ), email, SMS (Short MESSAGING SERVICE, short message Service), etc.

The memory 1320 may be used to store software programs and modules, and the processor 1380 may perform various functional applications and data processing by executing the software programs and modules stored in the memory 1320. The memory 1320 may mainly include a storage program area that may store an operating system, application programs required for functions, and the like, and a storage data area; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 1320 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, memory 1320 may also include a memory controller to provide access to memory 1320 by processor 1380 and input unit 1330.

The input unit 1330 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, input unit 1330 may include a touch-sensitive surface 1331 and other input devices 1332. Touch-sensitive surface 1331, also referred to as a touch display screen or touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on touch-sensitive surface 1331 or thereabout by any suitable object or accessory such as a finger, stylus, etc.) and actuate the corresponding connection device according to a predetermined program. Alternatively, touch-sensitive surface 1331 may include both a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 1380, and can receive commands from the processor 1380 and execute them. In addition, the touch-sensitive surface 1331 may be implemented in a variety of types, such as resistive, capacitive, infrared, and surface acoustic waves. In addition to touch-sensitive surface 1331, input unit 1330 may also include other input devices 1332. In particular, other input devices 1332 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 1340 may be used to display information input by a user or information provided to the user and various graphical user interfaces of the terminal, which may be composed of graphics, text, icons, video, and any combination thereof. The display unit 1340 may include a display panel 1341 based on electronic paper display technology, and alternatively, the display panel 1341 may be configured in the form of an LCD (Liquid CRYSTAL DISPLAY), an OLED (Organic Light-Emitting Diode), or the like. Further, the touch-sensitive surface 1331 may overlay the display panel 1341, and upon detection of a touch operation thereon or thereabout by the touch-sensitive surface 1331, the touch-sensitive surface is communicated to the processor 1380 to determine the type of touch event, and the processor 1380 then provides a corresponding visual output on the display panel 1341 in accordance with the type of touch event. Wherein the touch-sensitive surface 1331 and the display panel 1341 may be implemented as two separate components for input and output functions, although in some embodiments the touch-sensitive surface 1331 may be integrated with the display panel 1341 for input and output functions.

The terminal can also include at least one sensor 1350, such as a light sensor, a motion sensor, and other sensors. In particular, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 1341 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 1341 and/or backlight when the terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the device is stationary, and the device can be used for applications of recognizing the gesture of a terminal (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may be configured for the terminal are not described in detail herein.

Audio circuitry 1360, speaker 1361, microphone 1362 may provide an audio interface between the user and the terminal. The audio circuit 1360 may transmit the received electrical signal after audio data conversion to the speaker 1361, where the electrical signal is converted to a sound signal by the speaker 1361 and output; on the other hand, the microphone 1362 converts the collected sound signals into electrical signals, which are received by the audio circuit 1360 and converted into audio data, which are processed by the audio data output processor 1380 and transmitted to, for example, another terminal via the RF circuit 1310, or which are output to the memory 1320 for further processing. Audio circuitry 1360 may also include ear bud jacks to provide peripheral headphones with communication with the terminal.

WiFi belongs to a short-distance wireless transmission technology, and the terminal can help a user to send and receive e-mails, browse webpages, access streaming media and the like through a WiFi module 1370, so that wireless broadband Internet access is provided for the user. Although fig. 13 shows a WiFi module 1370, it is understood that it does not belong to the essential constitution of the terminal, and may be omitted entirely as required within the scope of not changing the essence of the invention.

Processor 1380 is a control center of the terminal, connects various portions of the overall terminal using various interfaces and lines, performs various functions of the terminal and processes data by running or executing software programs and/or modules stored in memory 1320, and invoking data stored in memory 1320. Optionally, processor 1380 may include one or more processing cores; preferably, processor 1380 may integrate an application processor primarily handling operating systems, user interfaces, applications, etc., with a modem processor primarily handling wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1380.

The terminal also includes a power supply 1390 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 1380 via a power management system to facilitate charge, discharge, and power management functions via the power management system. The power supply 1390 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown, the terminal may further include a camera, a bluetooth module, etc., which will not be described herein. In particular, in this embodiment, the terminal further includes a memory, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the one or more processors. The one or more programs described above include instructions for performing text detection provided by the method embodiments described above.

Embodiments of the present application also provide a computer readable storage medium that may be disposed in an electronic device to store at least one instruction or at least one program for implementing a text detection method, where the at least one instruction or the at least one program is loaded and executed by the processor to implement any of the text detection methods provided in the above method embodiments.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the electronic device executes any one of the text detection methods provided in the above method embodiments.

Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the application is not intended to limit the application to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the application are intended to be included within the scope of the application.

Claims

1. A text detection method, the method comprising:

2. The method of claim 1, wherein the target stroke detection dimension comprises a stroke size dimension, a stroke position dimension, and a stroke angle dimension; the step of extracting the stroke characteristics of the target stroke detection dimension of each predicted stroke based on the stroke prediction result to obtain the target stroke characteristics of each predicted stroke in the target stroke detection dimension comprises the following steps:

For each predicted stroke, extracting coordinate information of each pixel point in a stroke area corresponding to the predicted stroke;

Determining stroke size characteristics and stroke position characteristics of the predicted strokes based on the coordinate information of each pixel point;

And determining a stroke writing direction of the predicted stroke based on the stroke type of the predicted stroke, and determining a stroke angle characteristic of the predicted stroke based on the stroke writing direction and coordinate information of each pixel point.

3. The method of claim 2, wherein determining the stroke size feature and the stroke position feature of the predicted stroke based on the coordinate information of the pixels comprises:

Selecting a transverse maximum coordinate, a transverse minimum coordinate, a longitudinal maximum coordinate and a longitudinal minimum coordinate from the coordinate information of each pixel point;

Determining stroke size characteristics of the predicted stroke based on a difference between the lateral maximum coordinate and the lateral minimum coordinate and a difference between the longitudinal maximum coordinate and the longitudinal minimum coordinate;

And determining a horizontal average coordinate and a vertical average coordinate based on the coordinate information of each pixel point to obtain stroke position characteristics of the predicted stroke.

4. The method of claim 2, wherein determining the stroke angle feature of the predicted stroke based on the stroke writing direction and the coordinate information of the pixels comprises:

determining stroke starting point coordinate information and stroke ending point coordinate information from the coordinate information of each pixel point based on the stroke writing direction;

Determining a longitudinal length of the predicted stroke based on longitudinal coordinate values in the stroke start point coordinate information and the stroke end point coordinate information, and determining a transverse length of the predicted stroke based on transverse coordinate values in the stroke start point coordinate information and the stroke end point coordinate information;

and taking the ratio of the longitudinal length to the transverse length as a sine value, and determining the angle corresponding to the sine value as the stroke angle characteristic of the predicted stroke.

5. The method according to any one of claims 1 to 4, further comprising:

Extracting character features of the target character detection dimension of the character of the type to be detected based on the stroke prediction result, and obtaining target character features of the character of the type to be detected in the target character detection dimension;

acquiring standard character features of the standard type characters in the target character detection dimension;

and outputting a character detection result of the character of the type to be detected in the target character detection dimension based on the character feature offset between the target character feature and the standard character feature.

6. The method of claim 5, wherein the target text detection dimension comprises a text size dimension and a text position dimension; the step of extracting the character features of the target character detection dimension of the character of the type to be detected based on the stroke prediction result, the step of obtaining the target character features of the character of the type to be detected in the target character detection dimension comprises the following steps:

determining the sum of areas of stroke areas corresponding to all predicted strokes in the at least one predicted stroke to obtain a target character area of the character of the type to be detected; the target text area is used as a target text feature of the text size dimension;

determining a target pixel center point of the text of the type to be detected based on coordinate information of pixel points in a stroke area corresponding to each predicted stroke in the at least one predicted stroke; and the target pixel center point is used as a target character feature of the character position dimension.

7. The method of claim 6, wherein outputting a text detection result for the text of the type to be detected in the target text detection dimension based on a text feature offset between the target text feature and the standard text feature comprises:

determining the distance between the target pixel center point and the standard pixel center point to obtain a distance offset, and determining a first character detection result of the character to be detected in the character position dimension based on the distance offset;

Determining an area difference value between the target text area and the standard text area to obtain an area offset;

when the area offset is larger than an area offset threshold, determining a second character detection result of the character of the type to be detected in the character size dimension;

and outputting the first character detection result and the second character detection result as character detection results of the characters of the type to be detected.

8. The method of claim 7, wherein the target literal detection dimension further comprises a literal width dimension; the method further comprises the steps of:

When the area offset is smaller than or equal to the area offset threshold, determining the target character width of the character of the type to be detected based on the coordinate information of the pixel point in the stroke area corresponding to each predicted stroke in the at least one predicted stroke; the target character width is used as a target character feature of the character width dimension;

determining a width difference value between the width of the target character and the width of the standard character to obtain a width offset, and determining a third character detection result of the character to be detected in the character width dimension based on the width offset;

and outputting the first character detection result and the third character detection result as character detection results of the characters of the type to be detected.

9. The method of claim 5, wherein the method further comprises:

Determining, for each of the predicted strokes, a stroke score for the predicted stroke in the target stroke detection dimension based on a ratio of the stroke characteristic offset to a standard stroke characteristic for the corresponding standard stroke in the target stroke detection dimension;

averaging the stroke scores of all the predicted strokes in the target stroke detection dimension in the at least one predicted stroke to obtain the stroke score of the character of the type to be detected;

Determining a first text score of the text of the type to be detected in the target text detection dimension based on the ratio between the text feature offset of the text of the type to be detected in the target text detection dimension and the standard text feature;

Determining the image similarity between the to-be-detected character image and the standard type character image corresponding to the standard type character to obtain a second character score of the to-be-detected type character;

And outputting the character score of the character of the type to be detected based on the first character score, the second character score and the stroke score of the character of the type to be detected.

10. The method according to claim 1, wherein the obtaining the text image to be detected corresponding to the target text includes:

Responding to a writing submitting instruction aiming at a current handwriting practicing copybook, and acquiring an image of the current handwriting practicing copybook to obtain a copybook writing image; the copybook writing image comprises at least one character written on the current copybook for handwriting based on electronic paper, and each written character is positioned in a writing area of the current copybook for handwriting;

Dividing the copybook writing image according to the writing areas to obtain a plurality of writing area images;

Selecting a writing area image containing characters from the plurality of writing area images to obtain at least one target writing area image;

and taking any target writing area image in the at least one target writing area image as the character image to be detected.

11. A text detection device, the device comprising:

12. An electronic device comprising a processor and a memory, wherein the memory stores at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement the text detection method of any of claims 1-10.

13. A computer readable storage medium having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by a processor to implement the text detection method of any of claims 1-10.

14. A computer program product comprising a computer program which, when executed by a processor, implements the text detection method of any of claims 1 to 10.