CN108304814B

CN108304814B - Method for constructing character type detection model and computing equipment

Info

Publication number: CN108304814B
Application number: CN201810128155.1A
Authority: CN
Inventors: 徐行; 刘辉; 刘宁; 张东祥; 郭龙; 陈李江; 李启林
Original assignee: Hainan Yunjiang Technology Co ltd
Current assignee: Hainan Avanti Technology Co ltd
Priority date: 2018-02-08
Filing date: 2018-02-08
Publication date: 2020-07-14
Anticipated expiration: 2038-02-08
Also published as: CN108304814A

Abstract

The invention discloses a construction method of a character type detection model and a character type detection method, which are suitable for being executed in computing equipment, wherein the model construction method comprises the following steps: collecting a training picture; expanding each training picture into a square picture; acquiring results of marking the print font character area and the handwritten font character area of each shape picture; and training the convolutional neural network according to the training pictures and the labeling results thereof to obtain a character type detection model. The detection method comprises the following steps: acquiring an original picture to be identified, and segmenting the original picture into a plurality of sub-pictures; respectively detecting a print form character area and a handwritten character area in each subgraph by adopting a character type detection model to obtain coordinate information and character types of each character area; and combining adjacent character areas of the same type belonging to different subgraphs to obtain a print character area and a handwritten character area in the original picture. The invention also discloses corresponding computing equipment.

Description

Method for constructing character type detection model and computing equipment

Technical Field

The invention relates to the field of image data processing, in particular to a construction method of a character type detection model, a character type detection method and computing equipment.

Background

With the development of computer and internet technologies, people increasingly use automated equipment to read examination papers of students. In the analysis of test paper, it is often necessary to identify whether the text in each recognition area is a handwritten font or a printed font. Current text recognition methods typically recognize text based on character color or text characteristics of a short answer. The method has very high requirements on the quality of the image, and if the image has shadow or handwriting immersion, blurring and the like, the problem of low detection precision is caused. Moreover, this method can generally only perform segmentation detection based on horizontal lines of text, which does not perform good detection of a rotated image. In addition, the characters have various characteristics, and the detection and distinguishing of the handwritten characters based on the color characteristics cannot fully mine the characteristics of the handwritten characters, so that the detection effect is limited to a certain extent.

Therefore, it is desirable to provide a more efficient method for detecting handwritten and printed text.

Disclosure of Invention

In view of the above problems, the present invention provides a method for constructing a text type detection model, a text type detection method and a computing device, which aim to solve or at least solve the above problems.

According to one aspect of the present invention, there is provided a method for constructing a text type detection model, which is suitable for being executed in a computing device, the method including: collecting training pictures, wherein each training picture comprises at least one of print characters and handwritten characters; expanding each training picture into a square picture according to the length and width value of each training picture; acquiring results of marking the print font character area and the handwritten font character area of each shape picture; and training the convolutional neural network according to the training pictures and the labeling results thereof to obtain a character type detection model.

Optionally, in the method for constructing a text type detection model according to the present invention, the convolutional neural network includes 6 convolutional layers and 2 fully-connected layers.

Optionally, in the method for constructing a character type detection model according to the present invention, the convolution kernels of the middle convolution layer in the convolutional neural network include 3 × 3 convolution kernels, 5 × 5 convolution kernels, and 7 × 7 convolution kernels, and the final output layer includes 3 types of printed character regions, handwritten character regions, and background regions.

Optionally, in the method for constructing a text type detection model according to the present invention, the operation of labeling the print font text region and the handwriting font text region of the pictogram includes: determining each text line in the square picture and a character area in each text line; marking the character area types of each text line by line, wherein the character area types comprise a print style character area and a handwritten character area; and storing the coordinate information of each character area in each text line and the character type to which the coordinate information belongs.

Optionally, in the method for constructing a text type detection model according to the present invention, the step of expanding the training picture into a square picture according to the length and width values of the picture includes: and selecting the larger value of the length and the width as a white background image, and placing the training picture in the center of the white background image.

According to another aspect of the present invention, there is provided a text type detection method, adapted to be executed in a computing device, where a text type detection model is stored in the computing device, and the text type detection model is adapted to be constructed by the above-mentioned construction method of the text type detection model, where the text type detection method includes: acquiring an original picture of a character type to be recognized, and segmenting the original picture into a plurality of sub-pictures, wherein the sub-pictures are not overlapped and are connected; respectively detecting a print form character area and a handwritten form character area in each subgraph by adopting a character type detection model to obtain coordinate information of each character area and a character type to which the character area belongs; combining adjacent character areas of the same type belonging to different subgraphs respectively, and using the print character area set and the handwritten character area set in all the subgraphs as the print character area and the handwritten character area in the original picture

Optionally, in the text type detection method according to the present invention, the step of respectively merging text regions of the same type that belong to different subgraphs and are adjacent to each other includes: respectively acquiring first coordinate information of a print character area and a handwritten character area in each sub-image in the corresponding sub-image, and converting the first coordinate information into second coordinate information based on an original image; and detecting whether two or more character areas belonging to the same type are adjacent according to the second coordinate information of each character area, and if so, combining the adjacent areas to obtain all print character areas and handwritten character areas in the original picture.

Optionally, in the text type detection method according to the present invention, the step of splitting the original picture into a plurality of sub-pictures includes: and expanding the original picture into a square picture according to the length and width value of the original picture, and dividing the square picture into a plurality of subgraphs.

Optionally, in the text type detecting method according to the present invention, the coordinate information of the text region includes an upper left corner vertex coordinate and a lower right corner vertex coordinate of the text region.

Optionally, in the text type detection method according to the present invention, if the coordinate value of the top left vertex of the original picture in the square picture where the top left vertex is located is (x, y), the coordinate value of the top left vertex of a certain sub-picture in the square picture is (x, y)₁，y₁) The coordinate value of the top left corner vertex of a character area in the subgraph is (x)₂，y₂) Then the coordinate value of the character area in the original picture is (x)₁+x₂-x，y₁+y₂-y)。

According to yet another aspect of the invention, there is provided a computing device comprising: at least one processor; and a memory storing program instructions, wherein the program instructions are configured to be executed by the at least one processor, the program instructions comprising instructions for performing the method of constructing a text type detection model and/or the method of text type detection as described above.

According to yet another aspect of the present invention, there is provided a readable storage medium storing program instructions which, when read and executed by a computing device, cause the computing device to perform the method of constructing a text type detection model and/or the text type detection method as described above.

According to the technical scheme, in the model training process, a large number of text pictures with print forms and handwritten characters are collected, square expansion processing is carried out on the text pictures, the print form character areas and the handwritten character areas are manually marked and then input into a convolutional neural network for learning, and the character type detection model is obtained. The square expansion processing can effectively reduce the problem that the model training effect is poor due to the fact that the marked area is too small and the size is irregular in the subsequent model training process. The manual marking can be carried out line by line in the horizontal direction, so that the character area of a single line can be identified in the subsequent model training, the result roughness of the overall detection of the model is avoided, and the fine granularity and the precision of the detection are improved.

In the using process of the model, the original image to be recognized can be divided into a plurality of subgraphs according to the actual size of the original image, and the print character area and the handwritten character area in each subgraph are respectively detected. And finally, combining the print form character area and the handwritten form character area of each subgraph to obtain the print form character area and the handwritten form character area of the original image. Here, the original image is sliced into sub-images, which is more suitable for detection of the region detection model, and compared with the case where the original image is directly recognized on the original image, the fine granularity and accuracy of recognition can be improved. After all the subgraph results are combined, the printed and handwritten character regions can be obtained more practically, region fragments formed in factor graph detection are reduced, and therefore the region which is more consistent with character distribution in the original graph is obtained.

Drawings

To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.

FIG. 1 shows a block diagram of a computing device 100, according to one embodiment of the invention;

FIG. 2 illustrates a flow diagram of a method 200 for building a text type detection model according to one embodiment of the invention;

FIG. 3 illustrates a flow diagram of a text type detection method 300 according to one embodiment of the invention;

FIGS. 4A and 4B respectively illustrate example pictures that meet model training requirements;

FIGS. 4C and 4D respectively show example pictures that do not meet model training requirements;

FIGS. 5A and 5B are diagrams illustrating a square expansion process performed on a picture, respectively;

FIG. 6 is a diagram illustrating labeling of text regions of respective textbook lines line by line, according to one embodiment of the invention;

FIG. 7 shows a schematic structural diagram of a convolutional neural network, according to one embodiment of the present invention;

FIG. 8 illustrates a diagram of adaptively partitioning an original picture into multiple subgraphs, according to an embodiment of the present invention; and

FIG. 9 shows a schematic diagram of a transformation of the base coordinate system according to one embodiment of the invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 is a block diagram of an example computing device 100. In a basic configuration 102, computing device 100 typically includes system memory 106 and one or more processors 104. A memory bus 108 may be used for communication between the processor 104 and the system memory 106.

Depending on the desired configuration, the processor 104 may be any type of processing, including but not limited to a microprocessor (μ P), a microcontroller (μ C), a digital information processor (DSP), or any combination thereof the processor 104 may include one or more levels of cache, such as a level one cache 110 and a level two cache 112, a processor core 114, and registers 116 the example processor core 114 may include an arithmetic logic unit (A L U), a Floating Point Unit (FPU), a digital signal processing core (DSP core), or any combination thereof the example memory controller 118 may be used with the processor 104 or, in some implementations, the memory controller 118 may be an internal part of the processor 104.

Depending on the desired configuration, system memory 106 may be any type of memory, including but not limited to: volatile memory (such as RAM), non-volatile memory (such as ROM, flash memory, etc.), or any combination thereof. System memory 106 may include an operating system 120, one or more applications 122, and program data 124. In some embodiments, application 122 may be arranged to operate with program data 124 on an operating system. Program data 124 includes instructions, and in computing device 100 according to the present invention, program data 124 includes instructions for performing text type detection model construction method 200 and/or text type detection method 300.

Computing device 100 may also include an interface bus 140 that facilitates communication from various interface devices (e.g., output devices 142, peripheral interfaces 144, and communication devices 146) to the basic configuration 102 via the bus/interface controller 130. The example output device 142 includes a graphics processing unit 148 and an audio processing unit 150. They may be configured to facilitate communication with various external devices, such as a display or speakers, via one or more a/V ports 152. Example peripheral interfaces 144 may include a serial interface controller 154 and a parallel interface controller 156, which may be configured to facilitate communication with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device) or other peripherals (e.g., printer, scanner, etc.) via one or more I/O ports 158. An example communication device 146 may include a network controller 160, which may be arranged to facilitate communications with one or more other computing devices 162 over a network communication link via one or more communication ports 164.

A network communication link may be one example of a communication medium. Communication media may typically be embodied by computer readable instructions, data structures, program modules, and may include any information delivery media, such as carrier waves or other transport mechanisms, in a modulated data signal. A "modulated data signal" may be a signal that has one or more of its data set or its changes made in such a manner as to encode information in the signal. By way of non-limiting example, communication media may include wired media such as a wired network or private-wired network, and various wireless media such as acoustic, Radio Frequency (RF), microwave, Infrared (IR), or other wireless media. The term computer readable media as used herein may include both storage media and communication media.

Computing device 100 may be implemented as a server, such as a file server, a database server, an application server, a WEB server, etc., or as part of a small-form factor portable (or mobile) electronic device, such as a cellular telephone, a Personal Digital Assistant (PDA), a personal media player device, a wireless WEB-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions. Computing device 100 may also be implemented as a personal computer including both desktop and notebook computer configurations. In some embodiments, computing device 100 is configured to perform a method 200 of building a text type detection model and/or a method 300 of text type detection according to the present invention.

FIG. 2 illustrates a method 200 for building a text-type detection model according to one embodiment of the invention, which may be performed in a computing device, such as computing device 100. As shown in fig. 2, the method begins at step S220.

In step S220, training pictures are collected, wherein each training picture includes at least one of print characters and handwritten characters.

For a specific application scene, text pictures containing print and/or handwriting conforming to the scene can be collected, and it should be noted that the number of lines of text in the pictures is not too large or too dense, so as to reduce the labor cost of subsequent manual labeling. FIGS. 4A and 4B show exemplary images, respectively, that meet model training requirements, with appropriate text line numbers and spacing; FIGS. 4C and 4D respectively show example pictures that do not meet model training requirements; too many lines of text are also too dense.

Subsequently, in step S240, each training picture is expanded into a square picture according to the length and width value of each training picture.

Usually, the acquired training pictures do not necessarily meet the training requirements of the subsequent detection model, and therefore square expansion processing needs to be performed on each picture, so that the problem that the model training effect is poor due to the fact that the labeling area is too small and the size is irregular in the subsequent model training process can be reduced. The square expansion can be based on the original size of the picture (e.g. w is long and h is wide), select the larger value of w and h to frame an image with white background, and place the picture in the center of the white image, thus expanding the original picture into a square picture of w × w or h × h. FIGS. 5A and 5B illustrate two examples of square processing, respectively, where the width w of the picture in FIG. 5A is greater than the height h, thus expanding the picture to a square by the width value w; while the width w of the picture in fig. 5B is smaller than the height h, the picture is expanded to be square according to the height value h. Of course, if the picture itself is exactly a square picture, then square expansion is not performed any more.

Subsequently, in step S260, the result of labeling the print font character region and the handwritten character region of each font picture is obtained.

The operation of marking the print form character area and the handwriting form character area of the shape picture comprises the following steps: determining each text line in the square picture and a character area in each text line; marking the character area types of each text line by line, wherein the character area types comprise a print style character area and a handwritten character area; and storing the coordinate information of each character area in each text line and the character type to which the coordinate information belongs. The coordinate information of the text area generally includes a vertex coordinate of the upper left corner and a vertex coordinate of the lower right corner of the text area, and of course, other coordinate representation methods may be selected, such as a vertex coordinate of the lower left corner and a vertex coordinate of the upper right corner, or a vertex coordinate of the upper left corner and a length and width value of the area, as long as the area position of one text area can be accurately represented, which is not limited in the present invention. In addition, it should be understood that the text region may be recognized by any existing region recognition method, such as by using an OCR recognition method, which is not limited by the present invention.

Fig. 6 is a schematic diagram illustrating labeling of text regions of each textline line by line according to an embodiment of the present invention, where 4 text lines are printed text, the first 3 text lines each have a text region, and the 4 th text line has four text regions. The line-by-line labeling method enables the subsequent model training to recognize the character area of a single line, avoids the rough result of the overall detection of the model, and can improve the fine granularity and the precision of the detection.

Subsequently, in step S280, the convolutional neural network is trained according to each training picture and its labeling result, so as to obtain a character type detection model.

The method carries out model training according to the existing labeled picture set with a certain scale, and specifically, adopts the square-processed picture set and the labeling information of each picture and utilizes an improved detection model of the fast regional convolutional neural network to carry out training. The training model is modified based on a detection model of a fast zone convolutional neural network (ZF network). The structure and the contents of each layer of the convolutional neural network can be set by those skilled in the art according to the needs, and the present invention is not limited thereto.

According to an embodiment of the present invention, the convolutional neural network comprises 6 convolutional layers and 2 fully-connected layers, and fig. 7 shows a schematic structural diagram of the convolutional neural network. Considering that the size of the picture input by the deep neural network needs to be fixed (different pictures are all cut to a specified size), the invention cuts the input w or h original picture into a uniform size through multi-scale processing, such as cutting into 224 x 224 size, so as to ensure that the model can support multi-scale image input. In addition, convolution kernels of various sizes can be added to the intermediate convolution layer, such as a 3x3 convolution kernel, a 5x5 convolution kernel and a 7x7 convolution kernel, a parameter discarding strategy is adopted properly after the convolution layer, and finally the number of types of the output layer is set to be 3, including three types of a print, a handwriting and a background. The background refers to a pure white background, and the pixel values of the background are RGB (255, 255, 255), which does not interfere with or affect the original picture area in the neural network calculation. Of course, the structure of each layer in the convolutional neural network may be set to other values as needed, and the present invention is not limited thereto.

As shown in fig. 7, the convolutional neural network includes a 12-layer network structure, in which each layer has a code number of Inputlayer, conv, pool, full connection (fc), and output (output). Fig. 7 shows fully connected and pooled layers together, such as conv2+ pool2, conv3+ pool3, conv5+ pool5, and separate convolutional layers without pooling layers, such as conv1, conv4, conv 6. That is, the complete structure of the convolutional neural network is: input layer → first convolution layer → second convolution layer + second pooling layer → third convolution layer + third pooling layer → fourth convolution layer → fifth convolution layer + fifth pooling layer → sixth convolution layer → first fully-connected layer → second fully-connected layer → output layer, the parameters of each layer are as shown in the following table:

in addition, the model selection can be performed in a cross-validation manner during the training process: dividing the whole picture set into a training set, a verification set and a test set, training on the pictures in the training set, selecting a training model in a proper period to test and detect the performance on the verification set according to the reduction of a loss function in an iteration period, and selecting the training model which best appears on the verification set as a candidate optimal training model.

FIG. 3 illustrates a text type detection method 300 according to one embodiment of the present invention that may be performed in a computing device, such as computing device 100. The character type detection model is stored in the computing equipment, and the character type detection model is suitable for being constructed by adopting the character type detection model method. As shown in fig. 3, the method begins at step S320.

In step S320, an original picture of the text type to be recognized is obtained, and the original picture is divided into a plurality of sub-pictures, wherein the sub-pictures are not overlapped and connected.

As mentioned above, the prior art method for detecting handwritten characters has high requirements on images, and generally requires a high-definition image obtained by scanning with a scanner. The invention provides a character type detection model which can effectively reduce the requirement on image definition. Therefore, the original picture to be recognized can acquire a high-definition character image through a scanner, and can also acquire a photographed image through a mobile phone or a camera. Moreover, the picture is obtained without strict environmental requirements (such as illumination, angle, paper texture and the like), and only ordinary colorless paper needs to be shot normally under natural illumination, so that the universality of text image recognition is effectively improved, and the workload and the cost of image recognition are also reduced.

The segmentation of the original picture may adopt an adaptive segmentation method, that is, the original picture is divided into regions according to the length and width of the original picture, the regions are not overlapped and connected, and each region is regarded as a sub-picture (as shown in the picture segmentation in fig. 8). Generally, a sub-picture size may be limited to no more than 480 × 320, so that an original picture of 1920 × 1280 size may be sliced into 16-20 sub-pictures. The method is more suitable for detection of the region detection model after being cut into sub-images, and compared with the method of directly recognizing on the original image, the method can improve the fine granularity and precision of recognition. Furthermore, the original picture can be expanded into a square picture according to the length and width value of the original picture, and then the square picture is divided into a plurality of subgraphs. The method for expanding the square image is described in the foregoing, and is not further described here.

Subsequently, in step S340, a character type detection model is used to detect the print character region and the handwritten character region in each sub-image, so as to obtain the coordinate information of each character region and the character type to which the character region belongs. That is, the printed matter and handwritten character region detection is performed on each sub-image obtained by segmentation in step S320 one by one, so as to obtain the coordinate information of a plurality of printed matter and handwritten character regions in each sub-image, and the type (belonging to the printed matter or handwritten character region) of each detection region. Similarly, the coordinate information of the text region includes the vertex coordinates of the upper left corner and the vertex coordinates of the lower right corner of the text region, but is not limited thereto, as long as the region position of the text region can be accurately represented.

Subsequently, in step S360, character regions of the same type that belong to different sub-images and are adjacent to each other are merged, and the print character region set and the handwritten character region set in all sub-images are used as the print character region and the handwritten character region in the original image.

The print body areas and the handwriting body areas in all the subgraphs are respectively merged, so that the print body areas and the handwriting body character areas can be obtained more practically, and the area fragments formed in subgraph detection are reduced, so that the areas which are more consistent with the character distribution in the original image are obtained. The rules for merging subgraphs include: 1) collecting regions belonging to the same type in different subgraphs together to serve as regions of the corresponding type of the original picture; 2) since the detected (print or handwritten) region information in each sub-picture is the first coordinate information based on the sub-picture, it is necessary to map this first coordinate information to the second coordinate information based on the original picture (which involves transformation of the base coordinate system); 3) after converting into second coordinate information based on the original picture, detecting whether two or more areas are adjacent to each other, and if so, merging the areas; 4) and finally, arranging to obtain all non-overlapped printing forms and handwriting areas of the original picture.

According to an embodiment of the present invention, ifThe coordinate value of the top left vertex of the original picture in the square picture is (x, y), and the coordinate value of the top left vertex of a sub-picture in the square picture is (x, y)₁，y₁) The coordinate value of the top left corner vertex of a character area in the subgraph is (x)₂，y₂) Then the coordinate value of the character area in the original picture is (x)₁+x₂-x，y₁+y₂-y)。

Fig. 9 shows a schematic diagram of the principle of transformation of the base coordinate system, which is mainly how to transform the coordinates of the text region detected in the sub-image into the coordinates of the original w × w or h × h image after square expansion. As shown in fig. 9, for a square-extended picture (including a white background), the text picture area occupies only the central portion thereof, and the coordinates of the top left corner vertex (i.e., the position of the left frame five-pointed star) of the area are (x, y). Since the text detection of print/handwriting according to the present invention is performed on subgraphs 1-4 (the square expanded picture is divided into 4 blocks in the example graph, and of course, the square expanded picture can be divided into other numbers of subgraphs, such as 8 subgraphs, 12 subgraphs, 16 subgraphs, etc.), the coordinates of the detected text of print or handwriting are also based on the subgraphs, i.e., the first coordinate information. For example, in the handwritten character area of the rectangular box in sub-diagram 2, the vertex coordinate at the upper left corner is (x)₂，y₂) This coordinate value is relative to the vertex of sub-graph 2 (i.e. the pentagram position of the upper border in the graph), and the aim of the invention is to assign the coordinate (x)₂，y₂) Converting into coordinate values (x) with respect to vertices (x, y) of an original picture in a square picture₂＇，y₂' i.e. the second coordinate information with respect to the vertex of the original picture. By calculation, x₂＇＝x₁+x₂-x，y₂＇＝y₁+y₂-y。

According to another embodiment of the present invention, it is able to detect whether two or more regions are adjacent to each other according to the second coordinate information of each text region relative to the original picture. Here, adjacent cutting generally means that there is a print or handwriting area adjacent to the edge of different subgraphs, which is mainly the case that the same text area is cut by different subgraphs. For such a word that is split, it needs to be merged to get a complete line of words. Generally, whether adjacent cutting is performed can be determined according to the vertex coordinates of the upper left corner and the vertex coordinates of the lower right corner of the two character areas, and one abscissa value or one ordinate value is generally the same when adjacent cutting is performed. The rectangular frames of sub-picture 1 and sub-picture 3 as in fig. 9 are adjacent and they are an integral region in the original picture, so they need to be merged.

Specifically, adjacent character areas of the same type may be merged according to the following method: respectively acquiring first coordinate information of a print character area and a handwritten character area in each sub-image in the corresponding sub-image, and converting the first coordinate information into second coordinate information based on an original image; and detecting whether two or more character areas belonging to the same type are adjacent according to the second coordinate information of each character area, and if so, combining the adjacent areas to obtain all print character areas and handwritten character areas in the original picture. Merging here may refer to taking the largest union of two or more text regions.

According to the technical scheme of the invention, after each picture is subjected to square expansion processing, the problem that the model training effect is poor due to the fact that the marked area is too small and the size is irregular in the subsequent model training process can be reduced. The training pictures are marked line by line in the horizontal direction, so that the character area of a single line can be identified in subsequent model training, the rough result of the overall detection of the model is avoided, and the fine granularity and the precision of the detection can be improved. According to the characteristics of the image data set, the network result is modified, and the model training is carried out by using the improved fast regional convolution neural network, so that the model performance is higher. The method is more suitable for detection of the region detection model after being cut into sub-images, and compared with the method of directly recognizing on the original image, the method can improve the fine granularity and precision of recognition. The printed and handwritten character areas can be obtained more practically after the subgraphs are combined, and the area fragments formed in subgraph detection are reduced, so that the areas which are more consistent with the character distribution in the original image are obtained.

B9, the method according to any one of B6-B8, wherein the coordinate information of a text region includes top left corner and bottom right corner vertex coordinates of the text region.

B10, the method according to B7, wherein if the coordinate value of the top left vertex of the original picture in the square picture is (x, y), the coordinate value of the top left vertex of a sub-picture in the square picture is (x, y)₁，y₁) The coordinate value of the top left corner vertex of a character area in the subgraph is (x)₂，y₂) Then the coordinate value of the character area in the original picture is (x)₁+x₂-x，y₁+y₂-y)。

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various techniques described herein may be implemented in connection with hardware or software or, alternatively, with a combination of both. Thus, the methods and apparatus of the present invention, or certain aspects or portions thereof, may take the form of program code (i.e., instructions) embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, or any other machine-readable storage medium, wherein, when the program is loaded into and executed by a machine, such as a computer, the machine becomes an apparatus for practicing the invention.

In the case of program code execution on programmable computers, the computing device will generally include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Wherein the memory is configured to store program code; the processor is configured to execute the method for constructing a text type detection model and/or the text type detection method according to the present invention according to instructions in the program code stored in the memory.

Furthermore, some of the described embodiments are described herein as a method or combination of method elements that can be performed by a processor of a computer system or by other means of performing the described functions. A processor having the necessary instructions for carrying out the method or method elements thus forms a means for carrying out the method or method elements. Further, the elements of the apparatus embodiments described herein are examples of the following apparatus: the apparatus is used to implement the functions performed by the elements for the purpose of carrying out the invention.

As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims

1. A method for constructing a character type detection model, which is suitable for being executed in a computing device, comprises the following steps:

collecting training pictures, wherein each training picture comprises at least one of print characters and handwritten characters;

expanding each training picture into a square picture according to the length and width value of each training picture;

acquiring results of marking the print font character area and the handwritten font character area of each shape picture; and

training a convolutional neural network according to each training picture and a labeling result thereof to obtain the character type detection model, wherein an output layer of the convolutional neural network comprises 3 types of a print font character area, a handwritten font character area and a background area;

the operation of marking the print form character area and the handwriting form character area of the shape picture comprises the following steps:

determining each text line in the square picture and a character area in each text line;

marking the character area types of each text line by line, wherein the character area types comprise a print style character area and a handwritten character area; and

and storing the coordinate information of each character area in each text line and the character type to which the coordinate information belongs.

2. The method of claim 1, wherein the convolutional neural network comprises 6 convolutional layers and 2 fully-connected layers.

3. The method of claim 2, wherein the convolution kernels of the intermediate convolution layers in the convolutional neural network comprise 3x3 convolution kernels, 5x5 convolution kernels, and 7x7 convolution kernels.

4. The method of claim 1, wherein the step of expanding the training picture into a square picture according to the length and width values of the picture comprises:

and selecting the larger value of the length and the width as a white background image, and placing the training picture in the center of the white background image.

5. A text type detection method adapted to be executed in a computing device having stored therein a text type detection model adapted to be constructed using the method of any one of claims 1-4, the text type detection method comprising:

acquiring an original picture of a character type to be recognized, expanding the original picture into a square picture according to the length and width value of the original picture, and cutting the square picture into a plurality of subgraphs, wherein the subgraphs are not overlapped and are connected;

detecting the print character area and the handwritten character area in each subgraph respectively by adopting the character type detection model to obtain the coordinate information of each character area and the character type of the character area; and

and respectively merging character areas which belong to different subgraphs and are adjacent and of the same type, and taking the print character area set and the handwritten character area set in all the subgraphs as the print character area and the handwritten character area in the original picture.

6. The method of claim 5, wherein the step of merging adjacent text regions of the same type belonging to different subgraphs respectively comprises:

respectively acquiring first coordinate information of a print character area and a handwritten character area in each sub-image in the corresponding sub-image, and converting the first coordinate information into second coordinate information based on an original image;

and detecting whether two or more character areas belonging to the same type are adjacent according to the second coordinate information of each character area, and if so, combining the adjacent areas to obtain all print character areas and handwritten character areas in the original picture.

7. The method of claim 5 or 6, wherein the coordinate information of the text region comprises top left corner vertex coordinates and bottom right corner vertex coordinates of the text region.

8. The method of claim 6, wherein if the vertex of the upper left corner of the original picture in the square picture has the coordinate value of (x, y), the vertex of the upper left corner of a sub-picture in the square picture has the coordinate value of (x, y) in the square picture₁，y₁) The coordinate value of the top left corner vertex of a character area in the subgraph is (x)₂，y₂) Then the coordinate value of the character area in the original picture is (x)₁+x₂-x，y₁+y₂-y)。

9. A computing device, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-8.

10. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-8.