CN111626287A

CN111626287A - Training method and device for recognition network for recognizing Chinese in scene

Info

Publication number: CN111626287A
Application number: CN201910146791.1A
Authority: CN
Inventors: 郜业飞; 董健; 颜水成
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2019-02-27
Filing date: 2019-02-27
Publication date: 2020-09-04

Abstract

The invention provides a training method and a training device for a recognition network for recognizing Chinese in a scene. The method comprises the following steps: randomly generating a first corpus sample by using common Chinese characters; synthesizing the first corpus sample and a first background image to obtain a first synthesized scene image sample containing a Chinese character area; and training a recognition network for recognizing Chinese in the scene by using the first synthesized scene image sample. Because the occurrence probability of the common Chinese characters tends to be uniform in the randomly generated corpus samples, when the recognition network is trained by using the scene image samples synthesized based on the randomly generated corpus samples, the frequency that the recognition network can see all the common Chinese characters also tends to be consistent, thereby solving the problem of long-tail distribution of the Chinese characters to a certain extent and improving the recognition effect of the Chinese characters in the scene.

Description

Training method and device for recognition network for recognizing Chinese in scene

Technical Field

The invention relates to the technical field of image recognition, in particular to a training method for a recognition network for recognizing Chinese in a scene, a training device for the recognition network for recognizing Chinese in the scene, a computer storage medium and computing equipment.

Background

At present, the deep learning technology is widely applied to the field of graphic images. OCR (Optical character recognition) is a key link for interaction between electronic devices and the external environment in life, and is widely used in a plurality of application scenarios, such as license plate recognition, street view recognition, network image/video monitoring, and the like. And due to the introduction of deep learning, the OCR recognition precision is obviously improved, and the commercial product output of the related technology is promoted.

Nowadays, the application of a scene character recognition model based on deep learning in English character recognition is widely researched by scholars at home and abroad, and a good recognition effect is achieved. However, because chinese has the characteristics of no special interval between characters, rich number of characters, similar character patterns, long-tail distribution of corpus and the like, it is difficult to achieve the expectation by directly migrating the english recognition scheme to the chinese environment for chinese scene character recognition.

Therefore, a method for improving the long-tailed word problem of the character recognition in the Chinese scene so as to improve the recognition effect of the Chinese characters in the scene is needed.

Disclosure of Invention

In view of the above, the present invention has been made to provide a training method of a recognition network for recognizing chinese in a scene, a training apparatus of a recognition network for recognizing chinese in a scene, a computer storage medium, and a computing device that overcome or at least partially solve the above problems.

According to an aspect of the embodiments of the present invention, there is provided a training method for a recognition network for recognizing chinese in a scene, including:

randomly generating a first corpus sample by using common Chinese characters;

synthesizing the first corpus sample and a first background image to obtain a first synthesized scene image sample containing a Chinese character area;

and training a recognition network for recognizing Chinese in the scene by using the first synthesized scene image sample.

Optionally, in the first corpus sample, the frequency of occurrence of each chinese character is controllable.

Optionally, in the first corpus sample, the occurrence frequencies of all chinese characters are controlled to be equal.

Optionally, before randomly generating the first corpus sample using the common chinese characters, the method further comprises:

and acquiring the common Chinese characters from the codebook for Chinese character input.

Optionally, the method further comprises:

obtaining a corpus with real semantic information;

synthesizing the corpus with the real semantic information and a second background image to obtain a second synthesized scene image sample containing a Chinese character area;

training the recognition network using the second synthetic scene image sample.

Optionally, the first background image is the same as the second background image.

Optionally, obtaining the corpus with the real semantic information includes:

and intercepting characters with specific length from a text material containing natural semantics as the corpus with real semantic information.

Optionally, the method further comprises:

acquiring real scene image data;

and adjusting parameters of the identification network by using the real scene image data.

Optionally, acquiring real scene image data includes:

and marking the real scene image, and cutting out a Chinese character area in the real scene image.

Optionally, the recognition network is used to recognize chinese within a natural scene.

According to another aspect of the embodiments of the present invention, there is also provided a training apparatus for recognizing a recognition network for recognizing chinese in a scene, including:

the random corpus generating module is suitable for randomly generating a first corpus sample by utilizing common Chinese characters;

the image sample synthesis module is suitable for synthesizing the first corpus sample and a first background image to obtain a first synthesis scene image sample containing a Chinese character area; and

and the recognition network training module is suitable for training a recognition network for recognizing Chinese in the scene by using the first synthetic scene image sample.

Optionally, the random corpus generating module is further adapted to:

the method includes obtaining common Chinese characters from a codebook for Chinese character input before randomly generating first corpus samples with the common Chinese characters.

Optionally, the apparatus further comprises:

the real corpus acquiring module is suitable for acquiring a corpus with real semantic information;

the image sample synthesis module is further adapted to:

the recognition network training module is further adapted to:

training the recognition network using the second synthetic scene image sample.

Optionally, the real corpus acquiring module is further adapted to:

Optionally, the apparatus further comprises:

the real scene data acquisition module is suitable for acquiring real scene image data; and

and the identification network adjusting module is suitable for adjusting the parameters of the identification network by utilizing the real scene image data.

Optionally, the real scene data obtaining module is further adapted to:

According to yet another aspect of the embodiments of the present invention, there is also provided a computer storage medium storing computer program code which, when run on a computing device, causes the computing device to execute the training method for recognition network of chinese in a recognition scenario according to any one of the above.

According to still another aspect of the embodiments of the present invention, there is also provided a computing device including:

a processor; and

a memory storing computer program code;

the computer program code, when executed by the processor, causes the computing device to perform a method of training a recognition network for recognizing chinese within a scene according to any of the above.

According to the training method and device for the recognition network for recognizing the Chinese characters in the scene, disclosed by the embodiment of the invention, the corpus samples are randomly generated by utilizing the common Chinese characters, the obtained corpus samples are synthesized with the background image to obtain the synthesized scene image samples containing the Chinese character areas, and then the synthesized scene image samples are utilized to train the recognition network. Because only a small part of common Chinese characters frequently appear in natural corpus information, and other Chinese characters rarely or even do not appear (so-called long-tail distribution), if the natural corpus information material is used to train the recognition network, a good recognition effect on Chinese characters with low occurrence frequency in the corpus cannot be obtained. In the randomly generated corpus samples, the occurrence probability of the common Chinese characters tends to be uniform, and further, when the recognition network is trained by using the scene image samples synthesized based on the randomly generated corpus samples, the frequency of the recognition network for all the common Chinese characters tends to be consistent, so that the problem of long-tail distribution of the Chinese characters is solved to a certain extent, and the recognition effect of the Chinese characters in the scene is improved.

Furthermore, the occurrence frequency of each Chinese character in the corpus samples which are randomly synthesized is controlled, particularly the occurrence frequency of all the Chinese characters is controlled to be equal, and the problem of long-tail distribution of Chinese characters is further effectively solved.

Furthermore, after the recognition network is trained in the first stage by using the scene image sample synthesized based on the corpus sample generated randomly, the recognition network can be trained in the second stage by using the scene image sample synthesized based on the corpus with real semantic information, and finally, the recognition network is finely tuned by using real scene image data. By the multi-stage training strategy, the generalization capability of the recognition network and the recognition effect of Chinese characters in scenes are further improved. The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 illustrates a flow diagram of a training method of a recognition network to recognize Chinese within a scene, according to one embodiment of the invention;

FIG. 2 illustrates a flow diagram of a training method of a recognition network to recognize Chinese within a scene according to another embodiment of the invention;

FIG. 3 is a schematic diagram of a training apparatus for recognizing a recognition network of Chinese in a scene according to an embodiment of the present invention; and

FIG. 4 is a schematic structural diagram of a training apparatus for recognizing a recognition network of Chinese in a scene according to another embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

At present, the mainstream scene character recognition scheme is to extract the characteristics of an image character region by using a CRNN (Convolutional Neural Network), and the process combines the extraction of character space characteristic information in an image by a CNN (Convolutional Neural Network) and the encoding capability of the RNN (Convolutional Neural Network) on time sequence information. Further, a CTC (connection Temporal Classification) network is used to decode the encoding result of the character region to obtain the corresponding text information.

In the field, English character recognition is widely researched by scholars at home and abroad, a plurality of recognition schemes are successively proposed, and good recognition results are obtained. For an english scene, there are only 26 english letters, even if numbers are added, the total number is only a few tens, and there are spaces between each word in english. For a Chinese scenario, however, Chinese is a square word, and in a sentence, the distinction between words is not obvious (especially in the case of similar characters) and there is no obvious space between spaces. In particular, although about 5000-. In summary, the recognition of Chinese scene characters has the characteristics of no special interval between characters, rich characters, similar characters and shapes, and long-tail distribution of linguistic materials, so that the English recognition scheme is directly migrated to the Chinese environment and is difficult to achieve the expectation.

In order to solve the above technical problem, an embodiment of the present invention provides a training method for a recognition network that recognizes chinese in a scene. FIG. 1 illustrates a flow diagram of a training method for recognition networks that recognize Chinese within a scene, according to one embodiment of the invention. Referring to fig. 1, the method may include at least the following steps S102 to S106.

Step S102, a first corpus sample is randomly generated by using common Chinese characters.

Step S104, synthesizing the first corpus sample and the first background image to obtain a first synthesized scene image sample containing the Chinese character area.

And step S106, training a recognition network for recognizing Chinese in the scene by using the first synthetic scene image sample.

In the embodiment of the invention, the recognition network is a deep learning network, and the Chinese characters in a natural scene are mainly recognized by adopting an architecture of CRNN combined with CTC.

According to the training method and device for the recognition network for recognizing the Chinese characters in the scene, disclosed by the embodiment of the invention, the corpus samples are randomly generated by utilizing the common Chinese characters, the obtained corpus samples are synthesized with the background image to obtain the synthesized scene image samples containing the Chinese character areas, and then the synthesized scene image samples are utilized to train the recognition network. Because the occurrence probability of the common Chinese characters tends to be uniform in the randomly generated corpus samples, when the recognition network is trained by using the scene image samples synthesized based on the randomly generated corpus samples, the frequency that the recognition network can see all the common Chinese characters also tends to be consistent, thereby solving the problem of long-tail distribution of the Chinese characters to a certain extent and improving the recognition effect of the Chinese characters in the scene.

In the above step S102, the first corpus sample is generated by randomly combining the common chinese characters. In order to make the distribution of the Chinese characters in the generated corpus sample tend to be uniform, enough common Chinese characters can be used, for example, 5000-.

Alternatively, the common chinese characters may be obtained from a codebook used for chinese character input (e.g., a codebook of the chinese input method for huntington). Preferably, the frequently used Chinese characters with the frequency before are used in the codebook are selected.

In a preferred embodiment, to further solve the problem of corpus long-tail distribution, the occurrence frequency of each chinese character in the randomly generated first corpus sample may be controlled, so that the distribution of characters in the corpus is as desired.

Further, the frequency of occurrence of all the Chinese characters in the first corpus sample is controlled to be equal, thereby achieving uniform distribution of the Chinese characters in the corpus.

In step S104, the first background image may be an image of a real scene without text, and the first corpus sample is fused into the first background image to obtain a first synthesized scene image sample.

Further, in step S106, the obtained first synthesized scene image sample is used to train the recognition network, so that the frequency that the recognition network can see all commonly used chinese characters is consistent during the training process, and thus when the trained recognition network is used to recognize chinese characters in the scene, a better and more accurate recognition effect on chinese characters (especially chinese characters with a lower frequency of use) can be achieved.

In an optional embodiment of the present invention, after training the recognition network by using the first synthesized scene image sample synthesized based on the randomly generated first corpus sample, the following steps may be further performed:

first, a corpus with true semantic information is obtained. Then, the corpus with the real semantic information is synthesized with a second background image to obtain a second synthesized scene image sample containing the Chinese character area. And finally, training the recognition network by utilizing a second synthesized scene image sample.

After the recognition network is trained (not called as first-stage training) by using the scene image sample synthesized based on the corpus sample generated randomly, the recognition network is trained (not called as second-stage training) by using the scene image sample synthesized based on the corpus with real semantic information, so that the effect of Chinese recognition can be further improved.

Optionally, to simplify the synthesis of the scene image samples and the training operation of the recognition network, the second background image may employ the same image of the real scene as the first background image.

In practical applications, there are various ways to obtain corpora with real semantic information. For example, words of a certain length may be cut out from text material containing natural semantics as corpus with true semantic information. The text material may be, for example, news, books, etc.

In an optional embodiment of the present invention, after training the recognition network by using the first synthesized scene image sample synthesized based on the randomly generated first corpus sample, or after training the recognition network by using the second synthesized scene image sample synthesized based on the corpus with the real semantic information, the following steps may be further performed:

and acquiring real scene image data, and further performing parameter adjustment on the identification network by using the real scene image data.

Further, the real scene image data may be obtained by:

The parameters of the recognition network are finely adjusted by adopting the data set containing the real scene image of Chinese, so that the generalization capability of the recognition network is improved, and the Chinese character recognition effect is further improved.

In the above, various implementation manners of each link of the embodiment shown in fig. 1 are introduced, and the implementation process of the training method for identifying a network for identifying chinese in an identification scene according to the present invention will be described in detail through specific embodiments.

FIG. 2 is a flow chart illustrating a training method for recognition networks for recognizing Chinese in a scene according to an embodiment of the present invention. In this embodiment, the recognition network is a deep learning network, and a CRNN architecture is adopted in combination with the CTC architecture. Referring to fig. 2, the method may include at least the following steps S202 to S216.

Step S202, common Chinese characters are obtained from a codebook for Chinese character input, and a first corpus sample is randomly generated by the common Chinese characters, wherein the occurrence frequency of all Chinese characters in the first corpus sample is controlled to be equal.

Step S204, synthesizing the first corpus sample and the first background image to obtain a first synthesized scene image sample containing the Chinese character area.

Step S206, a first stage of training is carried out on the recognition network for recognizing Chinese in the natural scene by using the first synthetic scene image sample.

Step S208, intercepting characters with specific length from the text material containing natural semantics as the corpus with real semantic information.

The text material is, for example, a news material, a book, or the like.

Step S210, synthesizing the corpus with the real semantic information and a second background image to obtain a second synthesized scene image sample containing the chinese text region, wherein the second background image is the same as the first background image.

Step S212, a second stage of training is performed on the recognition network by using a second synthetic scene image sample.

Step S214, labeling the real scene image, and cutting out a Chinese character area in the real scene image to obtain a real scene image data set.

Step S216, the real scene image data set is used for carrying out parameter fine adjustment on the identification network.

In the embodiment, the problem of long-tailed words in Chinese scene character recognition is effectively solved through a multi-stage training strategy, and the recognition effect of Chinese characters in natural scenes is improved.

Based on the same inventive concept, the embodiment of the invention also provides a training device for identifying the recognition network of the Chinese in the scene, which is used for supporting the training method for identifying the recognition network of the Chinese in the scene provided by any one of the embodiments or the combination thereof. FIG. 3 is a schematic diagram of a training apparatus 300 for recognizing a recognition network of Chinese in a scene according to an embodiment of the present invention. Referring to fig. 3, the apparatus 300 may include at least: a random corpus generation module 310, an image sample synthesis module 320, and a recognition network training module 330.

Now, the functions of the components or devices of the training apparatus 300 for identifying a chinese recognition network in a scene and the connection relationship between the components will be described:

the random corpus generating module 310 is adapted to randomly generate a first corpus sample using the commonly used Chinese characters.

The image sample synthesizing module 320 is connected to the random corpus generating module 310, and is adapted to synthesize the first corpus sample and the first background image to obtain a first synthesized scene image sample containing a chinese text region.

And the recognition network training module 330 is connected with the image sample synthesis module 320 and is adapted to train a recognition network for recognizing Chinese in the scene by using the first synthesized scene image sample.

In an alternative embodiment of the present invention, the frequency of occurrence of each Chinese character in the obtained first corpus sample is controllable.

Further, in the obtained first corpus sample, the frequency of occurrence of all chinese characters is controlled to be equal.

In an optional embodiment of the present invention, the random corpus generating module 310 is further adapted to:

the common Chinese characters are obtained from a codebook for Chinese character input before randomly generating the first corpus samples with the common Chinese characters.

In an alternative embodiment of the present invention, as shown in fig. 4, the training apparatus 300 for recognizing a chinese recognition network in a scene illustrated in fig. 3 may further include a real corpus acquiring module 340. The real corpus acquiring module 340 may be connected to the image sample synthesizing module 320, and is adapted to acquire corpus with real semantic information. Accordingly, the image sample synthesis module 320 is further adapted to: and synthesizing the corpus with the real semantic information and the second background image to obtain a second synthesized scene image sample containing the Chinese character area. The recognition network training module 330 is further adapted to: and training the recognition network by using the second synthesized scene image sample.

In an alternative embodiment of the invention, the first background image is the same as the second background image.

In an optional embodiment of the present invention, the real corpus acquiring module 340 is further adapted to:

and intercepting characters with specific length from a text material containing natural semantics as a corpus with real semantic information.

In an alternative embodiment of the present invention, still referring to fig. 4, the training apparatus 300 for recognizing chinese in a scene may further include a real scene data obtaining module 350 and a recognition network adjusting module 360. The real scene data acquisition module 350 is adapted to acquire real scene image data. The recognition network adjusting module 360 may be connected to the real scene data acquiring module 350 and the recognition network training module 330, respectively, and is adapted to perform parameter adjustment on the recognition network by using the real scene image data.

In an optional embodiment of the invention, the real scene data acquisition module 350 is further adapted to:

In an alternative embodiment of the invention, the recognition network is used to recognize Chinese within natural scenes.

Based on the same inventive concept, the embodiment of the invention also provides a computer storage medium. The computer storage medium stores computer program code that, when run on a computing device, causes the computing device to perform a method of training a recognition network for recognizing chinese within a scene according to any one or combination of the above embodiments.

Based on the same inventive concept, the embodiment of the invention also provides the computing equipment. The computing device may include:

a processor; and

a memory storing computer program code;

the computer program code, when executed by a processor, causes the computing device to perform a method of training a recognition network for recognizing chinese within a scene according to any one or combination of the above embodiments.

According to any one or a combination of multiple optional embodiments, the embodiment of the present invention can achieve the following advantages:

Furthermore, after the recognition network is trained in the first stage by using the scene image sample synthesized based on the corpus sample generated randomly, the recognition network can be trained in the second stage by using the scene image sample synthesized based on the corpus with real semantic information, and finally, the recognition network is finely tuned by using real scene image data. By the multi-stage training strategy, the generalization capability of the recognition network and the recognition effect of Chinese characters in scenes are further improved.

It is clear to those skilled in the art that the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and for the sake of brevity, further description is omitted here.

In addition, the functional units in the embodiments of the present invention may be physically independent of each other, two or more functional units may be integrated together, or all the functional units may be integrated in one processing unit. The integrated functional units may be implemented in the form of hardware, or in the form of software or firmware.

Those of ordinary skill in the art will understand that: the integrated functional units, if implemented in software and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computing device (e.g., a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention when the instructions are executed. And the aforementioned storage medium includes: u disk, removable hard disk, Read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disk, and other various media capable of storing program code.

Alternatively, all or part of the steps of implementing the foregoing method embodiments may be implemented by hardware (such as a computing device, e.g., a personal computer, a server, or a network device) associated with program instructions, which may be stored in a computer-readable storage medium, and when the program instructions are executed by a processor of the computing device, the computing device executes all or part of the steps of the method according to the embodiments of the present invention.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments can be modified or some or all of the technical features can be equivalently replaced within the spirit and principle of the present invention; such modifications or substitutions do not depart from the scope of the present invention.

According to an aspect of the embodiments of the present invention, a method for training a recognition network for recognizing chinese in a scene is provided, including:

randomly generating a first corpus sample by using common Chinese characters;

A2. The method according to a1, wherein the frequency of occurrence of each chinese character in the first corpus sample is controllable.

A3. The method according to a2, wherein the frequency of occurrence of all chinese characters in the first corpus sample is controlled to be equal.

A4. The method according to any one of A1-A3, wherein before randomly generating the first corpus sample with common Chinese characters, further comprising:

A5. The method of any one of a1-a4, further comprising:

obtaining a corpus with real semantic information;

training the recognition network using the second synthetic scene image sample.

A6. The method of a5, wherein the first background image is the same as the second background image.

A7. The method according to A5 or A6, wherein obtaining corpus with true semantic information comprises:

A8. The method of any one of a1-a7, further comprising:

acquiring real scene image data;

A9. The method of A8, wherein acquiring real scene image data comprises:

A10. The method of any of a1-a9, wherein the recognition network is used to recognize chinese within a natural scene.

According to another aspect of the embodiments of the present invention, there is also provided a training apparatus for recognizing a recognition network for chinese in a scene, including:

B12. The apparatus of B11, wherein the frequency of occurrence of each chinese character in the first corpus sample is controllable.

B13. The apparatus of B12, wherein the frequency of occurrence of all chinese characters in the first corpus sample is controlled to be equal.

B14. The apparatus of any one of B11-B13, wherein the random corpus generation module is further adapted to:

B15. The apparatus of any one of B11-B14, further comprising:

the image sample synthesis module is further adapted to:

the recognition network training module is further adapted to:

training the recognition network using the second synthetic scene image sample.

B16. The apparatus of B15, wherein the first background image is the same as the second background image.

B17. The apparatus of B15 or B16, wherein the real corpus acquisition module is further adapted to:

B18. The apparatus of any one of B11-B17, further comprising:

B19. The apparatus of B18, wherein the real scene data acquisition module is further adapted to:

B20. The apparatus of any of B11-B19, wherein the recognition network is to recognize Chinese within a natural scene.

There is also provided, in accordance with yet another aspect of an embodiment of the present invention, a computer storage medium storing computer program code which, when run on a computing device, causes the computing device to perform a method of training a recognition network for recognizing chinese within a scene as recited in any one of a1-a 10.

There is also provided, in accordance with yet another aspect of an embodiment of the present invention, apparatus for computing, including:

a processor; and

a memory storing computer program code;

the computer program code, when executed by the processor, causes the computing device to perform a method of training a recognition network for recognition of chinese within a scene according to any of a1-a 10.

Claims

1. A training method for a recognition network for recognizing Chinese in a scene comprises the following steps:

randomly generating a first corpus sample by using common Chinese characters;

2. The method of claim 1, wherein the frequency of occurrence of each chinese character in the first corpus sample is controllable.

3. The method according to claim 2, wherein the frequency of occurrence of all chinese characters in the first corpus sample is controlled to be equal.

4. The method according to any one of claims 1-3, wherein prior to randomly generating the first corpus sample using common Chinese characters, further comprising:

5. The method according to any one of claims 1-4, further comprising:

obtaining a corpus with real semantic information;

training the recognition network using the second synthetic scene image sample.

6. The method of claim 5, wherein the first background image is the same as the second background image.

7. The method according to claim 5 or 6, wherein obtaining corpus with true semantic information comprises:

8. A training apparatus for recognizing a recognition network for chinese in a scene, comprising:

9. A computer storage medium storing computer program code which, when run on a computing device, causes the computing device to perform a method of training a recognition network for chinese within a recognition scenario according to any one of claims 1-7.

10. A computing device, comprising:

a processor; and

a memory storing computer program code;

the computer program code, when executed by the processor, causes the computing device to perform a method of training a recognition network for recognition of chinese within a scene according to any of claims 1-7.