CN111626287A - Training method and device for recognition network for recognizing Chinese in scene - Google Patents

Training method and device for recognition network for recognizing Chinese in scene Download PDF

Info

Publication number
CN111626287A
CN111626287A CN201910146791.1A CN201910146791A CN111626287A CN 111626287 A CN111626287 A CN 111626287A CN 201910146791 A CN201910146791 A CN 201910146791A CN 111626287 A CN111626287 A CN 111626287A
Authority
CN
China
Prior art keywords
corpus
chinese
scene
recognition network
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910146791.1A
Other languages
Chinese (zh)
Inventor
郜业飞
董健
颜水成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201910146791.1A priority Critical patent/CN111626287A/en
Publication of CN111626287A publication Critical patent/CN111626287A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Abstract

The invention provides a training method and a training device for a recognition network for recognizing Chinese in a scene. The method comprises the following steps: randomly generating a first corpus sample by using common Chinese characters; synthesizing the first corpus sample and a first background image to obtain a first synthesized scene image sample containing a Chinese character area; and training a recognition network for recognizing Chinese in the scene by using the first synthesized scene image sample. Because the occurrence probability of the common Chinese characters tends to be uniform in the randomly generated corpus samples, when the recognition network is trained by using the scene image samples synthesized based on the randomly generated corpus samples, the frequency that the recognition network can see all the common Chinese characters also tends to be consistent, thereby solving the problem of long-tail distribution of the Chinese characters to a certain extent and improving the recognition effect of the Chinese characters in the scene.

Description

Training method and device for recognition network for recognizing Chinese in scene
Technical Field
The invention relates to the technical field of image recognition, in particular to a training method for a recognition network for recognizing Chinese in a scene, a training device for the recognition network for recognizing Chinese in the scene, a computer storage medium and computing equipment.
Background
At present, the deep learning technology is widely applied to the field of graphic images. OCR (Optical character recognition) is a key link for interaction between electronic devices and the external environment in life, and is widely used in a plurality of application scenarios, such as license plate recognition, street view recognition, network image/video monitoring, and the like. And due to the introduction of deep learning, the OCR recognition precision is obviously improved, and the commercial product output of the related technology is promoted.
Nowadays, the application of a scene character recognition model based on deep learning in English character recognition is widely researched by scholars at home and abroad, and a good recognition effect is achieved. However, because chinese has the characteristics of no special interval between characters, rich number of characters, similar character patterns, long-tail distribution of corpus and the like, it is difficult to achieve the expectation by directly migrating the english recognition scheme to the chinese environment for chinese scene character recognition.
Therefore, a method for improving the long-tailed word problem of the character recognition in the Chinese scene so as to improve the recognition effect of the Chinese characters in the scene is needed.
Disclosure of Invention
In view of the above, the present invention has been made to provide a training method of a recognition network for recognizing chinese in a scene, a training apparatus of a recognition network for recognizing chinese in a scene, a computer storage medium, and a computing device that overcome or at least partially solve the above problems.
According to an aspect of the embodiments of the present invention, there is provided a training method for a recognition network for recognizing chinese in a scene, including:
randomly generating a first corpus sample by using common Chinese characters;
synthesizing the first corpus sample and a first background image to obtain a first synthesized scene image sample containing a Chinese character area;
and training a recognition network for recognizing Chinese in the scene by using the first synthesized scene image sample.
Optionally, in the first corpus sample, the frequency of occurrence of each chinese character is controllable.
Optionally, in the first corpus sample, the occurrence frequencies of all chinese characters are controlled to be equal.
Optionally, before randomly generating the first corpus sample using the common chinese characters, the method further comprises:
and acquiring the common Chinese characters from the codebook for Chinese character input.
Optionally, the method further comprises:
obtaining a corpus with real semantic information;
synthesizing the corpus with the real semantic information and a second background image to obtain a second synthesized scene image sample containing a Chinese character area;
training the recognition network using the second synthetic scene image sample.
Optionally, the first background image is the same as the second background image.
Optionally, obtaining the corpus with the real semantic information includes:
and intercepting characters with specific length from a text material containing natural semantics as the corpus with real semantic information.
Optionally, the method further comprises:
acquiring real scene image data;
and adjusting parameters of the identification network by using the real scene image data.
Optionally, acquiring real scene image data includes:
and marking the real scene image, and cutting out a Chinese character area in the real scene image.
Optionally, the recognition network is used to recognize chinese within a natural scene.
According to another aspect of the embodiments of the present invention, there is also provided a training apparatus for recognizing a recognition network for recognizing chinese in a scene, including:
the random corpus generating module is suitable for randomly generating a first corpus sample by utilizing common Chinese characters;
the image sample synthesis module is suitable for synthesizing the first corpus sample and a first background image to obtain a first synthesis scene image sample containing a Chinese character area; and
and the recognition network training module is suitable for training a recognition network for recognizing Chinese in the scene by using the first synthetic scene image sample.
Optionally, in the first corpus sample, the frequency of occurrence of each chinese character is controllable.
Optionally, in the first corpus sample, the occurrence frequencies of all chinese characters are controlled to be equal.
Optionally, the random corpus generating module is further adapted to:
the method includes obtaining common Chinese characters from a codebook for Chinese character input before randomly generating first corpus samples with the common Chinese characters.
Optionally, the apparatus further comprises:
the real corpus acquiring module is suitable for acquiring a corpus with real semantic information;
the image sample synthesis module is further adapted to:
synthesizing the corpus with the real semantic information and a second background image to obtain a second synthesized scene image sample containing a Chinese character area;
the recognition network training module is further adapted to:
training the recognition network using the second synthetic scene image sample.
Optionally, the first background image is the same as the second background image.
Optionally, the real corpus acquiring module is further adapted to:
and intercepting characters with specific length from a text material containing natural semantics as the corpus with real semantic information.
Optionally, the apparatus further comprises:
the real scene data acquisition module is suitable for acquiring real scene image data; and
and the identification network adjusting module is suitable for adjusting the parameters of the identification network by utilizing the real scene image data.
Optionally, the real scene data obtaining module is further adapted to:
and marking the real scene image, and cutting out a Chinese character area in the real scene image.
Optionally, the recognition network is used to recognize chinese within a natural scene.
According to yet another aspect of the embodiments of the present invention, there is also provided a computer storage medium storing computer program code which, when run on a computing device, causes the computing device to execute the training method for recognition network of chinese in a recognition scenario according to any one of the above.
According to still another aspect of the embodiments of the present invention, there is also provided a computing device including:
a processor; and
a memory storing computer program code;
the computer program code, when executed by the processor, causes the computing device to perform a method of training a recognition network for recognizing chinese within a scene according to any of the above.
According to the training method and device for the recognition network for recognizing the Chinese characters in the scene, disclosed by the embodiment of the invention, the corpus samples are randomly generated by utilizing the common Chinese characters, the obtained corpus samples are synthesized with the background image to obtain the synthesized scene image samples containing the Chinese character areas, and then the synthesized scene image samples are utilized to train the recognition network. Because only a small part of common Chinese characters frequently appear in natural corpus information, and other Chinese characters rarely or even do not appear (so-called long-tail distribution), if the natural corpus information material is used to train the recognition network, a good recognition effect on Chinese characters with low occurrence frequency in the corpus cannot be obtained. In the randomly generated corpus samples, the occurrence probability of the common Chinese characters tends to be uniform, and further, when the recognition network is trained by using the scene image samples synthesized based on the randomly generated corpus samples, the frequency of the recognition network for all the common Chinese characters tends to be consistent, so that the problem of long-tail distribution of the Chinese characters is solved to a certain extent, and the recognition effect of the Chinese characters in the scene is improved.
Furthermore, the occurrence frequency of each Chinese character in the corpus samples which are randomly synthesized is controlled, particularly the occurrence frequency of all the Chinese characters is controlled to be equal, and the problem of long-tail distribution of Chinese characters is further effectively solved.
Furthermore, after the recognition network is trained in the first stage by using the scene image sample synthesized based on the corpus sample generated randomly, the recognition network can be trained in the second stage by using the scene image sample synthesized based on the corpus with real semantic information, and finally, the recognition network is finely tuned by using real scene image data. By the multi-stage training strategy, the generalization capability of the recognition network and the recognition effect of Chinese characters in scenes are further improved. The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 illustrates a flow diagram of a training method of a recognition network to recognize Chinese within a scene, according to one embodiment of the invention;
FIG. 2 illustrates a flow diagram of a training method of a recognition network to recognize Chinese within a scene according to another embodiment of the invention;
FIG. 3 is a schematic diagram of a training apparatus for recognizing a recognition network of Chinese in a scene according to an embodiment of the present invention; and
FIG. 4 is a schematic structural diagram of a training apparatus for recognizing a recognition network of Chinese in a scene according to another embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
At present, the mainstream scene character recognition scheme is to extract the characteristics of an image character region by using a CRNN (Convolutional Neural Network), and the process combines the extraction of character space characteristic information in an image by a CNN (Convolutional Neural Network) and the encoding capability of the RNN (Convolutional Neural Network) on time sequence information. Further, a CTC (connection Temporal Classification) network is used to decode the encoding result of the character region to obtain the corresponding text information.
In the field, English character recognition is widely researched by scholars at home and abroad, a plurality of recognition schemes are successively proposed, and good recognition results are obtained. For an english scene, there are only 26 english letters, even if numbers are added, the total number is only a few tens, and there are spaces between each word in english. For a Chinese scenario, however, Chinese is a square word, and in a sentence, the distinction between words is not obvious (especially in the case of similar characters) and there is no obvious space between spaces. In particular, although about 5000-. In summary, the recognition of Chinese scene characters has the characteristics of no special interval between characters, rich characters, similar characters and shapes, and long-tail distribution of linguistic materials, so that the English recognition scheme is directly migrated to the Chinese environment and is difficult to achieve the expectation.
In order to solve the above technical problem, an embodiment of the present invention provides a training method for a recognition network that recognizes chinese in a scene. FIG. 1 illustrates a flow diagram of a training method for recognition networks that recognize Chinese within a scene, according to one embodiment of the invention. Referring to fig. 1, the method may include at least the following steps S102 to S106.
Step S102, a first corpus sample is randomly generated by using common Chinese characters.
Step S104, synthesizing the first corpus sample and the first background image to obtain a first synthesized scene image sample containing the Chinese character area.
And step S106, training a recognition network for recognizing Chinese in the scene by using the first synthetic scene image sample.
In the embodiment of the invention, the recognition network is a deep learning network, and the Chinese characters in a natural scene are mainly recognized by adopting an architecture of CRNN combined with CTC.
According to the training method and device for the recognition network for recognizing the Chinese characters in the scene, disclosed by the embodiment of the invention, the corpus samples are randomly generated by utilizing the common Chinese characters, the obtained corpus samples are synthesized with the background image to obtain the synthesized scene image samples containing the Chinese character areas, and then the synthesized scene image samples are utilized to train the recognition network. Because the occurrence probability of the common Chinese characters tends to be uniform in the randomly generated corpus samples, when the recognition network is trained by using the scene image samples synthesized based on the randomly generated corpus samples, the frequency that the recognition network can see all the common Chinese characters also tends to be consistent, thereby solving the problem of long-tail distribution of the Chinese characters to a certain extent and improving the recognition effect of the Chinese characters in the scene.
In the above step S102, the first corpus sample is generated by randomly combining the common chinese characters. In order to make the distribution of the Chinese characters in the generated corpus sample tend to be uniform, enough common Chinese characters can be used, for example, 5000-.
Alternatively, the common chinese characters may be obtained from a codebook used for chinese character input (e.g., a codebook of the chinese input method for huntington). Preferably, the frequently used Chinese characters with the frequency before are used in the codebook are selected.
In a preferred embodiment, to further solve the problem of corpus long-tail distribution, the occurrence frequency of each chinese character in the randomly generated first corpus sample may be controlled, so that the distribution of characters in the corpus is as desired.
Further, the frequency of occurrence of all the Chinese characters in the first corpus sample is controlled to be equal, thereby achieving uniform distribution of the Chinese characters in the corpus.
In step S104, the first background image may be an image of a real scene without text, and the first corpus sample is fused into the first background image to obtain a first synthesized scene image sample.
Further, in step S106, the obtained first synthesized scene image sample is used to train the recognition network, so that the frequency that the recognition network can see all commonly used chinese characters is consistent during the training process, and thus when the trained recognition network is used to recognize chinese characters in the scene, a better and more accurate recognition effect on chinese characters (especially chinese characters with a lower frequency of use) can be achieved.
In an optional embodiment of the present invention, after training the recognition network by using the first synthesized scene image sample synthesized based on the randomly generated first corpus sample, the following steps may be further performed:
first, a corpus with true semantic information is obtained. Then, the corpus with the real semantic information is synthesized with a second background image to obtain a second synthesized scene image sample containing the Chinese character area. And finally, training the recognition network by utilizing a second synthesized scene image sample.
After the recognition network is trained (not called as first-stage training) by using the scene image sample synthesized based on the corpus sample generated randomly, the recognition network is trained (not called as second-stage training) by using the scene image sample synthesized based on the corpus with real semantic information, so that the effect of Chinese recognition can be further improved.
Optionally, to simplify the synthesis of the scene image samples and the training operation of the recognition network, the second background image may employ the same image of the real scene as the first background image.
In practical applications, there are various ways to obtain corpora with real semantic information. For example, words of a certain length may be cut out from text material containing natural semantics as corpus with true semantic information. The text material may be, for example, news, books, etc.
In an optional embodiment of the present invention, after training the recognition network by using the first synthesized scene image sample synthesized based on the randomly generated first corpus sample, or after training the recognition network by using the second synthesized scene image sample synthesized based on the corpus with the real semantic information, the following steps may be further performed:
and acquiring real scene image data, and further performing parameter adjustment on the identification network by using the real scene image data.
Further, the real scene image data may be obtained by:
and marking the real scene image, and cutting out a Chinese character area in the real scene image.
The parameters of the recognition network are finely adjusted by adopting the data set containing the real scene image of Chinese, so that the generalization capability of the recognition network is improved, and the Chinese character recognition effect is further improved.
In the above, various implementation manners of each link of the embodiment shown in fig. 1 are introduced, and the implementation process of the training method for identifying a network for identifying chinese in an identification scene according to the present invention will be described in detail through specific embodiments.
FIG. 2 is a flow chart illustrating a training method for recognition networks for recognizing Chinese in a scene according to an embodiment of the present invention. In this embodiment, the recognition network is a deep learning network, and a CRNN architecture is adopted in combination with the CTC architecture. Referring to fig. 2, the method may include at least the following steps S202 to S216.
Step S202, common Chinese characters are obtained from a codebook for Chinese character input, and a first corpus sample is randomly generated by the common Chinese characters, wherein the occurrence frequency of all Chinese characters in the first corpus sample is controlled to be equal.
Step S204, synthesizing the first corpus sample and the first background image to obtain a first synthesized scene image sample containing the Chinese character area.
Step S206, a first stage of training is carried out on the recognition network for recognizing Chinese in the natural scene by using the first synthetic scene image sample.
Step S208, intercepting characters with specific length from the text material containing natural semantics as the corpus with real semantic information.
The text material is, for example, a news material, a book, or the like.
Step S210, synthesizing the corpus with the real semantic information and a second background image to obtain a second synthesized scene image sample containing the chinese text region, wherein the second background image is the same as the first background image.
Step S212, a second stage of training is performed on the recognition network by using a second synthetic scene image sample.
Step S214, labeling the real scene image, and cutting out a Chinese character area in the real scene image to obtain a real scene image data set.
Step S216, the real scene image data set is used for carrying out parameter fine adjustment on the identification network.
In the embodiment, the problem of long-tailed words in Chinese scene character recognition is effectively solved through a multi-stage training strategy, and the recognition effect of Chinese characters in natural scenes is improved.
Based on the same inventive concept, the embodiment of the invention also provides a training device for identifying the recognition network of the Chinese in the scene, which is used for supporting the training method for identifying the recognition network of the Chinese in the scene provided by any one of the embodiments or the combination thereof. FIG. 3 is a schematic diagram of a training apparatus 300 for recognizing a recognition network of Chinese in a scene according to an embodiment of the present invention. Referring to fig. 3, the apparatus 300 may include at least: a random corpus generation module 310, an image sample synthesis module 320, and a recognition network training module 330.
Now, the functions of the components or devices of the training apparatus 300 for identifying a chinese recognition network in a scene and the connection relationship between the components will be described:
the random corpus generating module 310 is adapted to randomly generate a first corpus sample using the commonly used Chinese characters.
The image sample synthesizing module 320 is connected to the random corpus generating module 310, and is adapted to synthesize the first corpus sample and the first background image to obtain a first synthesized scene image sample containing a chinese text region.
And the recognition network training module 330 is connected with the image sample synthesis module 320 and is adapted to train a recognition network for recognizing Chinese in the scene by using the first synthesized scene image sample.
In an alternative embodiment of the present invention, the frequency of occurrence of each Chinese character in the obtained first corpus sample is controllable.
Further, in the obtained first corpus sample, the frequency of occurrence of all chinese characters is controlled to be equal.
In an optional embodiment of the present invention, the random corpus generating module 310 is further adapted to:
the common Chinese characters are obtained from a codebook for Chinese character input before randomly generating the first corpus samples with the common Chinese characters.
In an alternative embodiment of the present invention, as shown in fig. 4, the training apparatus 300 for recognizing a chinese recognition network in a scene illustrated in fig. 3 may further include a real corpus acquiring module 340. The real corpus acquiring module 340 may be connected to the image sample synthesizing module 320, and is adapted to acquire corpus with real semantic information. Accordingly, the image sample synthesis module 320 is further adapted to: and synthesizing the corpus with the real semantic information and the second background image to obtain a second synthesized scene image sample containing the Chinese character area. The recognition network training module 330 is further adapted to: and training the recognition network by using the second synthesized scene image sample.
In an alternative embodiment of the invention, the first background image is the same as the second background image.
In an optional embodiment of the present invention, the real corpus acquiring module 340 is further adapted to:
and intercepting characters with specific length from a text material containing natural semantics as a corpus with real semantic information.
In an alternative embodiment of the present invention, still referring to fig. 4, the training apparatus 300 for recognizing chinese in a scene may further include a real scene data obtaining module 350 and a recognition network adjusting module 360. The real scene data acquisition module 350 is adapted to acquire real scene image data. The recognition network adjusting module 360 may be connected to the real scene data acquiring module 350 and the recognition network training module 330, respectively, and is adapted to perform parameter adjustment on the recognition network by using the real scene image data.
In an optional embodiment of the invention, the real scene data acquisition module 350 is further adapted to:
and marking the real scene image, and cutting out a Chinese character area in the real scene image.
In an alternative embodiment of the invention, the recognition network is used to recognize Chinese within natural scenes.
Based on the same inventive concept, the embodiment of the invention also provides a computer storage medium. The computer storage medium stores computer program code that, when run on a computing device, causes the computing device to perform a method of training a recognition network for recognizing chinese within a scene according to any one or combination of the above embodiments.
Based on the same inventive concept, the embodiment of the invention also provides the computing equipment. The computing device may include:
a processor; and
a memory storing computer program code;
the computer program code, when executed by a processor, causes the computing device to perform a method of training a recognition network for recognizing chinese within a scene according to any one or combination of the above embodiments.
According to any one or a combination of multiple optional embodiments, the embodiment of the present invention can achieve the following advantages:
according to the training method and device for the recognition network for recognizing the Chinese characters in the scene, disclosed by the embodiment of the invention, the corpus samples are randomly generated by utilizing the common Chinese characters, the obtained corpus samples are synthesized with the background image to obtain the synthesized scene image samples containing the Chinese character areas, and then the synthesized scene image samples are utilized to train the recognition network. Because only a small part of common Chinese characters frequently appear in natural corpus information, and other Chinese characters rarely or even do not appear (so-called long-tail distribution), if the natural corpus information material is used to train the recognition network, a good recognition effect on Chinese characters with low occurrence frequency in the corpus cannot be obtained. In the randomly generated corpus samples, the occurrence probability of the common Chinese characters tends to be uniform, and further, when the recognition network is trained by using the scene image samples synthesized based on the randomly generated corpus samples, the frequency of the recognition network for all the common Chinese characters tends to be consistent, so that the problem of long-tail distribution of the Chinese characters is solved to a certain extent, and the recognition effect of the Chinese characters in the scene is improved.
Furthermore, the occurrence frequency of each Chinese character in the corpus samples which are randomly synthesized is controlled, particularly the occurrence frequency of all the Chinese characters is controlled to be equal, and the problem of long-tail distribution of Chinese characters is further effectively solved.
Furthermore, after the recognition network is trained in the first stage by using the scene image sample synthesized based on the corpus sample generated randomly, the recognition network can be trained in the second stage by using the scene image sample synthesized based on the corpus with real semantic information, and finally, the recognition network is finely tuned by using real scene image data. By the multi-stage training strategy, the generalization capability of the recognition network and the recognition effect of Chinese characters in scenes are further improved.
It is clear to those skilled in the art that the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and for the sake of brevity, further description is omitted here.
In addition, the functional units in the embodiments of the present invention may be physically independent of each other, two or more functional units may be integrated together, or all the functional units may be integrated in one processing unit. The integrated functional units may be implemented in the form of hardware, or in the form of software or firmware.
Those of ordinary skill in the art will understand that: the integrated functional units, if implemented in software and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computing device (e.g., a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention when the instructions are executed. And the aforementioned storage medium includes: u disk, removable hard disk, Read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disk, and other various media capable of storing program code.
Alternatively, all or part of the steps of implementing the foregoing method embodiments may be implemented by hardware (such as a computing device, e.g., a personal computer, a server, or a network device) associated with program instructions, which may be stored in a computer-readable storage medium, and when the program instructions are executed by a processor of the computing device, the computing device executes all or part of the steps of the method according to the embodiments of the present invention.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments can be modified or some or all of the technical features can be equivalently replaced within the spirit and principle of the present invention; such modifications or substitutions do not depart from the scope of the present invention.
According to an aspect of the embodiments of the present invention, a method for training a recognition network for recognizing chinese in a scene is provided, including:
randomly generating a first corpus sample by using common Chinese characters;
synthesizing the first corpus sample and a first background image to obtain a first synthesized scene image sample containing a Chinese character area;
and training a recognition network for recognizing Chinese in the scene by using the first synthesized scene image sample.
A2. The method according to a1, wherein the frequency of occurrence of each chinese character in the first corpus sample is controllable.
A3. The method according to a2, wherein the frequency of occurrence of all chinese characters in the first corpus sample is controlled to be equal.
A4. The method according to any one of A1-A3, wherein before randomly generating the first corpus sample with common Chinese characters, further comprising:
and acquiring the common Chinese characters from the codebook for Chinese character input.
A5. The method of any one of a1-a4, further comprising:
obtaining a corpus with real semantic information;
synthesizing the corpus with the real semantic information and a second background image to obtain a second synthesized scene image sample containing a Chinese character area;
training the recognition network using the second synthetic scene image sample.
A6. The method of a5, wherein the first background image is the same as the second background image.
A7. The method according to A5 or A6, wherein obtaining corpus with true semantic information comprises:
and intercepting characters with specific length from a text material containing natural semantics as the corpus with real semantic information.
A8. The method of any one of a1-a7, further comprising:
acquiring real scene image data;
and adjusting parameters of the identification network by using the real scene image data.
A9. The method of A8, wherein acquiring real scene image data comprises:
and marking the real scene image, and cutting out a Chinese character area in the real scene image.
A10. The method of any of a1-a9, wherein the recognition network is used to recognize chinese within a natural scene.
According to another aspect of the embodiments of the present invention, there is also provided a training apparatus for recognizing a recognition network for chinese in a scene, including:
the random corpus generating module is suitable for randomly generating a first corpus sample by utilizing common Chinese characters;
the image sample synthesis module is suitable for synthesizing the first corpus sample and a first background image to obtain a first synthesis scene image sample containing a Chinese character area; and
and the recognition network training module is suitable for training a recognition network for recognizing Chinese in the scene by using the first synthetic scene image sample.
B12. The apparatus of B11, wherein the frequency of occurrence of each chinese character in the first corpus sample is controllable.
B13. The apparatus of B12, wherein the frequency of occurrence of all chinese characters in the first corpus sample is controlled to be equal.
B14. The apparatus of any one of B11-B13, wherein the random corpus generation module is further adapted to:
the method includes obtaining common Chinese characters from a codebook for Chinese character input before randomly generating first corpus samples with the common Chinese characters.
B15. The apparatus of any one of B11-B14, further comprising:
the real corpus acquiring module is suitable for acquiring a corpus with real semantic information;
the image sample synthesis module is further adapted to:
synthesizing the corpus with the real semantic information and a second background image to obtain a second synthesized scene image sample containing a Chinese character area;
the recognition network training module is further adapted to:
training the recognition network using the second synthetic scene image sample.
B16. The apparatus of B15, wherein the first background image is the same as the second background image.
B17. The apparatus of B15 or B16, wherein the real corpus acquisition module is further adapted to:
and intercepting characters with specific length from a text material containing natural semantics as the corpus with real semantic information.
B18. The apparatus of any one of B11-B17, further comprising:
the real scene data acquisition module is suitable for acquiring real scene image data; and
and the identification network adjusting module is suitable for adjusting the parameters of the identification network by utilizing the real scene image data.
B19. The apparatus of B18, wherein the real scene data acquisition module is further adapted to:
and marking the real scene image, and cutting out a Chinese character area in the real scene image.
B20. The apparatus of any of B11-B19, wherein the recognition network is to recognize Chinese within a natural scene.
There is also provided, in accordance with yet another aspect of an embodiment of the present invention, a computer storage medium storing computer program code which, when run on a computing device, causes the computing device to perform a method of training a recognition network for recognizing chinese within a scene as recited in any one of a1-a 10.
There is also provided, in accordance with yet another aspect of an embodiment of the present invention, apparatus for computing, including:
a processor; and
a memory storing computer program code;
the computer program code, when executed by the processor, causes the computing device to perform a method of training a recognition network for recognition of chinese within a scene according to any of a1-a 10.

Claims (10)

1. A training method for a recognition network for recognizing Chinese in a scene comprises the following steps:
randomly generating a first corpus sample by using common Chinese characters;
synthesizing the first corpus sample and a first background image to obtain a first synthesized scene image sample containing a Chinese character area;
and training a recognition network for recognizing Chinese in the scene by using the first synthesized scene image sample.
2. The method of claim 1, wherein the frequency of occurrence of each chinese character in the first corpus sample is controllable.
3. The method according to claim 2, wherein the frequency of occurrence of all chinese characters in the first corpus sample is controlled to be equal.
4. The method according to any one of claims 1-3, wherein prior to randomly generating the first corpus sample using common Chinese characters, further comprising:
and acquiring the common Chinese characters from the codebook for Chinese character input.
5. The method according to any one of claims 1-4, further comprising:
obtaining a corpus with real semantic information;
synthesizing the corpus with the real semantic information and a second background image to obtain a second synthesized scene image sample containing a Chinese character area;
training the recognition network using the second synthetic scene image sample.
6. The method of claim 5, wherein the first background image is the same as the second background image.
7. The method according to claim 5 or 6, wherein obtaining corpus with true semantic information comprises:
and intercepting characters with specific length from a text material containing natural semantics as the corpus with real semantic information.
8. A training apparatus for recognizing a recognition network for chinese in a scene, comprising:
the random corpus generating module is suitable for randomly generating a first corpus sample by utilizing common Chinese characters;
the image sample synthesis module is suitable for synthesizing the first corpus sample and a first background image to obtain a first synthesis scene image sample containing a Chinese character area; and
and the recognition network training module is suitable for training a recognition network for recognizing Chinese in the scene by using the first synthetic scene image sample.
9. A computer storage medium storing computer program code which, when run on a computing device, causes the computing device to perform a method of training a recognition network for chinese within a recognition scenario according to any one of claims 1-7.
10. A computing device, comprising:
a processor; and
a memory storing computer program code;
the computer program code, when executed by the processor, causes the computing device to perform a method of training a recognition network for recognition of chinese within a scene according to any of claims 1-7.
CN201910146791.1A 2019-02-27 2019-02-27 Training method and device for recognition network for recognizing Chinese in scene Pending CN111626287A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910146791.1A CN111626287A (en) 2019-02-27 2019-02-27 Training method and device for recognition network for recognizing Chinese in scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910146791.1A CN111626287A (en) 2019-02-27 2019-02-27 Training method and device for recognition network for recognizing Chinese in scene

Publications (1)

Publication Number Publication Date
CN111626287A true CN111626287A (en) 2020-09-04

Family

ID=72271718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910146791.1A Pending CN111626287A (en) 2019-02-27 2019-02-27 Training method and device for recognition network for recognizing Chinese in scene

Country Status (1)

Country Link
CN (1) CN111626287A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508108A (en) * 2020-12-10 2021-03-16 西北工业大学 Zero-sample Chinese character recognition method based on etymons
CN114612912A (en) * 2022-03-09 2022-06-10 中译语通科技股份有限公司 Image character recognition method, system and equipment based on intelligent corpus

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112508108A (en) * 2020-12-10 2021-03-16 西北工业大学 Zero-sample Chinese character recognition method based on etymons
CN112508108B (en) * 2020-12-10 2024-01-26 西北工业大学 Zero-sample Chinese character recognition method based on character roots
CN114612912A (en) * 2022-03-09 2022-06-10 中译语通科技股份有限公司 Image character recognition method, system and equipment based on intelligent corpus

Similar Documents

Publication Publication Date Title
Gu et al. Insertion-based decoding with automatically inferred generation order
Singh et al. Full page handwriting recognition via image to sequence extraction
CN104735468B (en) A kind of method and system that image is synthesized to new video based on semantic analysis
CN103678702A (en) Video duplicate removal method and device
CN108319668A (en) Generate the method and apparatus of text snippet
CN110505498B (en) Video processing method, video playing method, video processing device, video playing device and computer readable medium
CN109960815B (en) Method and system for establishing neural machine translation NMT model
CN111143551A (en) Text preprocessing method, classification method, device and equipment
CN105512182A (en) Speech control method and intelligent television
CN111626287A (en) Training method and device for recognition network for recognizing Chinese in scene
CN110555440A (en) Event extraction method and device
CN110019784B (en) Text classification method and device
CN106550268B (en) Video processing method and video processing device
CN110008807B (en) Training method, device and equipment for contract content recognition model
Tymoshenko et al. Real-Time Ukrainian Text Recognition and Voicing.
CN114416926A (en) Keyword matching method and device, computing equipment and computer readable storage medium
CN110796134B (en) Method for combining words of Chinese characters in strong-noise complex background image
CN116909435A (en) Data processing method and device, electronic equipment and storage medium
CN111914566A (en) Automatic comment generation method
CN116881412A (en) Chinese character multidimensional information matching training method and device, electronic equipment and storage medium
US20220199202A1 (en) Method and apparatus for compressing fastq data through character frequency-based sequence reordering
Xue et al. Radical composition network for chinese character generation
CN111506812B (en) Recommended word generation method and device, storage medium and computer equipment
CN114444475A (en) Word segmentation method and device based on corpus
CN114118950A (en) Method and device for arranging consultation scheme based on project

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination