CN113673525A - Method, system and device for generating Chinese text based on image - Google Patents

Method, system and device for generating Chinese text based on image Download PDF

Info

Publication number
CN113673525A
CN113673525A CN202110823454.9A CN202110823454A CN113673525A CN 113673525 A CN113673525 A CN 113673525A CN 202110823454 A CN202110823454 A CN 202110823454A CN 113673525 A CN113673525 A CN 113673525A
Authority
CN
China
Prior art keywords
chinese text
text
image
generating
confrontation network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110823454.9A
Other languages
Chinese (zh)
Inventor
陈志华
黄经赢
刘斌
魏文国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202110823454.9A priority Critical patent/CN113673525A/en
Publication of CN113673525A publication Critical patent/CN113673525A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention relates to the field of computer vision and the technical field of artificial intelligence, and discloses a method, a system and a device for generating a Chinese text based on an image. The method can reduce the characteristic quantity of the generative confrontation network model for one-time learning, reduce the training difficulty of the generative confrontation network model, and can be suitable for more complex images.

Description

Method, system and device for generating Chinese text based on image
Technical Field
The invention relates to the field of computer vision and the technical field of artificial intelligence, in particular to a method, a system and a device for generating a Chinese text based on an image.
Background
Text extraction of natural images has very wide application. The generation of Chinese text from images is achieved in the related art using a generative confrontation network model. The generative confrontation Network model is a generative model and comprises a Generator Network (Generator Network) and a Discriminator Network (Discriminator Network), and the Generator Network and the Discriminator Network compete with each other until equilibrium is reached.
At present, when a Chinese text describing an image is generated by adopting a generative confrontation network model, the whole image is directly input into a generative confrontation network, and for a more complex image, the generative confrontation network model needs to learn more characteristics at one time, so that certain network training difficulty exists.
Disclosure of Invention
The embodiment of the invention aims to provide a method, a system and a device for generating a Chinese text based on an image, wherein the corresponding Chinese text is generated by separating a background region and a foreground region, so that the characteristic quantity of one-time learning of a generative confrontation network model is reduced, the training difficulty of the generative confrontation network model is reduced, and the method, the system and the device can be suitable for more complex images.
In order to achieve the purpose, the invention adopts the following technical scheme:
the application provides a method for generating a Chinese text from an image in a first aspect, and the method comprises the following steps:
segmenting an image of a Chinese text to be generated into a foreground region and a background region;
generating a first Chinese text for describing the foreground region and a second Chinese text for describing the background region by using a generative confrontation network;
and generating a Chinese text for describing the image according to the first Chinese text and the second Chinese text.
According to one possible implementation of the first aspect of the present application, the first chinese text is generated based on a trained first generative confrontation network model, and the second chinese text is generated based on a trained second generative confrontation network model.
According to one possible implementation of the first aspect of the present application, a mechanism of attention is added to the first generative confrontation network model when the first generative confrontation network model is trained, and/or,
adding an attention mechanism in the second generative confrontation network model when training the second generative confrontation network model.
According to an implementable manner of the first aspect of the present application, generating a chinese text for describing the image from the first chinese text and the second chinese text comprises:
splicing the first Chinese text and the second Chinese text into a third text;
and adjusting the third text according to a preset expression sentence pattern structure to generate an adjusted text conforming to one of the preset expression sentence patterns, and taking the adjusted text as the Chinese text for describing the image.
A second aspect of the present application provides an image generation chinese text system, the system comprising:
the image segmentation module is used for segmenting an image of the Chinese text to be generated into a foreground region and a background region;
a first generation module, configured to generate a first chinese text describing the foreground region and a second chinese text describing the background region by using a generative confrontation network;
and the second generating module is used for generating a Chinese text for describing the image according to the first Chinese text and the second Chinese text.
According to an implementable manner of the second aspect of the present application, the first generating module comprises:
the first Chinese text generation unit is used for generating a first Chinese text based on a trained first generative confrontation network model;
and the second Chinese text generation unit is used for generating the second Chinese text based on the trained second generative confrontation network model.
According to an enabling manner of the second aspect of the present application, the system further comprises a training module, the training module comprising:
a first training unit for adding an attentive force mechanism to the first generative confrontation network model when training the first generative confrontation network model, and/or,
and the second training unit is used for adding an attention mechanism in the second generative confrontation network model when the second generative confrontation network model is trained.
According to an implementable manner of the second aspect of the present application, the second generating module comprises:
the splicing unit is used for splicing the first Chinese text and the second Chinese text into a third text;
and the adjusting unit is used for adjusting the third text according to a preset expression sentence pattern structure to generate an adjusting text conforming to one of the preset expression sentence patterns, and taking the adjusting text as the Chinese text for describing the image.
A third aspect of the present application provides an apparatus for generating chinese text based on an image, the apparatus comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing a method for generating chinese text based on an image as described in any one of the above embodiments when executing the computer program.
A fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed, implements a method of generating chinese text from an image as in any one of the embodiments above.
The embodiments disclosed in the present application have at least the following advantages:
the method is simple, convenient and easy to implement, the corresponding Chinese texts are generated by separating the background region and the foreground region, the characteristic quantity of the generative confrontation network model for one-time learning is reduced, the training difficulty of the generative confrontation network model is reduced, and the method can be suitable for more complex images.
Drawings
FIG. 1 is a schematic flow chart of a preferred embodiment of a method for generating Chinese text from images according to the present invention;
FIG. 2 is a schematic structural diagram of a preferred embodiment of an image-generating Chinese text system according to the present invention.
Reference numerals:
the image segmentation module 1, the first generation module 2 and the second generation module 3.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a schematic flow chart of a method for generating a chinese text from an image according to a preferred embodiment of the present invention.
As shown in fig. 1, the method includes:
s1 segments an image of the chinese text to be generated into a foreground region and a background region.
Specifically, the foreground region and the background region in the image can be distinguished according to the color information and the brightness information of the image pixels, so that the foreground region and the background region can be segmented.
S2 generates a first chinese text describing the foreground region and a second chinese text describing the background region using the generative confrontation network.
In one embodiment, the step S2 includes:
generating the first and second Chinese texts based on a trained first generative confrontation network model.
In another embodiment, the step S2 includes:
the first Chinese text is generated based on a trained first generative confrontation network model, and the second Chinese text is generated based on a trained second generative confrontation network model.
And carrying out countermeasure optimization training on the generated countermeasure network model through a training image of a known standard Chinese text to obtain the trained generated countermeasure network model. The generative confrontation network model comprises a generative model and a discrimination model, wherein the generative model is used for generating a corresponding Chinese text according to an image, and the discrimination model is used for discriminating whether the generated Chinese text is real data.
In particular, the generative model may employ an image encoder-text decoder architecture, the image encoder comprising a FasterR-CNN neural network, the text decoder comprising a two-layer LSTM network.
Based on this another embodiment, the step S2 further includes:
adding a mechanism of attention to the first generative confrontation network model while training the first generative confrontation network model, and/or,
adding an attention mechanism in the second generative confrontation network model when training the second generative confrontation network model.
In the present embodiment, the attention mechanism includes, for example, two aspects: it is decided which part of the input needs to be taken care of and limited information processing resources are allocated to the important part. The introduction of a mechanism of attention to the second image feature matrix may highlight more critical image portions of the second image feature matrix.
According to the embodiment of the application, by introducing the attention mechanism, the network can pay more attention to the information of the important area when generating the Chinese text, the redundancy of the network can be reduced, and the speed of generating the Chinese text is increased.
S3 generates a chinese text describing the image from the first chinese text and the second chinese text.
Wherein the step S3 includes:
splicing the first Chinese text and the second Chinese text into a third text;
and adjusting the third text according to a preset expression sentence pattern structure to generate an adjusted text conforming to one of the preset expression sentence patterns, and taking the adjusted text as the Chinese text for describing the image.
According to the method and the device, the Chinese text for describing the image is generated according to the first Chinese text and the second Chinese text, and the method is simple and convenient to implement. The text is adjusted through the pre-preset sentence expression structure, so that the generated text can better accord with the Chinese language expression habit.
Wherein the first Chinese text and the second Chinese text can be spliced according to a preset splicing mechanism. For example, the stitching mechanism is to order the first Chinese text before the second Chinese text.
The preset expression sentence pattern comprises a main predicate structure sentence pattern, a main predicate object structure sentence pattern, a main form predicate object structure sentence pattern and a form main predicate object structure sentence pattern.
Specifically, since the composition in the sentence may include: in the step, the preset expression sentence pattern can be constructed by matching the composition components in the 'subject, predicate, object, fixed term and complement' in any number and in any order according to the requirement of the description image.
In a specific application scenario, for example, the first chinese text is "ship sailing", the second chinese text is "sea", and the first chinese text and the second chinese text may be concatenated to be "ship sailing sea", at which time the text of "ship sailing sea" needs to be adjusted, an appropriate expression sentence structure is selected, and the text is adjusted to be "ship sailing sea".
The embodiment of the application provides a method for generating a Chinese text from an image, which divides the image to be generated into a background area and a foreground area, respectively generates a first Chinese text corresponding to the foreground area and a second Chinese text corresponding to the foreground area by using a generative confrontation network, and further generates the Chinese text for describing the image based on the first Chinese text and the second Chinese text. The method is simple, convenient and easy to implement, the corresponding Chinese texts are generated by separating the background area and the foreground area, the characteristic quantity of the generative confrontation network model for one-time learning is reduced, the training difficulty of the generative confrontation network model is reduced, and the method can be suitable for more complex images.
The embodiment of the second aspect of the application provides a system for generating Chinese text based on images.
Fig. 2 is a schematic structural diagram of a preferred embodiment of the image-generating chinese text system according to the present invention, which can implement the entire process of a method for generating chinese text from an image according to any of the above embodiments.
As shown in fig. 2, the system includes:
the image segmentation module 1 is used for segmenting an image of a Chinese text to be generated into a foreground region and a background region;
a first generating module 2, configured to generate a first chinese text for describing the foreground region and a second chinese text for describing the background region by using a generative confrontation network;
and the second generating module 3 is used for generating a Chinese text for describing the image according to the first Chinese text and the second Chinese text.
According to an implementable manner of the second aspect of the present application, the first generating module comprises:
the first Chinese text generation unit is used for generating a first Chinese text based on a trained first generative confrontation network model;
and the second Chinese text generation unit is used for generating the second Chinese text based on the trained second generative confrontation network model.
According to an enabling manner of the second aspect of the present application, the system further comprises a training module, the training module comprising:
a first training unit for adding an attentive force mechanism to the first generative confrontation network model when training the first generative confrontation network model, and/or,
and the second training unit is used for adding an attention mechanism in the second generative confrontation network model when the second generative confrontation network model is trained.
According to an implementable manner of the second aspect of the present application, the second generating module comprises:
the splicing unit is used for splicing the first Chinese text and the second Chinese text into a third text;
and the adjusting unit is used for adjusting the third text according to a preset expression sentence pattern structure to generate an adjusting text conforming to one of the preset expression sentence patterns, and taking the adjusting text as the Chinese text for describing the image.
The functions and implementation manners of the modules in the embodiment of the system are the same as those in the embodiment of the method for generating the Chinese text by using the image, and specific analysis can refer to the embodiment of the method for generating the Chinese text by using the image, so that repeated description is avoided.
The present application further provides an apparatus for generating chinese text based on an image, the apparatus comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing a method for generating chinese text based on an image as described in any one of the above embodiments when executing the computer program.
The present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed, the method for generating a chinese text from an image according to any one of the embodiments described above is implemented.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is the control center for the image-generating chinese text device, with various interfaces and lines connecting the various parts of the entire image-generating chinese text device.
The memory may be used to store the computer programs and/or modules, and the processor may implement the various functions of the image-generating chinese text apparatus by running or executing the computer programs and/or modules stored in the memory and calling the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the module/unit integrated with the image generating Chinese text device can be stored in a computer readable storage medium if it is realized in the form of software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
The foregoing is a preferred embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations are also regarded as the protection scope of the present application.

Claims (10)

1. A method for generating chinese text from an image, the method comprising:
segmenting an image of a Chinese text to be generated into a foreground region and a background region;
generating a first Chinese text for describing the foreground region and a second Chinese text for describing the background region by using a generative confrontation network;
and generating a Chinese text for describing the image according to the first Chinese text and the second Chinese text.
2. The method of claim 1, wherein the method comprises:
the first Chinese text is generated based on a trained first generative confrontation network model, and the second Chinese text is generated based on a trained second generative confrontation network model.
3. The method of claim 2, wherein the method comprises:
adding a mechanism of attention to the first generative confrontation network model while training the first generative confrontation network model, and/or,
adding an attention mechanism in the second generative confrontation network model when training the second generative confrontation network model.
4. The method of claim 1, wherein the method comprises:
splicing the first Chinese text and the second Chinese text into a third text;
and adjusting the third text according to a preset expression sentence pattern structure to generate an adjusted text conforming to one of the preset expression sentence patterns, and taking the adjusted text as the Chinese text for describing the image.
5. An image-generating chinese text system, the system comprising:
the image segmentation module is used for segmenting an image of the Chinese text to be generated into a foreground region and a background region;
a first generation module, configured to generate a first chinese text describing the foreground region and a second chinese text describing the background region by using a generative confrontation network;
and the second generating module is used for generating a Chinese text for describing the image according to the first Chinese text and the second Chinese text.
6. An image generation chinese text system according to claim 5, wherein the first generation module comprises:
the first Chinese text generation unit is used for generating a first Chinese text based on a trained first generative confrontation network model;
and the second Chinese text generation unit is used for generating the second Chinese text based on the trained second generative confrontation network model.
7. The image-generating chinese text system of claim 6, further comprising a training module, the training module comprising:
a first training unit for adding an attentive force mechanism to the first generative confrontation network model when training the first generative confrontation network model, and/or,
and the second training unit is used for adding an attention mechanism in the second generative confrontation network model when the second generative confrontation network model is trained.
8. The system of claim 5, wherein the second generating module comprises:
the splicing unit is used for splicing the first Chinese text and the second Chinese text into a third text;
and the adjusting unit is used for adjusting the third text according to a preset expression sentence pattern structure to generate an adjusting text conforming to one of the preset expression sentence patterns, and taking the adjusting text as the Chinese text for describing the image.
9. An image-generating chinese text apparatus comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, when executing the computer program, implementing an image-generating chinese text method as claimed in any one of claims 1 to 4.
10. A computer-readable storage medium, in which a computer program is stored which, when executed, implements a method of generating chinese text from an image as claimed in any one of claims 1 to 4.
CN202110823454.9A 2021-07-20 2021-07-20 Method, system and device for generating Chinese text based on image Pending CN113673525A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110823454.9A CN113673525A (en) 2021-07-20 2021-07-20 Method, system and device for generating Chinese text based on image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110823454.9A CN113673525A (en) 2021-07-20 2021-07-20 Method, system and device for generating Chinese text based on image

Publications (1)

Publication Number Publication Date
CN113673525A true CN113673525A (en) 2021-11-19

Family

ID=78539730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110823454.9A Pending CN113673525A (en) 2021-07-20 2021-07-20 Method, system and device for generating Chinese text based on image

Country Status (1)

Country Link
CN (1) CN113673525A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143617A (en) * 2019-12-12 2020-05-12 浙江大学 Automatic generation method and system for picture or video text description
CN111159454A (en) * 2019-12-30 2020-05-15 浙江大学 Picture description generation method and system based on Actor-Critic generation type countermeasure network
CN111507352A (en) * 2020-04-16 2020-08-07 腾讯科技(深圳)有限公司 Image processing method and device, computer equipment and storage medium
CN111859911A (en) * 2020-07-28 2020-10-30 中国平安人寿保险股份有限公司 Image description text generation method and device, computer equipment and storage medium
CN112818159A (en) * 2021-02-24 2021-05-18 上海交通大学 Image description text generation method based on generation countermeasure network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143617A (en) * 2019-12-12 2020-05-12 浙江大学 Automatic generation method and system for picture or video text description
CN111159454A (en) * 2019-12-30 2020-05-15 浙江大学 Picture description generation method and system based on Actor-Critic generation type countermeasure network
CN111507352A (en) * 2020-04-16 2020-08-07 腾讯科技(深圳)有限公司 Image processing method and device, computer equipment and storage medium
CN111859911A (en) * 2020-07-28 2020-10-30 中国平安人寿保险股份有限公司 Image description text generation method and device, computer equipment and storage medium
CN112818159A (en) * 2021-02-24 2021-05-18 上海交通大学 Image description text generation method based on generation countermeasure network

Similar Documents

Publication Publication Date Title
CN110673748B (en) Method and device for providing candidate long sentences in input method
CN111444922A (en) Picture processing method and device, storage medium and electronic equipment
CN113655999B (en) Page control rendering method, device, equipment and storage medium
CN110798636A (en) Subtitle generating method and device and electronic equipment
CN111739027A (en) Image processing method, device and equipment and readable storage medium
CN114332873A (en) Training method and device for recognition model
CN114332150A (en) Handwriting erasing method, device, equipment and readable storage medium
CN109615671A (en) A kind of character library sample automatic generation method, computer installation and readable storage medium storing program for executing
CN113674374B (en) Chinese text image generation method and device based on generation type countermeasure network
CN116310712A (en) Image ink style migration method and system based on cyclic generation countermeasure network
CN112990172A (en) Text recognition method, character recognition method and device
CN114359035A (en) Human body style migration method, device and medium based on generation of confrontation network
CN110070042A (en) Character recognition method, device and electronic equipment
CN110533020A (en) A kind of recognition methods of text information, device and storage medium
KR20210094823A (en) The creating method and apparatus of personal handwriting customized hangul font
CN113673525A (en) Method, system and device for generating Chinese text based on image
US20210182468A1 (en) Using classifications from text to determine instances of graphical element types to include in a template layout for digital media output
CN112419249B (en) Special clothing picture conversion method, terminal device and storage medium
CN110767201A (en) Score generation method, storage medium and terminal equipment
CN114565751A (en) OCR recognition model training method, OCR recognition method and related device
CN112766277A (en) Channel adjustment method, device and equipment of convolutional neural network model
CN111260663A (en) Nasopharyngeal carcinoma focus image segmentation device, equipment and computer readable storage medium
US20230336839A1 (en) Method, computer device, and storage medium for generating video cover
CN112949642B (en) Character generation method and device, storage medium and electronic equipment
CN111428569B (en) Visual recognition method and device for drawing book or teaching material based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211119