CN113673349B - Method, system and device for generating Chinese text by image based on feedback mechanism - Google Patents

Method, system and device for generating Chinese text by image based on feedback mechanism Download PDF

Info

Publication number
CN113673349B
CN113673349B CN202110823453.4A CN202110823453A CN113673349B CN 113673349 B CN113673349 B CN 113673349B CN 202110823453 A CN202110823453 A CN 202110823453A CN 113673349 B CN113673349 B CN 113673349B
Authority
CN
China
Prior art keywords
chinese text
discriminator
generator
loss function
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110823453.4A
Other languages
Chinese (zh)
Other versions
CN113673349A (en
Inventor
陈志华
刘斌
徐省华
魏文国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202110823453.4A priority Critical patent/CN113673349B/en
Publication of CN113673349A publication Critical patent/CN113673349A/en
Application granted granted Critical
Publication of CN113673349B publication Critical patent/CN113673349B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of text generation, and discloses a method, a system and a device for generating a Chinese text by an image based on a feedback mechanism, wherein the method utilizes the feedback mechanism when a generative confrontation network model is trained, obtains a corresponding reference image by using Chinese text description output by a generator, and feeds back the distance between the reference image and a sample image to the confrontation network, so that the generative confrontation network model is gradually optimized in the training process, and the accuracy of the image for generating the Chinese text is improved.

Description

Method, system and device for generating Chinese text by image based on feedback mechanism
Technical Field
The invention relates to the technical field of text generation, in particular to a method, a system and a device for generating a Chinese text by an image based on a feedback mechanism.
Background
As an important research direction in the field of natural language processing, the text generation technology has great application prospect. In the related art, a generative confrontation network model is used to process an image to generate a text description corresponding to the image. The Generative Adaptive Network (GAN) includes two submodels: a generator G and a discriminator D. The generator is used for simulating the distribution of real data, the discriminator is used for judging whether a sample is a real sample or a generated sample, and the training target of the network is to enable the generator to perfectly fit the distribution of the real data, so that the discriminator cannot distinguish.
However, the existing generative confrontation network model only trains the generative confrontation network model by using the sample image, and the accuracy of the text description generated by the trained generative confrontation network model is poor.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a method, a system, and a device for generating a chinese text from an image based on a feedback mechanism, which can gradually optimize a generative confrontation network model described in the chinese text generated from the image in a training process, thereby improving the accuracy of the chinese text generated from the image.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the present application provides a method for generating a chinese text based on an image of a feedback mechanism, the method including:
constructing a generative confrontation network model for generating Chinese text description through an image, wherein the generative confrontation network model comprises a generator and a discriminator;
inputting a sample image of known Chinese text description information into the generator, obtaining the Chinese text description output by the generator, and obtaining a corresponding reference image based on the output Chinese text description, wherein the image characteristics corresponding to the reference image are the same as the image characteristics corresponding to the Chinese text description information;
feeding back the reference image to the discriminator to cause the discriminator to calculate the distance between the sample image and the reference image;
and if the calculated distance is not smaller than a preset distance threshold, adding the distance into an objective function of the generating type confrontation network model, and adjusting the generator and the discriminator based on the objective function so as to guide the generator to generate a vector closer to a true value.
According to an implementable manner provided by the first aspect of the present application, the method further comprises:
constructing a first loss function of the generator according to the distance, and determining a first weighting value of the first loss function;
constructing a second loss function of the generator according to first probability information of the output Chinese text judged to be false by the discriminator, and determining a second weighted value of the second loss function;
constructing a loss function for the generator based on the first loss function, the second loss function, the first weighting value, and the second weighting value.
According to an implementable manner provided by the first aspect of the present application, the method further comprises:
and constructing a loss function of the discriminator according to the probability information of the discriminator for discriminating the output Chinese text as true, and constructing the target function according to the loss function of the generator and the loss function of the discriminator.
According to an implementable manner provided by the first aspect of the present application, the method further comprises:
the discriminator adopts a convolutional neural network to extract strongest semantic information, adds an attention mechanism to an input layer of the convolutional neural network to extract semantic information containing context, and further determines the probability of discriminating the output Chinese text to be true according to the strongest semantic information and the semantic information containing the context.
A second aspect of the present application provides a system for generating chinese text based on images of a feedback mechanism, the system comprising:
the model construction module is used for constructing a generative confrontation network model for generating Chinese text description through images, and the generative confrontation network model comprises a generator and a discriminator;
the generating module is used for inputting a sample image of known Chinese text description information into the generator, obtaining the Chinese text description output by the generator, and obtaining a corresponding reference image based on the output Chinese text description, wherein the image characteristics corresponding to the reference image are the same as the image characteristics corresponding to the Chinese text description information;
a feedback module, configured to feed back the reference image to the discriminator so that the discriminator calculates a distance between the sample image and the reference image;
and the adjusting module is used for adding the distance into an objective function of the generating type confrontation network model when the calculated distance is not smaller than a preset distance threshold, and adjusting the generator and the discriminator based on the objective function so as to guide the generator to generate a vector closer to a true value.
According to an implementable manner of the second aspect of the present application, the adjustment module comprises:
a first function construction unit, configured to construct a first loss function of the generator according to the distance, and determine a first weighting value of the first loss function;
a second function construction unit, configured to construct a second loss function of the generator according to first probability information that the discriminator discriminates that the output chinese text is false, and determine a second weighted value of the second loss function;
a third function construction unit for constructing a loss function of the generator based on the first loss function, the second loss function, the first weighting value, and the second weighting value.
According to an implementable manner of the second aspect of the present application, the adjusting module further comprises:
and the target function construction unit is used for constructing a loss function of the discriminator according to the probability information of the discriminator for discriminating the output Chinese text as true, and constructing the target function according to the loss function of the generator and the loss function of the discriminator.
According to a manner that can be implemented in the second aspect of the present application, the discriminator extracts the strongest semantic information by using a convolutional neural network, and adds an attention mechanism to an input layer thereof to extract semantic information containing context, and then determines a probability of discriminating that the output chinese text is true according to the strongest semantic information and the semantic information containing context.
A third aspect of the present application provides an apparatus for generating chinese text based on an image of a feedback mechanism, the apparatus comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, when executing the computer program, implementing a method for generating chinese text based on an image of a feedback mechanism as described in any one of the embodiments above.
A fourth aspect of the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed, implements the method for generating chinese text based on images of a feedback mechanism as described in any one of the above embodiments.
The embodiments disclosed in the present application have at least the following advantages:
the generative confrontation network model for generating the Chinese text description by the image can be gradually optimized in the training process, so that the accuracy of generating the Chinese text by the image is improved.
Drawings
FIG. 1 is a schematic flow chart of a method for generating Chinese text based on images of a feedback mechanism according to a preferred embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a preferred embodiment of the image-generating Chinese text system based on the feedback mechanism provided in the present application.
Reference numerals:
the device comprises a model building module 1, a generating module 2, a feedback module 3 and an adjusting module 4.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a schematic flow chart of a method for generating chinese text based on images of a feedback mechanism according to a preferred embodiment of the present invention.
As shown in fig. 1, the method includes:
s1 constructs a generative confrontation network model for generating chinese text description from the image, the generative confrontation network model including a generator and a discriminator.
In the embodiment of the application, the generator and the discriminator may not be limited to the neural network, but only the two may have functions that can be fitted to the respective generation and judgment, but are preferably neural network models.
S2, inputting a sample image of known Chinese text description information into the generator, obtaining the Chinese text description output by the generator, and obtaining a corresponding reference image based on the output Chinese text description, wherein the image characteristics corresponding to the reference image are the same as the image characteristics corresponding to the Chinese text description information.
The sample images of the known chinese text description information may be extracted from a preset training set. When a training set is constructed, images with Chinese text description information can be obtained.
Before a sample image with known Chinese text description information is input into the generator, the sample image can be subjected to necessary denoising processing so as to avoid noise of the sample image from influencing training of the generative confrontation network model.
Specifically, acquiring a corresponding reference image based on the output chinese text description includes: and inputting the output Chinese text description into a trained text generation image model, and further generating the reference image by the text generation image model. The text-generating image model may be a model based on a generative confrontation network, such as an existing StackGAN model, StackGAN + + model, AttnGAN model, or the like.
S3 feeds back the reference image to the discriminator to cause the discriminator to calculate the distance of the sample image from the reference image.
In this embodiment, the distance may be a cosine distance or a euclidean distance.
And S4, if the calculated distance is not less than a preset distance threshold, adding the distance into an objective function of the generating type confrontation network model, and adjusting the generator and the discriminator based on the objective function, so as to guide the generator to generate a vector closer to a true value.
It should be noted that, when the calculated distance is smaller than the preset distance threshold, the preset objective function may be used as the objective function of the generative confrontation network model.
It should be noted that the above-mentioned chinese text is described as a chinese text for describing an image. For example, the sample image is an image of two dogs, and the chinese text used to describe the image is text describing two dogs, which may be "two french bulls dogs on the grassland".
The discriminator is equivalent to a two-classifier, and can distinguish whether the input Chinese text is from a real question text or a text generated by the generator, and can distinguish whether the output Chinese text is the probability of the real Chinese text, and the like. The objective function may be determined based on the loss function of the generator, the arbiter. The generator and the discriminator can be adjusted and trained through the existing iterative training, and the precision of the generating type confrontation network model for generating Chinese text description by the image is improved.
It should be noted that, for adjusting the generator and the discriminator based on the objective function, various existing methods may be adopted, and the generator satisfying the expected value is obtained after adjustment, which is not limited in the embodiment of the present invention.
For the image description, the Chinese text description of the image is regenerated into the image, and if the distance between the two images is the minimum (the similarity is the highest), the Chinese text description of the image is the most accurate. The method is characterized in that a corresponding feedback mechanism is constructed based on the principle, the feedback mechanism acquires a corresponding reference image according to Chinese text description generated based on a sample image, further calculates the distance between the reference image and the sample image, and adds the distance to a target function of the generated countermeasure network model when the distance is less than optimal. Through the feedback mechanism, the generative confrontation network model for generating the Chinese text description by the image can be gradually optimized in the training process, so that the accuracy of generating the Chinese text by the image is improved.
After the generative confrontation network model is trained by the method, the target image needing to generate the Chinese text description can be input into the trained generator, so that the Chinese text description of the target image is obtained.
In one embodiment, the method further comprises:
constructing a first loss function of the generator according to the distance, and determining a first weighting value of the first loss function;
constructing a second loss function of the generator according to first probability information of the output Chinese text judged to be false by the discriminator, and determining a second weighted value of the second loss function;
constructing a loss function for the generator based on the first loss function, the second loss function, the first weighting value, and the second weighting value.
The loss function of the generator is determined by carrying out weighted summation on the first loss function and the second loss function.
The specific values of the first weighted value and the second weighted value both satisfy more than 0 and less than 1. In some embodiments, the specific values of the first weight and the second weight are both 0.5.
In one embodiment, the method further comprises:
and constructing a loss function of the discriminator according to the probability information of the discriminator for discriminating the output Chinese text as true, and constructing the target function according to the loss function of the generator and the loss function of the discriminator.
When the discriminator discriminates whether the output Chinese text is true or false, the method specifically executes the following steps:
the discriminator compares the Chinese text description output by the generator with the known Chinese text description of the corresponding sample image, if the Chinese text description output by the generator is determined to be the known Chinese text description, the output Chinese text is determined to be true, and if the Chinese text description output by the generator is determined not to be the known Chinese text description, the output Chinese text is determined to be false.
In one embodiment, the discriminator uses a convolutional neural network to extract the strongest semantic information, and adds an attention mechanism to an input layer thereof to extract semantic information containing context, and further determines the probability of discriminating the output chinese text to be true according to the strongest semantic information and the semantic information containing context. According to the embodiment of the invention, the discrimination network can obtain richer semantic and context information through the setting, so that the performance of the discrimination network is optimized.
The embodiment of the second aspect of the application provides an image generation Chinese text system based on a feedback mechanism.
Fig. 2 is a schematic structural diagram of a preferred embodiment of the feedback mechanism-based image generation chinese text system according to the present invention, which can implement the whole process of the feedback mechanism-based image generation chinese text method according to any of the above embodiments.
As shown in fig. 2, the system includes:
the model building module 1 is used for building a generative confrontation network model for generating Chinese text description through images, and the generative confrontation network model comprises a generator and a discriminator;
the generating module 2 is configured to input a sample image of known chinese text description information into the generator, obtain a chinese text description output by the generator, and obtain a corresponding reference image based on the output chinese text description, where an image feature corresponding to the reference image is the same as an image feature corresponding to the chinese text description information;
a feedback module 3, configured to feed back the reference image to the discriminator, so that the discriminator calculates a distance between the sample image and the reference image;
and the adjusting module 4 is configured to add the distance to an objective function of the generative confrontation network model when the calculated distance is not less than a preset distance threshold, and adjust the generator and the discriminator based on the objective function, so as to guide the generator to generate a vector closer to a true value.
According to an implementable manner of the second aspect of the embodiment of the present application, the adjusting module includes:
a first function construction unit, configured to construct a first loss function of the generator according to the distance, and determine a first weighting value of the first loss function;
a second function construction unit, configured to construct a second loss function of the generator according to first probability information that the discriminator discriminates that the output chinese text is false, and determine a second weighted value of the second loss function;
a third function construction unit for constructing a loss function of the generator based on the first loss function, the second loss function, the first weighting value, and the second weighting value.
According to an implementable manner of the second aspect of the embodiment of the present application, the adjusting module further includes:
and the target function construction unit is used for constructing a loss function of the discriminator according to the probability information of the discriminator for discriminating the output Chinese text as true, and constructing the target function according to the loss function of the generator and the loss function of the discriminator.
According to a manner that can be implemented in the second aspect of the embodiment of the present application, the discriminator extracts the strongest semantic information by using a convolutional neural network, and adds an attention mechanism to an input layer thereof to extract semantic information containing context, and then determines a probability of discriminating that the output chinese text is true according to the strongest semantic information and the semantic information containing context.
The functions and implementation manners of the modules of the embodiment of the system are the same as those of the embodiment of the method for generating the Chinese text based on the image of the feedback mechanism, and the specific analysis can refer to the embodiment of the method for generating the Chinese text based on the image of the feedback mechanism, so that the details are not repeated herein to avoid repetition.
The application also provides an image generation Chinese text device based on a feedback mechanism, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor executes the computer program to realize the image generation Chinese text method based on the feedback mechanism according to any one of the embodiments.
The present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed, the method for generating a chinese text based on an image of a feedback mechanism as described in any of the above embodiments is implemented.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor is a control center of the image generation Chinese text device based on the feedback mechanism, and various interfaces and lines are used for connecting various parts of the whole image generation Chinese text device based on the feedback mechanism.
The memory may be used to store the computer programs and/or modules, and the processor may implement the various functions of the apparatus for generating chinese text based on images of the feedback mechanism by running or executing the computer programs and/or modules stored in the memory and calling the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the integrated module/unit of the image generation Chinese text device based on the feedback mechanism can be stored in a computer readable storage medium if the module/unit is realized in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
The foregoing is a preferred embodiment of the present application, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations are also regarded as the protection scope of the present application.

Claims (10)

1. The method for generating the Chinese text based on the image of the feedback mechanism is characterized by comprising the following steps:
constructing a generative confrontation network model for generating Chinese text description through an image, wherein the generative confrontation network model comprises a generator and a discriminator;
inputting a sample image of known Chinese text description information into the generator, obtaining the Chinese text description output by the generator, and obtaining a corresponding reference image based on the output Chinese text description, wherein the image characteristics corresponding to the reference image are the same as the image characteristics corresponding to the Chinese text description information;
feeding back the reference image to the discriminator to cause the discriminator to calculate the distance between the sample image and the reference image;
and if the calculated distance is not smaller than a preset distance threshold, adding the distance into an objective function of the generating type confrontation network model, and adjusting the generator and the discriminator based on the objective function so as to guide the generator to generate a vector closer to a true value.
2. The method for generating chinese text based on images of feedback mechanism as claimed in claim 1, wherein said method further comprises:
constructing a first loss function of the generator according to the distance, and determining a first weighting value of the first loss function;
constructing a second loss function of the generator according to first probability information of the output Chinese text judged to be false by the discriminator, and determining a second weighted value of the second loss function;
constructing a loss function for the generator based on the first loss function, the second loss function, the first weighting value, and the second weighting value.
3. The method for generating chinese text based on images of feedback mechanism as claimed in claim 2, wherein said method further comprises:
and constructing a loss function of the discriminator according to the probability information of the discriminator for discriminating the output Chinese text as true, and constructing the target function according to the loss function of the generator and the loss function of the discriminator.
4. The method for generating chinese text based on images of feedback mechanism as claimed in claim 3, wherein said method further comprises:
the discriminator adopts a convolutional neural network to extract strongest semantic information, adds an attention mechanism to an input layer of the convolutional neural network to extract semantic information containing context, and further determines the probability of discriminating the output Chinese text to be true according to the strongest semantic information and the semantic information containing the context.
5. A system for generating chinese text from images based on a feedback mechanism, the system comprising:
the model construction module is used for constructing a generative confrontation network model for generating Chinese text description through images, and the generative confrontation network model comprises a generator and a discriminator;
the generating module is used for inputting a sample image of known Chinese text description information into the generator, obtaining the Chinese text description output by the generator, and obtaining a corresponding reference image based on the output Chinese text description, wherein the image characteristics corresponding to the reference image are the same as the image characteristics corresponding to the Chinese text description information;
a feedback module, configured to feed back the reference image to the discriminator so that the discriminator calculates a distance between the sample image and the reference image;
and the adjusting module is used for adding the distance into an objective function of the generating type confrontation network model when the calculated distance is not smaller than a preset distance threshold, and adjusting the generator and the discriminator based on the objective function so as to guide the generator to generate a vector closer to a true value.
6. The feedback mechanism based image generation chinese text system of claim 5, wherein the adjustment module comprises:
a first function construction unit, configured to construct a first loss function of the generator according to the distance, and determine a first weighting value of the first loss function;
a second function construction unit, configured to construct a second loss function of the generator according to first probability information that the discriminator discriminates that the output chinese text is false, and determine a second weighted value of the second loss function;
a third function construction unit for constructing a loss function of the generator based on the first loss function, the second loss function, the first weighting value, and the second weighting value.
7. The feedback mechanism based image generation chinese text system of claim 6, wherein the adjustment module further comprises:
and the target function construction unit is used for constructing a loss function of the discriminator according to the probability information of the discriminator for discriminating the output Chinese text as true, and constructing the target function according to the loss function of the generator and the loss function of the discriminator.
8. The feedback mechanism based image generation chinese text system of claim 7, wherein:
the discriminator adopts a convolutional neural network to extract strongest semantic information, adds an attention mechanism to an input layer of the convolutional neural network to extract semantic information containing context, and further determines the probability of discriminating the output Chinese text to be true according to the strongest semantic information and the semantic information containing the context.
9. The apparatus for generating Chinese text based on image of feedback mechanism, comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor, when executing the computer program, implements the method for generating Chinese text based on image of feedback mechanism as claimed in any one of claims 1-4.
10. A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, which when executed implements the method of generating chinese text based on images of a feedback mechanism as claimed in any one of claims 1 to 4.
CN202110823453.4A 2021-07-20 2021-07-20 Method, system and device for generating Chinese text by image based on feedback mechanism Active CN113673349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110823453.4A CN113673349B (en) 2021-07-20 2021-07-20 Method, system and device for generating Chinese text by image based on feedback mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110823453.4A CN113673349B (en) 2021-07-20 2021-07-20 Method, system and device for generating Chinese text by image based on feedback mechanism

Publications (2)

Publication Number Publication Date
CN113673349A CN113673349A (en) 2021-11-19
CN113673349B true CN113673349B (en) 2022-03-11

Family

ID=78539735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110823453.4A Active CN113673349B (en) 2021-07-20 2021-07-20 Method, system and device for generating Chinese text by image based on feedback mechanism

Country Status (1)

Country Link
CN (1) CN113673349B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114648681B (en) * 2022-05-20 2022-10-28 浪潮电子信息产业股份有限公司 Image generation method, device, equipment and medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989067B (en) * 2015-02-09 2019-09-03 华为技术有限公司 Method, user equipment and the training server of text snippet are generated from picture
CN110599557B (en) * 2017-08-30 2022-11-18 深圳市腾讯计算机系统有限公司 Image description generation method, model training method, device and storage medium
CN109685116B (en) * 2018-11-30 2022-12-30 腾讯科技(深圳)有限公司 Image description information generation method and device and electronic device
CN111860555A (en) * 2019-04-30 2020-10-30 北京京东尚科信息技术有限公司 Image processing method, device and storage medium
CN110287357B (en) * 2019-05-31 2021-05-18 浙江工业大学 Image description generation method for generating countermeasure network based on condition
CN111046904B (en) * 2019-10-30 2021-11-23 中国科学院深圳先进技术研究院 Image description method, image description device and computer storage medium
CN112560438A (en) * 2020-11-27 2021-03-26 同济大学 Text generation method based on generation of confrontation network
CN112905822B (en) * 2021-02-02 2022-07-01 华侨大学 Deep supervision cross-modal counterwork learning method based on attention mechanism
CN112818159B (en) * 2021-02-24 2022-10-18 上海交通大学 Image description text generation method based on generation countermeasure network

Also Published As

Publication number Publication date
CN113673349A (en) 2021-11-19

Similar Documents

Publication Publication Date Title
US10332507B2 (en) Method and device for waking up via speech based on artificial intelligence
CN110600017B (en) Training method of voice processing model, voice recognition method, system and device
US10360899B2 (en) Method and device for processing speech based on artificial intelligence
CN109272989B (en) Voice wake-up method, apparatus and computer readable storage medium
Li et al. Toward convolutional neural networks on pulse repetition interval modulation recognition
CN107393542B (en) Bird species identification method based on two-channel neural network
Sprengel et al. Audio based bird species identification using deep learning techniques
CN110706692B (en) Training method and system of child voice recognition model
CN108281137A (en) A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN108735199B (en) Self-adaptive training method and system of acoustic model
US10984793B2 (en) Voice interaction method and device
CN111563161B (en) Statement identification method, statement identification device and intelligent equipment
CN111010356A (en) Underwater acoustic communication signal modulation mode identification method based on support vector machine
CN113673349B (en) Method, system and device for generating Chinese text by image based on feedback mechanism
Esmaeilpour et al. Multidiscriminator sobolev defense-GAN against adversarial attacks for end-to-end speech systems
CN110717027B (en) Multi-round intelligent question-answering method, system, controller and medium
CN112786028B (en) Acoustic model processing method, apparatus, device and readable storage medium
CN111091809A (en) Regional accent recognition method and device based on depth feature fusion
CN113674374A (en) Chinese text image generation method and device based on generation type countermeasure network
CN117648990A (en) Voice countermeasure sample generation method and system for black box attack
Bui et al. A non-linear GMM KL and GUMI kernel for SVM using GMM-UBM supervector in home acoustic event classification
CN113643706B (en) Speech recognition method, device, electronic equipment and storage medium
CN111143560A (en) Short text classification method, terminal equipment and storage medium
CN113011446A (en) Intelligent target identification method based on multi-source heterogeneous data learning
Lu et al. Detecting Unknown Speech Spoofing Algorithms with Nearest Neighbors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant