WO2023019995A1 - 训练方法、译文展示方法、装置、电子设备以及存储介质 - Google Patents

训练方法、译文展示方法、装置、电子设备以及存储介质 Download PDF

Info

Publication number
WO2023019995A1
WO2023019995A1 PCT/CN2022/088395 CN2022088395W WO2023019995A1 WO 2023019995 A1 WO2023019995 A1 WO 2023019995A1 CN 2022088395 W CN2022088395 W CN 2022088395W WO 2023019995 A1 WO2023019995 A1 WO 2023019995A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
text block
text
target
translation
Prior art date
Application number
PCT/CN2022/088395
Other languages
English (en)
French (fr)
Inventor
吴亮
刘珊珊
章成全
姚锟
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Priority to JP2023509866A priority Critical patent/JP2023541351A/ja
Publication of WO2023019995A1 publication Critical patent/WO2023019995A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the technical field of artificial intelligence, specifically the technical fields of computer vision and deep learning, and can be applied to scenarios such as OCR optical character recognition. Specifically, it relates to a training method, a translation display method, a device, an electronic device and a storage medium.
  • Photo translation is a new form of translation product.
  • the input of the current photo translation function is an image with text in the source language, and the output is to return an image with text in the target translation language.
  • the disclosure provides a training method, a translation display method, a device, an electronic device, and a storage medium.
  • a method for training a text erasing model including: using the generator of the GAN model to process the original text block image set to obtain a simulated text block erasing image set, wherein the above-mentioned generation confrontation
  • the network model includes the above-mentioned generator and discriminator; using the real text block erasing image set and the above-mentioned simulated text block erasing image set, the above-mentioned generator and the above-mentioned discriminator are alternately trained to obtain the trained generator and discriminator;
  • the generator that the above-mentioned training is completed is determined as the above-mentioned text erasure model;
  • the pixel value of the text erasure area in the real text block erasure image that the above-mentioned real text block erasure image set includes is based on the above-mentioned real text block erasure Determined by the pixel values of areas other than the above-mentioned text erasure
  • a method for displaying a translated text including: processing a target original text block image by using a text erasure model to obtain an erased image of the target text block, where the target original text block image includes the target original text block; Determine the translation display parameters; according to the above translation display parameters, superimpose the target text block corresponding to the above target original text block on the above target text erasing image to obtain the target translation text block image; and display the above target target translation text block image; wherein , the above text erasure model is trained according to the method described above.
  • a text erasure model training device including: a first obtaining module, which is used to process the original text block image set by using the generator of the generation confrontation network model, and obtain the simulated text block erasure Image set, wherein, the above-mentioned generated confrontation network model includes the above-mentioned generator and discriminator; the second acquisition module is used to use the real text block erasing image set and the above-mentioned simulated text block erasing image set, to the above-mentioned generator and the above-mentioned discriminator
  • the generator is alternately trained to obtain a trained generator and a discriminator; and a first determination module is used to determine the trained generator as the above-mentioned text erasure model; wherein, the above-mentioned real text block erasure image set includes The pixel value of the erased text area in the real text block erased image is determined according to the pixel values of other areas in the real text block erased image except the text erased area
  • a translation display device including: a third obtaining module, configured to use a character erasure model to process a target original text block image to obtain an erased image of a target text block, and the target original text block image Including the target original text block; the second determination module is used to determine the translation display parameters; the fourth acquisition module is used to superimpose the target text block corresponding to the above target original text text block on the above target text erasure according to the above translation display parameters On the image, a target translation text block image is obtained; and a display module is used to display the above-mentioned target translation text block image; wherein, the above-mentioned text erasure model is trained according to the above-mentioned method.
  • an electronic device including: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores instructions executable by the at least one processor , the above-mentioned instructions are executed by the above-mentioned at least one processor, so that the above-mentioned at least one processor can execute the above-mentioned method.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the above-mentioned computer instructions are used to cause the above-mentioned computer to execute the above-mentioned method.
  • a computer program product including a computer program, which implements the above method when executed by a processor.
  • Fig. 1 schematically shows an exemplary system architecture of a training method, a translation display method and a device that can apply a text erasure model according to an embodiment of the present disclosure
  • Fig. 2 schematically shows the flow chart of the training method of the character erasure model according to the embodiment of the present disclosure
  • Fig. 3 schematically shows a flow chart of training a discriminator by using the first real text block erasing image set and the first simulated text block erasing image set according to an embodiment of the present disclosure
  • Fig. 4 schematically shows a schematic diagram of a training process of a text erasure model according to an embodiment of the present disclosure
  • Fig. 5 schematically shows a flow chart of a translation presentation method according to an embodiment of the present disclosure
  • Fig. 6 schematically shows a flow chart of determining the number of translation display lines and/or the translation display height according to an embodiment of the present disclosure
  • Fig. 7 schematically shows a schematic diagram of a translation presentation process according to an embodiment of the present disclosure
  • FIG. 8A schematically shows a schematic diagram of a text erasing process according to an embodiment of the present disclosure
  • Fig. 8B schematically shows a schematic diagram of a translation fitting process according to an embodiment of the present disclosure
  • Fig. 9 schematically shows a block diagram of a training device for a text erasure model according to an embodiment of the present disclosure
  • Fig. 10 schematically shows a block diagram of a translation display device according to an embodiment of the present disclosure.
  • Fig. 11 schematically shows a block diagram of an electronic device suitable for implementing a text erasure model training method or a translation presentation method according to an embodiment of the present disclosure.
  • the photo-translation technology may include: taking a picture of a scene containing text to obtain an image, and then identifying the text content of the text line in the obtained image; then performing machine translation on the text content to obtain the translated text content; The text content is displayed to the user. If you want to directly display the translation result on the original text line of the image, you need to erase the text in the original text line in the image first, and then paste the translation back to the original text line to display the translation result.
  • the text area in the original image can be directly blurred and filtered, or the color average value of the text block area can be used to fill The entire area, allowing users to visually erase the original text.
  • the embodiments of the present disclosure provide a text erasure model training method, a translation display method, a device, an electronic device, a non-transitory computer-readable storage medium storing computer instructions, and a computer program product.
  • the training method of the text erasing model includes: using the generator of the generation confrontation network model to process the training set to obtain a simulated text block erasing image set, wherein the generation confrontation network model includes a generator and a discriminator.
  • the generator and the discriminator are alternately trained by using the real text block erasing image set and the simulated text block erasing image set, and the trained generator and discriminator are obtained. Identify the trained generator as a text erasure model.
  • the pixel values of the text-erased areas in the real text-block-erased images included in the real-text-block-erased images are determined according to the pixel values of other areas in the real text-block-erased images except the text-erased areas.
  • Fig. 1 schematically shows an exemplary system architecture of a training method, a translation presentation method and a device that can apply a text erasure model according to an embodiment of the present disclosure.
  • the exemplary system architecture to which the content processing method and device can be applied may include a terminal device, but the terminal device may implement the content processing method and device provided by the embodiments of the present disclosure without interacting with the server .
  • a system architecture 100 may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 .
  • the network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 .
  • Network 104 may include various connection types, such as wired and/or wireless communication links, among others.
  • Terminal devices 101 , 102 , 103 Users can use terminal devices 101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like.
  • Various communication client applications can be installed on the terminal devices 101, 102, 103, such as knowledge reading applications, web browser applications, search applications, instant messaging tools, email clients and/or social platform software, etc. (only example).
  • the terminal devices 101, 102, 103 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers and the like.
  • the server 105 may be a server that provides various services, such as a background management server that supports content browsed by users using the terminal devices 101 , 102 , 103 (just an example).
  • the background management server can analyze and process received data such as user requests, and feed back processing results (such as webpages, information, or data obtained or generated according to user requests) to the terminal device.
  • the text erasure model training method and translation presentation method provided by the embodiments of the present disclosure can generally be executed by the terminal device 101 , 102 , or 103 .
  • the device for training the text erasing model and the device for displaying the translation provided by the embodiments of the present disclosure may also be set in the terminal device 101 , 102 , or 103 .
  • the method for training a text erasure model and the method for displaying translations provided by the embodiments of the present disclosure may also generally be executed by the server 105 .
  • the device for training the text erasure model and the device for displaying the translation provided by the embodiments of the present disclosure can generally be set in the server 105 .
  • the text erasing model training method and translation presentation method provided by the embodiments of the present disclosure may also be executed by a server or server cluster that is different from the server 105 and can communicate with the terminal devices 101 , 102 , 103 and/or the server 105 .
  • the method provided by the embodiments of the present disclosure may also be set in a server or a server cluster that is different from the server 105 and can communicate with the terminal devices 101 , 102 , 103 and/or the server 105 .
  • the server 105 processes the training set by using a generator of a generated confrontational network model to obtain a set of simulated block erasing images, wherein the generated confrontational network model includes a generator and a discriminator.
  • the generator and the discriminator are alternately trained by using the real text block erasing image set and the simulated text block erasing image set, and the trained generator and discriminator are obtained. Identify the trained generator as a text erasure model.
  • the server or server cluster that can communicate with the terminal equipment 101, 102, 103 and/or server 105 utilizes the real text block erasing image set and the simulated text block erasing image set to alternately train the generator and the discriminator, and Obtain the text erasure model, i.e., the trained generator.
  • terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • Fig. 2 schematically shows a flowchart of a method for training a text erasure model according to an embodiment of the present disclosure.
  • the method 200 includes operations S210-S230.
  • the original text block image set is processed by using the generator of the generative adversarial network model to obtain a simulated text block erasure image set, wherein the generative adversarial network model includes a generator and a discriminator.
  • the generator and the discriminator are alternately trained by using the real text block erasing image set and the simulated text block erasing image set to obtain the trained generator and discriminator.
  • the trained generator is determined as a text erasure model.
  • the pixel values of the text erasing area in the real text block erasing image included in the real text block erasing image set are based on other areas in the real text block erasing image except the text erasing area The pixel value is determined.
  • the text block image may include a text erased area and other background areas except the text erased area.
  • the text block erasing can be the text erasing of the text erasing area in the input text block image, and the texture color of the original background is retained while erasing.
  • the generative adversarial network model may include a deep convolutional generative adversarial network model, a bulldozer distance-based generative adversarial network model, or a conditional generative adversarial network model.
  • a GAN model can include a generator and a discriminator. Generators and discriminators can include neural network models. The generator can be used to generate a simulated text block erasing image set, and through continuous training of the generator, the real text block erasing image set can be learned, so that the data distribution of the real text block erasing image set can be generated from scratch. Compatible samples, and try to confuse the discriminator as much as possible. The discriminator can be used on both the real block erasure image set and the simulated block erasure image set.
  • the bulldozer distance-based generative adversarial network model can solve the problems of generator and discriminator training asynchrony, training non-convergence and mode collapse, and improve the model quality of the data generation model.
  • the training process of the generative adversarial network model based on bulldozer distance is as follows: preset the learning rate, the number of batches (that is, the number of real text block erasing images included in the real text block erasing image set), The model parameter range, maximum number of iterations, and training times per iteration for the neural network model.
  • the generator and the discriminator are iteratively and alternately trained by using the real text block erasing image set and the simulated text block erasing image set, so that the generator and the discriminator can realize their respective Finally, the discriminator cannot accurately distinguish between the real text block erasure image set and the simulated text block erasure image set, that is, to reach the Nash equilibrium. In this case, it can be considered that the generator has learned the data distribution of the real text erasing image set, and the trained generator is determined as the text erasing model.
  • iteratively and alternately training the generator and the discriminator may include: during each iteration, while maintaining the generator Under the condition that the model parameters remain unchanged, the discriminator is trained by using the real block erasing image set and the simulated block erasing image set to complete the training times set for the discriminator in this iteration. After completing the training times set for the discriminator in this iteration, while keeping the model parameters of the discriminator unchanged, use the simulated text block to erase the image set to train the generator, and the training times set for the generator in this iteration have been completed. training times.
  • the generator can be used to generate a simulated text block erasing image set corresponding to this time.
  • the above-mentioned training methods of the generator and the discriminator are only exemplary embodiments, but are not limited thereto, and may also include training methods known in the art, as long as the training of the generator and the discriminator can be realized.
  • an appropriate training strategy may be selected according to actual requirements, which is not limited herein.
  • the training strategy can include one of the following: in each iteration, the number of training times for the generator and the number of training times for the discriminator is once, the number of times for training the generator is once and the number of times for training the discriminator is multiple times, the number of times for training the generator is The number of training times is multiple and the number of training times of the discriminator is one time, the number of training times of the generator is multiple times and the number of training times of the discriminator is multiple times.
  • the simulated text block erasure image set is obtained, and the real text block erasure image set and the simulated text block erasure image set are used for
  • the generator and the discriminator are alternately trained to obtain the trained generator and the discriminator, and the trained generator is determined as the text erasure model, because the pixel value of the text erasure area in the real text block erasure image is based on The pixel values of other areas are determined. Therefore, the text erasure model can realize that the color of the text erasure area is as consistent as possible with other areas (that is, the background area), thereby improving the erasing effect and improving the user's vision. experience.
  • the original text block image training set includes a first original text block image set and a second original text block image set
  • the simulated text block erasing image set includes the first simulated text block erasing image set and the second Set of simulated text block erase images.
  • Using the generator of the generative confrontation network model to process the original text block image set to obtain the simulated text block erasure image set may include the following operations.
  • the generator is used to process the first original text block image set to generate a first simulated text block erasing image set; the generator is used to process the second original text block image set to generate a second simulated text block erasing image set.
  • using the generator to generate the simulated text block erasing image set may include: inputting the first original text block image set and the first random noise data into the generator to obtain the first simulated text block erasing image set ; Input the first original text block image set and the second random noise data into the generator to obtain the second simulated text block erasing image set.
  • Forms of the first random noise data and the second random noise data may include Gaussian noise.
  • the set of real block-erase images includes a first set of real block-erase images and a second set of real block-erase images.
  • the generator and the discriminator are alternately trained by using the real block erasing image set and the simulated character block erasing image set to obtain the trained generator and discriminator, which may include the following operations.
  • the discriminator is trained using the first set of real block-erased images and the first set of simulated block-erased images.
  • the generator is trained using a second set of simulated block erasure images. The operation of training the discriminator and the operation of training the generator are alternately performed until the convergence condition of the GAN model is met. The generator and discriminator obtained under the condition of satisfying the convergence condition of the GAN model are determined as the trained generator and discriminator.
  • the convergence condition of the generated network confrontation model may include generator convergence, both the generator and the discriminator converge, or the iteration reaches the termination condition, and the iteration reaching the termination condition may include that the number of iterations is equal to the preset number of iterations.
  • alternately performing the operation of training the discriminator and the operation of training the generator can be understood as: in the t-th iteration process, while keeping the model parameters of the generator unchanged, using The real text block erasing image set and the first simulated text block erasing image set train the discriminator, and repeat the above process to complete the training times set for the discriminator in this iteration, and t is an integer greater than or equal to 2.
  • the generator may be used to generate the first simulated text block image set corresponding to this time.
  • the generator after completing the training times set for the discriminator in this iteration, while keeping the model parameters of the discriminator unchanged, the generator is trained using the second simulated block erasing image set , repeat the above process to complete the training times set for the generator in this iteration.
  • the generator may be used to generate a second simulation text block image set corresponding to this time. 2 ⁇ t ⁇ T, T represents the number of preset iterations, and t and T are integers.
  • the model parameters of the generator in the case of keeping the model parameters of the generator unchanged refer to the last training of the generator in the completion of the t-1th iteration
  • the model parameters of the discriminator in the case of keeping the model parameters of the discriminator unchanged refer to the model parameters of the discriminator obtained after the last training for the discriminator in the t-th iteration is completed.
  • Fig. 3 schematically shows a flowchart of training a discriminator by using the first set of real block-erased images and the first set of simulated block-erased images according to an embodiment of the present disclosure.
  • the first real block-erase image set includes a plurality of first real block-erase images
  • the first simulated block-erase image set includes a plurality of first simulated block-erase images
  • the method 300 includes operations S310-S330.
  • operation S310 input each first real character block-erased image in the first real character block-erased image set to a discriminator to obtain a first discrimination result corresponding to the first real character block-erased image.
  • each first simulated block-erased image in the first set of simulated block-erased images is input to a discriminator to obtain a second discrimination result corresponding to the first simulated block-erased image.
  • the discriminator is trained based on the first discrimination result and the second discrimination result.
  • the discriminator actually belongs to the classifier. After inputting the first real text block erased image and the first simulated text block erased image respectively into the discriminator, according to the first real text block erased image The corresponding first discriminant result and the second discriminant result corresponding to the first simulated text block erasing image train the discriminator, so that the discriminator cannot accurately determine whether the input is the first real text block erasing image or the first simulated text block To erase the image, that is, to make the first judgment result corresponding to the first real character block erased image and the second judgment result corresponding to the first simulated character block erased image as identical as possible.
  • training the discriminator based on the first discrimination result and the second discrimination result may include the following operations:
  • the first output value is obtained by using the first discrimination result and the second discrimination result.
  • the model parameters of the discriminator are adjusted according to the first output value to obtain the adjusted model parameters of the discriminator.
  • using the second simulated text block erasing image set to train the generator may include the following operations:
  • the first discrimination result corresponding to the first real text block erasing image and the first simulation text is input into the first loss function to obtain the first output value.
  • the model parameters of the discriminator are adjusted according to the first output value, and the above process is repeated to complete the number of training times set for the discriminator in this iteration.
  • each image included in the second simulated block erasing image set is A second simulated text block erasing image is input to the second loss function to obtain a second output value.
  • a model parameter of the generator is adjusted according to the second output value. Repeat the above process to complete the training times set for the generator in this iteration.
  • the first loss function includes a discriminator loss function and a minimum mean square error loss function
  • the second loss function includes a generator loss function and a minimum mean square error loss function, a discriminator loss function, a minimum mean square error loss function, and a minimum mean square error loss function.
  • Squareness Error Loss and Generator Loss are both loss functions that include a regularization term.
  • the discriminator loss function, the minimum mean square error loss function and the generator loss function included in the first loss function are all loss functions including regularization items, and the combination of the above loss functions makes it easy to Denoising makes the text erasing results more realistic and reliable.
  • Fig. 4 schematically shows a schematic diagram of a training process of a text erasure model according to an embodiment of the present disclosure.
  • the training process 400 of the text erasure model may include: in each iteration process, under the condition that the model parameters of the generator 402 remain unchanged, input the first original text block image set 401 into the generator 402. Obtain a first simulated text block erasing image set 403.
  • Each first real character block erased image in the first real character block erased image set 404 is input to the discriminator 405 to obtain a first discrimination result 406 corresponding to the first real character block erased image.
  • Each of the first erased simulated text images in the first erased simulated text image set 403 is input to the discriminator 405 to obtain a second discrimination result 407 corresponding to the first erased simulated text image.
  • the second original text block image set 410 is input into the generator 402 to obtain the second simulated text block erasure Image set 411.
  • Each second simulated block-erased image in the second simulated block-erased image set 411 is input into the second loss function 412 to obtain a second output value 413 .
  • the model parameters of the generator 402 are adjusted according to the second output value 413 . The above process is repeated until the number of training times for the generator 402 in this iteration is completed.
  • the above-mentioned training process for the discriminator 405 and the generator 402 is alternately performed until the convergence condition of the GAN model is met, and the training is completed.
  • Fig. 5 schematically shows a flow chart of a translation presentation method according to an embodiment of the present disclosure.
  • the method 500 includes operations S510-S540.
  • the target original text block image is processed using a text erasure model to obtain an erased image of the target text block, where the target original text block image includes the target original text block.
  • the target text block corresponding to the target original text block is superimposed on the target text erasing image to obtain the target translation text block image.
  • the text erasing model is trained by using the method of the above operations S210-S240.
  • the target original text block image may include a text erasing area and other background areas except the text erasing area
  • the target text block erasing image may include text in the text erasing area of the target original text block image
  • the target original text block may include a text erasing area in the image of the target original text block.
  • the erased image of the target text block is obtained by inputting the target original text block image into the text erasure model.
  • the text erasure model uses the generator of the generative confrontation network model to generate a simulated text block image set, and uses the real text block erasure image set and the simulated text block image set to alternately train the generator of the generative confrontation network model and the discriminator, and obtains
  • the trained generator and discriminator are trained, and the trained generator is determined as a text erasure model.
  • the translation display parameters may include: the text arrangement parameter value, text color, text position, etc. of the translated text after the text in the text erasure area of the target original text block image is translated.
  • the text arrangement parameter value of the translation may include the number of translation display lines and/or the translation display height, and the translation display direction; the text color of the translation may be determined by the text color of the text erasing area of the target original text block image ; The text position of the translation can be consistent with the text position of the text erasing area of the target original text block graphics.
  • the translation is superimposed on the target text erasure image corresponding to the position of the text erasure area in the target original text block image to obtain the target translation text block image.
  • the erased image of the target text block is obtained, the translation display parameters are determined, and the target text block corresponding to the target original text block is superimposed according to the translation display parameters Erase the target text on the image to obtain the target translation text block image, and display the target translation text block image, effectively realizing the translation function of the text block image text, making the displayed translation image complete and beautiful, thereby improving the user's visual experience.
  • the text box corresponding to the target original text block is not a square text box
  • the text box is transformed into a square text box using affine transformation.
  • the character erasure model before using the character erasure model to process the target original character block image, based on the paragraph detection model, it is detected that the character frames in the character erasure area of the target original character block image are of different shapes
  • a quadrilateral text frame using affine transformation to transform the quadrilateral text frame with different shapes into a square text frame.
  • the quadrilateral text box may be a text box corresponding to the text erasing area of the target original text block image, and the square text box may be in a rectangular shape.
  • affine is used again to The transformation performs inverse transformation on the square text box, and transforms it back into a quadrilateral text box with the same shape and size as the text box corresponding to the text erasing area of the target original text block image.
  • the affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, which maintains the "straightness" and "parallelism” of the two-dimensional graphics.
  • Straightness can be straight line or straight line after transformation, no bending, arc or arc; parallelism can be keeping the relative positional relationship between two-dimensional graphics unchanged, parallel lines are still parallel lines, and the intersection angle of intersecting straight lines remains unchanged .
  • the affine transformation may be through translation, scaling, flipping, and rotation. Cutting and so on are realized.
  • the text box corresponding to the text erasure area of the target original text block image is an irregularly shaped quadrilateral box, and the irregularly shaped quadrilateral box corresponds to an oblique text erasure area.
  • text content the position information of each corner of the irregular quadrilateral box represents different two-dimensional coordinates, and the text box corresponding to the text erasing area of the target original text block image is corrected into a rectangular shape by affine transformation The two-dimensional coordinates of the quadrilateral box of .
  • the target original text block image may include a plurality of target sub-original text block images.
  • the target original text block image may be obtained by splicing a plurality of target sub-original text block images, and the spliced target original text block images are input into the text erasing model for erasing.
  • multiple target sub-original text block images can be normalized to a fixed height, and the multiple target sub-original text block images can be combined and stitched into a single or multiple regularly arranged large images , as the target source text block image.
  • the translation presentation parameters may include translation pixel values.
  • determining translation display parameters may include the following operations:
  • the pixel mean value of the zigzag region of the target original text block image is determined as the translation pixel value.
  • determining the text area of the target original zhang block image may include the following operations:
  • the target original text block image is processed by image binarization to obtain a first image area and a second image area. Determine the first pixel mean value of the target original text block image corresponding to the first image area. Determine the second pixel mean value of the target original text block image corresponding to the second image area. A third pixel mean value corresponding to the target text block erased image is determined. According to the first pixel average value, the second pixel average value and the third pixel average value, the text area of the target original text block image is determined.
  • the image binarization process can be to set a threshold T, and use the threshold T to divide the data of the image into two parts: a pixel group with a pixel value greater than T and a pixel group with a pixel value smaller than T, so that the entire image Presents an obvious visual effect of only black and white.
  • the first image area may be the text erased area of the target original text block image, or other areas except the text erased area of the target original text block image
  • the second image area may be the target original text block image.
  • the text erased area of the original text block image may also be other areas except the text erased area of the target original text block image.
  • the first pixel mean value of the target original text block image corresponding to the first image area can be represented by A1
  • the second pixel mean value of the target original text block image corresponding to the second image area can be represented by is A2
  • the third pixel mean value corresponding to the erased image of the target text block can be represented as A3.
  • the third pixel value corresponding to the erased image of the target text block may be determined according to the pixel values of other regions in the erased image of the target text block except for the erased text area.
  • determining the text area of the target original text block image according to the first pixel average value, the second pixel average value and the third pixel average value may include the following operations:
  • the first pixel mean value corresponding to An image area is determined as the text area of the target original text block image.
  • the absolute value of the difference between the first pixel mean and the third pixel mean is greater than or equal to the absolute value of the difference between the second pixel mean and the third pixel mean, it will correspond to the second pixel mean
  • the second image area of is determined as the text area of the target original text block image.
  • the first pixel average value A1 of the target original text block image corresponding to the first image area and the pixel average value A1 corresponding to the second image area is judged to determine the text area of the target original text block image.
  • the first image area corresponding to A1 is determined as the zigzag area of the target original text block image
  • the second image area corresponding to A2 is determined is other areas except the text area of the target original text block image.
  • the second image area corresponding to A2 is determined as the text area of the target original text block image
  • the first image area corresponding to A1 is determined as the target Other areas other than the text area of the original text block image.
  • the translation display parameter may include a translation arrangement parameter value
  • the translation arrangement parameter value may include a translation display line number, a translation display height, a translation display line number, and a translation display height.
  • determining the display parameters may include the following operations: according to the height and width of the text area corresponding to the erased image of the target text block, and the height and width corresponding to the target translation text block, determine the number of translation display lines and/or translation display height.
  • the displayed height of the translated text may be determined by the height of the text area corresponding to the erased image of the target text block.
  • the text width of the translated text may be the text width when the translated text is arranged in a row. According to the ratio of font width and height of the translation, the text width of the translation can be obtained when the translation is arranged in one line.
  • Fig. 6 schematically shows a flow chart of determining the number of translation display lines and/or the translation display height according to an embodiment of the present disclosure.
  • a width sum corresponding to a target translation text block is determined.
  • the number of translation display lines corresponding to the target translation text block is set as i lines, wherein the height of each line in the i lines is 1/i of the height of the text area corresponding to the erased image of the target text block, i is an integer greater than or equal to 1.
  • the width is determined to be larger than the preset width threshold corresponding to i lines
  • operation S640 the operation of determining whether the width sum is less than or equal to the preset width threshold corresponding to i rows is repeatedly performed until it is determined that the width sum is less than or equal to the preset width threshold corresponding to i rows.
  • the translated text width when the translated texts are arranged in a line that is, the sum W 1 of the text widths corresponding to the target translated text blocks can be obtained.
  • the number of translation display lines is set to i lines, and the preset width threshold W corresponding to i lines is determined according to i times the width of the text area corresponding to the target text block erasing image.
  • the number of displayed lines and/or the displayed height of the translated text is determined by comparing the width corresponding to the target translated text block with the preset width threshold W corresponding to W 1 and i lines.
  • the text in the text area of the target original text block image is "It's cloudy and rainy", and after “It's cloudy and rainy” is translated, the target translation is "cloudy and rainy”. Therefore, the character width corresponding to the target translation character block is the sum of the character widths when the target translation block "cloudy and rainy" is arranged in a row, which can be expressed as W 1 .
  • the width of the text area corresponding to the erased image of the target text block is W 2
  • the translation shows 2 lines.
  • the translation arrangement parameter value may include a translation presentation direction.
  • the translation display direction may be determined according to the text direction of the target original text block.
  • the text boxes in the text area of the target original text block are quadrilateral text boxes with different shapes, and the quadrilateral text boxes with different shapes are transformed into rectangular text boxes by using affine transformation, which is convenient for text erasing and translation Fitting, the pasted text frame of the translated text is transformed back to the shape of the text frame in the same quadrilateral text frame as the quadrilateral text frame of the target text block with a different shape by using affine transformation again, forming the display direction of the translated text .
  • affine transformation which is convenient for text erasing and translation Fitting
  • Fig. 7 schematically shows a schematic diagram of a translation presentation process according to an embodiment of the present disclosure.
  • the target original text block image 701 is input to the text erasure model 702 for text erasure processing, and the target text block erasure image 703 is obtained, and the translation display parameter 704 is determined.
  • the translation display parameter 704 the The target text block image 701 corresponding to the target text block text area in the original text block image 701 is superimposed on the target text block erased image 703 to obtain a target translation text block image 706, and the target translation text block image 706 is displayed.
  • FIG. 8A schematically shows a schematic diagram of a text erasing process 800 according to an embodiment of the present disclosure.
  • Fig. 8B schematically shows a schematic diagram of a translation fitting process 800' according to an embodiment of the present disclosure.
  • the original text block images 803, 804, 805, 806 in the original text block image set 802 detected by the original image 801 are input into the text erasure model 807, and the original text block images in the original text block image set 802
  • the character area of the original character block images 803, 804, 805, 806 is erased, and the character block erased images 809, 810, 811, 812 in the character block erased image set 808 after the character erasure are output.
  • each original text block graphic in the original text block image set is translated, for example, the text area of the original text block image 805 is translated to obtain the translated text block corresponding to the text area of the original text block image 805 813.
  • Determine the translation display parameters 814 of the translation text block 813, and the translation presentation parameters 814 include: translation text positions, translation text arrangement parameter values, and translation pixel values.
  • the translated text block 813 is superimposed on the text block erased image 811 in the text block erased image set 808 to obtain the translated text block image 815 .
  • each original text block image in the original text block image set 802 is erased and pasted, and finally a translation image 816 with a translation display is obtained.
  • Fig. 9 schematically shows a block diagram of a training device for a text erasure model according to an embodiment of the present disclosure.
  • an apparatus 900 for training a character erasing model may include: a first obtaining module 910 , a second obtaining module 920 , and a first determining module 930 .
  • the first obtaining module 910 is configured to use the generator of the generative adversarial network model to process the original text block image set to obtain the simulated text block erasure image set, wherein the generative adversarial network model includes a generator and a discriminator.
  • the second obtaining module 920 is used to alternately train the generator and the discriminator by using the real block-erased image set and the simulated block-erased image set to obtain the trained generator and discriminator.
  • the first determining module 930 is configured to determine the trained generator as a text erasing model.
  • the pixel values of the text erasing area in the real text block erasing image included in the real text block erasing image set are based on the values of other areas in the real text block erasing image except the text erasing area The pixel value is determined.
  • the original text block image set includes a first original text block image set and a second original text block image set
  • the simulated text block erasing image set includes the first simulated text block erasing image set and the second simulation Text block erase image set
  • the first obtaining module 910 may include: a first generating submodule and a second generating submodule.
  • the first generation sub-module is used to use the generator to process the first original text block image set to generate the first simulated text block erasing image set.
  • the second generation sub-module is used to use the generator to process the second original text block image set to generate the second simulated text block erasing image set.
  • the set of real block-erase images includes a first real block-erase image and a second real block-erase image.
  • the second obtaining module 920 may include: a first training submodule, a second training submodule, an execution submodule, and an obtaining submodule.
  • the first training sub-module is used to train the discriminator by using the first set of real text block erasing images and the first set of simulated text block erasing images.
  • the second training sub-module is used to train the generator by using the second simulated text block erasing image set.
  • the execution sub-module is used to alternately execute the operation of training the discriminator and the operation of training the generator until the convergence condition of the generative confrontation network model is met.
  • the obtaining sub-module is used to determine the generator and the discriminator obtained under the condition of meeting the convergence condition of the generative confrontation network model as the trained generator and discriminator.
  • the first real block-erased image set includes a plurality of first real block-erased images
  • the first simulated block-erased image set includes a plurality of first simulated block-erased images
  • the first training sub-module may include: a first obtaining unit, a second obtaining unit, and a training unit.
  • the first obtaining unit is configured to input each first real character block erased image in the first real character block erased image set to the discriminator to obtain a first discrimination result corresponding to the first real character block erased image.
  • the second obtaining unit is configured to input each first simulated block-erased image in the first simulated-block-erased image into the discriminator to obtain a second discrimination result corresponding to the first simulated-block-erased image.
  • the training unit is used to train the discriminator based on the first discrimination result and the second discrimination result.
  • the first training submodule may further include: a third obtaining unit and a first adjusting unit.
  • the third obtaining unit is configured to obtain the first output value by using the first discrimination result and the second discrimination result based on the first loss function while keeping the model parameters of the generator unchanged.
  • the first adjustment unit is configured to adjust the model parameters of the discriminator according to the first output value to obtain adjusted model parameters of the discriminator.
  • the second training submodule may include: a fourth obtaining unit and a second adjusting unit.
  • the fourth obtaining unit is configured to use the second simulation text block to erase the image set based on the second loss function while keeping the adjusted model parameters of the discriminator unchanged, so as to obtain the second output value.
  • the second adjustment unit adjusts the model parameters of the generator according to the second output value.
  • the first loss function includes a discriminator loss function and a minimum mean square error loss function
  • the second loss function includes a generator loss function and a minimum mean square error loss function, a discriminator loss function, a minimum mean square error loss function, and a minimum mean square error loss function.
  • Squareness Error Loss and Generator Loss are both loss functions that include a regularization term.
  • Fig. 10 schematically shows a block diagram of an apparatus for displaying translations according to an embodiment of the present disclosure.
  • the translation presentation device 1000 may include: a third obtaining module 1010 , a second determining module 1020 , a fourth obtaining module 1030 , and a displaying module 1040 .
  • the third obtaining module 1010 is used to process the image of the target original text block by using the text erasure model to obtain the erased image of the target text block.
  • the image of the target original text block includes the target original text block.
  • the second determination module 1020 is used to determine the display parameters of the translation.
  • the fourth obtaining module 1030 is configured to superimpose the target text block corresponding to the target original text block on the target text erasing image according to the translation presentation parameters to obtain the target translation text block image.
  • the display module 1040 is used to display target translation text block images.
  • the text erasing model is trained by using the above text erasing model training method.
  • the translation display apparatus 1000 may further include: a conversion module.
  • the transformation module is used to transform the text box into a square text box by affine transformation when it is determined that the text box corresponding to the target original text block is not a square text box.
  • the target original text block image includes a plurality of target sub-original text block images.
  • the translation display device 1000 may further include: a splicing module.
  • the splicing module is used for splicing multiple target sub-original text block images to obtain the target original text block image.
  • the translated display parameters include translated pixel values.
  • the second determination module 1020 may include: a first determination submodule, a second determination submodule, and a third determination submodule.
  • the first determination sub-module is used to determine the text area of the target original text block image.
  • the second determination sub-module is used to determine the pixel mean value of the text area of the target original text block image.
  • the third determination sub-module is used to determine the pixel mean value of the text area of the target original text block image as the translation pixel value.
  • the first determining submodule may include: a fifth obtaining unit, a first determining unit, a second determining unit, a third determining unit, and a fourth determining unit.
  • the fifth obtaining unit is configured to process the target original text block image by image binarization to obtain the first image area and the second image area.
  • the first determination unit is configured to determine a first pixel mean value of the target original text block image corresponding to the first image area.
  • the second determination unit is configured to determine a second pixel mean value of the target original text block image corresponding to the second image area.
  • the third determination unit is configured to determine a third pixel mean value corresponding to the erased image of the target character block.
  • the fourth determination unit is configured to determine the text area of the target original text block image according to the first pixel average value, the second pixel average value and the third pixel average value.
  • the fourth determination unit may include: a first determination subunit and a second determination subunit.
  • the first determining subunit is used to determine that the absolute value of the difference between the first pixel mean value and the third pixel mean value is smaller than the absolute value of the difference between the second pixel mean value and the third pixel mean value.
  • the first image area corresponding to the first pixel mean value is determined as the text area of the target original text block image.
  • the second determining subunit is used for determining that the absolute value of the difference between the first pixel mean value and the third pixel mean value is greater than or equal to the absolute value of the difference value between the second pixel mean value and the third pixel mean value , determining the second image area corresponding to the second pixel mean value as the text area of the target original text block image.
  • the translation display parameter includes a translation arrangement parameter value
  • the translation arrangement parameter value includes a translation display line number and/or a translation display height
  • the second determining module 1020 may also include: a fourth determining submodule.
  • the fourth determination sub-module is used to determine the number of translation display lines and/or translation display height according to the height and width of the text area corresponding to the erased image of the target text block and the height and width corresponding to the target translation text block.
  • the fourth determining submodule includes: a fifth determining unit, a sixth determining unit, a setting unit, a repeating unit, and a seventh determining unit.
  • the fifth determination unit is configured to determine the sum of widths corresponding to the text block of the target translation.
  • the sixth determining unit is used to set the number of translation display lines corresponding to the target translation text block as i lines, wherein the height of each line in the i lines is 1 of the height of the text area corresponding to the target text block erasing image /i, i is an integer greater than or equal to 1.
  • the repeating unit is configured to repeatedly execute the operation of determining whether the sum of widths is less than or equal to the preset width threshold corresponding to row i until it is determined that the sum of widths is less than or equal to the preset width threshold corresponding to row i.
  • the seventh determination unit is used to determine the i line as the number of translation display lines and/or the text corresponding to the target character block erasing image when the width is determined to be less than or equal to the preset width threshold corresponding to the i line 1/i of the height of the area is determined as the translation display height.
  • the translation arrangement parameter value includes the translation display direction, and the translation display direction is determined according to the text direction of the target original text block.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • an electronic device includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein, the memory stores instructions executable by at least one processor, and the instructions are processed by at least one The processor is executed, so that at least one processor can perform the method as described above.
  • non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the method as described above.
  • a computer program product includes a computer program, and the computer program implements the above method when executed by a processor.
  • FIG. 11 schematically shows a block diagram of an electronic device suitable for implementing a text erasure model training method or a translation presentation method according to an embodiment of the present disclosure.
  • Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • an electronic device 1100 includes a computing unit 1101, which can perform calculations according to a computer program stored in a read-only memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a random access memory (RAM) 1103. Various appropriate actions and processes are performed. In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 can also be stored.
  • the computing unit 1101, ROM 1102, and RAM 1103 are connected to each other through a bus 1104.
  • An input/output (I/O) interface 1105 is also connected to the bus 1104 .
  • the I/O interface 1105 Multiple components in the electronic device 1100 are connected to the I/O interface 1105, including: an input unit 1106, such as a keyboard, a mouse, etc.; an output unit 1107, such as various types of displays, speakers, etc.; a storage unit 1108, such as a magnetic disk, an optical disk etc.; and a communication unit 1109, such as a network card, a modem, a wireless communication transceiver, and the like.
  • the communication unit 1109 allows the electronic device 1100 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
  • the computing unit 1101 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 1101 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the calculation unit 1101 executes various methods and processes described above, such as a method for training a character erasure model or a method for displaying translations.
  • the method for training a text erasure model or the method for displaying translations can be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as the storage unit 1108 .
  • part or all of the computer program can be loaded and/or installed on the electronic device 1100 via the ROM 1102 and/or the communication unit 1109.
  • the computing unit 1101 may be configured in any other appropriate way (for example, by means of firmware) to execute a method for training a text erasure model or a method for displaying translations.
  • Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system of systems
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • programmable processor can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
  • Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, a server of a distributed system, or a server combined with a blockchain.
  • steps may be reordered, added or deleted using the various forms of flow shown above.
  • each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.

Abstract

本公开提供了一种文字擦除模型的训练方法、译文展示方法、装置、电子设备以及存储介质,涉及人工智能技术领域,具体为计算机视觉和深度学习领域,可应用于OCR光学字符识别等场景。具体实现方案为:利用生成对抗网络模型的生成器处理原文文字块图像集,得到仿真文字块擦除图像集,其中,生成对抗网络模型包括生成器和判别器;利用真实文字块擦除图像集和仿真文字块擦除图像集,对生成器和判别器进行交替训练,得到训练完成的生成器和判别器;以及将训练完成的生成器确定为文字擦除模型;其中,真实文字块擦除图像集包括的真实文字块擦除图像中的文字擦除区域的像素值是根据真实文字块擦除图像中除文字擦除区域以外的其他区域的像素值确定的。

Description

训练方法、译文展示方法、装置、电子设备以及存储介质
本申请要求于2021年8月17日提交的、申请号为202110945871.0的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及人工智能技术领域,具体为计算机视觉和深度学习技术领域,可应用于OCR光学字符识别等场景。具体地,涉及一种训练方法、译文展示方法、装置、电子设备以及存储介质。
背景技术
随着全球化的推进,各国之间在学术、商业、生活等上的交流变得日益频繁,但各国的语言存在差异,用户可以通过翻译应用将一种语言的文字翻译为另一种语言的文字,方便交流。
拍照翻译是一种新的翻译产品形态,目前的拍照翻译功能的输入是一张带源语种文字的图像,输出是返回带有目标翻译语种丈字的图像。
发明内容
本公开提供了一种训练方法、译文展示方法、装置、电子设备以及存储介质。
根据本公开的一方面,提供了一种文字擦除模型的训练方法,包括:利用生成对抗网络模型的生成器处理原文文字块图像集,得到仿真文字块擦除图像集,其中,上述生成对抗网络模型包括上述生成器和判别器;利用真实文字块擦除图像集和上述仿真文字块擦除图像集,对上述生成器和上述判别器进行交替训练,得到训练完成的生成器和判别器;以及将上述训练完成的生成器确定为上述文字擦除模型;其中,上述真实文字块擦除图像集包括的真实文字块擦除图像中的文字擦除区域的像素值是根据上述真实文字块擦除图像中除上述文字擦除区域以外的其他区域的像素值确定的。
根据本公开的另一方面,提供了一种译文展示方法,包括:利用文字擦除模型处理目标原文文字块图像,得到目标文字块擦除图像,上述目标原文文字块图像包括目标原文文字块;确定译文展示参数;根据上述译文展示参数,将与上述目标原文文字块对应 的译文文字块叠加至上述目标文本擦除图像上,得到目标译文文字块图像;以及展示上述目标译文文字块图像;其中,上述文字擦除模型是利用根据上述的方法训练的。
根据本公开的另一方面,提供了一种文字擦除模型的训练装置,包括:第一获得模块,用于利用生成对抗网络模型的生成器处理原文文字块图像集,得到仿真文字块擦除图像集,其中,上述生成对抗网络模型包括上述生成器和判别器;第二获得模块,用于利用真实文字块擦除图像集和上述仿真文字块擦除图像集,对上述生成器和上述判别器进行交替训练,得到训练完成的生成器和判别器;以及第一确定模块,用于将上述训练完成的生成器确定为上述文字擦除模型;其中,上述真实文字块擦除图像集包括的真实文字块擦除图像中的文字擦除区域的像素值是根据上述真实文字块擦除图像中除上述文字擦除区域以外的其他区域的像素值确定的。
根据本公开的另一方面,提供一种译文展示装置,包括:第三获得模块,用于利用文字擦除模型处理目标原文文字块图像,得到目标文字块擦除图像,上述目标原文文字块图像包括目标原文文字块;第二确定模块,用于确定译文展示参数;第四获得模块,用于根据上述译文展示参数,将与上述目标原文文字块对应的译文文字块叠加至上述目标文本擦除图像上,得到目标译文文字块图像;以及展示模块,用于展示上述目标译文丈字块图像;其中,上述文字擦除模型是利用根据上述的方法训练的。
根据本公开的另一方面,提供了一种电子设备,包括:至少一个处理器;以及与上述至少一个处理器通信连接的存储器;其中,上述存储器存储有可被上述至少一个处理器执行的指令,上述指令被上述至少一个处理器执行,以使上述至少一个处理器能够执行如上所述的方法。
根据本公开的另一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,上述计算机指令用于使上述计算机执行如上所述的方法。
根据本公开的另一方面,提供了一种计算机程序产品,包括计算机程序,上述计算机程序在被处理器执行时实现如上所述的方法。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本公开的限定。其中:
图1示意性示出了根据本公开实施例的可以应用文字擦除模型的训练方法、译文展 示方法及装置的示例性系统架构;
图2示意性示出了根据本公开实施例的文字擦除模型的训练方法的流程图;
图3示意性示出了根据本公开实施例的利用第一真实文字块擦除图像集和第一仿真文字块擦除图像集对判别器进行训练的流程图;
图4示意性示出了根据本公开实施例的文字擦除模型的训练过程的示意图;
图5示意性示出了根据本公开实施例的译文展示方法的流程图;
图6示意性示出了根据本公开实施例的确定译文展示行数和/或译文展示高度的流程图;
图7示意性示出了根据本公开实施例的译文展示过程的示意图;
图8A示意性示出了根据本公开实施例的文字擦除过程的示意图;
图8B示意性示出了根据本公开实施例的译文贴合过程的示意图;
图9示意性示出了根据本公开实施例的文字擦除模型的训练装置的框图;
图10示意性示出了根据本公开实施例的译文展示装置的框图;以及
图11示意性示出了根据本公开实施例的适于实现文字擦除模型的训练方法或译文展示方法的电子设备的框图。
具体实施方式
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
拍照翻译技术可以包括:对包含文字的场景进行拍照获取图像,然后对获取的图像中的文本行的文字内容进行识别;再对文字内容进行机器翻译,得到翻译后的文字内容;将翻译后的文字内容展示给用户。如需在图像原文本行直接展示翻译结果,则需要先将图像中原文本行中的文本进行擦除,然后将译文贴回原文本行位置展示翻译结果。
在实现本公开构思的过程中,发现一种技术方案在于:对原图像中的文字擦除时,可以直接将原图像中文字区域做模糊滤波处理,或取文字块区的颜色平均值来填充整个区域,使用户达到视觉上擦除原文字的效果。但是,这样容易造成文字区域与图像的其他背景部分区分明显,使得擦除效果欠佳,影响用户的视觉体验。
为此,本公开实施例提供了一种文字擦除模型的训练方法、译文展示方法、装置、 电子设备、存储有计算机指令的非瞬时计算机可读存储介质及计算机程序产品。该文字擦除模型的训练方法包括:利用生成对抗网络模型的生成器处理训练集,得到仿真文字块擦除图像集,其中,生成对抗网络模型包括生成器和判别器。利用真实文字块擦除图像集和仿真文字块擦除图像集,对生成器和判别器进行交替训练,得到训练完成的生成器和判别器。将训练完成的生成器确定为文字擦除模型。真实文字块擦除图像集包括的真实文字块擦除图像中的文字擦除区域的像素值是根据真实文字块擦除图像中除文字擦除区域以外的其他区域的像素值确定的。
图1示意性示出了根据本公开实施例的可以应用文字擦除模型的训练方法、译文展示方法及装置的示例性系统架构。
需要注意的是,图1所示仅为可以应用本公开实施例的系统架构的示例,以帮助本领域技术人员理解本公开的技术内容,但并不意味着本公开实施例不可以用于其他设备、系统、环境或场景。例如,在另一实施例中,可以应用内容处理方法及装置的示例性系统架构可以包括终端设备,但终端设备可以无需与服务器进行交互,即可实现本公开实施例提供的内容处理方法及装置。
如图1所示,根据该实施例的系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线和/或无线通信链路等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如知识阅读类应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端和/或社交平台软件等(仅为示例)。
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。
服务器105可以是提供各种服务的服务器,例如对用户利用终端设备101、102、103所浏览的内容提供支持的后台管理服务器(仅为示例)。后台管理服务器可以对接收到的用户请求等数据进行分析等处理,并将处理结果(例如根据用户请求获取或生成的网页、信息、或数据等)反馈给终端设备。
需要说明的是,本公开实施例所提供的文字擦除模型的训练方法和译文展示方法一般可以由终端设备101、102、或103执行。相应地,本公开实施例所提供的文字擦除模型的训练装置和译文展示装置也可以设置于终端设备101、102、或103中。
或者,本公开实施例所提供的文字擦除模型的训练方法和译文展示方法一般也可以由服务器105执行。相应地,本公开实施例所提供的文字擦除模型的训练装置和译文展示装置一般可以设置于服务器105中。本公开实施例所提供的文字擦除模型的训练方法和译文展示方法也可以由不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群执行。相应地,本公开实施例所提供的方法也可以设置于不同于服务器105且能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群中。
例如,服务器105利用生成对抗网络模型的生成器处理训练集,得到仿真文字块擦除图像集,其中,生成对抗网络模型包括生成器和判别器。利用真实文字块擦除图像集和仿真文字块擦除图像集,对生成器和判别器进行交替训练,得到训练完成的生成器和判别器。将训练完成的生成器确定为文字擦除模型。或者由能够与终端设备101、102、103和/或服务器105通信的服务器或服务器集群利用真实文字块擦除图像集和仿真文字块擦除图像集,对生成器和判别器进行交替训练,并获得文字擦除模型,即,训练完成的生成器。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
图2示意性示出了根据本公开实施例的文字擦除模型的训练方法的流程图。
如图2所示,该方法200包括操作S210~S230。
在操作S210,利用生成对抗网络模型的生成器处理原文文字块图像集,得到仿真文字块擦除图像集,其中,生成对抗网络模型包括生成器和判别器。
在操作S220,利用真实文字块擦除图像集和仿真文字块擦除图像集,对生成器和判别器进行交替训练,得到训练完成的生成器和判别器。
在操作S230,将训练完成的生成器确定为文字擦除模型。
根据本公开的实施例,真实文字块擦除图像集包括的真实文字块擦除图像中的文字擦除区域的像素值,是根据真实文字块擦除图像中除文字擦除区域以外的其他区域的像素值确定的。
根据本公开的实施例,文字块图像可以包括文字擦除区域和除文字擦除区域之外的其他背景区域。文字块擦除可以为把输入文字块图像中的文字擦除区域的文字擦除,在擦除的同时也保留了原有背景的纹理颜色。
根据本公开的实施例,生成对抗网络模型可以包括深度卷积生成对抗网络模型、基 于推土机距离的生成对抗网络模型或条件性生成对抗网络模型等。生成对抗网络模型可以包括生成器和判别器。生成器和判别器可以包括神经网络模型。生成器可以用于生成仿真文字块擦除图像集,并通过不断训练生成器使学习到真实文字块擦除图像集,从而能够从无到有生成与真实文字块擦除图像集的数据分布相符合的样本,并尽可能的去混淆判别器。判别器可以用于对真实文字块擦除图像集和仿真文字块擦除图像集。
根据本公开的实施例,基于推土机距离的生成对抗网络模型可以解决生成器和判别器的训练不同步、训练不收敛和模式崩溃的问题,提高了数据生成模型的模型质量。
根据本公开的实施例,基于推土机距离的生成对抗网络模型的训练过程如下:预先设定学习率、批处理数量(即真实文字块擦除图像集包括的真实文字块擦除图像的数量)、神经网络模型的模型参数范围、最大迭代次数和每次迭代的训练次数。
根据本公开的实施例,利用真实文字块擦除图像集和仿真文字块擦除图像集,对生成器和判别器进行迭代交替训练,使得生成器和判别器通过它们之间的博弈,实现各自的优化,最终使得判别器无法准确区分真实文字块擦除图像集和仿真文字块擦除图像集,即,达到纳什平衡。在此情况下,可以认为生成器学习到了真实文字块擦除图像集的数据分布,将训练完成的生成器确定为文字擦除模型。
根据本公开的实施例,利用真实丈字块擦除图像集和仿真文字块擦除图像集,对生成器和判别器进行迭代交替训练可以包括:在每次迭代过程中,在保持生成器的模型参数不变的情况下,利用真实丈字块擦除图像集和仿真文字块擦除图像集训练判别器,以完成该次迭代针对判别器设定的训练次数。在完成该迭代针对判别器设定的训练次数之后,在保持判别器的模型参数不变的情况下,利用仿真文字块擦除图像集训练生成器,已完成该次迭代针对生成器设定的训练次数。需要说明的是,在执行每次训练过程中,可以利用生成器生成与该次对应的仿真文字块擦除图像集。上述生成器和判别器的训练方式仅是示例性实施例,但不限于此,还可以包括本领域已知的训练方式,只要能够实现生成器和判别器的训练即可。
根据本公开的实施例,可以根据实际需求选择合适的训练策略,在此不作限定。例如,训练策略可以包括以下之一:在每次迭代中,生成器的训练次数和判别器的训练次数为一次、生成器的训练次数为一次且判别器的训练次数为多次、生成器的训练次数为多次且判别器的训练次数为一次、生成器的训练次数为多次且判别器的训练次数为多次。
根据本公开的实施例,通过利用生成对抗网络模型的生成器处理原文文字块图像 集,得到仿真文字块擦除图像集,利用真实文字块擦除图像集和仿真文字块擦除图像集,对生成器和判别器进行交替训练,得到训练完成的生成器和判别器,将训练完成的生成器确定为文字擦除模型,由于真实文字块擦除图像中的文字擦除区域的像素值是根据其他区域的像素值确定的,因此,使得文字擦除模型能够实现文字擦除区域的颜色与其他区域(即背景区域)尽量保持一致,由此,提高了擦除效果,进而提高了用户的视觉体验。
根据本公开的实施例,原文文字块图像训练集包括第一原文文字块图像集和第二原文文字块图像集,仿真文字块擦除图像集包括第一仿真文字块擦除图像集和第二仿真文字块擦除图像集。利用生成对抗网络模型的生成器处理原文文字块图像集,得到仿真文字块擦除图像集,可以包括如下操作。利用生成器处理第一原文文字块图像集,生成第一仿真文字块擦除图像集;利用生成器处理第二原文文字块图像集,生成第二仿真文字块擦除图像集。
根据本公开的实施例,利用生成器生成仿真文字块擦除图像集可以包括:可以将第一原文文字块图像集和第一随机噪声数据输入生成器,得到第一仿真文字块擦除图像集;将第一原文文字块图像集和第二随机噪声数据输入生成器,得到第二仿真文字块擦除图像集。第一随机噪声数据和第二随机噪声数据的形式可以包括高斯噪声。
根据本公开的实施例,真实丈字块擦除图像集包括第一真实文字块擦除图像集和第二真实文字块擦除图像集。利用真实丈字块擦除图像集和仿真文字块擦除图像集,对生成器和判别器进行交替训练,得到训练完成的生成器和判别器,可以包括如下操作。
利用第一真实文字块擦除图像集和第一仿真文字块擦除图像集对判别器进行训练。利用第二仿真文字块擦除图像集对生成器进行训练。交替执行对判别器进行训练的操作和对生成器进行训练的操作,直至满足生成对抗网络模型的收敛条件。将在满足生成对抗网络模型的收敛条件的情况下得到的生成器和判别器,确定为训练完成的生成器和判别器。
根据本公开的实施例,生成网络对抗模型的收敛条件可以包括生成器收敛、生成器和判别器均收敛或迭代达到终止条件,迭代达到终止条件可以包括迭代次数等于预设迭代次数。
根据本公开的实施例,交替执行对判别器进行训练的操作和对生成器进行训练的操作可以理解为:在第t次迭代过程中,在保持生成器的模型参数不变的情况下,利用真实文字块擦除图像集和第一仿真文字块擦除图像集对判别器进行训练,重复上述过程, 以完成该次迭代针对判别器设定的训练次数,t是大于或等于2的整数。在每次训练过程中,可以利用生成器生成与该次对应的第一仿真文字块图像集。
根据本公开的实施例,在完成该次迭代针对判别器设定的训练次数之后,在保持判别器的模型参数不变的情况下,利用第二仿真文字块擦除图像集对生成器进行训练,重复上述过程,以完成该次迭代针对生成器设定的训练次数。在每次训练过程中,可以利用生成器生成与该次对应的第二仿真文字块图像集。2≤t≤T,T表征预设迭代次数,t和T是整数。
根据本公开的实施例,针对第t次迭代,在保持生成器的模型参数不变的情况下中的生成器的模型参数是指在完成第t-1次迭代中针对生成器的最后一次训练之后得到的生成器的模型参数。在保持判别器的模型参数不变的情况下中的判别器的模型参数是指在完成第t次迭代中针对判别器的最后一次训练之后得到的判别器的模型参数。
下面参考图3~图4,结合具体实施例对根据本公开实施例所述的文字擦除模型的训练方法做进一步说明。
图3示意性示出了根据本公开实施例的利用第一真实文字块擦除图像集和第一仿真丈字块擦除图像集对判别器进行训练的流程图。
根据本公开的实施例,第一真实文字块擦除图像集包括多个第一真实文字块擦除图像,第一仿真丈字块擦除图像集包括多个第一仿真丈字块擦除图像。
如图3所示,该方法300包括操作S310~S330。
在操作S310,将第一真实文字块擦除图像集中的每个第一真实文字块擦除图像输入判别器,得到与第一真实文字块擦除图像对应的第一判别结果。
在操作S320,将第一仿真文字块擦除图像集中的每个第一仿真文字块擦除图像输入判别器,得到与第一仿真文字块擦除图像对应的第二判别结果。
在操作S330,基于第一判别结果和第二判别结果对判别器进行训练。
根据本公开的实施例,判别器实际上属于分类器,在将第一真实文字块擦除图像和第一仿真文字块擦除图像分别输入判别器之后,根据与第一真实文字块擦除图像对应的第一判别结果和与第一仿真文字块擦除图像对应的第二判别结果训练判别器,使得判别器无法准确确定输入其中的是第一真实文字块擦除图像还是第一仿真文字块擦除图像,即,使得与第一真实文字块擦除图像对应的第一判别结果和与第一仿真文字块擦除图像对应的第二判别结果尽可能相同。
根据本公开的实施例,基于第一判别结果和第二判别结果对判别器进行训练,可以 包括如下操作:
在保持生成器的模型参数不变的情况下,基于第一损失函数,利用第一判别结果和第二判别结果,得到第一输出值。根据第一输出值调整判别器的模型参数,得到调整后的判别器的模型参数。
根据本公开的实施例,利用第二仿真文字块擦除图像集对生成器进行训练,可以包括如下操作:
在保持调整后的判别器的模型参数不变的情况下,基于第二损失函数,利用第二仿真文字块擦除图像集,得到第二输出值;根据第二输出值调整生成器的模型参数。
根据本公开的实施例,在第t次迭代过程中,在保持生成器的模型参数不变的情况下,将与第一真实文字块擦除图像对应的第一判别结果和与第一仿真文字块擦除图像对应的第二判别结果输入第一损失函数,得出第一输出值。根据第一输出值调整判别器的模型参数,重复上述过程,以完成该次迭代针对判别器设定的训练次数。
根据本公开的实施例,在完成该次迭代针对判别器设定的训练次数之后,在保持调整后判别器的模型参数不变的情况下,将第二仿真文字块擦除图像集包括的每个第二仿真文字块擦除图像输入第二损失函数,得出第二输出值。根据第二输出值调整生成器的模型参数。重复上述过程,以完成该次迭代针对生成器设定的训练次数。
根据本公开的实施例,第一损失函数包括判别器损失函数和最小均方值误差损失函数,第二损失函数包括生成器损失函数和最小均方值误差损失函数,判别器损失函数、最小均方值误差损失函数和生成器损失函数均是包括正则项的损失函数。
根据本公开的实施例,第一损失函数包括的判别器损失函数、最小均方值误差损失函数和生成器损失函数均是包括正则项的损失函数,上述损失函数的结合使得在训练过程中便于去噪,使得文字擦除结果更加真实可靠。
图4示意性示出了根据本公开实施例的文字擦除模型的训练过程的示意图。
如图4所示,文字擦除模型的训练过程400可以包括:在每次迭代过程中,在保证生成器402的模型参数不变的情况下,将第一原文文字块图像集401输入生成器402,得到第一仿真文字块擦除图像集403。
将第一真实文字块擦除图像集404中的每个第一真实文字块擦除图像输入判别器405,得到与第一真实文字块擦除图像对应的第一判别结果406。将第一仿真文字块擦除图像集403中的每个第一仿真文字块擦除图像输入判别器405,得到与第一仿真文字块擦除图像对应的第二判别结果407。
将与第一真实文字块擦除图像对应的第一判别结果406和与第一仿真文字块擦除图像对应的第二判别结果407输入第一损失函数408,得出第一输出值409。根据第一输出值409调整判别器405的模型参数。重复上述过程,直至完成该次迭代针对判别器405的训练次数。
在完成该次迭代针对判别器405的训练次数之后,在保持判别器405的模型参数不变的情况下,将第二原文文字块图像集410输入生成器402,得到第二仿真文字块擦除图像集411。将第二仿真文字块擦除图像集411中的每个第二仿真文字块擦除图像输入第二损失函数412,得出第二输出值413。根据第二输出值413调整生成器402的模型参数。重复上述过程,直至完成该次迭代针对生成器402的训练次数。
交替执行上述对判别器405和生成器402的训练过程,直至满足生成对抗网络模型的收敛条件,训练完成。
图5示意性示出了根据本公开实施例的译文展示方法的流程图。
如图5所示,该方法500包括操作S510~S540。
在操作S510,利用文字擦除模型处理目标原文文字块图像,得到目标文字块擦除图像,目标原文文字块图像包括目标原文文字块。
在操作S520,确定译文展示参数。
在操作S530,根据译丈展示参数,将与目标原文文字块对应的译文丈字块叠加至目标文字擦除图像上,得到目标译文文字块图像。
在操作S540,展示目标译文文字块图像。
文字擦除模型是利用上述操作S210~S240的方法训练得到的。
根据本公开的实施例,目标原文文字块图像可以包括文字擦除区域和除文字擦除区域的其他背景区域,目标文字块擦除图像可以包括将目标原文文字块图像的文字擦除区域的文字擦除后的图像,目标原文文字块可以包括目标原文文字块图像中的文字擦除区域。
根据本公开的实施例,通过将目标原文文字块图像输入至文字擦除模型,得到目标文字块擦除图像。文字擦除模型是利用生成对抗网络模型的生成器生成仿真文字块图像集,利用真实文字块擦除图像集和仿真文字块图像集对生成对抗网络模型的生成器和判别器进行交替训练,得到训练完成的生成器和判别器,并将训练完成的生成器确定为文字擦除模型。
根据本公开的实施例,译文展示参数可以包括:目标原文文字块图像的文字擦除区 域的文字经过翻译之后的译文的文字排列参数值、文字颜色、文字位置等。
根据本公开的实施例,译文的文字排列参数值可以包括译文展示行数和/或译文展示高度、译文展示方向;译文的文字颜色可以由目标原文文字块图像的文字擦除区域的文字颜色确定;译文的文字位置可以与目标原文文字块图形的文字擦除区域所在文字位置相一致。
根据本公开的实施例,将译文叠加至与目标原文文字块图像中的文字擦除区域位置相对应的目标文字擦除图像上,得到目标译文文字块图像。
根据本公开的实施例,通过利用文字擦除模型处理目标原文文字块图像,得到目标文字块擦除图像,确定译文展示参数,根据译文展示参数,将与目标原文文字块对应的译文文字块叠加至目标文字擦除图像上,得到目标译文文字块图像,并展示目标译文文字块图像,有效的实现了文字块图像文字的翻译功能,使得展示的译文图像完整美观,从而提高用户的视觉体验。
根据本公开的实施例,在确定与目标原文文字块对应的文字框不是方形文字框的情况下,利用仿射变换将文本框变换为方形文字框。
根据本公开的实施例,在利用文字擦除模型处理目标原文文字块图像之前,基于段落检测模型,检测得出目标原丈丈字块图像的文字擦除区域的丈字框为形状不一的四边形丈字框,利用仿射变换将该形状不一的四边形文字框变换为方形文字框。该四边形文字框可以为目标原文文字块图像的文字擦除区域对应的文字框,方形文字框可以为矩形形状。
根据本公开的实施例,将变换为方形方文字框中的文字翻译的译文贴合至与目标原文文字块图像的文字擦除区域相对应的目标文字块擦除图像中之后,再次利用仿射变换将方形文字框进行逆变换,变换回与目标原文文字块图像的文字擦除区域对应的文字框的形状和大小均相同的四边形文字框。
根据本公开的实施例,仿射变换是一种二维坐标到二维坐标之间的线性变换,保持二维图形的“平直性”和“平行性”。平直性可以为变换后直线还是直线,不会打弯,圆弧还是圆弧;平行性可以为保持二维图形间的相对位置关系不变,平行线还是平行线,相交直线的交角不变。
根据本公开的实施例,仿射变换可以通过平移、缩放、翻转、旋转。剪切等实现。
根据本公开的实施例,例如,目标原文文字块图像的文字擦除区域对应的文字框为一个形状不规则的四边形方框,该形状不规则的四边形方框对应一倾斜的文字擦除区域 的文字内容,则该形状不规则的四边形方框的每一个角的位置信息表示不同的二维坐标,通过仿射变换将目标原文文字块图像的文字擦除区域对应的文字框校正为一矩形形状的四边形方框的二维坐标。
根据本公开的实施例,目标原文文字块图像可以包括多个目标子原文文字块图像。
根据本公开的实施例,该目标原文文字块图像可以包括将多个目标子原文文字块图像进行拼接而得到,将拼接而成的目标原文文字块图像输入文字擦除模型进行擦除。
根据本公开的实施例,例如,可以将多个目标子原文文字块图像进行归一化至固定高度,将该多个目标子原文文字块图像组合拼接成单张或多张规则排列的大图,作为目标原文文字块图像。
根据本公开的实施例,通过将多个目标子原文文字块图像进行拼接,得到目标原文文字块图像,将目标原文文字块图像输入文字擦除模型进行擦除,很大程度上降低了需要通过文字擦除模型的图像数目,提高了文字擦除的效率。
根据本公开的实施例,译文展示参数可以包括译文像素值。
根据本公开的实施例,确定译文展示参数,可以包括如下操作:
确定目标原文文字块图像的文字区域。确定目标原文文字块图像的文字区域的像素均值。目标原文文字块图像的丈字区域的像素均值确定为译文像素值。
根据本公开的实施例,确定目标原丈丈字块图像的文字区域,可以包括如下操作:
利用图像二值化处理目标原文文字块图像,得到第一图像区域和第二图像区域。确定与第一图像区域对应的目标原文文字块图像的第一像素均值。确定与第二图像区域对应的目标原文文字块图像的第二像素均值。确定与目标文字块擦除图像对应的第三像素均值。根据第一像素均值、第二像素均值和第三像素均值,确定目标原文文字块图像的文字区域。
根据本公开的实施例,图像二值化处理可以为设定一个阈值T,用阈值T将图像的数据分成两部分:像素值大于T的像素群和像素值小于T的像素群,使得整个图像呈现出明显的只有黑和白的视觉效果。
根据本公开的实施例,第一图像区域可以为目标原文文字块图像的文字擦除区域,也可以为除目标原文文字块图像的文字擦除区域以外的其他区域,第二图像区域可以为目标原文文字块图像的文字擦除区域,也可以为除目标原文文字块图像的文字擦除区域以外的其他区域。
根据本公开的实施例,例如,与第一图像区域对应的目标原文文字块图像的第一像 素均值可以表征为A1,与第二图像区域对应的目标原文文字块图像的第二像素均值可以表征为A2,与目标文字块擦除图像对应的第三像素均值可以表征为A3。
根据本公开的实施例,与目标文字块擦除图像对应的第三像素值可以根据目标文字块擦除图像中除文字擦除区域以外的其他区域的像素值确定。
根据本公开的实施例,根据第一像素均值、第二像素均值和第三像素均值,确定目标原文文字块图像的文字区域,可以包括如下操作:
在确定第一像素均值与第三像素均值之间的差值的绝对值小于第二像素均值与第三像素均值之间的差值的绝对值的情况下,将与第一像素均值对应的第一图像区域确定为目标原文文字块图像的文字区域。在确定第一像素均值与第三像素均值之间的差值的绝对值大于或等于第二像素均值与第三像素均值之间的差值的绝对值的情况下,将与第二像素均值对应的第二图像区域确定为目标原文文字块图像的文字区域。
根据本公开的实施例,基于目标文字块擦除图像对应的第三像素均值A3,将对与第一图像区域对应的目标原文文字块图像的第一像素均值A1和与第二图像区域对应的目标原文文字块图像的第二像素均值A2进行判定,确定目标原文文字块图像的文字区域。
根据本公开的实施例,例如,如果|A1-A3|<|A2-A3|,则A1对应的第一图像区域确定为目标原文文字块图像的丈字区域,A2对应的第二图像区域确定为除目标原文文字块图像的文字区域以外的其他区域。
根据本公开的实施例,如果|A1-A3|<|A2-A3|,则A2对应的第二图像区域确定为目标原文文字块图像的文字区域,A1对应的第一图像区域确定为除目标原文文字块图像的文字区域以外的其他区域。
根据本公开的实施例,译文展示参数可以包括译文排列参数值,译文排列参数值可以包括译文展示行数、译文展示高度、译文展示行数和译文展示高度。
根据本公开的实施例,确定展示参数,可以包括如下操作:根据与目标文字块擦除图像对应的文字区域的高度和宽度,以及与目标译文文字块对应的高度和宽度,确定译文展示行数和/或译文展示高度。
根据本公开的实施例,译文展示高度可以由目标文字块擦除图像对应的文字区域的高度确定。
根据本公开的实施例,译文文字宽度可以为将译文以一行排列时的文字宽度。根据译文的字体宽度和高度的比例可以得出将译文以一行排列时的译文文字宽度。
图6示意性示出了根据本公开实施例的确定译文展示行数和/或译文展示高度的流程图。
如图6所示,根据与目标文字块擦除图像对应的文字区域的高度和宽度,以及与目标译文文字块对应的高度和宽度,确定译文展示行数和/或译文展示高度,可以包括操作S610~S650。
在操作S610,确定与目标译文文字块对应的宽度和。
在操作S620,将与目标译文文字块对应的译文展示行数设置为i行,其中,i行中的每行的高度是与目标文字块擦除图像对应的文字区域的高度的1/i,i是大于或等于1的整数。
在操作S630,在确定宽度和大于与i行对应的预设宽度阈值的情况下,将与目标译文文字块对应的译文展示行数设置为i=i+1行,其中,预设宽度阈值是根据与目标文字块擦除图像对应的文字区域的宽度的i倍确定。
在操作S640,重复执行确定宽度和是否小于或等于与i行对应的预设宽度阈值的操作,直至确定宽度和小于或等于与i行对应的预设宽度阈值。
在操作S650,在确定宽度和小于或等于与i行对应的预设宽度阈值的情况下,将i行确定为译文展示行数和/或将与目标文字块擦除图像对应的文字区域的高度的1/i确定为译丈展示高度。
根据本公开的实施例,根据译文的字体宽度和高度的比例可以得出将译文以一行排列时的译文文字宽度,即,与目标译文文字块对应的文字宽度之和W 1
根据本公开的实施例,译文展示行数设置为i行,i行对应的预设宽度阈值W是根据与目标文字块擦除图像对应的文字区域宽度的i倍确定。
根据本公开的实施例,根据目标译文文字块对应的宽度和W 1与i行对应的预设宽度阈值W进行对比,确定译文展示行数和/或展示高度。
根据本公开的实施例,例如,目标原文文字块图像的文字区域的文字为“It’s cloudy and rainy”,将“It’s cloudy and rainy”经过翻译之后,目标译文为“多云多雨”。由此,与目标译文文字块对应的文字宽度为将目标译文块“多云多雨”以一行排列时的文字宽度之和,可以表征为W 1
根据本公开的实施例,目标文字块擦除图像对应的文字区域宽度为W 2,则译文展示行数i行对应的预设宽度阈值为W,则W=i×W 2
根据本公开的实施例,如果“多云多雨”的译文文字对应的译文展示行数为1行 (i=1),译文文字宽度之和W 1大于译文展示行数为1行对应的预设宽度阈值W=1×W 2,则说明用1行来排列目标译文文字块对应的译文不合适,则需要将译文展示行数设置为2行。此时,译文展示行为2行。
根据本公开的实施例,继续执行上述操作,译文文字宽度之和W 1大于译文展示行数为2行对应的预设宽度阈值W=2×W 2,则说明用2行来排列目标译文文字块对应的译文不合适,则需要将译文展示行数设置为3行。此时,译文展示行为3行。
根据本公开的实施例,重复执行上述操作,直到确定译文文字宽度之和W 1小于或等于i行对应的预设宽度阈值W=i×W 2时,将i行确定为译文展示行数,将与目标文字块擦除图像对应的文字区域的高度的1/i确定为译文展示高度。
根据本公开的实施例,例如,译文文字宽度之和W 1小于或等于译文展示行数为3行对应的预设宽度阈值W=3×W 2,则说明用3行来排列目标译文文字块对应的译文合适,则译文展示行数为3行,译文展示高度为目标文字块擦除图像对应的文字区域的高度的1/3。
根据本公开的实施例,译文排列参数值可以包括译文展示方向。译文展示方向可以为根据目标原文文字块的文字方向确定的。
根据本公开的实施例,目标原文文字块的文字区域的文字框为形状不一的四边形文字框,利用仿射变换将形状不一的四边形文字框变换为矩形文字框,便于文字擦除与译文贴合,译文贴合后的文字框再次利用仿射变换将其变换回与目标原文丈字块的丈字区域的形状不一的四边形文字框相同的文字区域的文字框形状,形成译文展示方向。
图7示意性示出了根据本公开实施例的译文展示过程的示意图。
如图7所示,将目标原文文字块图像701输入至文字擦除模型702进行文字擦除处理,得到目标文字块擦除图像703,确定译文展示参数704,根据译文展示参数704,将与目标原文文字块图像701中的目标原文文字块文字区域对应的译文文字块705叠加至目标文字块擦除图像703上,得到目标译文文字块图像706,并展示目标译文文字块图像706。
图8A示意性示出了根据本公开实施例的文字擦除过程800的示意图。
图8B示意性示出了根据本公开实施例的译文贴合过程800’的示意图。
如图8A所示,将原始图像801检测得到的原文文字块图像集802中的原文文字块图像803、804、805、806输入至文字擦除模型807中,将原文文字块图像集802中的原文文字块图像803、804、805、806的文字区域擦除,输出文字擦除后的文字块擦除 图像集808中的文字块擦除图像809、810、811、812。
在文字擦除过程800之后进行译文贴合过程800’。如图8B所示,将原文文字块图像集中的每一个原文文字块图形进行翻译,例如,原文文字块图像805的文字区域进行翻译,得到与原文文字块图像805的文字区域对应的译文文字块813。
确定译文文字块813的译文展示参数814,译文展示参数814包括:译文文字位置、译文文字排列参数值、译文像素值。
根据译文展示参数814,将译文文字块813叠加至文字块擦除图像集808中的文字块擦除图像811上,得到译文文字块图像815。
重复上述操作,将原文文字块图像集802中的每一个原文文字块图像进行文字擦除并进行文字贴合后,最终得到一张带有译文展示的译文图像816。
图9示意性示出了根据本公开实施例的文字擦除模型的训练装置的框图。
如图9所示,文字擦除模型的训练装置900可以包括:第一获得模块910、第二获得模块920、第一确定模块930。
第一获得模块910,用于利用生成对抗网络模型的生成器处理原文文字块图像集,得到仿真文字块擦除图像集,其中,生成对抗网络模型包括生成器和判别器。
第二获得模块920,用于利用真实丈字块擦除图像集和仿真文字块擦除图像集,对生成器和判别器进行交替训练,得到训练完成的生成器和判别器。
第一确定模块930,用于将训练完成的生成器确定为文字擦除模型。
根据本公开的实施例,真实文字块擦除图像集包括的真实文字块擦除图像中的文字擦除区域的像素值是根据真实文字块擦除图像中除文字擦除区域以外的其他区域的像素值确定的。
根据本公开的实施例,原文文字块图像集包括第一原文文字块图像集和第二原文文字块图像集,仿真文字块擦除图像集包括第一仿真文字块擦除图像集和第二仿真文字块擦除图像集。
第一获得模块910可以包括:第一生成子模块、第二生成子模块。
第一生成子模块,用于利用生成器处理第一原文文字块图像集,生成第一仿真文字块擦除图像集。
第二生成子模块,用于利用生成器处理第二原文文字块图像集,生成第二仿真文字块擦除图像集。
根据本公开的实施例,真实文字块擦除图像集包括第一真实文字块擦除图像和第二 真实文字块擦除图像。第二获得模块920可以包括:第一训练子模块、第二训练子模块、执行子模块、获得子模块。
第一训练子模块,用于利用第一真实文字块擦除图像集和第一仿真文字块擦除图像集对判别器进行训练。
第二训练子模块,用于利用第二仿真文字块擦除图像集对生成器进行训练。
执行子模块,用于交替执行对判别器进行训练的操作和对生成器进行训练的操作,直至满足生成对抗网络模型的收敛条件。
获得子模块,用于将在满足生成对抗网络模型的收敛条件的情况下得到的生成器和判别器,确定为训练完成的生成器和判别器。
根据本公开的实施例,第一真实文字块擦除图像集包括多个第一真实文字块擦除图像,第一仿真文字块擦除图像集包括多个第一仿真文字块擦除图像。
第一训练子模块可以包括:第一获得单元、第二获得单元、训练单元。
第一获得单元,用于将第一真实文字块擦除图像集中的每个第一真实文字块擦除图像输入判别器,得到与第一真实文字块擦除图像对应的第一判别结果。
第二获得单元,用于将第一仿真文字块擦除图像集中的每个第一仿真文字块擦除图像输入判别器,得到与第一仿真文字块擦除图像对应的第二判别结果。
训练单元,用于基于第一判别结果和第二判别结果对判别器进行训练。
根据本公开的实施例,第一训练子模块还可以包括:第三获得单元、第一调整单元。
第三获得单元,用于在保持生成器的模型参数不变的情况下,基于第一损失函数,利用第一判别结果和第二判别结果,得到第一输出值。
第一调整单元,用于根据第一输出值调整判别器的模型参数,得到调整后的判别器的模型参数。
其中,第二训练子模块可以包括:第四获得单元、第二调整单元。
第四获得单元,用于在保持调整后的判别器的模型参数不变的情况下,基于第二损失函数,利用第二仿真文字块擦除图像集,得到第二输出值。
第二调整单元,根据第二输出值调整生成器的模型参数。
根据本公开的实施例,第一损失函数包括判别器损失函数和最小均方值误差损失函数,第二损失函数包括生成器损失函数和最小均方值误差损失函数,判别器损失函数、最小均方值误差损失函数和生成器损失函数均是包括正则项的损失函数。
图10示意性示出了根据本公开实施例的译文展示装置的框图。
如图10所示,译文展示装置1000可以包括:第三获得模块1010、第二确定模块1020、第四获得模块1030、展示模块1040。
第三获得模块1010,用于利用文字擦除模型处理目标原文文字块图像,得到目标文字块擦除图像,目标原文文字块图像包括目标原文文字块。
第二确定模块1020,用于确定译文展示参数。
第四获得模块1030,用于根据译文展示参数,将与目标原文文字块对应的译文文字块叠加至目标文字擦除图像上,得到目标译文文字块图像。
展示模块1040,用于展示目标译文文字块图像。
其中,文字擦除模型是利用上述文字擦除模型训练方法训练的。
根据本公开的实施例,上述译文展示装置1000还可以包括:变换模块。
变换模块,用于在确定与目标原文文字块对应的文字框不是方形文字框的情况下,利用仿射变换将文本框变换为方形文字框。
根据本公开的实施例,目标原文文字块图像包括多个目标子原文文字块图像。
上述译文展示装置1000还可以包括:拼接模块。
拼接模块,用于将多个目标子原文文字块图像进行拼接,得到目标原文文字块图像。
根据本公开的实施例,译丈展示参数包括译丈像素值。
第二确定模块1020可以包括:第一确定子模块、第二确定子模块、第三确定子模块。
第一确定子模块,用于确定目标原文文字块图像的文字区域。
第二确定子模块,用于确定目标原文文字块图像的文字区域的像素均值。
第三确定子模块,用于将目标原文文字块图像的文字区域的像素均值确定为译文像素值。
根据本公开的实施例,第一确定子模块可以包括:第五获得单元、第一确定单元、第二确定单元、第三确定单元、第四确定单元。
第五获得单元,用于利用图像二值化处理目标原文文字块图像,得到第一图像区域和第二图像区域。
第一确定单元,用于确定与第一图像区域对应的目标原文文字块图像的第一像素均值。
第二确定单元,用于确定与第二图像区域对应的目标原文文字块图像的第二像素均值。
第三确定单元,用于确定与目标文字块擦除图像对应的第三像素均值。
第四确定单元,用于根据第一像素均值、第二像素均值和第三像素均值,确定目标原文文字块图像的文字区域。
根据本公开的实施例,第四确定单元可以包括:第一确定子单元、第二确定子单元。
第一确定子单元,用于在确定第一像素均值与第三像素均值之间的差值的绝对值小于第二像素均值与第三像素均值之间的差值的绝对值的情况下,将与第一像素均值对应的第一图像区域确定为目标原文文字块图像的文字区域。
第二确定子单元,用于在确定第一像素均值与第三像素均值之间的差值的绝对值大于或等于第二像素均值与第三像素均值之间的差值的绝对值的情况下,将与第二像素均值对应的第二图像区域确定为目标原文文字块图像的文字区域。
根据本公开的实施例,译文展示参数包括译文排列参数值,译文排列参数值包括译文展示行数和/或译文展示高度。
第二确定模块1020还可以包括:第四确定子模块。
第四确定子模块,用于根据与目标文字块擦除图像对应的文字区域的高度和宽度,以及与目标译文文字块对应的高度和宽度,确定译文展示行数和/或译文展示高度。
根据本公开的实施例,第四确定子模块包括:第五确定单元、第六确定单元、设置单元、重复单元、第七确定单元。
第五确定单元,用于确定与目标译文文字块对应的宽度和。
第六确定单元,用于将与目标译文文字块对应的译文展示行数设置为i行,其中,i行中的每行的高度是与目标文字块擦除图像对应的文字区域的高度的1/i,i是大于或等于1的整数。
设置单元,用于在确定宽度和大于与i行对应的预设宽度阈值的情况下,将与目标译文文字块对应的译文展示行数设置为i=i+1行,其中,预设宽度阈值是根据与目标文字块擦除图像对应的文字区域的宽度的i倍确定。
重复单元,用于重复执行确定宽度和是否小于或等于与i行对应的预设宽度阈值的操作,直至确定宽度和小于或等于与i行对应的预设宽度阈值。
第七确定单元,用于在确定宽度和小于或等于与i行对应的预设宽度阈值的情况下,将i行确定为译文展示行数和/或将与目标文字块擦除图像对应的文字区域的高度的1/i确定为译文展示高度。
根据本公开的实施例,译文排列参数值包括译文展示方向,译文展示方向是根据目 标原文文字块的文字方向确定的。
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。
根据本公开的实施例,一种电子设备,包括:至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行如上所述的方法。
根据本公开的实施例,一种存储有计算机指令的非瞬时计算机可读存储介质,其中,计算机指令用于使计算机执行如上所述的方法。
根据本公开的实施例,一种计算机程序产品,包括计算机程序,计算机程序在被处理器执行时实现如上所述的方法。
在本公开的技术方案中,所涉及的用户个人信息的收集、存储、使用、加工、传输、提供、公开和应用等处理,均符合相关法律法规的规定,采取了必要保密措施,且不违背公序良俗。
在本公开的技术方案中,在获取或采集用户个人信息之前,均获取了用户的授权或同意。图11示意性示出了根据本公开实施例的适于实现文字擦除模型的训练方法或译文展示方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。
如图11所示,电子设备1100包括计算单元1101,其可以根据存储在只读存储器(ROM)1102中的计算机程序或者从存储单元1108加载到随机访问存储器(RAM)1103中的计算机程序,来执行各种适当的动作和处理。在RAM 1103中,还可存储电子设备1100操作所需的各种程序和数据。计算单元1101、ROM 1102以及RAM 1103通过总线1104彼此相连。输入/输出(I/O)接口1105也连接至总线1104。
电子设备1100中的多个部件连接至I/O接口1105,包括:输入单元1106,例如键盘、鼠标等;输出单元1107,例如各种类型的显示器、扬声器等;存储单元1108,例如磁盘、光盘等;以及通信单元1109,例如网卡、调制解调器、无线通信收发机等。通信单元1109允许电子设备1100通过诸如因特网的计算机网络和/或各种电信网络与其他 设备交换信息/数据。
计算单元1101可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1101的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元1101执行上文所描述的各个方法和处理,例如文字擦除模型的训练方法或译文展示方法。例如,在一些实施例中,文字擦除模型的训练方法或译文展示方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元1108。在一些实施例中,计算机程序的部分或者全部可以经由ROM 1102和/或通信单元1109而被载入和/或安装到电子设备1100上。当计算机程序加载到RAM 1103并由计算单元1101执行时,可以执行上文描述的文字擦除模型的训练方法或译文展示方法的一个或多个步骤。备选地,在其他实施例中,计算单元1101可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行文字擦除模型的训练方法或译文展示方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限 于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,也可以是分布式系统的服务器,或者是结合了区块链的服务器。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。

Claims (20)

  1. 一种文字擦除模型的训练方法,包括:
    利用生成对抗网络模型的生成器处理原文文字块图像集,得到仿真文字块擦除图像集,其中,所述生成对抗网络模型包括所述生成器和判别器;
    利用真实文字块擦除图像集和所述仿真文字块擦除图像集,对所述生成器和所述判别器进行交替训练,得到训练完成的生成器和判别器;以及
    将所述训练完成的生成器确定为所述文字擦除模型;
    其中,所述真实文字块擦除图像集包括的真实文字块擦除图像中的文字擦除区域的像素值是根据所述真实文字块擦除图像中除所述文字擦除区域以外的其他区域的像素值确定的。
  2. 根据权利要求1所述的方法,其中,所述原文文字块图像集包括第一原文文字块图像集和第二原文文字块图像集,所述仿真文字块擦除图像集包括第一仿真文字块擦除图像集和第二仿真文字块擦除图像集;
    所述利用生成对抗网络模型的生成器处理原文文字块图像集,得到仿真文字块擦除图像集,包括:
    利用所述生成器处理所述第一原文文字块图像集,生成所述第一仿真文字块擦除图像集;以及
    利用所述生成器处理所述第二原文文字块图像集,生成所述第二仿真文字块擦除图像集。
  3. 根据权利要求2所述的方法,其中,所述真实文字块擦除图像集包括第一真实文字块擦除图像集和第二真实文字块擦除图像集;
    所述利用真实文字块擦除图像集和所述仿真文字块擦除图像集,对所述生成器和所述判别器进行交替训练,得到训练完成的生成器和判别器,包括:
    利用所述第一真实文字块擦除图像集和所述第一仿真文字块擦除图像集对所述判别器进行训练;
    利用所述第二仿真文字块擦除图像集对所述生成器进行训练;
    交替执行对所述判别器进行训练的操作和对所述生成器进行训练的操作,直至满足所述生成对抗网络模型的收敛条件;以及
    将在满足所述生成对抗网络模型的收敛条件的情况下得到的生成器和判别器,确定为所述训练完成的生成器和判别器。
  4. 根据权利要求3所述的方法,其中,所述第一真实文字块擦除图像集包括多个第一真实文字块擦除图像,所述第一仿真文字块擦除图像集包括多个第一仿真文字块擦除图像;
    所述利用所述第一真实文字块擦除图像集和所述第一仿真文字块擦除图像集对所述判别器进行训练,包括:
    将所述第一真实文字块擦除图像集中的每个所述第一真实文字块擦除图像输入所述判别器,得到与所述第一真实文字块擦除图像对应的第一判别结果;
    将所述第一仿真文字块擦除图像集中的每个所述第一仿真文字块擦除图像输入所述判别器,得到与所述第一仿真文字块擦除图像对应的第二判别结果;以及
    基于所述第一判别结果和所述第二判别结果对所述判别器进行训练。
  5. 根据权利要求4所述的方法,其中,所述基于第一判别结果和第二判别结果对所述判别器进行训练,包括:
    在保持所述生成器的模型参数不变的情况下,基于第一损失函数,利用第一判别结果和第二判别结果,得到第一输出值;以及
    根据所述第一输出值调整所述判别器的模型参数,得到调整后的判别器的模型参数;
    其中,所述利用所述第二仿真文字块擦除图像集对所述生成器进行训练,包括:
    在保持所述调整后的判别器的模型参数不变的情况下,基于第二损失函数,利用所述第二仿真文字块擦除图像集,得到第二输出值;以及
    根据所述第二输出值调整所述生成器的模型参数。
  6. 根据权利要求5所述的方法,其中,所述第一损失函数包括判别器损失函数和最小均方值误差损失函数,所述第二损失函数包括生成器损失函数和所述最小均方值误差损失函数,所述判别器损失函数、所述最小均方值误差损失函数和所述生成器损失函数均是包括正则项的损失函数。
  7. 一种译文展示方法,包括:
    利用文字擦除模型处理目标原文文字块图像,得到目标文字块擦除图像,所述目标原文文字块图像包括目标原文文字块;
    确定译文展示参数;
    根据所述译文展示参数,将与所述目标原文文字块对应的译文文字块叠加至所述目标文字擦除图像上,得到目标译文文字块图像;以及
    展示所述目标译文文字块图像;
    其中,所述文字擦除模型是利用根据权利要求1~6中任一项所述的方法训练的。
  8. 根据权利要求7所述的方法,还包括:
    在确定与所述目标原文文字块对应的文字框不是方形文字框的情况下,利用仿射变换将所述文本框变换为所述方形文字框。
  9. 根据权利要求7或8所述的方法,其中,所述目标原文文字块图像包括多个目标子原文文字块图像;
    所述方法还包括:
    将所述多个目标子原文文字块图像进行拼接,得到所述目标原文文字块图像。
  10. 根据权利要求7~9中任一项所述的方法,其中,所述译文展示参数包括译文像素值;
    所述确定译文展示参数,包括:
    确定所述目标原文文字块图像的文字区域;
    确定所述目标原文文字块图像的文字区域的像素均值;以及
    将所述目标原文文字块图像的文字区域的像素均值确定为所述译文像素值。
  11. 根据权利要求10所述的方法,其中,所述确定所述目标原文文字块图像的文字区域,包括:
    利用图像二值化处理所述目标原文文字块图像,得到第一图像区域和第二图像区域;
    确定与所述第一图像区域对应的目标原文文字块图像的第一像素均值;
    确定与所述第二图像区域对应的目标原文文字块图像的第二像素均值;
    确定与所述目标文字块擦除图像对应的第三像素均值;以及
    根据所述第一像素均值、所述第二像素均值和所述第三像素均值,确定所述目标原文文字块图像的文字区域。
  12. 根据权利要求11所述的方法,其中,所述根据所述第一像素均值、所述第二像素均值和所述第三像素均值,确定所述目标原文文字块图像的文字区域,包括:
    在确定所述第一像素均值与所述第三像素均值之间的差值的绝对值小于所述第二像素均值与所述第三像素均值之间的差值的绝对值的情况下,将与所述第一像素均值对应的第一图像区域确定为所述目标原文文字块图像的文字区域;以及
    在确定所述第一像素均值与所述第三像素均值之间的差值的绝对值大于或等于所 述第二像素均值与所述第三像素均值之间的差值的绝对值的情况下,将与所述第二像素均值对应的第二图像区域确定为所述目标原文文字块图像的文字区域。
  13. 根据权利要求7~12中任一项所述的方法,其中,所述译文展示参数包括译文排列参数值,所述译文排列参数值包括译文展示行数和/或译文展示高度;
    所述确定译文展示参数,包括:
    根据与所述目标文字块擦除图像对应的文字区域的高度和宽度,以及与所述目标译文文字块对应的高度和宽度,确定所述译文展示行数和/或所述译文展示高度。
  14. 根据权利要求13所述的方法,其中,所述根据与所述目标文字块擦除图像对应的文字区域的高度和宽度,以及与所述目标译文文字块对应的高度和宽度,确定所述译文展示行数和/或所述译文展示高度,包括:
    确定与所述目标译文文字块对应的宽度和;
    将与所述目标译文文字块对应的译文展示行数设置为i行,其中,所述i行中的每行的高度是与所述目标文字块擦除图像对应的文字区域的高度的1/i,i是大于或等于1的整数;
    在确定所述宽度和大于与所述i行对应的预设宽度阈值的情况下,将与所述目标译文文字块对应的译文展示行数设置为i=i+1行,其中,所述预设宽度阈值是根据与所述目标文字块擦除图像对应的文字区域的宽度的i倍确定;
    重复执行确定所述宽度和是否小于或等于与所述i行对应的预设宽度阈值的操作,直至确定所述宽度和小于或等于与所述i行对应的预设宽度阈值;以及
    在确定所述宽度和小于或等于与所述i行对应的预设宽度阈值的情况下,将所述i行确定为所述译文展示行数和/或将与所述目标文字块擦除图像对应的文字区域的高度的1/i确定为所述译文展示高度。
  15. 根据权利要求7~14中任一项所述的方法,其中,所述译文排列参数值包括译文展示方向,所述译文展示方向是根据所述目标原文文字块的文字方向确定的。
  16. 一种文字擦除模型的训练装置,包括:
    第一获得模块,用于利用生成对抗网络模型的生成器处理原文文字块图像集,得到仿真文字块擦除图像集,其中,所述生成对抗网络模型包括所述生成器和判别器;
    第二获得模块,用于利用真实文字块擦除图像集和所述仿真文字块擦除图像集,对所述生成器和所述判别器进行交替训练,得到训练完成的生成器和判别器;以及
    第一确定模块,用于将所述训练完成的生成器确定为所述文字擦除模型;
    其中,所述真实文字块擦除图像集包括的真实文字块擦除图像中的文字擦除区域的像素值是根据所述真实文字块擦除图像中除所述文字擦除区域以外的其他区域的像素值确定的。
  17. 一种译文展示装置,包括:
    第三获得模块,用于利用文字擦除模型处理目标原文文字块图像,得到目标文字块擦除图像,所述目标原文文字块图像包括目标原文文字块;
    第二确定模块,用于确定译文展示参数;
    第四获得模块,用于根据所述译文展示参数,将与所述目标原文文字块对应的译文文字块叠加至所述目标文本擦除图像上,得到目标译文文字块图像;以及
    展示模块,用于展示所述目标译文文字块图像;
    其中,所述文字擦除模型是利用根据权利要求1~6中任一项所述的方法训练的。
  18. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1~6中任一项或权利要求7~15中任一项所述的方法。
  19. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1~6中任一项或权利要求7~15中任一项所述的方法。
  20. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1~6中任一项或权利要求7~15中任一项所述的方法。
PCT/CN2022/088395 2021-08-17 2022-04-22 训练方法、译文展示方法、装置、电子设备以及存储介质 WO2023019995A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023509866A JP2023541351A (ja) 2021-08-17 2022-04-22 文字消去モデルのトレーニング方法及び装置、訳文表示方法及び装置、電子機器、記憶媒体、並びにコンピュータプログラム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110945871.0 2021-08-17
CN202110945871.0A CN113657396B (zh) 2021-08-17 2021-08-17 训练方法、译文展示方法、装置、电子设备以及存储介质

Publications (1)

Publication Number Publication Date
WO2023019995A1 true WO2023019995A1 (zh) 2023-02-23

Family

ID=78492142

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/088395 WO2023019995A1 (zh) 2021-08-17 2022-04-22 训练方法、译文展示方法、装置、电子设备以及存储介质

Country Status (3)

Country Link
JP (1) JP2023541351A (zh)
CN (1) CN113657396B (zh)
WO (1) WO2023019995A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657396B (zh) * 2021-08-17 2024-02-09 北京百度网讯科技有限公司 训练方法、译文展示方法、装置、电子设备以及存储介质
CN117274438B (zh) * 2023-11-06 2024-02-20 杭州同花顺数据开发有限公司 一种图片翻译方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160217117A1 (en) * 2015-01-27 2016-07-28 Abbyy Development Llc Smart eraser
CN109492627A (zh) * 2019-01-22 2019-03-19 华南理工大学 一种基于全卷积网络的深度模型的场景文本擦除方法
CN111429374A (zh) * 2020-03-27 2020-07-17 中国工商银行股份有限公司 图像中摩尔纹的消除方法及装置
CN111723585A (zh) * 2020-06-08 2020-09-29 中国石油大学(华东) 一种风格可控的图像文本实时翻译与转换方法
CN112465931A (zh) * 2020-12-03 2021-03-09 科大讯飞股份有限公司 图像文本抹除方法、相关设备及可读存储介质
CN113657396A (zh) * 2021-08-17 2021-11-16 北京百度网讯科技有限公司 训练方法、译文展示方法、装置、电子设备以及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3829667B2 (ja) * 2001-08-21 2006-10-04 コニカミノルタホールディングス株式会社 画像処理装置、画像処理方法、画像処理方法実行のためのプログラム及びプログラムを記憶した記憶媒体
CN111127593B (zh) * 2018-10-30 2023-10-31 珠海金山办公软件有限公司 一种文档内容擦除方法、装置、电子设备及可读存储介质
CN111612081B (zh) * 2020-05-25 2024-04-02 深圳前海微众银行股份有限公司 识别模型的训练方法、装置、设备及存储介质
CN112580623B (zh) * 2020-12-25 2023-07-25 北京百度网讯科技有限公司 图像生成方法、模型训练方法、相关装置及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160217117A1 (en) * 2015-01-27 2016-07-28 Abbyy Development Llc Smart eraser
CN109492627A (zh) * 2019-01-22 2019-03-19 华南理工大学 一种基于全卷积网络的深度模型的场景文本擦除方法
CN111429374A (zh) * 2020-03-27 2020-07-17 中国工商银行股份有限公司 图像中摩尔纹的消除方法及装置
CN111723585A (zh) * 2020-06-08 2020-09-29 中国石油大学(华东) 一种风格可控的图像文本实时翻译与转换方法
CN112465931A (zh) * 2020-12-03 2021-03-09 科大讯飞股份有限公司 图像文本抹除方法、相关设备及可读存储介质
CN113657396A (zh) * 2021-08-17 2021-11-16 北京百度网讯科技有限公司 训练方法、译文展示方法、装置、电子设备以及存储介质

Also Published As

Publication number Publication date
CN113657396B (zh) 2024-02-09
CN113657396A (zh) 2021-11-16
JP2023541351A (ja) 2023-10-02

Similar Documents

Publication Publication Date Title
US20240078646A1 (en) Image processing method, image processing apparatus, and non-transitory storage medium
WO2023019995A1 (zh) 训练方法、译文展示方法、装置、电子设备以及存储介质
US20190287283A1 (en) User-guided image completion with image completion neural networks
US10726599B2 (en) Realistic augmentation of images and videos with graphics
WO2023019974A1 (zh) 文档图像的矫正方法、装置、电子设备和存储介质
EP3876197A2 (en) Portrait extracting method and apparatus, electronic device and storage medium
WO2023035531A1 (zh) 文本图像超分辨率重建方法及其相关设备
EP3998583A2 (en) Method and apparatus of training cycle generative networks model, and method and apparatus of building character library
US20230047748A1 (en) Method of fusing image, and method of training image fusion model
CN114821734A (zh) 一种驱动虚拟人物表情的方法和装置
JP7418370B2 (ja) 髪型を変換するための方法、装置、デバイス及び記憶媒体
US20220189083A1 (en) Training method for character generation model, character generation method, apparatus, and medium
CN108597034B (zh) 用于生成信息的方法和装置
CN115147265A (zh) 虚拟形象生成方法、装置、电子设备和存储介质
WO2023024653A1 (zh) 图像处理方法、图像处理装置、电子设备以及存储介质
CN114792355A (zh) 虚拟形象生成方法、装置、电子设备和存储介质
US20210312686A1 (en) Method and apparatus for generating human body three-dimensional model, device and storage medium
WO2024051632A1 (zh) 图像处理方法、装置、介质及设备
WO2023134143A1 (zh) 图像样本生成方法、文本识别方法、装置、设备和介质
EP4318314A1 (en) Image acquisition model training method and apparatus, image detection method and apparatus, and device
US20220319141A1 (en) Method for processing image, device and storage medium
US11875601B2 (en) Meme generation method, electronic device and storage medium
CN115082298A (zh) 图像生成方法、装置、电子设备以及存储介质
CN115375847A (zh) 材质恢复方法、三维模型的生成方法和模型的训练方法
CN113240780A (zh) 生成动画的方法和装置

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2023509866

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE