CN111539184A

CN111539184A - Text data manufacturing method and device based on deep learning, terminal and storage medium

Info

Publication number: CN111539184A
Application number: CN202010355494.0A
Authority: CN
Inventors: 周康明; 胡威
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2020-08-14

Abstract

The application provides a text data manufacturing method, a text data manufacturing device, a text data manufacturing terminal and a text data storage medium based on deep learning, wherein the text data manufacturing method comprises the following steps: preprocessing original character data to generate corresponding character texts; performing image processing on the character text to generate a corresponding character image; and constructing a generation countermeasure network model for manufacturing the sample image, and adding a space transformation network serving as a constraint condition to the constructed generation countermeasure network model so that the manufactured sample image learns the space position information of the distorted text image. According to the method, on the basis of manufacturing the data sample by adopting a traditional image processing mode, the space transformation network is added into the generation countermeasure network, so that the sample can better learn the information such as the space position of the sample, and the fitting performance of the sample on distortion, rotation, jitter and the like is improved. Meanwhile, a parameter penalty factor q is introduced when a network loss value is calculated, and the distribution of real sample characteristics in the generated samples is adjusted manually according to the distribution condition of the generated samples.

Description

Text data manufacturing method and device based on deep learning, terminal and storage medium

Technical Field

The present application relates to the field of data manufacturing technologies, and in particular, to a text data manufacturing method and apparatus based on deep learning, a terminal, and a storage medium.

Background

With the widespread application of computer vision technology, deep learning has become the basis of this application field. Due to the characteristics of faster speed, larger scale of processed data, stronger universality and the like of the deep learning technology compared with the traditional image processing technology, the method has better theoretical basis for introducing the deep learning technology into the manufacturing and training of data samples.

At present, the deep learning image processing technology is widely applied in the fields of target positioning, classification retrieval, face recognition and the like. At present, the idea of deep learning is applied to data set production, the problem that the universality of a model is restricted by data shortage, single data distribution and the like is solved, the recognition of the deep learning in a natural scene has better applicability, and the method is a hotspot research direction in the field.

However, the conventional method for manufacturing text data is mainly based on the conventional image processing method, and although a method based on deep learning is also used for sample manufacturing, the application is not comprehensive at present, and the method is basically used in specific fields such as face recognition, certain fixed scenes and the like. Although the traditional image processing method can meet the requirements of uniform distribution and large quantity of created data, the traditional image processing method often cannot create 'mythical' samples according to the style of a real scene, so that many created data are redundant repeatedly.

Therefore, how to apply the countermeasure generation network to the application, in combination with the conventional image processing method, data closer to the real style can be created under the condition of less data samples, so that the data distribution is more uniform, and the usability of the created data is improved, is a technical problem to be solved in the art.

Content of application

In view of the above-mentioned drawbacks of the prior art, it is an object of the present application to provide a text data manufacturing method, apparatus, terminal and storage medium based on deep learning, which solve the problems in the prior art.

To achieve the above and other related objects, a first aspect of the present application provides a text data manufacturing method based on deep learning, including: preprocessing original character data to generate corresponding character texts; performing image processing on the character text to generate a corresponding character image; and constructing a generation countermeasure network model for manufacturing the sample image, and adding a space transformation network serving as a constraint condition to the constructed generation countermeasure network model so that the manufactured sample image learns the space position information of the distorted text image.

In some embodiments of the first aspect of the present application, the preprocessing the raw character data to generate corresponding character text includes: generating a character file in which a correspondence table between Chinese characters and English is recorded; expanding the character file to generate character files of different language versions; character files of different font styles of each language version are collected.

In some embodiments of the first aspect of the present application, the image processing the character text to generate a corresponding text image includes: defining input parameters; the input parameters include: any one or more combinations of input and output catalogues, character and font file catalogues, image sizes, rotation angles, generation quantity ratios and rotation angles; reading the character file by utilizing a Chinese character generating function, and presetting font format parameters to generate sample data; and performing data enhancement processing on the sample data.

In some embodiments of the first aspect of the present application, the data enhancement processing on the sample data includes: adding any one or more of random noise, swelling corrosion and channel variation.

In some embodiments of the first aspect of the present application, constructing a generative confrontation network model for manufacturing a sample image, and adding a spatial transformation network as a constraint condition to the constructed generative confrontation network model, so that the manufactured sample image learns spatial position information of a warped text image, includes: constructing a discriminator network for discriminating true and false attributes of the sample; constructing a generator network for outputting generated samples; constructing a generative confrontation network model comprising the generator network and a generator network; adding a space transformation network as a conditional constraint in the generation countermeasure network model; constructing a loss function for generating a confrontation network model, wherein the loss function comprises a loss function when a real image is used as discrimination and a loss function when a generated image is used as discrimination; a penalty strength parameter is introduced for the loss function to increase the weight of a first loss value when the first loss value is high when using the real image as the discrimination, and to increase the weight of a second loss value when the second loss value is high when using the generated image as the discrimination.

In some embodiments of the first aspect of the present application, the generating the antagonistic network model comprises: a generator network for inputting the three-dimensional noise vector and generating and outputting a corresponding false image; a discriminator network having as inputs a real image and a false image output by the generator network, respectively; a spatial transformation network; wherein the generative countermeasure network model outputs a prediction value for predicting the image true and false based on the discriminator network and the spatial transform network.

In some embodiments of the first aspect of the present application, the method comprises: and after the input image is processed by the space conversion network, an affine transformation matrix is obtained through calculation, and the converted image is output based on the affine transformation matrix.

To achieve the above and other related objects, a second aspect of the present application provides a text data manufacturing apparatus based on deep learning, comprising: the preprocessing module is used for preprocessing the original character data to generate a corresponding character text; the image processing module is used for carrying out image processing on the character text to generate a corresponding character image; and the model construction module is used for constructing a generation countermeasure network model for manufacturing the sample image and adding a space transformation network serving as a constraint condition into the constructed generation countermeasure network model so as to enable the manufactured sample image to learn the space position information of the distorted text image.

To achieve the above and other related objects, a third aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the deep learning-based text data manufacturing method.

To achieve the above and other related objects, a fourth aspect of the present application provides an electronic terminal comprising: a processor and a memory; the memory is used for storing computer programs, and the processor is used for executing the computer programs stored by the memory so as to enable the terminal to execute the text data manufacturing method based on deep learning.

As described above, the text data manufacturing method, apparatus, terminal and storage medium based on deep learning according to the present application have the following advantages: according to the method, on the basis of manufacturing the data sample by adopting a traditional image processing mode, the space transformation network is added into the generation countermeasure network, so that the manufactured sample can better learn the information such as the space position of the sample, the fitting performance of the sample such as distortion, rotation and shake is improved, and the learning effect is closer to the real sample. Meanwhile, a parameter penalty factor q is introduced when a network loss value is calculated, and the distribution of real sample characteristics in generated samples can be artificially and dynamically adjusted according to the distribution condition of the generated samples.

Drawings

Fig. 1 is a flowchart illustrating a text data manufacturing method based on deep learning according to an embodiment of the present application.

Fig. 2 is a schematic flow chart illustrating a process of constructing a generation countermeasure network model according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of generating a countermeasure network model according to an embodiment of the present application.

Fig. 4A is a schematic structural diagram of a spatial transform network according to an embodiment of the present application.

Fig. 4B is a schematic diagram illustrating a transformation effect of a spatial transformation network according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of an apparatus for generating text data based on deep learning according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of an electronic terminal according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

It is noted that in the following description, reference is made to the accompanying drawings which illustrate several embodiments of the present application. It is to be understood that other embodiments may be utilized and that mechanical, structural, electrical, and operational changes may be made without departing from the spirit and scope of the present application. The following detailed description is not to be taken in a limiting sense, and the scope of embodiments of the present application is defined only by the claims of the issued patent. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. Spatially relative terms, such as "upper," "lower," "left," "right," "lower," "below," "lower," "above," "upper," and the like, may be used herein to facilitate describing one element or feature's relationship to another element or feature as illustrated in the figures.

In this application, unless expressly stated or limited otherwise, the terms "mounted," "connected," "secured," "retained," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.

Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and/or "comprising," when used in this specification, specify the presence of stated features, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions or operations are inherently mutually exclusive in some way.

The existing method for manufacturing text data is mainly based on the traditional image processing mode, although a method based on deep learning is used for sample manufacturing, the application is not comprehensive at present, and the method is basically used in specific fields such as face recognition, certain fixed scenes and the like. Although the traditional image processing method can meet the requirements of uniform distribution and large quantity of created data, the traditional image processing method often cannot create 'mythical' samples according to the style of a real scene, so that many created data are redundant repeatedly.

In view of this, the present invention provides a text data manufacturing method, apparatus, terminal and storage medium based on deep learning, and it should be understood that the deep learning method provided in this patent mainly applies a deep learning technique to the manufacturing of text data samples, such as text data manufacturing in natural scenes, such as identity cards and insurance policies, which are common in daily life. Due to the fact that collected data in real life are limited by factors such as sample distribution is not wide, data are single, types are deficient, and the like, the learned recognition model is not robust enough, and scene applicability is not strong.

Therefore, on the basis of manufacturing the data sample by adopting a traditional image processing mode, the space transformation network is added into the generation countermeasure network, so that the manufactured sample can better learn the information such as the space position of the sample, the fitting performance of the sample such as distortion, rotation and shake is improved, and the learning effect is closer to the real sample. Meanwhile, a parameter penalty factor q is introduced when a network loss value is calculated, and the distribution of real sample characteristics in generated samples can be artificially and dynamically adjusted according to the distribution condition of the generated samples.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention are further described in detail by the following embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example one

Fig. 1 is a flow chart illustrating a text data manufacturing method based on deep learning according to an embodiment of the present invention. The text data manufacturing method of the present embodiment mainly includes steps S11 to S13.

Step S11: the raw data is preprocessed to generate corresponding character text.

1) Generating a character file in which a correspondence table between chinese characters and english is described, the character file including: writing characters to be generated into a character file, and generating id data corresponding to Chinese characters and English characters according to a character writing sequence; and storing the character file so as to inquire corresponding characters according to the id data.

For example, a corresponding label table of Chinese characters and English characters can be generated by a pickel module of python, firstly, the characters to be generated are written into a txt file, then, corresponding ids of the Chinese characters and the English characters are generated according to the writing sequence, and the character file corresponding to one is stored, so that the corresponding characters can be quickly and conveniently found according to a certain id.

2) And expanding the character file to generate a plurality of character files with different language versions.

In other words, if the pattern does not satisfy the chinese-english character sample, the corresponding character text may be generated based on other versions of languages, for example, the corresponding table between the chinese character and the arabic letter is generated based on the arabic letter, or the corresponding table between the chinese character and the latin letter is generated based on the latin letter, and the generating manner of the corresponding table is similar to that of the corresponding table between the chinese character and the english letter, and thus, the description thereof is omitted.

3) Character files of different font styles of each language version are collected. Font styles include, but are not limited to: style fonts such as a song style, a regular style, an italic style, a black style, artistic fonts and the like, but the embodiment is not limited.

Specifically, because the fonts of texts in many real environments are very different, such as sons' style, regular script, italics, blackbody, artistic characters, etc., font formats of different styles should be collected as much as possible in the early preparation of data to increase the richness of the fonts.

Step S12: performing image processing on the character text to generate a corresponding character image, and specifically comprising the following steps:

first, input parameters are defined, including but not limited to input-output directories, character and font file directories, image sizes, rotation angles, generation number ratios, rotation angle settings, and the like.

And secondly, reading the character file by utilizing a Chinese character generating function, and presetting font format parameters to generate sample data.

Specifically, at the stage of generating the image, a Chinese character generating function carried by a PIL tool provided in python is utilized to read the character text from the memory, and font format parameters such as font, background color, font size and the like are set to complete sample data generation.

And finally, performing data enhancement on the sample data. Because the data format of the previously generated data sample is single, and more samples with similar repetition exist, the data expansion is necessary to be carried out by the traditional OpenCV image processing mode, and therefore, the data enhancement can be carried out by adding random noise, swelling corrosion, channel change and other modes so as to increase the variety and richness of the sample data.

Step S13: and constructing a generation countermeasure network model for manufacturing the sample image, and adding a space transformation network serving as a constraint condition to the constructed generation countermeasure network model so that the manufactured sample image learns the space position information of the distorted text image. Step S13 in turn includes various sub-steps shown in fig. 2.

Step S131: and constructing a discriminator network for discriminating the true and false attributes of the sample. The function of the discriminator network is to discriminate the true and false attributes of the sample, which takes the output of the generator as the input of the network. The hierarchical structure of the network can be designed according to the number of classes and the complexity, generally comprises a plurality of layers of convolutions, each layer of the hierarchical structure is standardized to accelerate the convergence of the network, and finally, an activation function is used to obtain the classification result of each sample.

Step S132: a generator network is constructed for outputting the generated samples. The generator network is opposite to the discriminator network, converts the image sample into a vector of digital data through the deconvolution layer, and then the vector is used as input, and various generated samples are output through the deconvolution layer.

Step S133 builds a generative confrontation network model including the generator network and the generator network.

Specifically, the generated countermeasure network (GAN) is composed of a generator and a discriminator, the generator is used for making the generated samples difficult to distinguish the true and false of the generated samples as much as possible, the discriminator is used for fully learning the rule of the generated data, the true and false of the generated samples generated by the generator are identified as much as possible, and the two are maintained in a balanced state in the process of continuously learning interaction.

Before the generation of the countermeasure network of the present embodiment is used, input parts including real images and generated false images, learning rate parameters of the network, category data, image normalization size, and the like are defined. To facilitate understanding by those skilled in the art, the generation of the countermeasure network of the present embodiment will now be further explained with reference to fig. 3.

As shown in fig. 3, a schematic structural diagram of the generation of the countermeasure network in the present embodiment is shown. The generation countermeasure Network includes a Generator Network 31(Generator Network), a Discriminator Network 32(Discriminator Network), and a Spatial Transformer Network 33(Spatial Transformer Network); wherein, the Generator Network 31(Generator Network) inputs the three-dimensional Noise Vector (D-dimension Noise Vector) and outputs the false image (Fake Images); the Discriminator Network 22(Discriminator Network) inputs the Real Images (Real Images) and the false Images (Fake Images) output from the Generator Network 31Generator Network, respectively; a prediction value (Predicted Labels) for predicting the image genuineness is output based on a Discriminator Network 32(Discriminator Network) and a Spatial Transformer Network 33(Spatial Transformer Network).

Step S134: and adding a space transformation network as a conditional constraint in the generation countermeasure network model. That is, to increase the applicability of samples to rotation warped samples, the present embodiment introduces a spatial transformation network as a conditional constraint of the generator network.

Specifically, the affine transformation is a spatial coordinate transformation in the form of cropping, translation, scaling, or rotation of an image. For an input original image, the input original image outputs a parameter theta of an affine transformation matrix, wherein U is an input image, the position of an output pixel point in the original image is reversely deduced by utilizing the transformation matrix and the coordinates of the pixel point of the output image, the input original image corresponding to the output image is generated by utilizing a bilinear interpolation mode and the like, the spatial position information of a distorted text image is learned, and the category diversification of a generated network output sample is increased.

As shown in fig. 4A, U is an input image, and an affine transformation matrix is obtained by processing the input image with a spatial transformation Network (spatialtransform Network), and an image V is output. The spatial transformer Network (spatialtransform Network) is composed of a localization Network (localization Network), a Grid generator (Grid generator), and a sampler (sampler). The positioning network (localization net) is used for inputting the image U and outputting an affine transformation parameter theta; the Grid generator (Grid generator) is used for outputting a parameterized sampling Grid, namely, the input mapping generates expected conversion output through sampling; the sampler (sampler) performs bilinear sampling through the input image U and the parametric sampling grid and outputs an image V.

The generated effect graph is shown in fig. 4B, where the left side of the graph is the input original graph I, the right side is the corrected graph I ', and the pixel point P ' in the corrected graph I ' is transformed by the correction matrix T to obtain the pixel point P corresponding to the original graph I.

Step S135: a loss function generating a countermeasure network model is constructed, which includes using the real image as a loss function at the time of discrimination and using the generated image as a loss function at the time of discrimination.

The loss function generated against the network typically includes a generator loss function and a discriminator loss function. Wherein the loss function of the discriminator is composed of the loss function when the real image is used and the loss function when the image is generated. Because the real image and the generated image are input into the discriminator to be discriminated, the label smoothing mode is used to set probability parameters, so that the probability output value of each class is approximately expressed as: label _ smooth _ Probability) + (/ K), where K is the number of classes, so that each class of generated samples has a certain confidence Probability, and the labels of the generated samples are subjected to absolute analysis, thereby preventing the over-fitting phenomenon from occurring and reducing the final generated samples from being over-concentrated in a certain feature distribution to a certain extent.

Step S136: a penalty strength parameter is introduced for the loss function to increase the weight of a first loss value when the first loss value is high when using the real image as the discrimination, and to increase the weight of a second loss value when the second loss value is high when using the generated image as the discrimination.

In order to further reduce the network loss value, a penalty degree parameter q with a value range of (0,1) is introduced at the stage of calculating the loss value by the network. For example: if the loss value is higher when the real image is used as the judgment in the training process, the punishment degree q can be increased, so that the judgment device can pay more attention to the fitting of the characteristic dimension of the real sample; on the contrary, when the generated sample is used as the judgment, the loss value is high, the punishment degree q can be reduced, so that the punishment coefficient (1-q) is increased, the network pays more attention to the characteristics of the generated sample, the fitting degree of the real sample in the training process can be adjusted manually, and the similarity of the generated sample close to the real sample is controlled to a certain degree. The loss function is expressed approximately as:

loss _ total ═ q ═ Loss _ true _ images + (1-q) · Loss _ false _ image; formula 1)

Wherein q represents a penalty degree; (1-q) represents a penalty coefficient; loss _ true _ images represents Loss values when using a real image as discrimination; loss _ false _ image represents a Loss value when using the generated sample as discrimination.

The data manufacturing method based on deep learning according to the present invention is widely applicable to the creation of text type data, but may be applied to a specific data manufacturing scenario such as a person or a landscape after targeted preparation and improvement of data, and the present invention is not limited thereto.

According to the method, on the basis of manufacturing the data sample by adopting a traditional image processing mode, the space transformation network is added into the generation countermeasure network, so that the manufactured sample can better learn the information such as the space position of the sample, the fitting performance of the sample such as distortion, rotation and shake is improved, and the learning effect is closer to the real sample. Meanwhile, a parameter penalty factor q is introduced when a network loss value is calculated, and the distribution of real sample characteristics in generated samples can be artificially and dynamically adjusted according to the distribution condition of the generated samples.

It should be noted that the text data manufacturing method based on deep learning provided by the invention can be applied to various hardware devices. Examples of the hardware devices are arm (advanced RISC machines) controllers, fpga (field programmable Gate array) controllers, soc (system on chip) controllers, dsp (digital signal processing) controllers, mcu (micro controller unit) controllers, and the like. In some embodiments, the hardware device may also be a computer that includes components such as memory, storage controllers, one or more processing units (CPU), peripheral interfaces, RF circuitry, audio circuitry, speakers, microphones, input/output (I/O) subsystems, display screens, other output or control devices, and external ports; the computer includes, but is not limited to, Personal computers such as desktop computers, notebook computers, tablet computers, smart phones, smart televisions, Personal Digital Assistants (PDAs), and the like. In other embodiments, the hardware device may also be a server, where the server may be arranged on one or more entity servers according to various factors such as functions and loads, or may be formed by a distributed or centralized server cluster, and this embodiment is not limited in this embodiment.

Example two

Fig. 5 is a schematic structural diagram of a text data generating apparatus based on deep learning according to an embodiment of the present invention. The text data manufacturing apparatus in the present embodiment includes a preprocessing module 51, an image processing module 52, and a model construction module 53.

Specifically, the preprocessing module 51 is configured to preprocess the original character data to generate a corresponding character text; the image processing module 52 is configured to perform image processing on the character text to generate a corresponding text image; the model construction module 53 is configured to construct a generative confrontation network model for manufacturing the sample image, and add a spatial transformation network as a constraint condition to the constructed generative confrontation network model, so that the manufactured sample image learns spatial position information of the distorted text image.

It should be noted that, the embodiment of the text data manufacturing apparatus based on deep learning in this embodiment is similar to the embodiment of the text data manufacturing method based on deep learning in the first embodiment, and therefore, the detailed description is omitted here.

It should be understood that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the preprocessing module may be a processing element that is separately set up, or may be implemented by being integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the preprocessing module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

EXAMPLE III

Fig. 6 is a schematic structural diagram of an electronic terminal according to an embodiment of the present invention. This example provides an electronic terminal, includes: a processor 61, a memory 62, a communicator 63; the memory 62 is connected with the processor 61 and the communicator 63 through a system bus and completes mutual communication, the memory 62 is used for storing computer programs, the communicator 63 is used for communicating with other devices, and the processor 61 is used for operating the computer programs, so that the electronic terminal executes the steps of the text data manufacturing method based on deep learning.

The above-mentioned system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

Example four

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the deep learning-based text data manufacturing method.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the above method embodiments may be performed by hardware associated with a computer program. The aforementioned computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

In summary, the present invention provides a text data manufacturing method, apparatus, terminal and storage medium based on deep learning, and the method adds a spatial transformation network to a generated countermeasure network on the basis of manufacturing a data sample by using a conventional image processing method, so that the manufactured sample has better information such as a spatial position of a learning sample, and the fitting performance of the sample to distortion, rotation and jitter is increased, so that the learning effect is closer to a real sample. Meanwhile, a parameter penalty factor q is introduced when a network loss value is calculated, and the distribution of real sample characteristics in generated samples can be artificially and dynamically adjusted according to the distribution condition of the generated samples. Therefore, the application effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.

Claims

1. A text data manufacturing method based on deep learning is characterized by comprising the following steps:

preprocessing original character data to generate corresponding character texts;

performing image processing on the character text to generate a corresponding character image;

and constructing a generation countermeasure network model for manufacturing the sample image, and adding a space transformation network serving as a constraint condition to the constructed generation countermeasure network model so that the manufactured sample image learns the space position information of the distorted text image.

2. The method of claim 1, wherein preprocessing the raw character data to generate corresponding character text comprises:

generating a character file in which a correspondence table between Chinese characters and English is recorded;

expanding the character file to generate character files of different language versions;

character files of different font styles of each language version are collected.

3. The method of claim 1, wherein the image processing the character text to generate a corresponding text image comprises:

defining input parameters; the input parameters include: any one or more combinations of input and output catalogues, character and font file catalogues, image sizes, rotation angles, generation quantity ratios and rotation angles;

reading the character file by utilizing a Chinese character generating function, and presetting font format parameters to generate sample data;

and performing data enhancement processing on the sample data.

4. The method of claim 3, wherein the data enhancement processing of the sample data comprises:

adding any one or more of random noise, swelling corrosion and channel variation.

5. The method of claim 1, wherein constructing a generative confrontation network model for manufacturing the sample image, and adding a spatial transformation network as a constraint condition to the constructed generative confrontation network model, so that the manufactured sample image learns spatial position information of the warped text image, comprises:

constructing a discriminator network for discriminating true and false attributes of the sample;

constructing a generator network for outputting generated samples;

constructing a generative confrontation network model comprising the generator network and a generator network;

adding a space transformation network as a conditional constraint in the generation countermeasure network model;

constructing a loss function for generating a confrontation network model, wherein the loss function comprises a loss function when a real image is used as discrimination and a loss function when a generated image is used as discrimination;

a penalty strength parameter is introduced for the loss function to increase the weight of a first loss value when the first loss value is high when using the real image as the discrimination, and to increase the weight of a second loss value when the second loss value is high when using the generated image as the discrimination.

6. The method of claim 5, wherein generating the antagonistic network model comprises:

a generator network for inputting the three-dimensional noise vector and generating and outputting a corresponding false image;

a discriminator network having as inputs a real image and a false image output by the generator network, respectively;

a spatial transformation network;

wherein the generative countermeasure network model outputs a prediction value for predicting the image true and false based on the discriminator network and the spatial transform network.

7. The method of claim 5, comprising:

and after the input image is processed by the space conversion network, an affine transformation matrix is obtained through calculation, and the converted image is output based on the affine transformation matrix.

8. A text data producing apparatus based on deep learning, characterized by comprising:

the preprocessing module is used for preprocessing the original character data to generate a corresponding character text;

the image processing module is used for carrying out image processing on the character text to generate a corresponding character image;

and the model construction module is used for constructing a generation countermeasure network model for manufacturing the sample image and adding a space transformation network serving as a constraint condition into the constructed generation countermeasure network model so as to enable the manufactured sample image to learn the space position information of the distorted text image.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the deep learning-based text data manufacturing method according to any one of claims 1 to 7.

10. An electronic terminal, comprising: a processor and a memory;

the memory is used for storing a computer program;

the processor is configured to execute the computer program stored in the memory to cause the terminal to execute the text data manufacturing method based on deep learning according to any one of claims 1 to 7.