CN111753859B

CN111753859B - Sample generation method, device and equipment

Info

Publication number: CN111753859B
Application number: CN201910233792.XA
Authority: CN
Inventors: 张鹏
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-03-26
Filing date: 2019-03-26
Publication date: 2024-03-26
Anticipated expiration: 2039-03-26
Also published as: CN111753859A

Abstract

The invention provides a sample generation method, a device and equipment, wherein the sample generation method comprises the following steps: acquiring a feature description vector of a specified standard word, wherein the feature description vector is used for indicating the content of the specified standard word; and converting the specified standard word into a target sample by using the feature description vector and the specified non-standard feature vector, wherein the style corresponding to the target sample is the same as the style represented by the non-standard feature vector. The sample of the font style can be generated without collecting the character image of the required font style, and the sample generation efficiency is improved.

Description

Sample generation method, device and equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a device for generating a sample.

Background

With the development of scientific technology, the deep learning algorithm is excellent in tasks such as classification, detection and recognition. However, the performance is achieved by a plurality of factors such as improvement of computer power and a large number of training samples, wherein the training samples are an indispensable part of algorithm development as "fuel". In text recognition technology, too, a large number of samples containing characters are required to achieve training.

In the related sample generation mode, character images are pasted on background images to be synthesized into samples, fonts of text characters are diversified in a real scene, and in order to enable an algorithm to accurately identify the text characters in the real scene, samples of various font styles required for training need to be generated.

Disclosure of Invention

In view of this, the invention provides a sample generation method, device and equipment, which can generate samples of the font style without collecting character images of the required font style, thereby improving sample generation efficiency.

The first aspect of the present invention provides a sample generation method, including:

acquiring a feature description vector of a specified standard word, wherein the feature description vector is used for indicating the content of the specified standard word;

and converting the specified standard word into a target sample by using the feature description vector and the specified non-standard feature vector, wherein the style corresponding to the target sample is the same as the style represented by the non-standard feature vector.

According to one embodiment of the present invention, the obtaining the feature description vector of the specified standard word includes:

Inputting the first image containing the specified standard words into a first neural network in a trained student network, and extracting features of the input first image by the first neural network to obtain feature description vectors.

According to one embodiment of the present invention, the feature extraction of the first image by the first neural network to obtain a feature description vector includes:

and the first neural network performs feature extraction on the first image at least through a convolution layer for performing convolution processing and a first nonlinear transformation layer for performing nonlinear transformation processing to obtain a feature description vector.

According to one embodiment of the present invention, converting a specified standard word into a target sample using the feature description vector and a specified non-standard feature vector includes:

inputting the feature description vector and the nonstandard feature vector into a second neural network in a trained student network, so that the feature description vector and the nonstandard feature vector are fused by the second neural network to obtain a fusion vector, and generating a second image by using the fusion vector;

the second image is determined as the target sample.

According to one embodiment of the invention, the second neural network comprises a fusion layer; the feature description vector is the same as the non-standard feature vector in dimension;

the second neural network fuses the feature description vector and the nonstandard feature vector to obtain a fusion vector, and the fusion vector comprises the following components:

and the second neural network performs superposition processing on the feature description vector and the nonstandard feature vector by using a fusion layer to obtain the fusion vector.

According to one embodiment of the invention, the second neural network comprises a fully connected layer and a fusion layer; the feature description vector is different from the non-standard feature vector in dimension;

the second neural network maps the nonstandard feature vector into a reference vector with the same dimension as the feature description vector by using a full connection layer;

and the second neural network performs superposition processing on the feature description vector and the reference vector by using a fusion layer to obtain the fusion vector.

According to one embodiment of the invention, the second neural network comprises a fusion layer;

and the second neural network utilizes a fusion layer to combine the feature description vector and the nonstandard feature vector to obtain the fusion vector.

According to one embodiment of the invention, the second neural network further comprises: a deconvolution layer for performing deconvolution processing, and a second nonlinear transformation layer for performing nonlinear transformation;

the second neural network generating a second image using the fusion vector includes:

the second neural network generates a second image corresponding to the fusion vector by using the deconvolution layer and a second nonlinear transformation layer.

According to one embodiment of the invention, the student network is trained under the supervision of a trained teacher network;

network parameters of at least one layer in the second neural network apply network parameters of a corresponding layer in the teacher network.

A second aspect of the present invention provides a sample generating device comprising:

the characteristic description vector acquisition module is used for acquiring a characteristic description vector of a specified standard word, wherein the characteristic description vector is used for indicating the content of the specified standard word;

And the target sample generation module is used for converting the specified standard word into a target sample by utilizing the feature description vector and the specified non-standard feature vector, and the style corresponding to the target sample is the same as the style represented by the non-standard feature vector.

According to one embodiment of the present invention, the feature description vector acquisition module is specifically configured to:

According to one embodiment of the invention, the target sample generation module comprises:

the image generation unit is used for inputting the feature description vector and the nonstandard feature vector into a second neural network in the trained student network so as to fuse the feature description vector and the nonstandard feature vector by the second neural network to obtain a fusion vector, and generating a second image by using the fusion vector;

And a target sample determining unit configured to determine the second image as the target sample.

the second neural network is specifically configured to, when fusing the feature description vector and the non-standard feature vector to obtain a fused vector:

the second neural network is specifically configured to, when generating a second image using the fusion vector:

A third aspect of the invention provides an electronic device comprising a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor, when executing the program, implements the sample generation method as described in the foregoing embodiments.

A fourth aspect of the present invention provides a machine-readable storage medium, having stored thereon a program which, when executed by a processor, implements a sample generation method as described in the previous embodiments.

The embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, the specific standard word can be converted into the target sample of the style by utilizing the characteristic description vector for indicating the content of the specific standard word and the nonstandard characteristic vector for indicating the style, the character image under the style is not required to be collected to synthesize the sample, the generating efficiency of the sample is improved, and the samples containing different text contents under various styles can be generated according to the needs, so that the diversity of the samples is realized.

Drawings

FIG. 1 is a flow chart of a sample generation method according to an embodiment of the invention;

FIG. 2 is a block diagram showing the structure of a sample generating apparatus according to an embodiment of the present invention;

FIG. 3 is a block diagram illustrating a connection structure of a first neural network and a second neural network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a training pattern of a first neural network and a second neural network according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of another training mode of the first neural network and the second neural network according to an embodiment of the present invention;

Fig. 6 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various devices, these information should not be limited by these terms. These terms are only used to distinguish one device from another of the same type. For example, a first device could also be termed a second device, and, similarly, a second device could also be termed a first device, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

In order to make the description of the present invention clearer and more concise, some technical terms of the present invention are explained below:

neural network: a technique for simulating the abstraction of brain structure features that a network system is formed by complex connection of a great number of simple functions, which can fit extremely complex functional relation, and generally includes convolution/deconvolution operation, activation operation, pooling operation, addition, subtraction, multiplication and division, channel merging and element rearrangement. Training the network with specific input data and output data, adjusting the connections therein, and allowing the neural network to learn the mapping between the fitting inputs and outputs.

The sample generation method according to the embodiment of the present invention is described in more detail below, but is not limited thereto. In one embodiment, referring to fig. 1, a sample generation method may include the steps of:

s100: acquiring a characteristic description vector C of a specified standard word, wherein the characteristic description vector C is used for indicating the content of the specified standard word;

s200: and converting the specified standard word into a target sample by using the feature description vector C and the specified non-standard feature vector S, wherein the style corresponding to the target sample is the same as the style represented by the non-standard feature vector S.

The execution subject of the sample generation method of the embodiment of the invention may be an electronic device, and more specifically may be a processor of the electronic device. The electronic device may be, for example, a computer device or an embedded device, and the specific type is not limited as long as it has data processing capability.

In step S100, a feature description vector C of a specified standard word is obtained, where the feature description vector C is used to indicate the content of the specified standard word.

The font of the specified standard word may be Song Ti, bold, or the like, and the specific font style is not limited as long as the content of the specified standard word is the text content required for the sample. Before the feature description vector C of the specified standard word is acquired, the specified standard word may be acquired from a word stock of the corresponding font. After the specified standard word is obtained, feature extraction may be performed on the specified standard word to obtain a feature description vector C describing the content of the specified standard word.

The feature extraction can be performed on the specified standard words through a feature extraction algorithm, the feature extraction algorithm is not particularly limited, such as an LBP feature extraction algorithm, an HOG feature extraction algorithm, a SIFT feature extraction operator and the like, and the feature extraction can be realized in a deep learning mode.

The specified standard word is any standard word in the specified font word library. Typically, the common font library is configured by default in the computer device, or may be downloaded from a network, and the specified font library may be any one of the common font libraries. Taking the example that the designated font word stock is a Song-body word stock, the Song-body word stock contains more than 20000 Song-body words, and the designated standard word can be any one of 20000 Song-body words according to the text content required by the sample. If the embodiments of the present invention are employed to generate corresponding samples for each of 20000 multiple Song body words, then more than 20000 samples containing different text content of the desired style can be generated.

If N1 standard words exist in the appointed font word stock, N1 samples containing different text contents in the required style can be obtained through conversion, and the text contents in each sample are identical to the content of the standard words but different in style. Therefore, in the embodiment of the invention, a plurality of samples with required styles can be easily generated, and the problem that the samples with font styles are fewer at present, such as the font styles in calligraphic works, can be overcome.

In step S200, the specified standard word is converted into a target sample by using the feature description vector C and the specified non-standard feature vector S, where the style corresponding to the target sample is the same as the style represented by the non-standard feature vector S.

The style represented by the nonstandard feature vector S (simply referred to as the target style) may be some common or unusual handwriting styles, for example, may include font styles such as bold, liu Zongyuan, or meter-oriented font styles, and even the target style may be a handwriting font of a person. Generally speaking, there is a certain difference in writing style of each person, and the writing style of each person can be used as a target style.

A plurality of non-standard feature vectors representing different styles may be preset, of which the non-standard feature vector S is one. If the total number of the preset nonstandard feature vectors is N2, if the conversion of the specified standard words is carried out on each nonstandard feature vector, N2 target samples containing the same text content in different styles can be finally generated, the samples are more diversified, more real scenes can be reflected, the neural network is trained by the samples, and the neural network text recognition result can be more accurate.

The coding form of each nonstandard feature vector is not limited, for example, the coding can be performed according to the total number of styles N2, and the vector can be coded by using a one-hot coding (one-hot coding) mode.

Taking one-hot coding as an example, assuming that N2 types of styles to be generated, such as bold, liu Zongyuan volumes, m, etc., when a Liu Zongyuan-volume sample needs to be generated, the non-standard feature vector S can only code the value in the dimension (2 nd dimension) of the corresponding Liu Zongyuan volume to be 1, the values in the other dimensions to be 0, and finally the non-standard feature vector s= [0,1,0 … 0]; when a sample of the rice lung volume needs to be generated, the non-standard feature vector S can only encode the value of 1 in the corresponding rice lung volume dimension (3 rd dimension), and encode the value of 0 in the other dimensions, and finally the non-standard feature vector s= [0,1 … 0]; other styles may be similarly used.

In combination with the foregoing, in the case where N1 standard words exist in the designated font library and N2 nonstandard feature vectors are encoded, the total number of target samples that can be generated is the product of N1 and N2, where the total number of styles of all target samples is N2, each style has N1 target samples and each target sample contains different text contents.

In the embodiment of the invention, the specified standard word can be converted into the target sample of the style by utilizing the characteristic description vector C for indicating the content of the specified standard word and the nonstandard characteristic vector S for indicating the style, the character image under the style is not required to be collected to synthesize the sample, the generating efficiency of the sample is improved, and the samples containing different text contents under various styles can be generated according to the needs, so that the diversity of the samples is realized.

In one embodiment, the above method flow may be performed by the sample generation apparatus 100, and as shown in fig. 2, the sample generation apparatus 100 may include 2 modules: a feature description vector acquisition module 101 and a target sample generation module 102. The feature description vector acquisition module 101 is configured to perform the above step S100, and the target sample generation module 102 is configured to perform the above step S200.

In one embodiment, in step S100, the obtaining the feature description vector C of the specified standard word includes:

inputting the first image containing the specified standard words into a first neural network in a trained student network, and extracting features of the input first image by the first neural network to obtain a feature description vector C.

The student network is pre-trained and can be pre-stored in the electronic device or stored in the external device, and the electronic device calls the first neural network in the student network when the method needs to be executed.

The first image may be obtained by collecting the specified standard words in the real scene, or may be obtained by format conversion of the specified standard words in the specified font library, and the specific mode is not limited. The first image may be preset in the electronic device, and the first image is acquired from the electronic device at the time of execution.

20000 Song-body words (ttf) exist in the Song-body word stock, and the first image can be obtained according to the existing Song-body words in the Song-body word stock. For example, the Song-style words in the Song-style word stock can be directly converted from ttf format to image format to obtain a first image; alternatively, the first image may be generated by fusing the Song-body words in the Song-body word stock with background data, such as background data representing a white background.

And inputting the first image into a first neural network, and obtaining a feature description vector C of the specified standard word after the first neural network performs feature extraction on the first image. This function of the first neural network may be provided by training.

Specifically, referring to fig. 3, the first image is, for example, an image with a size of 64×64, a specified standard word in the first image is, for example, an "eye" of the Song body, and the first neural network performs feature extraction on the first image to obtain a 512-dimensional feature description vector C indicating the "eye".

In one embodiment, the feature extraction of the first image by the first neural network to obtain a feature description vector C includes:

and the first neural network performs feature extraction on the first image at least through a convolution layer for performing convolution processing and a first nonlinear transformation layer for performing nonlinear transformation processing to obtain a feature description vector C.

The first neural network may include a plurality of convolution layers, the convolution layers perform a convolution operation, and feature extraction may be performed on the first image to obtain a feature description vector, and the feature description vector may be output to the first nonlinear transformation layer. The first nonlinear transformation layer can enhance the fitting capacity of the neural network, and the first nonlinear transformation layer outputs the fitted feature description vector as the feature description vector. Of course, the layer structure in the first neural network is not limited thereto, and may also include other layers, such as a Pooling layer (Pooling), which is a special downsampling layer that performs dimension reduction on the feature description vector obtained by convolution.

The first neural network may be implemented by using a convolutional neural network architecture such as VGG, inception, resNet, for example, and is not limited thereto. The convolutional neural network is a feedforward neural network, and the neurons of the feedforward neural network can respond to surrounding units in a limited coverage range, and effectively extract the structural information of an image through weight sharing and feature aggregation.

In one embodiment, in step S200, converting the specified standard word into the target sample using the feature description vector C and the specified non-standard feature vector S includes:

s201: inputting the feature description vector C and the nonstandard feature vector S into a second neural network in a trained student network, so that the feature description vector C and the nonstandard feature vector S are fused by the second neural network to obtain a fusion vector T, and generating a second image by using the fusion vector T;

s202: the second image is determined as the target sample.

The student network is pre-trained and can be pre-stored in the electronic device or stored in the external device, and the electronic device recalls the second neural network in the student network when the method needs to be executed.

After the feature description vector C and the nonstandard feature vector S are input into the second neural network, the second neural network fuses the input feature description vector C and the nonstandard feature vector S to obtain a fusion vector T, and a second image is generated by using the fusion vector T. The second image is used as a target sample, the style of the second image is consistent with that of the nonstandard feature vector S, and the contained text content is consistent with that of the specified standard word.

Based on training of the second neural network, the style corresponding to the second image can be specified by inputting different non-standard feature vectors, and the method can be suitable for generating samples with different font styles. For example, the style represented by the input nonstandard feature vector S is Liu Zongyuan, and the style of the generated second image is Liu Zongyuan; the style represented by the input non-standard feature vector S is Mi volumes, then the style of the generated second image is Mi volumes, and so on.

With continued reference to fig. 3, the first neural network performs feature extraction on the first image to obtain a 512-dimensional feature description vector C indicating "eye", and inputs the feature description vector C into the second neural network, together with the non-standard feature vector S into the second neural network. The style represented by the nonstandard feature vector S is, for example, a specified style, and the dimension of the nonstandard feature vector S is, for example, 100 dimensions. The second neural network fuses the input 512-dimensional feature description vector C and the 100-dimensional non-standard feature vector S into a fusion vector T, and generates a second image by using the fusion vector T, wherein the second image is an image with the size of 64 x 64, and the second image contains an 'eye' with a specified style and is taken as a target sample. In step S201, the second neural network fuses the feature description vector C and the nonstandard feature vector S to obtain more than one implementation manner of the fusion vector T, for example, the implementation manners include the following three implementation manners:

In a first implementation, the second neural network includes a fusion layer; the dimension of the feature description vector C is the same as that of the nonstandard feature vector S;

the second neural network fuses the feature description vector C and the nonstandard feature vector S to obtain a fusion vector T, and the fusion vector T comprises the following components:

and the second neural network performs superposition processing on the feature description vector C and the nonstandard feature vector S by using a fusion layer to obtain the fusion vector T.

In this manner, the fusion layer is a calculation layer for performing a vector superposition process, and the feature description vector C and the nonstandard feature vector S may be subjected to the superposition process to obtain the fusion vector T.

The manner of the superimposition processing may be weighted superimposition processing, and the feature description vector C and the non-standard feature vector S are correspondingly weighted and summed in each dimension. For example, c= (a 1, a2, a3, … …, a 512), s= (b 1, b2, b3, … …, b 512), the fusion vector T after the weighted overlap-add processing= (a1×1+b1×y1, a2×2+b2×y2, a3×3+b3×3, … …, a512×512+b512×512), where (x 1, x2, x3, … …, x 512) is a weight coefficient when the feature description vector C is weighted by a numerical value in each dimension, and (y 1, y2, y3, … …, y 512) is a weight coefficient when the non-standard feature vector S is weighted by a numerical value in each dimension.

In a second implementation, the second neural network includes a full connection layer and a fusion layer; the dimension of the feature description vector C is different from that of the nonstandard feature vector S;

the second neural network maps the nonstandard feature vector S into a reference vector K with the same dimension as the feature description vector C by using a full connection layer;

and the second neural network performs superposition processing on the feature description vector C and the reference vector K by using a fusion layer to obtain the fusion vector T.

The fully connected layer is a calculation layer for performing vector dimension mapping. For example, the dimension of the nonstandard feature vector S is 100 dimensions, the dimension of the feature description vector C is 512 dimensions, and the nonstandard feature vector S can be mapped into a reference vector K with the dimension of 512 dimensions through the fully-connected layer, so that the dimension expansion is realized. The fusion layer is a calculation layer for performing a vector superposition process, and the feature description vector C and the reference vector K may be subjected to a superposition process to obtain a fusion vector T, where the superposition process is similar to the first implementation manner, and will not be described herein.

In a third implementation, the second neural network includes a fusion layer;

and the second neural network utilizes a fusion layer to combine the feature description vector C and the nonstandard feature vector S to obtain the fusion vector T.

This implementation is particularly suitable in the case where the feature description vector C is different from the dimension of the non-standard feature vector S, although it is applicable in the same case.

In this way, the fusion layer is a calculation layer for performing vector merging processing, and the feature description vector C and the non-standard feature vector S may be merged to obtain a new line fusion vector T.

The vector merging is to splice two vectors in dimension, and the dimension of the merged vector is the sum of the dimensions of the two vectors required. For example, c= (a 1, a2, a3, … …, a 512), s= (b 1, b2, b3, … …, b 100), and t= (a 1, a2, a3, … …, a512, b1, b2, b3, … …, b 100) after merging.

In one embodiment, the second neural network further comprises: a deconvolution layer for performing deconvolution processing, and a second nonlinear transformation layer for performing nonlinear transformation;

The second neural network generating a second image using the fusion vector T includes:

the second neural network generates a second image corresponding to the fusion vector T by using the deconvolution layer and a second nonlinear transformation layer.

The second neural network may include a multi-layer deconvolution layer that performs a deconvolution operation that may generate a second image using the fusion vector T and output the second image to the second nonlinear transformation layer. The second nonlinear transformation layer can also enhance the fitting capacity of the neural network, and the second nonlinear transformation layer outputs a fitted second image. Of course, the layer structure of the second neural network is not limited thereto, and may include other layers such as a fully connected layer, which may implement mapping of dimensions, such as mapping the dimensions of the input vector to a vector of a higher dimension, and the fully connected layer may be replaced by a convolution layer.

In one embodiment, the student network is trained under the supervision of a trained teacher network;

The training of the first neural network and the second neural network is supervised by training a teacher network and then by the teacher network. In the training method of this embodiment, the connection structure between the first neural network and the second neural network is used as one student network.

Referring to fig. 4, a teacher network is further trained before training the student network, where the teacher network includes a first neural network A1 and a second neural network A2. Wherein, the layer structure of the first neural network A1 and the first neural network in the student network may be the same; the layer structure of the second neural network A2 and the second neural network in the student network may be similar except that vector fusion is not required to be performed, and thus the fusion layer may be omitted. The training is divided into two steps:

firstly, taking a sample (which can be acquired from a real scene) containing a designated style character (non-Song style character) as input and output of a teacher network, training the teacher network, and obtaining network parameters of each layer in a first neural network A1 and a second neural network A2 after training the teacher network;

then, taking the network parameters of one or more layers of the second neural network A2 in the trained teacher network as the network parameters of the corresponding layers of the second neural network in the student network, taking the sample containing the Song-body character 'eye' as the input of the student network, taking the nonstandard feature vector S representing the appointed style as the input of the second neural network in the student network, taking the sample containing the appointed style character 'eye' as the output of the student network (the feature description vector obtained by the first neural network is input into the second neural network), training the student network, and completing the training of the student network.

The manner in which the first and second neural networks are trained is provided further below.

The training of the first neural network and the second neural network is supervised by training a classifier capable of distinguishing the generated sample from the real sample. For brevity of description, in this training manner, a connection structure of the first neural network and the second neural network is referred to as one neural network EG.

Referring to fig. 5, a real sample is a sample collected from a real scene for training a neural network EG, for example, an image containing a specified style of text (not a sonde character) collected from the real scene, the image containing a specified style of text "jing" is shown in the figure, more real samples containing different specified styles of text can be selected for training in the actual training process, and thus, better network parameters can be obtained, and the training process is divided into two steps:

firstly, inputting a sample containing a Song body character 'eye' into a neural network to obtain a generated sample containing a specified style character 'eye', taking a real sample containing the specified style character 'eye' and a nonstandard feature vector S representing the specified style as one group of input data, taking the generated sample of the neural network EG and the nonstandard feature vector S representing the specified style as another group of input data, respectively inputting the two groups of input data into a classifier, training the classifier to enable the generated sample and the real sample to be distinguished, and calculating the deviation of the generated sample and the real sample to finish training the classifier;

Then, the training of the network parameters of the neural network EG is supervised through a trained classifier, a sample containing the Song body word 'eye' is input into the neural network EG to obtain a generated sample containing the specified style word 'eye', the generated sample and a nonstandard feature vector S representing the specified style are input into the classifier, after the classifier calculates the deviation between the generated sample and the corresponding real sample, a loss value corresponding to the deviation is calculated by using a loss function, the network parameters of the neural network EG are adjusted according to the loss value, the training of the network parameters of the neural network EG is performed, the sample containing the Song body word 'eye' is input into the neural network EG to obtain the generated sample containing the specified style word 'eye', the neural network EG is continuously trained until the loss value of the loss function is reduced to be within a reasonable range, and the training of the neural network EG is completed.

In the training process, a sample containing the Song-body character 'eye' is input into a neural network EG to obtain a generated sample containing the appointed style character 'eye', and the training process comprises the following steps: inputting a sample containing the Song-body character 'eye' into a first neural network, extracting the characteristics of the input sample by the first neural network to obtain a characteristic description vector C representing the content of the Song-body character 'eye', and inputting the C into a second neural network; the non-standard feature vector S representing the appointed style is also input into a second neural network, the feature description vector C and the non-standard feature vector S are fused by the second neural network, and an image is generated by utilizing the fused vector, and the generated image is taken as a generation sample containing the appointed style text 'eye'.

In the two training modes, the first neural network and the second neural network are trained together. Of course, the first neural network and the second neural network may also be trained separately.

The present invention also provides a sample generation apparatus, referring to fig. 2, the sample generation apparatus 100 includes:

a feature description vector acquisition module 101, configured to acquire a feature description vector C of a specified standard word, where the feature description vector C is used to indicate the content of the specified standard word;

the target sample generating module 102 is configured to convert the specified standard word into a target sample by using the feature description vector C and the specified non-standard feature vector S, where a style corresponding to the target sample is the same as a style represented by the non-standard feature vector S.

In one embodiment, the feature description vector acquisition module is specifically configured to:

In one embodiment, the target sample generation module comprises:

the image generation unit is used for inputting the feature description vector C and the nonstandard feature vector S into a second neural network in the trained student network so as to fuse the feature description vector C and the nonstandard feature vector S by the second neural network to obtain a fusion vector T, and generating a second image by using the fusion vector T;

In one embodiment, the second neural network includes a fusion layer; the dimension of the feature description vector C is the same as that of the nonstandard feature vector S;

the second neural network is specifically configured to, when fusing the feature description vector C and the non-standard feature vector S to obtain a fusion vector T:

In one embodiment, the second neural network includes a fully connected layer and a fusion layer; the dimension of the feature description vector C is different from that of the nonstandard feature vector S;

In one embodiment, the second neural network includes a fusion layer;

The second neural network is specifically configured to, when generating a second image using the fusion vector T:

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements.

The invention also provides an electronic device, which comprises a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor, when executing the program, implements the sample generation method as described in the foregoing embodiments.

The embodiment of the sample generating device can be applied to electronic equipment. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of an electronic device where the device is located for operation. In terms of hardware, as shown in fig. 6, fig. 6 is a hardware configuration diagram of an electronic device where the sample generating device 100 according to an exemplary embodiment of the present invention is located, and in addition to the processor 510, the memory 530, the interface 520, and the nonvolatile storage 540 shown in fig. 6, the electronic device where the device 100 is located in the embodiment may further include other hardware according to the actual functions of the electronic device, which will not be described herein.

The present invention also provides a machine-readable storage medium having stored thereon a program which, when executed by a processor, implements a sample generation method as in any of the preceding embodiments.

The present invention may take the form of a computer program product embodied on one or more storage media (including, but not limited to, magnetic disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Machine-readable storage media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of machine-readable storage media include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by the computing device.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

Claims

1. A method of generating a sample, comprising:

inputting a first image containing a specified standard word into a first neural network in a trained student network, and extracting features of the input first image by the first neural network to obtain a feature description vector;

inputting the feature description vector and the nonstandard feature vector into a second neural network in a trained student network, so that the feature description vector and the nonstandard feature vector are fused by the second neural network to obtain a fusion vector, and generating a second image by using the fusion vector; determining the second image as a target sample; if N styles are generated, the nonstandard feature vector has N dimensions, the dimensions are traversed according to the sequence from the 1 st dimension to the N th dimension, the numerical code of the currently traversed dimension is set to be 1, and the numerical codes of the other dimensions are set to be 0, so that N nonstandard feature vectors corresponding to the N styles are obtained;

Wherein the student network is obtained by training under the supervision of a trained teacher network;

network parameters of at least one layer in the second neural network apply network parameters of a corresponding layer in the teacher network; the connection structure of the first neural network and the second neural network is used as a student network; the first neural network is trained with the second neural network.

2. The sample generation method of claim 1, wherein the feature extraction of the first image by the first neural network to obtain a feature description vector comprises:

3. The sample generation method of claim 1, wherein the second neural network comprises a fusion layer; the feature description vector is the same as the non-standard feature vector in dimension;

And the second neural network performs superposition processing on the feature description vector and the nonstandard feature vector S by using a fusion layer to obtain the fusion vector.

4. The sample generation method of claim 1, wherein the second neural network comprises a fully connected layer and a fusion layer; the feature description vector is different from the non-standard feature vector in dimension;

5. The sample generation method of claim 1, wherein the second neural network comprises a fusion layer;

and the second neural network utilizes a fusion layer to combine the feature description vector and the nonstandard feature vector S to obtain the fusion vector.

6. The sample generation method of any of claims 3 to 5, wherein the second neural network further comprises: a deconvolution layer for performing deconvolution processing, and a second nonlinear transformation layer for performing nonlinear transformation;

7. A sample generation apparatus, comprising:

the feature description vector acquisition module is used for inputting a first image containing a specified standard word into a first neural network in a trained student network so as to extract features of the input first image by the first neural network to obtain a feature description vector;

the target sample generation module is used for inputting the feature description vector and the nonstandard feature vector into a second neural network in the trained student network so as to fuse the feature description vector and the nonstandard feature vector by the second neural network to obtain a fusion vector, and generating a second image by using the fusion vector; determining the second image as a target sample; if N styles are generated, the nonstandard feature vector has N dimensions, the dimensions are traversed according to the sequence from the 1 st dimension to the N th dimension, the numerical code of the currently traversed dimension is set to be 1, and the numerical codes of the other dimensions are set to be 0, so that N nonstandard feature vectors corresponding to the N styles are obtained;

8. An electronic device, comprising a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor, when executing the program, implements the sample generation method according to any one of claims 1-6.