CN111753859A - Sample generation method, device and equipment - Google Patents

Sample generation method, device and equipment Download PDF

Info

Publication number
CN111753859A
CN111753859A CN201910233792.XA CN201910233792A CN111753859A CN 111753859 A CN111753859 A CN 111753859A CN 201910233792 A CN201910233792 A CN 201910233792A CN 111753859 A CN111753859 A CN 111753859A
Authority
CN
China
Prior art keywords
vector
neural network
feature
standard
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910233792.XA
Other languages
Chinese (zh)
Other versions
CN111753859B (en
Inventor
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201910233792.XA priority Critical patent/CN111753859B/en
Publication of CN111753859A publication Critical patent/CN111753859A/en
Application granted granted Critical
Publication of CN111753859B publication Critical patent/CN111753859B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides a sample generation method, a sample generation device and sample generation equipment, wherein the sample generation method comprises the following steps: acquiring a feature description vector of a specified standard word, wherein the feature description vector is used for indicating the content of the specified standard word; and converting the specified standard words into target samples by using the feature description vectors and the specified non-standard feature vectors, wherein the style corresponding to the target samples is the same as the style represented by the non-standard feature vectors. The sample of the font style can be generated without collecting the character image of the required font style, and the sample generation efficiency is improved.

Description

Sample generation method, device and equipment
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a device for generating a sample.
Background
With the development of scientific technology, the deep learning algorithm is excellent in tasks such as classification, detection and identification. However, the performance is achieved by the factors of computer power improvement, a large number of training samples and the like, wherein the training samples are used as 'fuel' and are an indispensable ring in algorithm development. In text recognition techniques, too, a large number of samples containing characters are required to achieve training.
In a related sample generation mode, character images are pasted to background images to be synthesized into samples, fonts of text characters are diversified in a real scene, in order to enable an algorithm to accurately identify the text characters in the real scene, samples of various font styles required by training need to be generated, in the mode, each time a sample of one font style is required, character images of a corresponding font style need to be collected to synthesize the required sample, and the sample generation efficiency is too low.
Disclosure of Invention
In view of this, the present invention provides a sample generation method, apparatus and device, which can generate a sample of a desired font style without acquiring a character image of the font style, thereby improving sample generation efficiency.
A first aspect of the present invention provides a sample generation method, including:
acquiring a feature description vector of a specified standard word, wherein the feature description vector is used for indicating the content of the specified standard word;
and converting the specified standard words into target samples by using the feature description vectors and the specified non-standard feature vectors, wherein the style corresponding to the target samples is the same as the style represented by the non-standard feature vectors.
According to an embodiment of the present invention, the obtaining a feature description vector of a specific standard word includes:
inputting a first image containing the specified standard words into a first neural network in a trained student network, and performing feature extraction on the input first image by the first neural network to obtain a feature description vector.
According to an embodiment of the present invention, the feature extraction of the input first image by the first neural network to obtain a feature description vector includes:
the first neural network performs feature extraction on the first image at least through a convolution layer for performing convolution processing and a first nonlinear transformation layer for performing nonlinear transformation processing to obtain a feature description vector.
According to one embodiment of the invention, converting a specified standard word into a target sample using the feature description vector and a specified non-standard feature vector comprises:
inputting the feature description vector and the non-standard feature vector into a second neural network in a trained student network, fusing the feature description vector and the non-standard feature vector by the second neural network to obtain a fusion vector, and generating a second image by using the fusion vector;
determining the second image as the target sample.
According to one embodiment of the invention, the second neural network comprises a fusion layer; the feature description vector has the same dimension as the non-standard feature vector;
the second neural network fuses the feature description vector and the non-standard feature vector to obtain a fused vector, including:
and the second neural network utilizes a fusion layer to perform superposition processing on the feature description vector and the non-standard feature vector to obtain the fusion vector.
According to one embodiment of the invention, the second neural network comprises a fully-connected layer and a fused layer; the feature description vector is different in dimension from the non-standard feature vector;
the second neural network fuses the feature description vector and the non-standard feature vector to obtain a fused vector, including:
the second neural network utilizes a fully connected layer to map the non-standard feature vector to a reference vector having dimensions that are the same as dimensions of the feature description vector;
and the second neural network utilizes a fusion layer to perform superposition processing on the feature description vector and the reference vector to obtain the fusion vector.
According to one embodiment of the invention, the second neural network comprises a fusion layer;
the second neural network fuses the feature description vector and the non-standard feature vector to obtain a fused vector, including:
and the second neural network combines the feature description vector and the non-standard feature vector by utilizing a fusion layer to obtain the fusion vector.
According to an embodiment of the invention, the second neural network further comprises: a deconvolution layer for performing deconvolution processing, and a second nonlinear transformation layer for performing nonlinear transformation;
the second neural network generating a second image using the fused vector comprises:
the second neural network generates a second image corresponding to the fusion vector using the deconvolution layer and a second nonlinear conversion layer.
According to one embodiment of the invention, the student network is trained under the supervision of a trained teacher network;
the network parameters of at least one layer in the second neural network are applied to the network parameters of the corresponding layer in the teacher network.
A second aspect of the present invention provides a sample generation apparatus comprising:
the characteristic description vector acquisition module is used for acquiring a characteristic description vector of the specified standard word, and the characteristic description vector is used for indicating the content of the specified standard word;
and the target sample generation module is used for converting the specified standard words into target samples by using the feature description vectors and the specified non-standard feature vectors, and the style corresponding to the target samples is the same as the style represented by the non-standard feature vectors.
According to an embodiment of the present invention, the feature description vector obtaining module is specifically configured to:
inputting a first image containing the specified standard words into a first neural network in a trained student network, and performing feature extraction on the input first image by the first neural network to obtain a feature description vector.
According to an embodiment of the present invention, the feature extraction of the input first image by the first neural network to obtain a feature description vector includes:
the first neural network performs feature extraction on the first image at least through a convolution layer for performing convolution processing and a first nonlinear transformation layer for performing nonlinear transformation processing to obtain a feature description vector.
According to one embodiment of the invention, the target sample generation module comprises:
the image generation unit is used for inputting the feature description vector and the non-standard feature vector into a second neural network in a trained student network, so that the feature description vector and the non-standard feature vector are fused by the second neural network to obtain a fusion vector, and a second image is generated by utilizing the fusion vector;
a target sample determination unit for determining the second image as the target sample.
According to one embodiment of the invention, the second neural network comprises a fusion layer; the feature description vector has the same dimension as the non-standard feature vector;
when the second neural network fuses the feature description vector and the non-standard feature vector to obtain a fused vector, the second neural network is specifically configured to:
and the second neural network utilizes a fusion layer to perform superposition processing on the feature description vector and the non-standard feature vector to obtain the fusion vector.
According to one embodiment of the invention, the second neural network comprises a fully-connected layer and a fused layer; the feature description vector is different in dimension from the non-standard feature vector;
when the second neural network fuses the feature description vector and the non-standard feature vector to obtain a fused vector, the second neural network is specifically configured to:
the second neural network utilizes a fully connected layer to map the non-standard feature vector to a reference vector having dimensions that are the same as dimensions of the feature description vector;
and the second neural network utilizes a fusion layer to perform superposition processing on the feature description vector and the reference vector to obtain the fusion vector.
According to one embodiment of the invention, the second neural network comprises a fusion layer;
when the second neural network fuses the feature description vector and the non-standard feature vector to obtain a fused vector, the second neural network is specifically configured to:
and the second neural network combines the feature description vector and the non-standard feature vector by utilizing a fusion layer to obtain the fusion vector.
According to an embodiment of the invention, the second neural network further comprises: a deconvolution layer for performing deconvolution processing, and a second nonlinear transformation layer for performing nonlinear transformation;
when the second neural network generates the second image by using the fusion vector, the second neural network is specifically configured to:
the second neural network generates a second image corresponding to the fusion vector using the deconvolution layer and a second nonlinear conversion layer.
According to one embodiment of the invention, the student network is trained under the supervision of a trained teacher network;
the network parameters of at least one layer in the second neural network are applied to the network parameters of the corresponding layer in the teacher network.
A third aspect of the invention provides an electronic device comprising a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor, when executing the program, implements the sample generation method as described in the foregoing embodiments.
A fourth aspect of the present invention provides a machine-readable storage medium, characterized in that a program is stored thereon, which when executed by a processor, implements the sample generation method as described in the foregoing embodiments.
The embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, the characteristic description vector used for indicating the content of the specified standard word and the non-standard characteristic vector representing a certain style can be utilized to convert the specified standard word into the target sample of the style, the character image in the style does not need to be collected to synthesize the sample, the generation efficiency of the sample is improved, and the samples containing different character contents in various styles can be generated according to the needs, so that the diversity of the sample is realized.
Drawings
FIG. 1 is a schematic flow chart of a sample generation method according to an embodiment of the invention;
FIG. 2 is a block diagram of a sample generation apparatus according to an embodiment of the present invention;
FIG. 3 is a block diagram of a connection structure between a first neural network and a second neural network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a training mode of a first neural network and a second neural network according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of another training method for the first neural network and the second neural network according to an embodiment of the present invention;
fig. 6 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one type of device from another. For example, a first device may also be referred to as a second device, and similarly, a second device may also be referred to as a first device, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In order to make the description of the present invention clearer and more concise, some technical terms in the present invention are explained below:
a neural network: a technique for simulating brain structure abstraction features that a great number of simple functions are connected to form a network system, which can fit very complex function relations, including convolution/deconvolution, activation, pooling, addition, subtraction, multiplication, division, channel merging and element rearrangement. Training the network with specific input data and output data, adjusting the connections therein, allows the neural network to learn the mapping between the fitting inputs and outputs.
The sample generation method according to the embodiment of the present invention is described in more detail below, but should not be limited thereto. In one embodiment, referring to fig. 1, a sample generation method may include the steps of:
s100: acquiring a feature description vector C of a specified standard word, wherein the feature description vector C is used for indicating the content of the specified standard word;
s200: and converting the specified standard words into target samples by using the feature description vector C and the specified non-standard feature vector S, wherein the style corresponding to the target samples is the same as the style represented by the non-standard feature vector S.
The execution subject of the sample generation method of the embodiment of the present invention may be an electronic device, and more specifically, may be a processor of the electronic device. The electronic device may be, for example, a computer device or an embedded device, and the specific type is not limited as long as the electronic device has a data processing capability.
In step S100, a feature description vector C of the specified standard word is obtained, where the feature description vector C is used to indicate the content of the specified standard word.
The font of the designated standard character can be a song style, a black body and the like, and the specific font style is not limited as long as the content of the designated standard character is the character content required by the sample. Before the feature description vector C of the specified standard word is obtained, the specified standard word may be obtained from a word stock of a corresponding font. After the specified standard word is obtained, feature extraction can be performed on the specified standard word to obtain a feature description vector C describing the content of the specified standard word.
The feature extraction of the designated standard words can be performed through a feature extraction algorithm, the feature extraction algorithm is not limited, for example, an LBP feature extraction algorithm, an HOG feature extraction algorithm, an SIFT feature extraction operator and the like, and the feature extraction can be realized in a deep learning manner.
The specified standard word is any standard word in the specified font word library. Generally, a commonly used font library is configured in the computer device by default, or the commonly used font library can be downloaded from a network, and the designated font library can be any one of the commonly used font libraries. Taking the example that the specified font character library is the Song style character library, the Song style character library contains more than 20000 Song style characters, and the specified standard character can be any one of the 20000 Song style characters and is determined according to the character content required by the sample. If embodiments of the present invention are employed to generate corresponding samples for each of 20000 sony-type characters, more than 20000 desired styles of samples containing different textual content may be generated.
If there are N1 standard words in the specified font library, N1 samples of the desired style containing different text content can be obtained, where the text content in each sample is the same as the standard words but the style is different. Therefore, in the embodiment of the invention, a plurality of samples with required styles can be generated more easily, and the problem that some current font style samples are fewer, such as the font style in calligraphy works, can be overcome.
In step S200, the feature description vector C and the specified non-standard feature vector S are used to convert the specified standard word into a target sample, where the style corresponding to the target sample is the same as the style represented by the non-standard feature vector S.
The style (referred to as target style) represented by the non-standard feature vector S may be some common or uncommon calligraphy styles, such as font styles including black body, willow body, or lung body, and even the target style may be a handwriting font of a certain person. Generally speaking, there is a certain difference in writing style of each person, and the writing style of each person can be used as a target style.
A plurality of non-standard feature vectors representing different styles, of which the non-standard feature vector S is one, may be preset. If the total number of the preset non-standard feature vectors is N2, if the conversion of the designated standard words is carried out on each non-standard feature vector, N2 target samples with different styles and containing the same text content can be finally generated, the samples are more diverse and can reflect more real scenes, and the neural network text recognition results can be more accurate by training the neural network with the samples.
The encoding form of each non-standard feature vector is not limited, and for example, the encoding can be performed according to the total number of styles N2, and the vector can be encoded by using one-hot encoding (one-hot code).
Taking one-hot encoding as an example, assuming that N2 styles are to be generated, such as black body, willow element, rice lung element, and the like, when a sample of willow element needs to be generated, only the value in the dimension (2 nd dimension) of the corresponding willow element in the non-standard feature vector S may be encoded to be 1, the values in the remaining dimensions may be encoded to be 0, and finally the non-standard feature vector S is [0,1,0 … 0 ]; when a sample of a lung is required to be generated, only the value in the dimension (3 rd dimension) corresponding to the lung may be encoded as 1 in the non-standard feature vector S, and the values in the remaining dimensions are encoded as 0, and finally the non-standard feature vector S is ═ 0,0,1 … 0; other styles may be analogized.
In connection with the foregoing, in the case where there are N1 standard words in the specified font library and N2 non-standard feature vectors are encoded, the total number of target samples that can be generated is the product of N1 and N2, where the total number of styles for all target samples is N2, there are N1 target samples for each style and each target sample contains different text content.
In the embodiment of the invention, the characteristic description vector C used for indicating the content of the specified standard word and the non-standard characteristic vector S representing a certain style can be utilized to convert the specified standard word into the target sample of the style, and the character image in the style does not need to be collected to synthesize the sample, so that the generation efficiency of the sample is improved, and the samples containing different character contents in various styles can be generated according to the needs, thereby realizing the diversity of the sample.
In one embodiment, the above method flow can be performed by the sample generation apparatus 100, and as shown in fig. 2, the sample generation apparatus 100 can include 2 modules: a feature description vector acquisition module 101 and a target sample generation module 102. The feature description vector obtaining module 101 is configured to perform the step S100, and the target sample generating module 102 is configured to perform the step S200.
In one embodiment, in step S100, the obtaining a feature description vector C of a specific standard word includes:
inputting a first image containing the specified standard words into a first neural network in a trained student network, and performing feature extraction on the input first image by the first neural network to obtain a feature description vector C.
The student network is trained in advance, and can be pre-stored in the electronic device or stored in an external device, and the electronic device calls the first neural network in the student network when the method needs to be executed.
The first image may be obtained by acquiring a specified standard word in a real scene, or may be obtained by format conversion of a specified standard word in a specified font word library, and the specific manner is not limited. The first image may be preset in the electronic device, and the first image is acquired from the electronic device when being executed.
20000 Song type characters (with the format of ttf) exist in the Song type character library, and the first image can be obtained according to the existing Song type characters in the Song type character library. For example, the song body characters in the song body character library can be directly converted from the ttf format to the image format to obtain a first image; alternatively, the first image may be generated by fusing the song body characters in the song body character library with background data (e.g., background data representing a white background).
And inputting the first image into a first neural network, and after the first neural network extracts the features of the first image, obtaining a feature description vector C of the specified standard word. This function of the first neural network may be provided by training.
Specifically, referring to fig. 3, the first image is, for example, an image with a size of 64 × 64, the specified standard word in the first image is, for example, "eye" of the sons, and the first neural network performs feature extraction on the first image to obtain a 512-dimensional feature description vector C indicating "eye".
In one embodiment, the feature extraction of the input first image by the first neural network to obtain a feature description vector C includes:
the first neural network performs feature extraction on the first image at least through a convolution layer for performing convolution processing and a first nonlinear transformation layer for performing nonlinear transformation processing to obtain a feature description vector C.
The first neural network may include a plurality of convolutional layers, which perform a convolution operation, may perform feature extraction on the first image to obtain a feature description vector, and may output the feature description vector to the first nonlinear transformation layer. The first nonlinear transformation layer can enhance the fitting capability of the neural network, and the fitted feature description vectors are output by the first nonlinear transformation layer to serve as feature description vectors. Of course, the layer structure in the first neural network is not limited thereto, and may further include other layers such as a Pooling layer (Pooling), which is a special down-sampling layer, that is, the feature description vector obtained by convolution is reduced in dimension.
The first neural network may be implemented by using, for example, convolutional neural network architectures such as VGG, inclusion, and ResNet, and is not particularly limited thereto. The convolutional neural network is a feedforward neural network, and neurons of the convolutional neural network can respond to peripheral units in a limited coverage range and effectively extract structural information of an image through weight sharing and feature convergence.
In one embodiment, in step S200, converting a specified standard word into a target sample using the feature description vector C and a specified non-standard feature vector S includes:
s201: inputting the feature description vector C and the non-standard feature vector S into a second neural network in a trained student network, fusing the feature description vector C and the non-standard feature vector S by the second neural network to obtain a fusion vector T, and generating a second image by using the fusion vector T;
s202: determining the second image as the target sample.
The student network is trained in advance, can be prestored in the electronic equipment or stored in the external equipment, and the electronic equipment recalls the second neural network in the student network when the method needs to be executed.
After the feature description vector C and the non-standard feature vector S are input into the second neural network, the second neural network fuses the input feature description vector C and the non-standard feature vector S to obtain a fusion vector T, and a second image is generated by using the fusion vector T. The style of the second image is consistent with the non-standard characteristic vector S and the content of the contained characters is consistent with the content of the specified standard characters.
Based on training of the second neural network, the style corresponding to the second image can be specified by inputting different non-standard feature vectors, and the method can be suitable for generating samples with different font styles. For example, the style represented by the input non-standard feature vector S is a willow element, and then the style of the generated second image is a willow element; the style represented by the input non-standard feature vector S is rice lung, and then the style of the generated second image is rice lung, and so on.
With continued reference to FIG. 3, the first neural network performs feature extraction on the first image to obtain a 512-dimensional feature description vector C indicating "eye", and inputs the feature description vector C into the second neural network, together with the non-standard feature vector S into the second neural network. The style represented by the non-standard feature vector S is, for example, a specified style, and the dimension of the non-standard feature vector S is, for example, 100 dimensions. The second neural network fuses the input 512-dimensional feature description vector C and the 100-dimensional non-standard feature vector S into a fusion vector T, and generates a second image using the fusion vector T obtained by fusion, where the second image is, for example, an image 64 × 64 in size, and the second image includes "eyes" of a specified style, and serves as a target sample. In step S201, the second neural network fuses the feature description vector C and the non-standard feature vector S to obtain a fusion vector T in more than one implementation, for example, the implementation includes the following three implementations:
in a first implementation, the second neural network includes a fusion layer; the feature description vector C has the same dimension as the non-standard feature vector S;
the second neural network fuses the feature description vector C and the non-standard feature vector S to obtain a fusion vector T, and the fusion vector T comprises the following steps:
and the second neural network utilizes a fusion layer to perform superposition processing on the feature description vector C and the non-standard feature vector S to obtain the fusion vector T.
In this manner, the fusion layer is a calculation layer for performing the vector superimposition processing, and the feature description vector C and the non-standard feature vector S can be superimposed to obtain the fusion vector T.
The superposition processing may be weighted superposition processing, in which the feature description vector C is weighted and summed with the non-standard feature vector S in each dimension. For example, C ═ C (a1, a2, a3, … …, a512), S ═ S (b1, b2, b3, … …, b512), and the fusion vector T after weighted superposition processing ═ a1 × 1+ b1 × 1, a2 × 2+ b2 × 2, a3 × 3+ b3 × 3, … …, a512 × 512+ b512 × 512), where (x1, x2, x3, … …, x512) are weight coefficients when the feature description vector C is weighted in each dimension, and (y1, y2, y3, … …, y512) are weight coefficients when the non-standard feature vector S is weighted in each dimension.
In a second implementation, the second neural network includes a fully-connected layer and a fused layer; the feature description vector C has a different dimension from the non-standard feature vector S;
the second neural network fuses the feature description vector C and the non-standard feature vector S to obtain a fusion vector T, and the fusion vector T comprises the following steps:
the second neural network utilizes a fully connected layer to map the non-standard feature vector S into a reference vector K with dimensions identical to those of the feature description vector C;
and the second neural network utilizes a fusion layer to perform superposition processing on the feature description vector C and the reference vector K to obtain the fusion vector T.
The fully-connected layer is a computational layer for performing vector dimension mapping. For example, the dimension of the non-standard feature vector S is 100 dimensions, the dimension of the feature description vector C is 512 dimensions, and the non-standard feature vector S can be mapped to the reference vector K with the dimension of 512 dimensions through the full connection layer, so that the dimension expansion is realized. The fusion layer is a calculation layer for performing vector overlay processing, and the feature description vector C and the reference vector K may be subjected to overlay processing to obtain a fusion vector T, and the overlay processing is similar to that in the first implementation manner, and is not described herein again.
In a third implementation, the second neural network includes a fusion layer;
the second neural network fuses the feature description vector C and the non-standard feature vector S to obtain a fusion vector T, and the fusion vector T comprises the following steps:
and the second neural network combines the feature description vector C and the non-standard feature vector S by utilizing a fusion layer to obtain the fusion vector T.
This implementation is particularly suitable for the case where the feature description vector C has a different dimension than the non-standard feature vector S, but is also applicable for the case where the dimensions are the same.
In this way, the fusion layer is a computing layer for performing vector merging processing, and a new line fusion vector T can be obtained by performing merging processing on the feature description vector C and the non-standard feature vector S.
The merging of the vectors is the splicing of the two vectors in the dimension, and the merged vector dimension is the dimension sum of the two vectors. For example, C ═ is (a1, a2, a3, … …, a512), S ═ is (b1, b2, b3, … …, b100), and T ═ is combined (a1, a2, a3, … …, a512, b1, b2, b3, … …, b 100).
In one embodiment, the second neural network further comprises: a deconvolution layer for performing deconvolution processing, and a second nonlinear transformation layer for performing nonlinear transformation;
the second neural network generating a second image using the fusion vector T comprises:
the second neural network generates a second image corresponding to the fusion vector T by using the deconvolution layer and a second nonlinear conversion layer.
The second neural network may include a plurality of deconvolution layers that perform a deconvolution operation, may generate a second image using the fusion vector T, and output the second image to the second nonlinear transformation layer. The second nonlinear transformation layer can also enhance the fitting capability of the neural network, and the second nonlinear transformation layer outputs a fitted second image. Of course, the layer structure of the second neural network is not limited thereto, and may further include other layers such as a fully-connected layer, which may implement mapping of dimensions, such as mapping the dimensions of the input vector to vectors of higher dimensions, and may also be replaced with convolutional layers.
In one embodiment, the student network is trained under the supervision of a trained teacher network;
the network parameters of at least one layer in the second neural network are applied to the network parameters of the corresponding layer in the teacher network.
The training of the first neural network and the second neural network is supervised by training a teacher network. In the training mode of this embodiment, the connection structure of the first neural network and the second neural network is used as a student network.
Referring to fig. 4, before training the student network, a teacher network is trained, wherein the teacher network includes a first neural network a1 and a second neural network a 2. Wherein, the layer structure of the first neural network a1 and the first neural network in the student network can be the same; the layer structure of the second neural network a2 and the second neural network in the student network may be similar, except that fusion of vectors need not be performed, and thus the fusion layer may be omitted. The training is divided into two steps:
firstly, taking a sample (which can be collected from a real scene) containing characters with a specified style (non-Song style characters) as the input and the output of a teacher network, training the teacher network, and obtaining network parameters of each layer in a first neural network A1 and a second neural network A2 after the teacher network is trained;
then, the network parameters of one or more layers of the second neural network a2 in the trained teacher network are used as the network parameters of the corresponding layer of the second neural network in the student network, the sample containing the Song body character "eyeball" is used as the input of the student network, the non-standard feature vector S representing the designated style is used as the input of the second neural network in the student network, and the sample containing the designated style character "eyeball" is used as the output of the student network (the feature description vector obtained by the first neural network is input into the second neural network), the student network is trained, and the training of the student network is completed.
The following provides a way to train the first neural network and the second neural network.
The training of the first neural network and the second neural network is supervised by training a classifier capable of distinguishing the generated samples from the real samples. For the sake of brevity, in this training method, the connection structure between the first neural network and the second neural network is referred to as a neural network EG.
Referring to fig. 5, the real sample is a sample collected from a real scene and used for training the neural network EG, for example, an image collected from the real scene and containing characters of a designated style (non-songhua characters), and the image is shown as an image containing characters of a designated style "eyeball", and more real samples containing different characters of the designated style can be selected for training in the actual training process, so as to obtain better network parameters, and the training process is divided into two steps:
firstly, inputting a sample containing the Song body character 'eye' into a neural network to obtain a generated sample containing the character 'eye' with a specified style, taking a real sample containing the character 'eye' with the specified style and a nonstandard characteristic vector S representing the specified style as a group of input data, taking the generated sample of the neural network EG and the nonstandard characteristic vector S representing the specified style as another group of input data, respectively inputting the two groups of input data into a classifier, training the classifier to enable the classifier to distinguish the generated sample from the real sample, calculating the deviation between the generated sample and the real sample, and finishing training the classifier;
then, the training of the network parameters of the neural network EG is supervised through a trained classifier, a sample containing the Song body character eye is input into the neural network EG to obtain a generated sample containing the specified style character eye, the generated sample and a non-standard characteristic vector S representing the specified style are input into the classifier, the classifier calculates the deviation between the generated sample and a corresponding real sample, a loss value corresponding to the deviation is calculated through a loss function, the network parameters of the neural network EG are adjusted according to the loss value, the training of the network parameters of the neural network EG is carried out, the step of inputting the sample containing the Song body character eye into the neural network EG to obtain a generated sample containing the specified style character eye is carried out, the neural network EG is continuously trained until the loss value of the loss function is reduced to be within a reasonable range, and the training of the neural network EG is completed.
In the training process, the method comprises the following steps of inputting a sample containing the Song style characters 'eye' into a neural network EG to obtain a generated sample containing the appointed style characters 'eye', and comprises the following steps: inputting a sample containing the Song body character 'jing' into a first neural network, performing feature extraction on the input sample by the first neural network to obtain a feature description vector C for representing the content of the Song body character 'jing', and inputting the C into a second neural network; inputting the non-standard feature vector S representing the specified style into a second neural network, fusing the feature description vector C and the non-standard feature vector S by the second neural network, and generating an image by using the fused vector, wherein the generated image is used as a generation sample containing the character eye of the specified style.
In the two training modes, the first neural network and the second neural network are trained together. Of course, the first neural network and the second neural network may be trained separately.
The present invention also provides a sample generation apparatus, and referring to fig. 2, the sample generation apparatus 100 includes:
a feature description vector obtaining module 101, configured to obtain a feature description vector C of a specified standard word, where the feature description vector C is used to indicate content of the specified standard word;
and the target sample generation module 102 is configured to convert the specified standard word into a target sample by using the feature description vector C and the specified non-standard feature vector S, where a style corresponding to the target sample is the same as a style represented by the non-standard feature vector S.
In one embodiment, the feature description vector obtaining module is specifically configured to:
inputting a first image containing the specified standard words into a first neural network in a trained student network, and performing feature extraction on the input first image by the first neural network to obtain a feature description vector C.
In one embodiment, the feature extraction of the input first image by the first neural network to obtain a feature description vector C includes:
the first neural network performs feature extraction on the first image at least through a convolution layer for performing convolution processing and a first nonlinear transformation layer for performing nonlinear transformation processing to obtain a feature description vector C.
In one embodiment, the target sample generation module comprises:
the image generation unit is used for inputting the feature description vector C and the non-standard feature vector S into a second neural network in a trained student network, so that the feature description vector C and the non-standard feature vector S are fused by the second neural network to obtain a fusion vector T, and a second image is generated by using the fusion vector T;
a target sample determination unit for determining the second image as the target sample.
In one embodiment, the second neural network includes a fusion layer; the feature description vector C has the same dimension as the non-standard feature vector S;
when the second neural network fuses the feature description vector C and the non-standard feature vector S to obtain a fusion vector T, the second neural network is specifically configured to:
and the second neural network utilizes a fusion layer to perform superposition processing on the feature description vector C and the non-standard feature vector S to obtain the fusion vector T.
In one embodiment, the second neural network comprises a fully-connected layer and a fused layer; the feature description vector C has a different dimension from the non-standard feature vector S;
when the second neural network fuses the feature description vector C and the non-standard feature vector S to obtain a fusion vector T, the second neural network is specifically configured to:
the second neural network utilizes a fully connected layer to map the non-standard feature vector S into a reference vector K with dimensions identical to those of the feature description vector C;
and the second neural network utilizes a fusion layer to perform superposition processing on the feature description vector C and the reference vector K to obtain the fusion vector T.
In one embodiment, the second neural network includes a fusion layer;
when the second neural network fuses the feature description vector C and the non-standard feature vector S to obtain a fusion vector T, the second neural network is specifically configured to:
and the second neural network combines the feature description vector C and the non-standard feature vector S by utilizing a fusion layer to obtain the fusion vector T.
In one embodiment, the second neural network further comprises: a deconvolution layer for performing deconvolution processing, and a second nonlinear transformation layer for performing nonlinear transformation;
when the second neural network generates the second image by using the fusion vector T, the second neural network is specifically configured to:
the second neural network generates a second image corresponding to the fusion vector T by using the deconvolution layer and a second nonlinear conversion layer.
In one embodiment, the student network is trained under the supervision of a trained teacher network;
the network parameters of at least one layer in the second neural network are applied to the network parameters of the corresponding layer in the teacher network.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts shown as units may or may not be physical units.
The invention also provides an electronic device, which comprises a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor, when executing the program, implements the sample generation method as described in the foregoing embodiments.
Embodiments of the sample generation apparatus of the present invention may be applied to electronic devices. Taking a software implementation as an example, as a logical device, the device is formed by reading, by a processor of the electronic device where the device is located, a corresponding computer program instruction in the nonvolatile memory into the memory for operation. From a hardware aspect, as shown in fig. 6, fig. 6 is a hardware structure diagram of an electronic device where the sample generation apparatus 100 is located according to an exemplary embodiment of the present invention, and except for the processor 510, the memory 530, the interface 520, and the nonvolatile memory 540 shown in fig. 6, the electronic device where the apparatus 100 is located in the embodiment may also include other hardware according to an actual function of the electronic device, which is not described again.
The present invention also provides a machine readable storage medium having stored thereon a program which, when executed by a processor, implements a sample generation method as described in any one of the preceding embodiments.
The present invention may take the form of a computer program product embodied on one or more storage media including, but not limited to, disk storage, CD-ROM, optical storage, and the like, having program code embodied therein. Machine-readable storage media include both permanent and non-permanent, removable and non-removable media, and the storage of information may be accomplished by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of machine-readable storage media include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (11)

1. A method of generating a sample, comprising:
acquiring a feature description vector of a specified standard word, wherein the feature description vector is used for indicating the content of the specified standard word;
and converting the specified standard words into target samples by using the feature description vectors and the specified non-standard feature vectors, wherein the style corresponding to the target samples is the same as the style represented by the non-standard feature vectors.
2. The sample generation method of claim 1, wherein said obtaining a feature description vector specifying a standard word comprises:
inputting a first image containing the specified standard words into a first neural network in a trained student network, and performing feature extraction on the input first image by the first neural network to obtain a feature description vector.
3. The sample generation method of claim 2, wherein the first neural network performs feature extraction on the input first image to obtain a feature description vector, comprising:
the first neural network performs feature extraction on the first image at least through a convolution layer for performing convolution processing and a first nonlinear transformation layer for performing nonlinear transformation processing to obtain a feature description vector.
4. The sample generation method of claim 1, wherein converting a specified standard word into a target sample using the feature description vector and a specified non-standard feature vector comprises:
inputting the feature description vector and the non-standard feature vector into a second neural network in a trained student network, fusing the feature description vector and the non-standard feature vector by the second neural network to obtain a fusion vector, and generating a second image by using the fusion vector;
determining the second image as the target sample.
5. The sample generation method of claim 4, wherein the second neural network comprises a fusion layer; the feature description vector has the same dimension as the non-standard feature vector;
the second neural network fuses the feature description vector and the non-standard feature vector to obtain a fused vector, including:
and the second neural network utilizes a fusion layer to perform superposition processing on the feature description vector and the non-standard feature vector S to obtain the fusion vector.
6. The sample generation method of claim 4, wherein the second neural network comprises a fully connected layer and a fused layer; the feature description vector is different in dimension from the non-standard feature vector;
the second neural network fuses the feature description vector and the non-standard feature vector to obtain a fused vector, including:
the second neural network utilizes a fully connected layer to map the non-standard feature vector to a reference vector having dimensions that are the same as dimensions of the feature description vector;
and the second neural network utilizes a fusion layer to perform superposition processing on the feature description vector and the reference vector to obtain the fusion vector.
7. The sample generation method of claim 4, wherein the second neural network comprises a fusion layer;
the second neural network fuses the feature description vector and the non-standard feature vector to obtain a fused vector, including:
and the second neural network combines the feature description vector and the non-standard feature vector S by utilizing a fusion layer to obtain the fusion vector.
8. The sample generation method of any of claims 5 to 7, wherein the second neural network further comprises: a deconvolution layer for performing deconvolution processing, and a second nonlinear transformation layer for performing nonlinear transformation;
the second neural network generating a second image using the fused vector comprises:
the second neural network generates a second image corresponding to the fusion vector using the deconvolution layer and a second nonlinear conversion layer.
9. The sample generation method of claim 4, wherein the student network is trained under trained teacher network supervision;
the network parameters of at least one layer in the second neural network are applied to the network parameters of the corresponding layer in the teacher network.
10. A sample generation device, comprising:
the characteristic description vector acquisition module is used for acquiring a characteristic description vector of the specified standard word, and the characteristic description vector is used for indicating the content of the specified standard word;
and the target sample generation module is used for converting the specified standard words into target samples by using the feature description vectors and the specified non-standard feature vectors, and the style corresponding to the target samples is the same as the style represented by the non-standard feature vectors.
11. An electronic device comprising a processor and a memory; the memory stores a program that can be called by the processor; wherein the processor, when executing the program, implements the sample generation method of any of claims 1-7, 9.
CN201910233792.XA 2019-03-26 2019-03-26 Sample generation method, device and equipment Active CN111753859B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910233792.XA CN111753859B (en) 2019-03-26 2019-03-26 Sample generation method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910233792.XA CN111753859B (en) 2019-03-26 2019-03-26 Sample generation method, device and equipment

Publications (2)

Publication Number Publication Date
CN111753859A true CN111753859A (en) 2020-10-09
CN111753859B CN111753859B (en) 2024-03-26

Family

ID=72671425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910233792.XA Active CN111753859B (en) 2019-03-26 2019-03-26 Sample generation method, device and equipment

Country Status (1)

Country Link
CN (1) CN111753859B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417959A (en) * 2020-10-19 2021-02-26 上海臣星软件技术有限公司 Picture generation method and device, electronic equipment and computer storage medium
CN113695058A (en) * 2021-10-28 2021-11-26 南通金驰机电有限公司 Self-protection method of intelligent waste crushing device for heat exchanger production

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170098153A1 (en) * 2015-10-02 2017-04-06 Baidu Usa Llc Intelligent image captioning
JP2018132855A (en) * 2017-02-14 2018-08-23 国立大学法人電気通信大学 Image style conversion apparatus, image style conversion method and image style conversion program
CN108664996A (en) * 2018-04-19 2018-10-16 厦门大学 A kind of ancient writing recognition methods and system based on deep learning
CN109064522A (en) * 2018-08-03 2018-12-21 厦门大学 The Chinese character style generation method of confrontation network is generated based on condition
CN109165376A (en) * 2018-06-28 2019-01-08 西交利物浦大学 Style character generating method based on a small amount of sample

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170098153A1 (en) * 2015-10-02 2017-04-06 Baidu Usa Llc Intelligent image captioning
JP2018132855A (en) * 2017-02-14 2018-08-23 国立大学法人電気通信大学 Image style conversion apparatus, image style conversion method and image style conversion program
CN108664996A (en) * 2018-04-19 2018-10-16 厦门大学 A kind of ancient writing recognition methods and system based on deep learning
CN109165376A (en) * 2018-06-28 2019-01-08 西交利物浦大学 Style character generating method based on a small amount of sample
CN109064522A (en) * 2018-08-03 2018-12-21 厦门大学 The Chinese character style generation method of confrontation network is generated based on condition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANGELINE AGUINALDO等: "Compressing GANs using Knowledge Distillation", ARXIV:1902.00159V1 [CS.CV], pages 38 - 39 *
徐杨;: "基于隐式马尔可夫模型的遗传类比学习在中国书法生成中的应用", 武汉大学学报(理学版), no. 01, 29 February 2008 (2008-02-29), pages 90 - 94 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417959A (en) * 2020-10-19 2021-02-26 上海臣星软件技术有限公司 Picture generation method and device, electronic equipment and computer storage medium
CN113695058A (en) * 2021-10-28 2021-11-26 南通金驰机电有限公司 Self-protection method of intelligent waste crushing device for heat exchanger production
CN113695058B (en) * 2021-10-28 2022-03-15 南通金驰机电有限公司 Self-protection method of intelligent waste crushing device for heat exchanger production

Also Published As

Publication number Publication date
CN111753859B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
Cai et al. Learning delicate local representations for multi-person pose estimation
Xu et al. Pad-net: Multi-tasks guided prediction-and-distillation network for simultaneous depth estimation and scene parsing
Lin et al. SCN: Switchable context network for semantic segmentation of RGB-D images
CN108229478B (en) Image semantic segmentation and training method and device, electronic device, storage medium, and program
Kulhánek et al. Viewformer: Nerf-free neural rendering from few images using transformers
CN114049584A (en) Model training and scene recognition method, device, equipment and medium
CN111860138B (en) Three-dimensional point cloud semantic segmentation method and system based on full fusion network
CN113822951B (en) Image processing method, device, electronic equipment and storage medium
CN113361251A (en) Text image generation method and system based on multi-stage generation countermeasure network
CN111753859B (en) Sample generation method, device and equipment
CN111738269A (en) Model training method, image processing device, model training apparatus, and storage medium
CN115223020B (en) Image processing method, apparatus, device, storage medium, and computer program product
CN113903022A (en) Text detection method and system based on feature pyramid and attention fusion
Oeljeklaus An integrated approach for traffic scene understanding from monocular cameras
CN115797731A (en) Target detection model training method, target detection model detection method, terminal device and storage medium
Leng et al. Pseudoaugment: Learning to use unlabeled data for data augmentation in point clouds
Huang et al. Label-guided auxiliary training improves 3d object detector
CN113723352A (en) Text detection method, system, storage medium and electronic equipment
Kar Mastering Computer Vision with TensorFlow 2. x: Build advanced computer vision applications using machine learning and deep learning techniques
CN116958324A (en) Training method, device, equipment and storage medium of image generation model
CN112419249B (en) Special clothing picture conversion method, terminal device and storage medium
CN112329735B (en) Training method of face recognition model and online education system
Singh et al. CanvasGAN: A simple baseline for text to image generation by incrementally patching a canvas
CN114037056A (en) Method and device for generating neural network, computer equipment and storage medium
CN116152334A (en) Image processing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant