CN115661304A - Word stock generation method based on frame interpolation, electronic device, storage medium and system - Google Patents
Word stock generation method based on frame interpolation, electronic device, storage medium and system Download PDFInfo
- Publication number
- CN115661304A CN115661304A CN202211244030.8A CN202211244030A CN115661304A CN 115661304 A CN115661304 A CN 115661304A CN 202211244030 A CN202211244030 A CN 202211244030A CN 115661304 A CN115661304 A CN 115661304A
- Authority
- CN
- China
- Prior art keywords
- neural network
- convolutional neural
- frame
- character
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000003860 storage Methods 0.000 title claims abstract description 12
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 103
- 238000012549 training Methods 0.000 claims abstract description 25
- 230000006870 function Effects 0.000 claims description 40
- 230000009466 transformation Effects 0.000 claims description 26
- 230000006835 compression Effects 0.000 claims description 24
- 238000007906 compression Methods 0.000 claims description 24
- 238000012546 transfer Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 10
- 230000007246 mechanism Effects 0.000 claims description 8
- 230000008447 perception Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 10
- 238000004519 manufacturing process Methods 0.000 abstract description 3
- 238000010606 normalization Methods 0.000 description 17
- 238000005070 sampling Methods 0.000 description 16
- 230000004913 activation Effects 0.000 description 12
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000012512 characterization method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000000087 stabilizing effect Effects 0.000 description 4
- 230000001131 transforming effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 229940073619 encare Drugs 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000008929 regeneration Effects 0.000 description 1
- 238000011069 regeneration method Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Processing (AREA)
Abstract
The present disclosure relates to a frame interpolation-based word stock generation method, electronic device, storage medium, and system, which uses video data as pre-training data, models a problem of generating multiple-word repeated fonts as a problem of continuous frame interpolation, and fine-tunes network parameters of a constructed convolutional neural network using an existing multiple-word repeated font data set, thereby generating a plurality of different word weights between a coarsest word weight and a finest word weight. The method can greatly shorten the time for manufacturing other characters, not only improves the generation effect, but also enables the effect of generating other characters to be more attractive and reasonable than the character patterns manufactured by a point method, has more obvious style characteristics and is more convenient to read.
Description
Technical Field
The present disclosure relates to the field of word stocks, and in particular, to a word stock generation method, an electronic device, a storage medium, and a system based on frame interpolation.
Background
The characters are visible everywhere in our life, and are one of the main tools for information transmission, and in order to ensure the effectiveness of information transmission, the characters are presented in different ways and forms on different occasions, different situations and different devices. The multi-character repeated character library is a concept which is presented to meet different display effects, and is a series of characters with the same style and different character repetition. The word weight is the stroke weight of the font, and the international standard ISO defines 9 classes of the word weight, namely, from W1 to W9, the word weight is super-fine, slightly fine, medium, slightly coarse, super-coarse and super-coarse. The variety of the weight of the character can increase the applicability of the font, and one type of font may appear on the small-sized displays of the title, the body, the poster and the embedded device, and if the same weight of the character is used, the expected rendering effect is difficult to achieve. Therefore, making fonts with different font weights is very important to the applicability of the fonts.
The most basic method for making multiple character repeat word stock is that the designer designs a set of word stock first, then adjusts the word repeat by editing the control point of each character, so as to achieve the purpose of modifying the word repeat without reducing the style characteristics of the font and the aesthetic feeling of the face. However, the workload of this method is related to the size of the character set of the word stock and how many characters need to be made, and it usually takes at least several months to make a set of nine-style characters.
In order to reduce the workload of designers and accelerate the manufacture of a multi-character repeated word stock, a common method is a point-to-point method, namely, the designers are enabled to design word stocks with the thickest and thinnest characters, then engineers correspond to the same control points of the same characters in two sets of word stocks, and then the control points are sequentially moved on the positions between the corresponding control points, so that different character repeated effects are realized. However, although the amount of work of a designer is greatly reduced in the case of the peer-to-peer method, the amount of work is reduced from 9 different weights to only 2 weights, but there is a great limitation in the peer-to-peer method. Because the composition structure of each character of the Chinese character and the space gap of components are different, if the position of a control point is mechanically moved according to the proportion, the whole font structure looks uncoordinated, the style of the font is changed, and the universality is not realized, and parameters need to be debugged according to each font. Therefore, the point method is only suitable for a few fonts with simple styles, the application range is small, and the generated word stock is not beautiful enough.
Disclosure of Invention
The present disclosure provides a word stock generation method, an electronic device, a storage medium, and a system based on frame interpolation to at least solve at least one technical problem in the background art described above.
In a preferred embodiment of the present disclosure, an embodiment of the present application provides a word stock generation method based on frame interpolation, where the method includes:
acquiring video data and storing the video data frame by frame into pictures to obtain a video frame data set consisting of the pictures;
pre-training a convolutional neural network by using the video frame data set to obtain a pre-trained convolutional neural network;
fine-tuning the network parameters of the pre-trained convolutional neural network by using the multi-font data set to obtain a fine-tuned convolutional neural network;
and inputting the large repeated character body and the small repeated character body of the same style into the finely tuned convolutional neural network to obtain the multi-character repeated character library of the same style.
Further, the pre-training the convolutional neural network by using the video frame data set to obtain the pre-trained convolutional neural network comprises the following steps:
splicing the pictures of the ith frame and the (i + 2) th frame of the stored video data in the channel dimension to be used as the input of a convolutional neural network;
taking the predicted video frame picture of the (i + 1) th frame as the output of the convolutional neural network;
updating network parameters of the convolutional neural network according to the loss function;
and stopping the pre-training when the pre-trained convolutional neural network is converged to obtain the pre-trained convolutional neural network.
Further, the fine tuning of the network parameters of the pre-trained convolutional neural network using the multi-word repeated font data set to obtain the fine-tuned convolutional neural network includes the following steps:
acquiring a multiple font data set;
taking two character style data separated by one character weight in the multi-character weight character style data set as the input of a pre-trained convolutional neural network to generate character style data with the middle character weight;
updating the network parameters of the convolutional neural network after pre-training according to the loss function;
and stopping fine tuning when the fine tuned convolutional neural network is converged to obtain the fine tuned convolutional neural network.
Further, the loss function uses an average absolute error loss function and a perceptual loss function at the same time, and the calculation formula of the average absolute error loss function and the perceptual loss function is as follows:
L1_Loss=||G(F i ,F i+2 )-F i+1 || 1
wherein L1_ Loss represents the average absolute error Loss function, G represents the constructed convolutional neural network G, F i 、F i+1 And F i+2 Video frames respectively representing the ith frame, the (i + 1) th frame and the (i + 2) th frame of video data, LPIPS _ Loss represents a perceptual Loss function, weight L The coefficients representing the network parameters of the L-th layer, ☉ represents the bitwise multiplication of the coefficients and the characteristics of the network parameters of the L-th layer,andand feature maps respectively representing that the result output by the convolutional neural network G and the target video frame of the (i + 1) th frame pass through the L-th layer of the deep convolutional neural network VGG, and the size of the feature maps is h multiplied by w.
Further, the conditions for the convolutional neural network convergence are: respectively calculating the values of an average absolute error loss function and a perception loss function; and summing the two calculated values, and converging the convolutional neural network when the sum of the two values does not decrease.
Further, before pre-training the convolutional neural network using the set of video frame data, constructing a convolutional neural network, the convolutional neural network comprising three modules:
the characteristic compression coding module is used for coding an input video frame into a 4-dimensional tensor and then compressing the input video frame to obtain a compressed 4-dimensional tensor; the characteristic compression coding module consists of a convolution network layer, a characteristic normalization layer, a characteristic activation layer and a down-sampling network layer, wherein the convolution network layer is used for carrying out linear transformation on a video frame to obtain the characteristics of the convolution layer; the characteristic normalization layer is used for carrying out numerical value normalization operation on the characteristics obtained by the convolution layer, so that the range of the characteristics is kept between [ -1,1 ]; the characteristic activation layer is used for carrying out nonlinear transformation on the normalized characteristic to obtain a nonlinear characteristic; the down-sampling network layer is used for performing down-sampling operation on the obtained nonlinear features in the 2 nd and 3 rd dimensions to obtain the nonlinear features with smaller sizes;
the characteristic transfer module is used for carrying out nonlinear transformation on the characteristics extracted by the characteristic compression coding module for multiple times, so that the characteristics after the nonlinear transformation have more characterization capability, and the characteristic transfer module is realized by adding a self-attention mechanism in a residual error layer; the residual error layer is used for accelerating the network convergence speed and stabilizing the output of the network; the self-attention mechanism is used for extracting local features, so that network parameters can be updated conveniently;
the characteristic decoding module is used for decoding the characteristics after nonlinear transformation into an image space and transforming the characteristics into an image to be output; the feature decoding module consists of a two-dimensional reverse convolution network layer, a feature instance normalization layer, a feature activation layer and an up-sampling network layer, wherein the reverse convolution network layer is used for decoding features, and the feature instance normalization layer is used for normalizing feature values to ensure the stability of training; the characteristic activation layer is used for carrying out nonlinear transformation on the normalized characteristic to obtain a nonlinear characteristic; and the up-sampling network layer is used for amplifying the size of the features and complementing the spatial information.
In a preferred embodiment of the present disclosure, an embodiment of the present application further provides a system for generating a word stock based on frame interpolation, including:
the video frame data set generation module is used for acquiring video data and storing the video data into pictures frame by frame to obtain a video frame data set consisting of the pictures;
the pre-training module is used for pre-training the convolutional neural network by using the video frame data set to obtain the pre-trained convolutional neural network;
the fine tuning module is used for fine tuning the network parameters of the pre-trained convolutional neural network by using the multi-character repeated font data set to obtain the fine-tuned convolutional neural network;
and the multi-character repeated character library generating module is used for inputting the large character repeated font and the small character repeated font of the same style into the finely tuned convolutional neural network to obtain the multi-character repeated character library of the same style.
Further, the word stock generation system based on frame interpolation further comprises a convolutional neural network construction module, wherein the convolutional neural network is composed of three modules:
the characteristic compression coding module is used for coding an input video frame into a 4-dimensional tensor and then compressing the 4-dimensional tensor so as to obtain a compressed 4-dimensional tensor; the characteristic compression coding module consists of a convolution network layer, a characteristic normalization layer, a characteristic activation layer and a down-sampling network layer, wherein the convolution network layer is used for carrying out linear transformation on a video frame to obtain the characteristics of the convolution layer; the characteristic normalization layer is used for carrying out numerical value normalization operation on the characteristics obtained by the convolution layer, so that the range of the characteristics is kept between [ -1,1 ]; the characteristic activation layer is used for carrying out nonlinear transformation on the normalized characteristic to obtain a nonlinear characteristic; the down-sampling network layer is used for performing down-sampling operation on the obtained nonlinear features in the 2 nd and 3 rd dimensions to obtain the nonlinear features with smaller sizes;
the characteristic transfer module is used for carrying out nonlinear transformation on the characteristics extracted by the characteristic compression coding module for multiple times, so that the characteristics after the nonlinear transformation have more characterization capability, and the characteristic transfer module is realized by adding a self-attention mechanism in a residual error layer; the residual error layer is used for accelerating the network convergence speed and stabilizing the output of the network; the self-attention mechanism is used for extracting local features, so that network parameters can be updated conveniently;
the characteristic decoding module is used for decoding the characteristics after nonlinear transformation into an image space and transforming the characteristics into an image to be output; the feature decoding module consists of a two-dimensional reverse convolution network layer, a feature instance normalization layer, a feature activation layer and an up-sampling network layer, wherein the reverse convolution network layer is used for decoding features, and the feature instance normalization layer is used for normalizing feature values to ensure the stability of training; the characteristic activation layer is used for carrying out nonlinear transformation on the normalized characteristic to obtain a nonlinear characteristic; and the up-sampling network layer is used for amplifying the size of the features and complementing the spatial information.
In a preferred embodiment of the present disclosure, an electronic device is further provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the above-mentioned word stock generation method based on frame interpolation.
In a preferred embodiment of the present disclosure, a computer-readable storage medium is further provided, on which a computer program is stored, and the program, when executed by a processor, implements the steps of the above-mentioned word stock generation method based on frame interpolation.
The beneficial effects of this disclosure are: the present disclosure uses video data as pre-training data, models the problem of multi-weight font generation as a continuous frame interpolation problem, and uses existing multi-weight font data sets to fine tune the network parameters of the constructed convolutional neural network, thereby generating a variety of different weights between the coarsest and finest weights. The method greatly shortens the time for manufacturing other characters, not only improves the generation effect, but also has more attractive and reasonable effect of generating other characters than the characters manufactured by a point method, has more obvious style characteristics and is more convenient to read.
Drawings
FIG. 1 is a flow chart of a method for generating a word stock based on frame interpolation;
FIG. 2 is a block diagram of a convolutional neural network;
FIG. 3 is a block diagram of a word stock generation system based on frame interpolation;
FIG. 4 is a diagram of the effect of multi-character regeneration of font characters between 45-65.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Example 1
Referring to fig. 1, according to the word stock generation method based on frame interpolation provided in the exemplary embodiment of the present disclosure, for the technical problem mentioned in the background art, video data is used as pre-training data, the generation problem of multi-word repeated font is modeled as a continuous frame interpolation problem, and the existing multi-word repeated font data set is used to fine tune the network parameters of the constructed convolutional neural network, so as to adapt to the generation of font domain data and improve the generation effect.
The implementation process of the exemplary frame interpolation-based word stock generation method comprises the following steps:
collecting video data and storing the video data frame by frame in picture format and named with a frame number, F i 、F i+1 And F i+2 Video frames representing the ith frame, the (i + 1) th frame, and the (i + 2) th frame of video data, respectively.
Constructing a convolutional neural network, wherein the convolutional neural network mainly comprises three modules: the device comprises a feature compression coding module, a feature transfer module and a feature decoding module. The module structure of the convolutional neural network is shown in fig. 2, in which:
the characteristic compression coding module is used for compressing an input video frame after the video frame is coded into a 4-dimensional tensor so as to obtain a compressed 4-dimensional tensor, the characteristic compression coding module uses a network structure of { Conv-BN-Relu } xN-Downample, wherein N represents the occurrence times of the { Conv-BN-Relu } module and the Downample, N = [4,8] in the scheme, 2 characteristic compression coding sub-modules Enc are adopted to form the characteristic compression coding module, wherein Conv refers to a convolutional network layer and is used for carrying out linear transformation on the video frame to obtain the characteristics of the convolutional layer; BN refers to a characteristic normalization layer, which is used for carrying out numerical value normalization operation on the characteristics obtained by the convolution layer, so that the range of the characteristics is kept between [ -1,1 ]; relu is a feature activation layer, which is used for carrying out nonlinear transformation on the normalized features to obtain nonlinear features; down sample refers to a down-sampling network layer, which is used to perform down-sampling operation on the obtained nonlinear features in the 2 nd and 3 rd dimensions to obtain smaller-sized nonlinear features, such as: the nonlinear characteristic of (1,1,4,4) dimension can be obtained through a down-sampling layer, and the nonlinear characteristic of (1,1,2,2) dimension can be obtained;
the characteristic transfer module is used for carrying out multiple nonlinear transformation on the characteristics extracted by the characteristic compression coding module, so that the characteristics after the nonlinear transformation have more characterization capability; the characteristic transfer module uses residual structure link, the main flow part uses a self-attention structure, the residual layer uses 1x1 convolution module link, in the scheme, 4 characteristic transfer sub-modules res are used for forming the characteristic transfer module, wherein the residual layer is used for accelerating the network convergence speed and stabilizing the network output; the self-attention mechanism is used for extracting local features, so that network parameters can be updated conveniently;
the characteristic decoding module is used for decoding the characteristics after the nonlinear transformation into an image space and transforming the characteristics into an image for output; the characteristic decoding module is composed of { Conv2 dTransspan-IN-Relu } xM-Upsample, wherein M represents the occurrence frequency of the { Conv2 dTransspan-IN-Relu } module and the Upsample, M = [8,4] IN the scheme, and 2 characteristic decoding submodules Dec are adopted to form the characteristic compression coding module, wherein Conv2 dTransspan refers to a two-dimensional reverse convolution network layer and is used for decoding the characteristics; IN refers to a characteristic instance normalization layer which is used for normalization of characteristic values and ensures the stability of training; relu is a feature activation layer, which is used for carrying out nonlinear transformation on the normalized features to obtain nonlinear features; upsample refers to an upsampling network layer, which is used to enlarge the size of a feature and complete spatial information, such as a feature with one (1,1,2,2) dimension, and after passing through the upsampling layer, the upsampling network layer becomes (1,1,4,4).
And splicing the ith frame and the (i + 2) th frame together in the dimension of the channel to be used as the input of the neural network.
The function of the convolutional neural network is to predict the content of the (i + 1) th frame, and the loss is calculated from the output f _ int of the convolutional neural network and the (i + 1) th frame to update the network parameters of the convolutional neural network.
Calculating the values of the average absolute error loss function L1_ loss and the perception loss function LPIPS _ loss, stopping the pre-training when the sum of the two values does not decrease any more, and calculating formulas of L1_ loss and LPIPS _ loss are as follows:
L1_Loss=||G(F i ,F i+2 )-F i+1 || 1
wherein L1_ Loss represents the average absolute error Loss function, G represents the constructed convolutional neural network G, F i 、F i+1 And F i+2 Video frames respectively representing the ith frame, the (i + 1) th frame and the (i + 2) th frame of video data, LPIPS _ Loss represents a perceptual Loss function, weight L The coefficients representing the network parameters of the L-th layer, ☉ represents the bitwise multiplication of the coefficients and the characteristics of the network parameters of the L-th layer,andrespectively representing the result output by the convolutional neural network G and the characteristic diagram of the L-th layer of the deep convolutional neural network VGG of the target video frame of the (i + 1) th frameThe dimension of the figure is h × w, wherein the deep convolutional neural network VGG is a classic convolutional neural network developed by the scientific engineering department of oxford university, and is a convolutional neural network which is trained and completed.
After the convolutional neural network is pre-trained by using video data, fine adjustment of network parameters is carried out on the pre-trained convolutional neural network by using a multi-word repeated font data set, and the purpose of consistent data distribution is achieved. The multi-character repeated font data set mainly adopts a multi-character repeated font designed by a designer, such as Chinese character black and flag, and the total number of characters is 15.
Two font data separated by 1 weight are used as input to generate the font data with the middle weight. The two loss functions used are the same as in the pre-training phase.
And calculating the values of the average absolute error loss function L1_ loss and the perception loss function LPIPS _ loss, so that the sum of the two values is not reduced any more, the pretrained convolutional neural network is converged, and the fine tuning is stopped to obtain the fine-tuned convolutional neural network.
Inputting the large repeated character style and the small repeated character style of the same style into the finely tuned convolutional neural network to obtain the multiple repeated character library of the same style, wherein the effect graph of the generated multiple repeated character style is shown in fig. 4, and the character sizes from left to right are 45, 47, 49,.
Example 2
As shown in fig. 3, an exemplary word stock generation system based on frame interpolation includes:
a video frame data set generation module for collecting video data, storing the video data frame by frame as picture, and using frame number F i 、F i+1 And F i+2 Name wherein F i 、F i+1 And F i+2 Video frames representing the ith frame, the (i + 1) th frame, and the (i + 2) th frame of video data, respectively.
And the pre-training module is used for splicing the ith frame and the (i + 2) th frame together on the dimension of the channel to be used as the input of the neural network. The function of the convolutional neural network is to predict the content of the (i + 1) th frame, and the loss is calculated by the output f _ int of the convolutional neural network and the (i + 1) th frame to update the network parameters of the convolutional neural network, specifically: calculating the values of the average absolute error loss function L1_ loss and the perception loss function LPIPS _ loss, converging the convolutional neural network when the sum of the two values does not decrease, stopping pre-training to obtain the pre-trained convolutional neural network, wherein the calculation formulas of L1_ loss and LPIPS _ loss are as follows:
L1_Loss=||G(F i ,F i+2 )-F i+1 || 1
wherein L1_ Loss represents the average absolute error Loss function, G represents the constructed convolutional neural network G, F i 、F i+1 And F i+2 Video frames respectively representing the ith frame, the (i + 1) th frame and the (i + 2) th frame of video data, LPIPS _ Loss represents a perceptual Loss function, weight L The coefficients representing the network parameters of the L-th layer, ☉ represents the bitwise multiplication of the coefficients and the characteristics of the network parameters of the L-th layer,andand the result output by the convolutional neural network G and the target video frame of the (i + 1) th frame respectively pass through a feature map of the L-th layer of a deep convolutional neural network VGG, and the size of the feature map is h multiplied by w, wherein the deep convolutional neural network VGG is a classic convolutional neural network developed by the scientific engineering department of Oxford university and is a convolutional neural network which is trained.
The fine tuning module is used for carrying out fine tuning of network parameters on the pre-trained convolutional neural network by using the multi-character repeated font data set so as to achieve the purpose of consistent data distribution, and specifically comprises the following steps: and taking two character style data separated by 1 character weight in the multi-character weight character style data set as input, generating character style data with middle character weight, calculating the values of an average absolute error loss function L1_ loss and a perception loss function LPIPS _ loss, converging the pretrained convolution neural network when the sum of the two values does not decrease any more, and stopping fine tuning to obtain the fine-tuned convolution neural network. The multi-character repeated font data set mainly adopts a multi-character repeated font designed by a designer, such as Chinese character black and flag, and the total number of characters is 15. The two loss functions used by the fine tuning module are the same as in the pre-training module described above.
And the multi-character repeated character library generating module is used for inputting the large character repeated font and the small character repeated font of the same style into the finely tuned convolutional neural network to obtain the multi-character repeated character library of the same style.
Further, the word stock generation system based on frame interpolation further comprises a convolutional neural network, wherein the convolutional neural network mainly comprises three modules: the device comprises a feature compression coding module, a feature transfer module and a feature decoding module. The modular structure of the convolutional neural network is shown in fig. 2, in which:
the characteristic compression coding module is used for coding an input video frame into a 4-dimensional tensor and then compressing the input video frame to obtain a compressed 4-dimensional tensor, wherein the characteristic compression coding module uses a network structure of { Conv-BN-Relu } xN-Downample, N represents the number of times of occurrence of the { Conv-BN-Relu } module and the Downample, N = [4,8] in the scheme, and 2 characteristic compression coding sub-modules Enc are adopted to form the characteristic compression coding module, wherein Conv refers to a convolutional network layer and is used for carrying out linear transformation on the video frame to obtain the characteristics of a convolutional layer; BN refers to a feature normalization layer, and is used for carrying out numerical value normalization operation on the features obtained by the convolution layer, so that the range of the features is kept between [ -1,1 ]; relu is a feature activation layer, which is used for carrying out nonlinear transformation on the normalized features to obtain nonlinear features; down sample refers to a down-sampling network layer, which is used to perform down-sampling operation on the obtained nonlinear features in the 2 nd and 3 rd dimensions to obtain smaller-sized nonlinear features, such as: the nonlinear characteristic of (1,1,4,4) dimension can be obtained through a down-sampling layer, and the nonlinear characteristic of (1,1,2,2) dimension can be obtained;
the characteristic transfer module is used for carrying out multiple nonlinear transformation on the characteristics extracted by the characteristic compression coding module, so that the characteristics after the nonlinear transformation have more characterization capability; the characteristic transfer module uses residual structure link, the main process part uses a self-attention structure, the residual layer uses 1x1 convolution module link, in the scheme, 4 characteristic transfer sub-modules res are used to form the characteristic transfer module, wherein the residual layer is used for accelerating the network convergence speed and stabilizing the network output; the self-attention mechanism is used for extracting local features, so that network parameters can be updated conveniently;
the characteristic decoding module is used for decoding the characteristics after the nonlinear transformation into an image space and transforming the characteristics into an image for output; the characteristic decoding module is composed of { Conv2 dTransspan-IN-Relu } xM-Upsample, wherein M represents the occurrence frequency of the { Conv2 dTransspan-IN-Relu } module and the Upsample, M = [8,4] IN the scheme, and 2 characteristic decoding submodules Dec are adopted to form the characteristic compression coding module, wherein Conv2 dTransspan refers to a two-dimensional reverse convolution network layer and is used for decoding the characteristics; IN refers to a characteristic instance normalization layer which is used for normalizing characteristic values and ensuring the stability of training; relu is a feature activation layer, which is used for carrying out nonlinear transformation on the normalized features to obtain nonlinear features; the Upsample refers to an upsampling network layer, and is used for enlarging the size of a feature and completing spatial information, for example, a feature with one (1,1,2,2) dimension, which becomes (1,1,4,4) after passing through the upsampling layer.
Example 3
An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the frame interpolation based word stock generation method of embodiment 1 when executing the computer program.
Embodiment 3 of the present disclosure is merely an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present disclosure.
The electronic device may be embodied in the form of a general purpose computing device, which may be, for example, a server device. Components of the electronic device may include, but are not limited to: at least one processor, at least one memory, and a bus connecting different system components (including the memory and the processor).
The buses include a data bus, an address bus, and a control bus.
The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may further include read-only memory (ROM).
The memory may also include program means having a set of (at least one) program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor executes various functional applications and data processing by executing computer programs stored in the memory.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface. Also, the electronic device may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through a network adapter. The network adapter communicates with other modules of the electronic device over the bus. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, to name a few.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 4
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the frame interpolation-based word stock generation method of embodiment 1.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation, the present disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps of implementing the frame interpolation based word stock generation method described in embodiment 1, when the program product is run on the terminal device.
Where program code for carrying out the disclosure is written in any combination of one or more programming languages, the program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
Although embodiments of the present disclosure have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations may be made in these embodiments without departing from the principles and spirit of the disclosure, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. A method for generating a word stock based on frame interpolation is characterized by comprising the following steps:
acquiring video data and storing the video data frame by frame into pictures to obtain a video frame data set consisting of the pictures;
pre-training a convolutional neural network by using the video frame data set to obtain a pre-trained convolutional neural network;
fine-tuning network parameters of the pre-trained convolutional neural network by using the multi-word repeated font data set to obtain a fine-tuned convolutional neural network;
and inputting the large repeated character body and the small repeated character body of the same style into the finely tuned convolutional neural network to obtain the multi-character repeated character library of the same style.
2. The method for generating a word stock based on frame interpolation according to claim 1, wherein the pre-training of the convolutional neural network using the video frame data set to obtain a pre-trained convolutional neural network comprises the following steps:
splicing the pictures of the ith frame and the (i + 2) th frame of the stored video data in the channel dimension to be used as the input of a convolutional neural network;
taking the predicted video frame picture of the (i + 1) th frame as the output of the convolutional neural network;
updating network parameters of the convolutional neural network according to the loss function;
and stopping the pre-training when the pre-trained convolutional neural network is converged to obtain the pre-trained convolutional neural network.
3. The method of generating a word stock based on frame interpolation of claim 1, wherein the fine tuning of the network parameters of the pre-trained convolutional neural network using the multigram data set to obtain the fine tuned convolutional neural network comprises the steps of:
acquiring a multi-font data set;
taking two character style data separated by one character weight in the multi-character weight character style data set as the input of a pre-trained convolutional neural network to generate character style data with the middle character weight;
updating the network parameters of the pre-trained convolutional neural network according to the loss function;
and stopping fine tuning when the fine tuned convolutional neural network is converged to obtain the fine tuned convolutional neural network.
4. The frame interpolation based word stock generating method according to claim 2 or 3,
the loss function simultaneously uses an average absolute error loss function and a perception loss function, and the calculation formulas of the average absolute error loss function and the perception loss function are as follows:
L1_Loss=||G(F i ,F i + 2 )-F i+1 || 1
wherein L1_ Loss represents the average absolute error Loss function, G represents the constructed convolutional neural network G, F i 、F i+1 And F i+2 Video frames respectively representing the ith frame, the (i + 1) th frame and the (i + 2) th frame of video data, LPIPS _ Loss represents a perceptual Loss function, weight L The coefficients representing the network parameters of the L-th layer, ☉ represents the bitwise multiplication of the coefficients and the characteristics of the network parameters of the L-th layer,andand feature maps respectively representing that the result output by the convolutional neural network G and the target video frame of the (i + 1) th frame pass through the L-th layer of the deep convolutional neural network VGG, and the size of the feature maps is h multiplied by w.
5. The method of generating a frame interpolation based word stock according to claim 4, wherein the condition for convergence of the convolutional neural network is:
respectively calculating the values of the average absolute error loss function and the perception loss function;
the two calculated values are summed and when the sum of the two values no longer falls, the convolutional neural network converges.
6. The method of generating a frame interpolation based word stock of claim 1, further comprising constructing a convolutional neural network prior to pre-training the convolutional neural network using the set of video frame data, the convolutional neural network consisting of three modules:
the characteristic compression coding module is used for coding an input video frame into a 4-dimensional tensor and then compressing the input video frame to obtain a compressed 4-dimensional tensor;
the characteristic transfer module is realized by adding a self-attention mechanism in the residual error layer and is used for carrying out multiple times of nonlinear transformation on the characteristics extracted by the characteristic compression coding module;
and the characteristic decoding module is used for decoding the characteristics output by the characteristic transferring module into an image space and converting the characteristics into an image to be output.
7. A system for generating a word stock based on frame interpolation, comprising:
the video frame data set generating module is used for acquiring video data and storing the video data into pictures frame by frame to obtain a video frame data set consisting of the pictures;
the pre-training module is used for pre-training the convolutional neural network by using the video frame data set to obtain the pre-trained convolutional neural network;
the fine tuning module is used for fine tuning the network parameters of the pre-trained convolutional neural network by using the multi-character repeated font data set to obtain the fine-tuned convolutional neural network;
and the multi-character repeated character library generating module is used for inputting the large character repeated font and the small character repeated font of the same style into the finely tuned convolutional neural network to obtain the multi-character repeated character library of the same style.
8. The frame interpolation based word stock generation system of claim 7, further comprising a convolutional neural network construction module, the convolutional neural network consisting of three modules:
the characteristic compression coding module is used for coding an input video frame into a 4-dimensional tensor and then compressing the 4-dimensional tensor so as to obtain a compressed 4-dimensional tensor;
the characteristic transfer module is realized by adding a self-attention mechanism in the residual error layer and is used for carrying out multiple times of nonlinear transformation on the characteristics extracted by the characteristic compression coding module;
and the characteristic decoding module is used for decoding the characteristics output by the characteristic transferring module into an image space and converting the characteristics into an image to be output.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the frame interpolation based word stock generation method of any of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the frame interpolation based word stock generation method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211244030.8A CN115661304B (en) | 2022-10-11 | 2022-10-11 | Word stock generation method based on frame interpolation, electronic equipment, storage medium and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211244030.8A CN115661304B (en) | 2022-10-11 | 2022-10-11 | Word stock generation method based on frame interpolation, electronic equipment, storage medium and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115661304A true CN115661304A (en) | 2023-01-31 |
CN115661304B CN115661304B (en) | 2024-05-03 |
Family
ID=84987165
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211244030.8A Active CN115661304B (en) | 2022-10-11 | 2022-10-11 | Word stock generation method based on frame interpolation, electronic equipment, storage medium and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115661304B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9659248B1 (en) * | 2016-01-19 | 2017-05-23 | International Business Machines Corporation | Machine learning and training a computer-implemented neural network to retrieve semantically equivalent questions using hybrid in-memory representations |
US20200104367A1 (en) * | 2018-09-30 | 2020-04-02 | International Business Machines Corporation | Vector Representation Based on Context |
WO2021237743A1 (en) * | 2020-05-29 | 2021-12-02 | 京东方科技集团股份有限公司 | Video frame interpolation method and apparatus, and computer-readable storage medium |
KR20220032538A (en) * | 2021-09-09 | 2022-03-15 | 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 | Training method for character generation model, character generation method, apparatus and device, and medium |
CN114913533A (en) * | 2022-05-11 | 2022-08-16 | 北京百度网讯科技有限公司 | Method and device for changing character weight |
-
2022
- 2022-10-11 CN CN202211244030.8A patent/CN115661304B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9659248B1 (en) * | 2016-01-19 | 2017-05-23 | International Business Machines Corporation | Machine learning and training a computer-implemented neural network to retrieve semantically equivalent questions using hybrid in-memory representations |
US20200104367A1 (en) * | 2018-09-30 | 2020-04-02 | International Business Machines Corporation | Vector Representation Based on Context |
WO2021237743A1 (en) * | 2020-05-29 | 2021-12-02 | 京东方科技集团股份有限公司 | Video frame interpolation method and apparatus, and computer-readable storage medium |
KR20220032538A (en) * | 2021-09-09 | 2022-03-15 | 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 | Training method for character generation model, character generation method, apparatus and device, and medium |
CN114913533A (en) * | 2022-05-11 | 2022-08-16 | 北京百度网讯科技有限公司 | Method and device for changing character weight |
Non-Patent Citations (1)
Title |
---|
WEI CAO等: "Stacked residual recurrent neural network with word weight for text classification", 《COMPUTER SCIENCE》, 1 January 2017 (2017-01-01) * |
Also Published As
Publication number | Publication date |
---|---|
CN115661304B (en) | 2024-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Luo et al. | End-to-end optimization of scene layout | |
CN113014927B (en) | Image compression method and image compression device | |
CN110166757B (en) | Method, system and storage medium for compressing data by computer | |
CN113888744A (en) | Image semantic segmentation method based on Transformer visual upsampling module | |
CN110245710B (en) | Training method of semantic segmentation model, semantic segmentation method and device | |
CN114073071B (en) | Video frame inserting method and device and computer readable storage medium | |
CN112950471A (en) | Video super-resolution processing method and device, super-resolution reconstruction model and medium | |
CN113793286B (en) | Media image watermark removing method based on multi-order attention neural network | |
US20210201448A1 (en) | Image filling method and apparatus, device, and storage medium | |
CN116645668B (en) | Image generation method, device, equipment and storage medium | |
US20210350230A1 (en) | Data dividing method and processor for convolution operation | |
US20220292795A1 (en) | Face image processing method, electronic device, and storage medium | |
CN116071300A (en) | Cell nucleus segmentation method based on context feature fusion and related equipment | |
JP2023501640A (en) | POINT CLOUD PROCESSING METHOD, COMPUTER SYSTEM, PROGRAM AND COMPUTER-READABLE STORAGE MEDIUM | |
CN113409307A (en) | Image denoising method, device and medium based on heterogeneous noise characteristics | |
CN112669431A (en) | Image processing method, apparatus, device, storage medium, and program product | |
CN115661304B (en) | Word stock generation method based on frame interpolation, electronic equipment, storage medium and system | |
US20230135109A1 (en) | Method for processing signal, electronic device, and storage medium | |
CN111488886A (en) | Panorama image significance prediction method and system with attention feature arrangement and terminal | |
US20230196093A1 (en) | Neural network processing | |
CN113436292B (en) | Image processing method, training method, device and equipment of image processing model | |
CN114998668A (en) | Feature extraction method and device, storage medium and electronic equipment | |
CN115797171A (en) | Method and device for generating composite image, electronic device and storage medium | |
CN114399708A (en) | Video motion migration deep learning system and method | |
CN111915701B (en) | Button image generation method and device based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |