CN115661304A - Word stock generation method based on frame interpolation, electronic device, storage medium and system - Google Patents

Word stock generation method based on frame interpolation, electronic device, storage medium and system Download PDF

Info

Publication number
CN115661304A
CN115661304A CN202211244030.8A CN202211244030A CN115661304A CN 115661304 A CN115661304 A CN 115661304A CN 202211244030 A CN202211244030 A CN 202211244030A CN 115661304 A CN115661304 A CN 115661304A
Authority
CN
China
Prior art keywords
neural network
convolutional neural
frame
character
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211244030.8A
Other languages
Chinese (zh)
Other versions
CN115661304B (en
Inventor
岳强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI YICHUANG INFORMATION TECHNOLOGY CO LTD
Beijing Hanyi Innovation Technology Co ltd
Original Assignee
SHANGHAI YICHUANG INFORMATION TECHNOLOGY CO LTD
Beijing Hanyi Innovation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI YICHUANG INFORMATION TECHNOLOGY CO LTD, Beijing Hanyi Innovation Technology Co ltd filed Critical SHANGHAI YICHUANG INFORMATION TECHNOLOGY CO LTD
Priority to CN202211244030.8A priority Critical patent/CN115661304B/en
Publication of CN115661304A publication Critical patent/CN115661304A/en
Application granted granted Critical
Publication of CN115661304B publication Critical patent/CN115661304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Image Processing (AREA)

Abstract

The present disclosure relates to a frame interpolation-based word stock generation method, electronic device, storage medium, and system, which uses video data as pre-training data, models a problem of generating multiple-word repeated fonts as a problem of continuous frame interpolation, and fine-tunes network parameters of a constructed convolutional neural network using an existing multiple-word repeated font data set, thereby generating a plurality of different word weights between a coarsest word weight and a finest word weight. The method can greatly shorten the time for manufacturing other characters, not only improves the generation effect, but also enables the effect of generating other characters to be more attractive and reasonable than the character patterns manufactured by a point method, has more obvious style characteristics and is more convenient to read.

Description

Word stock generation method based on frame interpolation, electronic device, storage medium and system
Technical Field
The present disclosure relates to the field of word stocks, and in particular, to a word stock generation method, an electronic device, a storage medium, and a system based on frame interpolation.
Background
The characters are visible everywhere in our life, and are one of the main tools for information transmission, and in order to ensure the effectiveness of information transmission, the characters are presented in different ways and forms on different occasions, different situations and different devices. The multi-character repeated character library is a concept which is presented to meet different display effects, and is a series of characters with the same style and different character repetition. The word weight is the stroke weight of the font, and the international standard ISO defines 9 classes of the word weight, namely, from W1 to W9, the word weight is super-fine, slightly fine, medium, slightly coarse, super-coarse and super-coarse. The variety of the weight of the character can increase the applicability of the font, and one type of font may appear on the small-sized displays of the title, the body, the poster and the embedded device, and if the same weight of the character is used, the expected rendering effect is difficult to achieve. Therefore, making fonts with different font weights is very important to the applicability of the fonts.
The most basic method for making multiple character repeat word stock is that the designer designs a set of word stock first, then adjusts the word repeat by editing the control point of each character, so as to achieve the purpose of modifying the word repeat without reducing the style characteristics of the font and the aesthetic feeling of the face. However, the workload of this method is related to the size of the character set of the word stock and how many characters need to be made, and it usually takes at least several months to make a set of nine-style characters.
In order to reduce the workload of designers and accelerate the manufacture of a multi-character repeated word stock, a common method is a point-to-point method, namely, the designers are enabled to design word stocks with the thickest and thinnest characters, then engineers correspond to the same control points of the same characters in two sets of word stocks, and then the control points are sequentially moved on the positions between the corresponding control points, so that different character repeated effects are realized. However, although the amount of work of a designer is greatly reduced in the case of the peer-to-peer method, the amount of work is reduced from 9 different weights to only 2 weights, but there is a great limitation in the peer-to-peer method. Because the composition structure of each character of the Chinese character and the space gap of components are different, if the position of a control point is mechanically moved according to the proportion, the whole font structure looks uncoordinated, the style of the font is changed, and the universality is not realized, and parameters need to be debugged according to each font. Therefore, the point method is only suitable for a few fonts with simple styles, the application range is small, and the generated word stock is not beautiful enough.
Disclosure of Invention
The present disclosure provides a word stock generation method, an electronic device, a storage medium, and a system based on frame interpolation to at least solve at least one technical problem in the background art described above.
In a preferred embodiment of the present disclosure, an embodiment of the present application provides a word stock generation method based on frame interpolation, where the method includes:
acquiring video data and storing the video data frame by frame into pictures to obtain a video frame data set consisting of the pictures;
pre-training a convolutional neural network by using the video frame data set to obtain a pre-trained convolutional neural network;
fine-tuning the network parameters of the pre-trained convolutional neural network by using the multi-font data set to obtain a fine-tuned convolutional neural network;
and inputting the large repeated character body and the small repeated character body of the same style into the finely tuned convolutional neural network to obtain the multi-character repeated character library of the same style.
Further, the pre-training the convolutional neural network by using the video frame data set to obtain the pre-trained convolutional neural network comprises the following steps:
splicing the pictures of the ith frame and the (i + 2) th frame of the stored video data in the channel dimension to be used as the input of a convolutional neural network;
taking the predicted video frame picture of the (i + 1) th frame as the output of the convolutional neural network;
updating network parameters of the convolutional neural network according to the loss function;
and stopping the pre-training when the pre-trained convolutional neural network is converged to obtain the pre-trained convolutional neural network.
Further, the fine tuning of the network parameters of the pre-trained convolutional neural network using the multi-word repeated font data set to obtain the fine-tuned convolutional neural network includes the following steps:
acquiring a multiple font data set;
taking two character style data separated by one character weight in the multi-character weight character style data set as the input of a pre-trained convolutional neural network to generate character style data with the middle character weight;
updating the network parameters of the convolutional neural network after pre-training according to the loss function;
and stopping fine tuning when the fine tuned convolutional neural network is converged to obtain the fine tuned convolutional neural network.
Further, the loss function uses an average absolute error loss function and a perceptual loss function at the same time, and the calculation formula of the average absolute error loss function and the perceptual loss function is as follows:
L1_Loss=||G(F i ,F i+2 )-F i+1 || 1
Figure BDA0003884869800000031
wherein L1_ Loss represents the average absolute error Loss function, G represents the constructed convolutional neural network G, F i 、F i+1 And F i+2 Video frames respectively representing the ith frame, the (i + 1) th frame and the (i + 2) th frame of video data, LPIPS _ Loss represents a perceptual Loss function, weight L The coefficients representing the network parameters of the L-th layer, ☉ represents the bitwise multiplication of the coefficients and the characteristics of the network parameters of the L-th layer,
Figure BDA0003884869800000032
and
Figure BDA0003884869800000033
and feature maps respectively representing that the result output by the convolutional neural network G and the target video frame of the (i + 1) th frame pass through the L-th layer of the deep convolutional neural network VGG, and the size of the feature maps is h multiplied by w.
Further, the conditions for the convolutional neural network convergence are: respectively calculating the values of an average absolute error loss function and a perception loss function; and summing the two calculated values, and converging the convolutional neural network when the sum of the two values does not decrease.
Further, before pre-training the convolutional neural network using the set of video frame data, constructing a convolutional neural network, the convolutional neural network comprising three modules:
the characteristic compression coding module is used for coding an input video frame into a 4-dimensional tensor and then compressing the input video frame to obtain a compressed 4-dimensional tensor; the characteristic compression coding module consists of a convolution network layer, a characteristic normalization layer, a characteristic activation layer and a down-sampling network layer, wherein the convolution network layer is used for carrying out linear transformation on a video frame to obtain the characteristics of the convolution layer; the characteristic normalization layer is used for carrying out numerical value normalization operation on the characteristics obtained by the convolution layer, so that the range of the characteristics is kept between [ -1,1 ]; the characteristic activation layer is used for carrying out nonlinear transformation on the normalized characteristic to obtain a nonlinear characteristic; the down-sampling network layer is used for performing down-sampling operation on the obtained nonlinear features in the 2 nd and 3 rd dimensions to obtain the nonlinear features with smaller sizes;
the characteristic transfer module is used for carrying out nonlinear transformation on the characteristics extracted by the characteristic compression coding module for multiple times, so that the characteristics after the nonlinear transformation have more characterization capability, and the characteristic transfer module is realized by adding a self-attention mechanism in a residual error layer; the residual error layer is used for accelerating the network convergence speed and stabilizing the output of the network; the self-attention mechanism is used for extracting local features, so that network parameters can be updated conveniently;
the characteristic decoding module is used for decoding the characteristics after nonlinear transformation into an image space and transforming the characteristics into an image to be output; the feature decoding module consists of a two-dimensional reverse convolution network layer, a feature instance normalization layer, a feature activation layer and an up-sampling network layer, wherein the reverse convolution network layer is used for decoding features, and the feature instance normalization layer is used for normalizing feature values to ensure the stability of training; the characteristic activation layer is used for carrying out nonlinear transformation on the normalized characteristic to obtain a nonlinear characteristic; and the up-sampling network layer is used for amplifying the size of the features and complementing the spatial information.
In a preferred embodiment of the present disclosure, an embodiment of the present application further provides a system for generating a word stock based on frame interpolation, including:
the video frame data set generation module is used for acquiring video data and storing the video data into pictures frame by frame to obtain a video frame data set consisting of the pictures;
the pre-training module is used for pre-training the convolutional neural network by using the video frame data set to obtain the pre-trained convolutional neural network;
the fine tuning module is used for fine tuning the network parameters of the pre-trained convolutional neural network by using the multi-character repeated font data set to obtain the fine-tuned convolutional neural network;
and the multi-character repeated character library generating module is used for inputting the large character repeated font and the small character repeated font of the same style into the finely tuned convolutional neural network to obtain the multi-character repeated character library of the same style.
Further, the word stock generation system based on frame interpolation further comprises a convolutional neural network construction module, wherein the convolutional neural network is composed of three modules:
the characteristic compression coding module is used for coding an input video frame into a 4-dimensional tensor and then compressing the 4-dimensional tensor so as to obtain a compressed 4-dimensional tensor; the characteristic compression coding module consists of a convolution network layer, a characteristic normalization layer, a characteristic activation layer and a down-sampling network layer, wherein the convolution network layer is used for carrying out linear transformation on a video frame to obtain the characteristics of the convolution layer; the characteristic normalization layer is used for carrying out numerical value normalization operation on the characteristics obtained by the convolution layer, so that the range of the characteristics is kept between [ -1,1 ]; the characteristic activation layer is used for carrying out nonlinear transformation on the normalized characteristic to obtain a nonlinear characteristic; the down-sampling network layer is used for performing down-sampling operation on the obtained nonlinear features in the 2 nd and 3 rd dimensions to obtain the nonlinear features with smaller sizes;
the characteristic transfer module is used for carrying out nonlinear transformation on the characteristics extracted by the characteristic compression coding module for multiple times, so that the characteristics after the nonlinear transformation have more characterization capability, and the characteristic transfer module is realized by adding a self-attention mechanism in a residual error layer; the residual error layer is used for accelerating the network convergence speed and stabilizing the output of the network; the self-attention mechanism is used for extracting local features, so that network parameters can be updated conveniently;
the characteristic decoding module is used for decoding the characteristics after nonlinear transformation into an image space and transforming the characteristics into an image to be output; the feature decoding module consists of a two-dimensional reverse convolution network layer, a feature instance normalization layer, a feature activation layer and an up-sampling network layer, wherein the reverse convolution network layer is used for decoding features, and the feature instance normalization layer is used for normalizing feature values to ensure the stability of training; the characteristic activation layer is used for carrying out nonlinear transformation on the normalized characteristic to obtain a nonlinear characteristic; and the up-sampling network layer is used for amplifying the size of the features and complementing the spatial information.
In a preferred embodiment of the present disclosure, an electronic device is further provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the above-mentioned word stock generation method based on frame interpolation.
In a preferred embodiment of the present disclosure, a computer-readable storage medium is further provided, on which a computer program is stored, and the program, when executed by a processor, implements the steps of the above-mentioned word stock generation method based on frame interpolation.
The beneficial effects of this disclosure are: the present disclosure uses video data as pre-training data, models the problem of multi-weight font generation as a continuous frame interpolation problem, and uses existing multi-weight font data sets to fine tune the network parameters of the constructed convolutional neural network, thereby generating a variety of different weights between the coarsest and finest weights. The method greatly shortens the time for manufacturing other characters, not only improves the generation effect, but also has more attractive and reasonable effect of generating other characters than the characters manufactured by a point method, has more obvious style characteristics and is more convenient to read.
Drawings
FIG. 1 is a flow chart of a method for generating a word stock based on frame interpolation;
FIG. 2 is a block diagram of a convolutional neural network;
FIG. 3 is a block diagram of a word stock generation system based on frame interpolation;
FIG. 4 is a diagram of the effect of multi-character regeneration of font characters between 45-65.
Detailed Description
The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
Example 1
Referring to fig. 1, according to the word stock generation method based on frame interpolation provided in the exemplary embodiment of the present disclosure, for the technical problem mentioned in the background art, video data is used as pre-training data, the generation problem of multi-word repeated font is modeled as a continuous frame interpolation problem, and the existing multi-word repeated font data set is used to fine tune the network parameters of the constructed convolutional neural network, so as to adapt to the generation of font domain data and improve the generation effect.
The implementation process of the exemplary frame interpolation-based word stock generation method comprises the following steps:
collecting video data and storing the video data frame by frame in picture format and named with a frame number, F i 、F i+1 And F i+2 Video frames representing the ith frame, the (i + 1) th frame, and the (i + 2) th frame of video data, respectively.
Constructing a convolutional neural network, wherein the convolutional neural network mainly comprises three modules: the device comprises a feature compression coding module, a feature transfer module and a feature decoding module. The module structure of the convolutional neural network is shown in fig. 2, in which:
the characteristic compression coding module is used for compressing an input video frame after the video frame is coded into a 4-dimensional tensor so as to obtain a compressed 4-dimensional tensor, the characteristic compression coding module uses a network structure of { Conv-BN-Relu } xN-Downample, wherein N represents the occurrence times of the { Conv-BN-Relu } module and the Downample, N = [4,8] in the scheme, 2 characteristic compression coding sub-modules Enc are adopted to form the characteristic compression coding module, wherein Conv refers to a convolutional network layer and is used for carrying out linear transformation on the video frame to obtain the characteristics of the convolutional layer; BN refers to a characteristic normalization layer, which is used for carrying out numerical value normalization operation on the characteristics obtained by the convolution layer, so that the range of the characteristics is kept between [ -1,1 ]; relu is a feature activation layer, which is used for carrying out nonlinear transformation on the normalized features to obtain nonlinear features; down sample refers to a down-sampling network layer, which is used to perform down-sampling operation on the obtained nonlinear features in the 2 nd and 3 rd dimensions to obtain smaller-sized nonlinear features, such as: the nonlinear characteristic of (1,1,4,4) dimension can be obtained through a down-sampling layer, and the nonlinear characteristic of (1,1,2,2) dimension can be obtained;
the characteristic transfer module is used for carrying out multiple nonlinear transformation on the characteristics extracted by the characteristic compression coding module, so that the characteristics after the nonlinear transformation have more characterization capability; the characteristic transfer module uses residual structure link, the main flow part uses a self-attention structure, the residual layer uses 1x1 convolution module link, in the scheme, 4 characteristic transfer sub-modules res are used for forming the characteristic transfer module, wherein the residual layer is used for accelerating the network convergence speed and stabilizing the network output; the self-attention mechanism is used for extracting local features, so that network parameters can be updated conveniently;
the characteristic decoding module is used for decoding the characteristics after the nonlinear transformation into an image space and transforming the characteristics into an image for output; the characteristic decoding module is composed of { Conv2 dTransspan-IN-Relu } xM-Upsample, wherein M represents the occurrence frequency of the { Conv2 dTransspan-IN-Relu } module and the Upsample, M = [8,4] IN the scheme, and 2 characteristic decoding submodules Dec are adopted to form the characteristic compression coding module, wherein Conv2 dTransspan refers to a two-dimensional reverse convolution network layer and is used for decoding the characteristics; IN refers to a characteristic instance normalization layer which is used for normalization of characteristic values and ensures the stability of training; relu is a feature activation layer, which is used for carrying out nonlinear transformation on the normalized features to obtain nonlinear features; upsample refers to an upsampling network layer, which is used to enlarge the size of a feature and complete spatial information, such as a feature with one (1,1,2,2) dimension, and after passing through the upsampling layer, the upsampling network layer becomes (1,1,4,4).
And splicing the ith frame and the (i + 2) th frame together in the dimension of the channel to be used as the input of the neural network.
The function of the convolutional neural network is to predict the content of the (i + 1) th frame, and the loss is calculated from the output f _ int of the convolutional neural network and the (i + 1) th frame to update the network parameters of the convolutional neural network.
Calculating the values of the average absolute error loss function L1_ loss and the perception loss function LPIPS _ loss, stopping the pre-training when the sum of the two values does not decrease any more, and calculating formulas of L1_ loss and LPIPS _ loss are as follows:
L1_Loss=||G(F i ,F i+2 )-F i+1 || 1
Figure BDA0003884869800000081
wherein L1_ Loss represents the average absolute error Loss function, G represents the constructed convolutional neural network G, F i 、F i+1 And F i+2 Video frames respectively representing the ith frame, the (i + 1) th frame and the (i + 2) th frame of video data, LPIPS _ Loss represents a perceptual Loss function, weight L The coefficients representing the network parameters of the L-th layer, ☉ represents the bitwise multiplication of the coefficients and the characteristics of the network parameters of the L-th layer,
Figure BDA0003884869800000091
and
Figure BDA0003884869800000092
respectively representing the result output by the convolutional neural network G and the characteristic diagram of the L-th layer of the deep convolutional neural network VGG of the target video frame of the (i + 1) th frameThe dimension of the figure is h × w, wherein the deep convolutional neural network VGG is a classic convolutional neural network developed by the scientific engineering department of oxford university, and is a convolutional neural network which is trained and completed.
After the convolutional neural network is pre-trained by using video data, fine adjustment of network parameters is carried out on the pre-trained convolutional neural network by using a multi-word repeated font data set, and the purpose of consistent data distribution is achieved. The multi-character repeated font data set mainly adopts a multi-character repeated font designed by a designer, such as Chinese character black and flag, and the total number of characters is 15.
Two font data separated by 1 weight are used as input to generate the font data with the middle weight. The two loss functions used are the same as in the pre-training phase.
And calculating the values of the average absolute error loss function L1_ loss and the perception loss function LPIPS _ loss, so that the sum of the two values is not reduced any more, the pretrained convolutional neural network is converged, and the fine tuning is stopped to obtain the fine-tuned convolutional neural network.
Inputting the large repeated character style and the small repeated character style of the same style into the finely tuned convolutional neural network to obtain the multiple repeated character library of the same style, wherein the effect graph of the generated multiple repeated character style is shown in fig. 4, and the character sizes from left to right are 45, 47, 49,.
Example 2
As shown in fig. 3, an exemplary word stock generation system based on frame interpolation includes:
a video frame data set generation module for collecting video data, storing the video data frame by frame as picture, and using frame number F i 、F i+1 And F i+2 Name wherein F i 、F i+1 And F i+2 Video frames representing the ith frame, the (i + 1) th frame, and the (i + 2) th frame of video data, respectively.
And the pre-training module is used for splicing the ith frame and the (i + 2) th frame together on the dimension of the channel to be used as the input of the neural network. The function of the convolutional neural network is to predict the content of the (i + 1) th frame, and the loss is calculated by the output f _ int of the convolutional neural network and the (i + 1) th frame to update the network parameters of the convolutional neural network, specifically: calculating the values of the average absolute error loss function L1_ loss and the perception loss function LPIPS _ loss, converging the convolutional neural network when the sum of the two values does not decrease, stopping pre-training to obtain the pre-trained convolutional neural network, wherein the calculation formulas of L1_ loss and LPIPS _ loss are as follows:
L1_Loss=||G(F i ,F i+2 )-F i+1 || 1
Figure BDA0003884869800000101
wherein L1_ Loss represents the average absolute error Loss function, G represents the constructed convolutional neural network G, F i 、F i+1 And F i+2 Video frames respectively representing the ith frame, the (i + 1) th frame and the (i + 2) th frame of video data, LPIPS _ Loss represents a perceptual Loss function, weight L The coefficients representing the network parameters of the L-th layer, ☉ represents the bitwise multiplication of the coefficients and the characteristics of the network parameters of the L-th layer,
Figure BDA0003884869800000102
and
Figure BDA0003884869800000103
and the result output by the convolutional neural network G and the target video frame of the (i + 1) th frame respectively pass through a feature map of the L-th layer of a deep convolutional neural network VGG, and the size of the feature map is h multiplied by w, wherein the deep convolutional neural network VGG is a classic convolutional neural network developed by the scientific engineering department of Oxford university and is a convolutional neural network which is trained.
The fine tuning module is used for carrying out fine tuning of network parameters on the pre-trained convolutional neural network by using the multi-character repeated font data set so as to achieve the purpose of consistent data distribution, and specifically comprises the following steps: and taking two character style data separated by 1 character weight in the multi-character weight character style data set as input, generating character style data with middle character weight, calculating the values of an average absolute error loss function L1_ loss and a perception loss function LPIPS _ loss, converging the pretrained convolution neural network when the sum of the two values does not decrease any more, and stopping fine tuning to obtain the fine-tuned convolution neural network. The multi-character repeated font data set mainly adopts a multi-character repeated font designed by a designer, such as Chinese character black and flag, and the total number of characters is 15. The two loss functions used by the fine tuning module are the same as in the pre-training module described above.
And the multi-character repeated character library generating module is used for inputting the large character repeated font and the small character repeated font of the same style into the finely tuned convolutional neural network to obtain the multi-character repeated character library of the same style.
Further, the word stock generation system based on frame interpolation further comprises a convolutional neural network, wherein the convolutional neural network mainly comprises three modules: the device comprises a feature compression coding module, a feature transfer module and a feature decoding module. The modular structure of the convolutional neural network is shown in fig. 2, in which:
the characteristic compression coding module is used for coding an input video frame into a 4-dimensional tensor and then compressing the input video frame to obtain a compressed 4-dimensional tensor, wherein the characteristic compression coding module uses a network structure of { Conv-BN-Relu } xN-Downample, N represents the number of times of occurrence of the { Conv-BN-Relu } module and the Downample, N = [4,8] in the scheme, and 2 characteristic compression coding sub-modules Enc are adopted to form the characteristic compression coding module, wherein Conv refers to a convolutional network layer and is used for carrying out linear transformation on the video frame to obtain the characteristics of a convolutional layer; BN refers to a feature normalization layer, and is used for carrying out numerical value normalization operation on the features obtained by the convolution layer, so that the range of the features is kept between [ -1,1 ]; relu is a feature activation layer, which is used for carrying out nonlinear transformation on the normalized features to obtain nonlinear features; down sample refers to a down-sampling network layer, which is used to perform down-sampling operation on the obtained nonlinear features in the 2 nd and 3 rd dimensions to obtain smaller-sized nonlinear features, such as: the nonlinear characteristic of (1,1,4,4) dimension can be obtained through a down-sampling layer, and the nonlinear characteristic of (1,1,2,2) dimension can be obtained;
the characteristic transfer module is used for carrying out multiple nonlinear transformation on the characteristics extracted by the characteristic compression coding module, so that the characteristics after the nonlinear transformation have more characterization capability; the characteristic transfer module uses residual structure link, the main process part uses a self-attention structure, the residual layer uses 1x1 convolution module link, in the scheme, 4 characteristic transfer sub-modules res are used to form the characteristic transfer module, wherein the residual layer is used for accelerating the network convergence speed and stabilizing the network output; the self-attention mechanism is used for extracting local features, so that network parameters can be updated conveniently;
the characteristic decoding module is used for decoding the characteristics after the nonlinear transformation into an image space and transforming the characteristics into an image for output; the characteristic decoding module is composed of { Conv2 dTransspan-IN-Relu } xM-Upsample, wherein M represents the occurrence frequency of the { Conv2 dTransspan-IN-Relu } module and the Upsample, M = [8,4] IN the scheme, and 2 characteristic decoding submodules Dec are adopted to form the characteristic compression coding module, wherein Conv2 dTransspan refers to a two-dimensional reverse convolution network layer and is used for decoding the characteristics; IN refers to a characteristic instance normalization layer which is used for normalizing characteristic values and ensuring the stability of training; relu is a feature activation layer, which is used for carrying out nonlinear transformation on the normalized features to obtain nonlinear features; the Upsample refers to an upsampling network layer, and is used for enlarging the size of a feature and completing spatial information, for example, a feature with one (1,1,2,2) dimension, which becomes (1,1,4,4) after passing through the upsampling layer.
Example 3
An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the frame interpolation based word stock generation method of embodiment 1 when executing the computer program.
Embodiment 3 of the present disclosure is merely an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present disclosure.
The electronic device may be embodied in the form of a general purpose computing device, which may be, for example, a server device. Components of the electronic device may include, but are not limited to: at least one processor, at least one memory, and a bus connecting different system components (including the memory and the processor).
The buses include a data bus, an address bus, and a control bus.
The memory may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may further include read-only memory (ROM).
The memory may also include program means having a set of (at least one) program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor executes various functional applications and data processing by executing computer programs stored in the memory.
The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface. Also, the electronic device may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through a network adapter. The network adapter communicates with other modules of the electronic device over the bus. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, to name a few.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 4
A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the frame interpolation-based word stock generation method of embodiment 1.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation, the present disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps of implementing the frame interpolation based word stock generation method described in embodiment 1, when the program product is run on the terminal device.
Where program code for carrying out the disclosure is written in any combination of one or more programming languages, the program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
Although embodiments of the present disclosure have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations may be made in these embodiments without departing from the principles and spirit of the disclosure, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A method for generating a word stock based on frame interpolation is characterized by comprising the following steps:
acquiring video data and storing the video data frame by frame into pictures to obtain a video frame data set consisting of the pictures;
pre-training a convolutional neural network by using the video frame data set to obtain a pre-trained convolutional neural network;
fine-tuning network parameters of the pre-trained convolutional neural network by using the multi-word repeated font data set to obtain a fine-tuned convolutional neural network;
and inputting the large repeated character body and the small repeated character body of the same style into the finely tuned convolutional neural network to obtain the multi-character repeated character library of the same style.
2. The method for generating a word stock based on frame interpolation according to claim 1, wherein the pre-training of the convolutional neural network using the video frame data set to obtain a pre-trained convolutional neural network comprises the following steps:
splicing the pictures of the ith frame and the (i + 2) th frame of the stored video data in the channel dimension to be used as the input of a convolutional neural network;
taking the predicted video frame picture of the (i + 1) th frame as the output of the convolutional neural network;
updating network parameters of the convolutional neural network according to the loss function;
and stopping the pre-training when the pre-trained convolutional neural network is converged to obtain the pre-trained convolutional neural network.
3. The method of generating a word stock based on frame interpolation of claim 1, wherein the fine tuning of the network parameters of the pre-trained convolutional neural network using the multigram data set to obtain the fine tuned convolutional neural network comprises the steps of:
acquiring a multi-font data set;
taking two character style data separated by one character weight in the multi-character weight character style data set as the input of a pre-trained convolutional neural network to generate character style data with the middle character weight;
updating the network parameters of the pre-trained convolutional neural network according to the loss function;
and stopping fine tuning when the fine tuned convolutional neural network is converged to obtain the fine tuned convolutional neural network.
4. The frame interpolation based word stock generating method according to claim 2 or 3,
the loss function simultaneously uses an average absolute error loss function and a perception loss function, and the calculation formulas of the average absolute error loss function and the perception loss function are as follows:
L1_Loss=||G(F i ,F i + 2 )-F i+1 || 1
Figure FDA0003884869790000021
wherein L1_ Loss represents the average absolute error Loss function, G represents the constructed convolutional neural network G, F i 、F i+1 And F i+2 Video frames respectively representing the ith frame, the (i + 1) th frame and the (i + 2) th frame of video data, LPIPS _ Loss represents a perceptual Loss function, weight L The coefficients representing the network parameters of the L-th layer, ☉ represents the bitwise multiplication of the coefficients and the characteristics of the network parameters of the L-th layer,
Figure FDA0003884869790000022
and
Figure FDA0003884869790000023
and feature maps respectively representing that the result output by the convolutional neural network G and the target video frame of the (i + 1) th frame pass through the L-th layer of the deep convolutional neural network VGG, and the size of the feature maps is h multiplied by w.
5. The method of generating a frame interpolation based word stock according to claim 4, wherein the condition for convergence of the convolutional neural network is:
respectively calculating the values of the average absolute error loss function and the perception loss function;
the two calculated values are summed and when the sum of the two values no longer falls, the convolutional neural network converges.
6. The method of generating a frame interpolation based word stock of claim 1, further comprising constructing a convolutional neural network prior to pre-training the convolutional neural network using the set of video frame data, the convolutional neural network consisting of three modules:
the characteristic compression coding module is used for coding an input video frame into a 4-dimensional tensor and then compressing the input video frame to obtain a compressed 4-dimensional tensor;
the characteristic transfer module is realized by adding a self-attention mechanism in the residual error layer and is used for carrying out multiple times of nonlinear transformation on the characteristics extracted by the characteristic compression coding module;
and the characteristic decoding module is used for decoding the characteristics output by the characteristic transferring module into an image space and converting the characteristics into an image to be output.
7. A system for generating a word stock based on frame interpolation, comprising:
the video frame data set generating module is used for acquiring video data and storing the video data into pictures frame by frame to obtain a video frame data set consisting of the pictures;
the pre-training module is used for pre-training the convolutional neural network by using the video frame data set to obtain the pre-trained convolutional neural network;
the fine tuning module is used for fine tuning the network parameters of the pre-trained convolutional neural network by using the multi-character repeated font data set to obtain the fine-tuned convolutional neural network;
and the multi-character repeated character library generating module is used for inputting the large character repeated font and the small character repeated font of the same style into the finely tuned convolutional neural network to obtain the multi-character repeated character library of the same style.
8. The frame interpolation based word stock generation system of claim 7, further comprising a convolutional neural network construction module, the convolutional neural network consisting of three modules:
the characteristic compression coding module is used for coding an input video frame into a 4-dimensional tensor and then compressing the 4-dimensional tensor so as to obtain a compressed 4-dimensional tensor;
the characteristic transfer module is realized by adding a self-attention mechanism in the residual error layer and is used for carrying out multiple times of nonlinear transformation on the characteristics extracted by the characteristic compression coding module;
and the characteristic decoding module is used for decoding the characteristics output by the characteristic transferring module into an image space and converting the characteristics into an image to be output.
9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the frame interpolation based word stock generation method of any of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the frame interpolation based word stock generation method of any one of claims 1 to 6.
CN202211244030.8A 2022-10-11 2022-10-11 Word stock generation method based on frame interpolation, electronic equipment, storage medium and system Active CN115661304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211244030.8A CN115661304B (en) 2022-10-11 2022-10-11 Word stock generation method based on frame interpolation, electronic equipment, storage medium and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211244030.8A CN115661304B (en) 2022-10-11 2022-10-11 Word stock generation method based on frame interpolation, electronic equipment, storage medium and system

Publications (2)

Publication Number Publication Date
CN115661304A true CN115661304A (en) 2023-01-31
CN115661304B CN115661304B (en) 2024-05-03

Family

ID=84987165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211244030.8A Active CN115661304B (en) 2022-10-11 2022-10-11 Word stock generation method based on frame interpolation, electronic equipment, storage medium and system

Country Status (1)

Country Link
CN (1) CN115661304B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9659248B1 (en) * 2016-01-19 2017-05-23 International Business Machines Corporation Machine learning and training a computer-implemented neural network to retrieve semantically equivalent questions using hybrid in-memory representations
US20200104367A1 (en) * 2018-09-30 2020-04-02 International Business Machines Corporation Vector Representation Based on Context
WO2021237743A1 (en) * 2020-05-29 2021-12-02 京东方科技集团股份有限公司 Video frame interpolation method and apparatus, and computer-readable storage medium
KR20220032538A (en) * 2021-09-09 2022-03-15 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 Training method for character generation model, character generation method, apparatus and device, and medium
CN114913533A (en) * 2022-05-11 2022-08-16 北京百度网讯科技有限公司 Method and device for changing character weight

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9659248B1 (en) * 2016-01-19 2017-05-23 International Business Machines Corporation Machine learning and training a computer-implemented neural network to retrieve semantically equivalent questions using hybrid in-memory representations
US20200104367A1 (en) * 2018-09-30 2020-04-02 International Business Machines Corporation Vector Representation Based on Context
WO2021237743A1 (en) * 2020-05-29 2021-12-02 京东方科技集团股份有限公司 Video frame interpolation method and apparatus, and computer-readable storage medium
KR20220032538A (en) * 2021-09-09 2022-03-15 베이징 바이두 넷컴 사이언스 테크놀로지 컴퍼니 리미티드 Training method for character generation model, character generation method, apparatus and device, and medium
CN114913533A (en) * 2022-05-11 2022-08-16 北京百度网讯科技有限公司 Method and device for changing character weight

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEI CAO等: "Stacked residual recurrent neural network with word weight for text classification", 《COMPUTER SCIENCE》, 1 January 2017 (2017-01-01) *

Also Published As

Publication number Publication date
CN115661304B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
Luo et al. End-to-end optimization of scene layout
CN113014927B (en) Image compression method and image compression device
CN110166757B (en) Method, system and storage medium for compressing data by computer
CN113888744A (en) Image semantic segmentation method based on Transformer visual upsampling module
CN110245710B (en) Training method of semantic segmentation model, semantic segmentation method and device
CN114073071B (en) Video frame inserting method and device and computer readable storage medium
CN112950471A (en) Video super-resolution processing method and device, super-resolution reconstruction model and medium
CN113793286B (en) Media image watermark removing method based on multi-order attention neural network
US20210201448A1 (en) Image filling method and apparatus, device, and storage medium
CN116645668B (en) Image generation method, device, equipment and storage medium
US20210350230A1 (en) Data dividing method and processor for convolution operation
US20220292795A1 (en) Face image processing method, electronic device, and storage medium
CN116071300A (en) Cell nucleus segmentation method based on context feature fusion and related equipment
JP2023501640A (en) POINT CLOUD PROCESSING METHOD, COMPUTER SYSTEM, PROGRAM AND COMPUTER-READABLE STORAGE MEDIUM
CN113409307A (en) Image denoising method, device and medium based on heterogeneous noise characteristics
CN112669431A (en) Image processing method, apparatus, device, storage medium, and program product
CN115661304B (en) Word stock generation method based on frame interpolation, electronic equipment, storage medium and system
US20230135109A1 (en) Method for processing signal, electronic device, and storage medium
CN111488886A (en) Panorama image significance prediction method and system with attention feature arrangement and terminal
US20230196093A1 (en) Neural network processing
CN113436292B (en) Image processing method, training method, device and equipment of image processing model
CN114998668A (en) Feature extraction method and device, storage medium and electronic equipment
CN115797171A (en) Method and device for generating composite image, electronic device and storage medium
CN114399708A (en) Video motion migration deep learning system and method
CN111915701B (en) Button image generation method and device based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant