CN117474817B - Method for content unification of composite continuous images - Google Patents

Method for content unification of composite continuous images Download PDF

Info

Publication number
CN117474817B
CN117474817B CN202311800961.6A CN202311800961A CN117474817B CN 117474817 B CN117474817 B CN 117474817B CN 202311800961 A CN202311800961 A CN 202311800961A CN 117474817 B CN117474817 B CN 117474817B
Authority
CN
China
Prior art keywords
frame
cluster
image
video
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311800961.6A
Other languages
Chinese (zh)
Other versions
CN117474817A (en
Inventor
翟晓东
汝乐
夏哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Austin Photoelectric Technology Co ltd
Original Assignee
Jiangsu Austin Photoelectric Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Austin Photoelectric Technology Co ltd filed Critical Jiangsu Austin Photoelectric Technology Co ltd
Priority to CN202311800961.6A priority Critical patent/CN117474817B/en
Publication of CN117474817A publication Critical patent/CN117474817A/en
Application granted granted Critical
Publication of CN117474817B publication Critical patent/CN117474817B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for content unification of synthetic continuous images, which comprises the following steps: step 1, extracting variance, mean value and color histogram features from each image frame in a synthetic video; and 2, taking the variance, the mean value and the color histogram characteristics as input data of a K-means algorithm, classifying images with similar characteristics into the same category, namely the same cluster, obtaining an optimal characteristic cluster, taking one image in the optimal characteristic cluster as a sample image, and adjusting contrast, brightness and color histograms of other images in the synthesized video according to the sample image. Thereby ensuring the consistency of the video on the content and achieving the optimal effect.

Description

Method for content unification of composite continuous images
Technical Field
The invention belongs to the field of video synthesis, in particular to a method for carrying out uniformity on a time line on a picture synthesized by one frame, which is caused by discontinuous texture and motion changes.
Technical Field
The existing image synthesis field is mature, and can realize a strong virtual effect, so that people cannot distinguish true from false. In many cases, however, these composite images are each semantically generated, so that there is a lack of correlation between each frame. Even with template video schemes, there is still a "jerkiness" after synthesis. I.e. between each frame, the perceived change is abrupt due to the discontinuous texture and motion changes.
In the existing schemes, there are mainly two solutions, namely a video-to-video translation scheme and a post-processing technology for converting video per frame.
The first video-to-video translation scheme adds a loss of time consistency in the design and training of the network to improve time correlation. However, this approach has two drawbacks. First, it requires knowledge of the correlation to redesign the algorithm to train the depth model, while requiring the video dataset to train. However, video datasets are quite scarce, especially with surveillance algorithms. Second, these methods are slow due to the need to calculate the test time stream.
The second scheme is to perform post-processing on each frame of transformed video by using the video after image enhancement, so that the video is consecutive in time. The post-processing technique does not require retraining of the image enhancement algorithm, and applies any image enhancement algorithm to the original video to achieve temporal consistency.
However, the second scheme is based on the premise that the 1 st frame is completely reliable, and then gradually deduces from the 1 st frame to the 2 nd and 3 rd frames until the last frame. The disadvantage of this scheme is quite obvious, which requires a very satisfactory frame 1 to be selected manually or by the machine. If frame 1 is defective, all frames are affected.
Disclosure of Invention
The invention provides a method for content unification of synthetic video based on the prior artificial intelligence technology and aiming at the prior problems of the post-processing technical scheme of video transformation of each frame, which comprises the following steps:
step 1, extracting variance, mean value and color histogram features from each image frame in a synthetic video;
and 2, taking the variance, the mean value and the color histogram characteristics as input data of a K-means algorithm, classifying images with similar characteristics into the same category, namely the same cluster, obtaining an optimal characteristic cluster, taking one image in the optimal characteristic cluster as a sample image, and adjusting contrast, brightness and color histograms of other images in the synthesized video according to the sample image.
Further, in the step 2, one image in the optimal feature cluster is taken as a sample image, which specifically includes: in the optimal feature cluster, the cluster center of the optimal feature cluster is calculated, and for each sample point (namely an image) in the optimal feature cluster, the distance between each sample point in the cluster and the cluster center is calculated by using Euclidean distance, and the sample point closest to the cluster center is found and used as a sample image.
Further, the method for content unification of the synthetic video of the invention further comprises a step 3 of performing time unification processing on the synthetic video by adopting an image conversion network to obtain a processed video image frame at the time tThe method comprises the steps of carrying out a first treatment on the surface of the Specifically, let the first frameThe current processing frame in the synthesized videoOriginal video frameOriginal video frameOutput frame of last timeInputting the video frames into an image conversion network, performing time unification processing, and outputting the video frames at the moment tThe method comprises the steps of carrying out a first treatment on the surface of the The original video refers to the video before synthesis.
Further, in the step 2, the intra-cluster square sum WCSS and the profile coefficient index are adopted, and an optimal feature cluster is selected, specifically:
the calculation formula of the intra-cluster square sum WCSS is as follows:
wherein i is the coordinate point in each sample point, namely the straight feature graph,n represents the number of sample points in the cluster;
contour coefficient of single sampleExpressed as:
wherein,representing the degree of cohesion of the sample points,representing the minimum value of the distance between the sample point and the other class,the calculation method is as follows:
where j represents the other sample points in the same cluster as sample i, and distance represents the distance between sample point i and sample point j.
Further, in step 3, the image conversion network is an encoder-decoder architecture, and a ConvLSTM module is inserted into the encoder-decoder.
The image conversion network comprises an encoder, a ConvLSTM module and a decoder which are sequentially linked, and skip connection is added between the encoder and the decoder; the encoder comprises a first downsampling convolution layer, a second downsampling convolution layer, a splicing layer and a residual block, wherein a normalization layer is arranged behind each downsampling convolution layer;
currently processed frameOutput frame of last timeInput to a first downsampled convolution layer, original video framesOriginal video frameAnd the data are input into a second downsampling convolution layer, are spliced in a splicing layer after downsampling respectively, and then are decoded by a decoder after passing through a residual block and a ConvLSTM module.
Further, the overall loss function for training the image conversion network is:
wherein,in order to account for the loss of consistency of the features,in order for the short-term loss to occur,for the purpose of long-term loss,andthe weights of the overall feature consistency penalty, short term penalty, and long term penalty, respectively.
The method for calculating the feature consistency loss adopts the relu 1-2 layer of the pretrained VGG-19 to extract the shallow feature information of the image, and comprises the following steps:
wherein the method comprises the steps ofRepresenting the average value over the channel dimension,the standard deviation is indicated as such,representing vector EWhich is provided withWith an output at time tRGB pixel values of (c), andrepresenting VGG-19 networksIs the first of (2)Feature activation at the layer;
at the same time, toFeature consistency constraints are also made between:
thus, the overall feature consistency loss functionThe method comprises the following steps:
short term lossExpressed as:
wherein the method comprises the steps ofIs thatFrom optical flowWarp frame post-acquisitionThe image of the object to be processed,) Is based on input framesAnd warped input frameA visibility mask for the warp error calculation in between; optical flowIs thatAndand the reflux between them.
Applying long-term time loss between first output frame and all output framesExpressed as:
) Is based on input framesAnd warped input frameVisibility mask for warp error calculation in between.
The beneficial effects are that: the method for content unification of the synthetic video provided by the invention is a scheme for finding out the optimal characteristics in all images of the video and transferring all the images to the optimal characteristics, thereby ensuring the consistency of the video on the content and achieving the optimal effect; and the present invention redesigns the loss function of the network.
Drawings
FIG. 1 is a flow chart of dominant frequency signal feature selection in an embodiment of the invention;
FIG. 2 is a diagram of the internal structure of ConvLSTM in an embodiment of the invention;
FIG. 3 is a diagram of an encoder-decoder architecture in an embodiment of the invention;
fig. 4 is a diagram of an encoder architecture in an embodiment of the present invention.
Detailed Description
The invention is based on the premise that the input continuous single-frame image is a composite image which is carried out by taking a template video as a reference object, and the continuous single-frame image is continuous video, and each frame has correlation. For example, in the template video Vmp, the video of dancing of a male and female professional dancer 2 (MP) is that, the synthesized videos Vhp1 and Vhp2 are respectively that of a male dancer imitating the action of the male dancer in the template video Vmp and that of a female dancer imitating the action of the female dancer in the template video Vmp; the synthesized video includes a new video Vin of all people to be synthesized (HP). The video Vin is related to the action content rather than the video without the relation, wherein the video without the relation means that the first frame is a static diagram of a cat and the second frame is a static diagram of a dog.
Under this condition, the invention works as follows:
in the composite video, due to the fact that the original pictures in the earlier stage are subjected toPicture set synthesized one by one in individual picture formUnder different training iteration cycles or input conditions, the style or color contrast distribution of each frame of picture of the video may be inconsistent. Therefore, the invention needs to carry out post-processing steps on the enhanced image, searches the optimal dominant frequency style signal, so as to restrict the overall video frame style and achieve the video with better overall consistency. The invention adopts image clustering based on principal component analysis (Principal Component Analysis, PCA) and K-means algorithm to find the optimal characteristics.
Step 1, feature extraction
And extracting variance, mean and color histogram features from each image frame in the composite video, wherein the variance and the mean reflect brightness and contrast features of the image, and the color histogram describes color distribution conditions in the image.
Step 2, searching optimal characteristics through clustering
The K-means algorithm is to divide a set of data points into K different clusters, each cluster consisting of data points within it, the data points within the clusters being similar to each other.
Clustering the dimensionality reduced images by using a K-means algorithm, taking the variance, the mean value and the color histogram characteristics as input data of the K-means algorithm, classifying the images with similar characteristics into the same category, namely the same cluster, and finding the central mean value of each cluster. For example, of 10 images, the color histogram of 7 images is reddish and the color histogram of 3 images is yellowish, then 2 clusters are possible. And selecting an optimal characteristic cluster by adopting the square sum and the outline coefficient index in the cluster. The intra-cluster sum of squares is a sum of squares that calculates the euclidean distance of the samples in each cluster from the center of the cluster, and a smaller value of the intra-cluster sum of squares (Within-Cluster Sum of Square, WCSS) indicates a higher degree of tightness of the intra-cluster samples. The calculation formula is as follows:
where i is the coordinate point in each sample point, i.e. the feature map,for the cluster center, n shows the number of sample points within the cluster, and the value of WCSS is the sum of the squares of the distances of each data point from its cluster center.
The contour coefficient is calculated for each sample, and represents the similarity between the sample and other samples in the cluster to which the sample belongs and the dissimilarity between the sample and the nearest neighbor cluster. The profile coefficients for a single sample are calculated as:
wherein,the cohesion of the representative sample points is calculated as follows:
where j represents other sample points in the same cluster as sample i, distance represents the distance between i and j, where n represents n sample points in the cluster. So thatSmaller indicates that the class is tighter.
Calculation mode and of (2)Similarly, but requires traversing other classes of clusters (m total) to get multiple valuesFrom which the smallest value is selected as the final result.
So the profile coefficient of the original single sampleCan be written as:
from the above, it can be seen thatWhen the distance within the class is smaller than the class spacingAnd if the clustering is away, the clustering result is more compact. The value of S will approach 1, with a more aggressive 1 representing a more pronounced profile. In contrast, whenWhen the distance in the class is larger than the distance between the classes, the clustering result is loose. The value of S will approach-1, the worse the clustering effect will be the closer to-1.
By profile factor of individual samplesAnd (5) obtaining an index to obtain an optimal characteristic cluster. In the optimal feature cluster, for each sample point (i.e., image) in the cluster, the distance between each sample point in the cluster and the center of the cluster to which the sample point belongs is calculated by using Euclidean distance, and the distance is used as the optimal sample image which is needed to be obtained by the invention. And taking the obtained sample image as a standard reference, and carrying out corresponding contrast, brightness and color histogram adjustment on each target image so as to enable the target image to have consistent visual characteristics with the sample image.
Step 3, image unification
The invention takes a deep recursion network as a basic transmission module, and the concept of the recursion network is to infer the current output by combining all previous frame information and current frame information. To convert original videoAnd currently synthesized videoAs input, time-consistent output video is generatedWherein the value range of T is 1-T, and T represents the total frame number in the original video. The first output frameFrom the following componentsEndowing, setting a first output frame. At each time step, an AND is generated using image conversion network learningTemporally coincident output frames. The current output frame is then taken as input for the next time step.
Step 3.1 image converting network structure
The image conversion network consists of a classical encoder-decoder architecture. To capture the spatio-temporal correlation of video, a ConvLSTM module is inserted in the encoder-decoder network, as shown in fig. 4.
By integrating the ConvLSTM module into the image conversion network, the image conversion network is able to capture timing information in the video sequence and learn the spatio-temporal correlation between video frames using the memory unit in the ConvLSTM module. The ConvLSTM module can compress the information into a hidden state that can be used to estimate the current state. This hidden state can capture spatial information of the entire input sequence and allow the ConvLSTM module to learn temporal consistency coherently. In combination with the time loss, the ConvLSTM module provides satisfactory results in facilitating time consistency of the video-style transport network. The ConvLSTM module converts the 2D input in LSTM into a 3D tensor, the last two dimensions being the spatial dimensions (rows and columns). For each time t of data, the ConvLSTM module replaces a portion of the join operations in LSTM with convolution operations, i.e., predictions are made by the current input and the past state of the local neighbors. The convolution operation of extracting the spatial features is added to the LSTM network, and a part of connection operations in the LSTM are replaced by convolution operations. The internal structure of ConvLSTM module is shown in FIG. 2, H in FIG. 2 t-1 Representing the state of the hidden layer of the neuron at the time t-1, C t-1 Indicating the output of the neuron at time t-1. X is X t A value representing time-series data X at the time t, in the present embodiment, the time tThe reason for the O of (c) being retained as parameter symbol X here is to facilitate a comparative understanding with the original ConvLSTM.
The present invention uses a basic encoder-decoder architecture as shown in fig. 3. The input to the image conversion network comprises the currently processed frameOutput frame of previous timeAnd unprocessed frames at the current and previous momentsIs a series of (a) and (b). Since the output frame is generally similar to the currently processed frame, the training image conversion network is used to predict the residual rather than the actual pixel values, i.eWhere F represents the image conversion network.For learning content and feature information, by learning unprocessedSpace-time relationship, constraint betweenTime information between them.
The encoder consists of two downsampled stride convolutional layers, each followed by an instance normalization. The encoder is followed by 5 residual blocks and a ConvLSTM module. The decoder placed after the ConvLSTM module consists of two transposed convolutional layers, with example normalization later.
As shown in fig. 4, a skip connection is added between the encoder and the decoder to provideHigh reconstruction quality and reduced information loss. The skip connection allows the underlying feature to pass directly to the decoder so that it can access higher level feature representations from the encoder. This helps to improve the detail retention of the network during the reconstruction process. However, skipping the connection may transfer low-level information (e.g., color) to the output frame and create visual artifacts. Thus, the input to the encoder is split into two streams: one stream for the processed framesAndanother stream is used for inputting framesAnd. The skip connection adds only the skip connection from the processed frame to avoid transmitting low-level information from the input frame.
Step 3.2 loss function
The object of the present invention is to reduce temporal inconsistencies in the output video while maintaining content, feature similarity to the processed frames.
[1] Content aware losses
Computing perceived loss using a pretrained VGG classification networkAndsimilarity between them. The perceived loss is defined as:
wherein,representing vector EWith an output at time tIs used for the RGB pixel values of (c),is the total number of pixels in the frame, andrepresenting VGG-19 networksIs the first of (2)Features at the layer activate. The Relu 4-3 layer is selected to calculate the perceptual loss.
[2] Feature consistency loss
Content aware losses take into accountAndthe comparison between pixel levels, the loss of which is too severe, may result in an overall feature difference from the original video. The present invention therefore proposes a new feature consistency penalty aimed at ensuring that the generated video frames do not change the feature distribution of the original video.
The relu 1-2 layer of the pretrained VGG-19 described above is also used to extract shallow feature information for the image. The feature consistency loss is:
wherein the method comprises the steps ofRepresenting the average value over the channel dimension,representing standard deviation.
At the same time consider, generating an imageThe offset feature association may be lost between the previous and subsequent frames. Thus, forFeature consistency constraints are also made between:
thus, the overall feature consistency loss function is:
[3] time loss
Short term time loss. The time loss is formulated as a warp error between output frames:
wherein the method comprises the steps ofIs thatFrom optical flowThe image obtained after the frame is warped,) Is based on input framesAnd warped input frameDistortion error betweenAnd (3) a calculated visibility mask. Optical flowIs thatAndand the reflux between them. The dynamic traffic is calculated using FlowNet2, using bilinear sampling layer warped frames, and empirically set(the pixel range is 0,1]Between).
Long term time loss. Although short term time loss enforces temporal consistency between successive frames, long term (e.g., more than 5 frames) coherence is not guaranteed. A straightforward method of enforcing long-term temporal consistency is to apply a loss of time across all output frame pairs. However, such strategies require significant computational costs (e.g., optical flow estimation). Furthermore, it is meaningless to calculate the time loss between two intermediate outputs before the network converges. Thus, a long-term time penalty is imposed between the first output frame and all output frames:
long-term temporal coherence can be implemented over a maximum of 10 frames (t=10) during training.
[4] Total loss of
The overall loss function for training the image conversion network is:
wherein the method comprises the steps ofAndthe weights of the feature consistency loss, short term loss, and long term loss, respectively.
By implementing the invention, the prior problems of the post-processing technical scheme of video conversion per frame can be faced, the scheme of automatically identifying the optimal characteristics in all images and transferring the images to the optimal characteristics is realized, thereby ensuring the consistency of the whole video on the content and achieving the optimal effect.

Claims (9)

1. A method for content unification of composite video, comprising the steps of:
step 1, extracting variance, mean value and color histogram features from each image frame in a synthetic video;
step 2, taking the variance, the mean value and the color histogram feature as input data of a K-means algorithm, classifying images with similar features into the same category, namely the same cluster, obtaining an optimal feature cluster, taking one image in the optimal feature cluster as a sample image, and adjusting contrast, brightness and color histogram of other images in the synthesized video according to the sample image;
the method for acquiring the optimal feature cluster specifically comprises the following steps:
the calculation formula of the intra-cluster square sum WCSS is as follows:
wherein i is coordinate point in each sample point, namely the straight feature diagram, C k N represents the number of sample points in the cluster;
the profile coefficient S (i) of a single sample is expressed as:
where a (i) represents the cohesion of the sample point, b (i) represents the minimum value of the distance between the sample point and other classes, and a (i) is calculated as follows:
where j represents the other sample points in the same cluster as sample i, and distance represents the distance between sample point i and sample point j.
2. The method for content matching of composite video according to claim 1, further comprising step 3 of performing time matching processing on the composite video using an image conversion network to obtain a processed video image frame O at time t t
Specifically, let the first frame O 1 =P 1 The current processing frame P in the synthesized video t Original video frame I t Original video frame I t-1 Output frame O of the previous time t-1 Input into an image conversion network, output a video frame O at the moment t after time unification processing t The method comprises the steps of carrying out a first treatment on the surface of the The original video refers to the video before synthesis.
3. The method for content unification of composite video according to claim 1, wherein one image in the optimal feature cluster is taken as a sample image, specifically:
and calculating the cluster center of the optimal feature cluster in the optimal feature cluster, and for each sample point in the optimal feature cluster, calculating the distance between each sample point in the cluster and the cluster center by using Euclidean distance to find the sample point closest to the cluster center as a sample image.
4. The method of claim 2, wherein in step 3, the image conversion network is an encoder-decoder architecture, and wherein the ConvLSTM module is inserted into the encoder-decoder.
5. The method for content unification of synthetic video according to claim 2, wherein the image conversion network comprises an encoder, a ConvLSTM module, and a decoder linked in sequence, and a skip connection is added between the encoder and the decoder;
the encoder comprises a first downsampling convolution layer, a second downsampling convolution layer, a splicing layer and a residual block, wherein a normalization layer is arranged behind each downsampling convolution layer;
currently processed frame P t Output frame O of last time t-1 Input to a first downsampled convolution layer, original video frame I t Original video frame I t-1 And the data are input into a second downsampling convolution layer, are spliced in a splicing layer after downsampling respectively, and then are decoded by a decoder after passing through a residual block and a ConvLSTM module.
6. A method for content unification of composite video according to claim 2, wherein the overall loss function for training the image conversion network is:
L=λ f L fst L stlt L lt
wherein L is f For the total feature consistency loss, L st L is short term loss lt Lambda is a long-term loss f 、λ st And lambda (lambda) lt The weights of the overall feature consistency penalty, short term penalty, and long term penalty, respectively.
7. The method for content matching of composite video of claim 6, wherein the pre-trained relu 1-2 layer of VGG-19 is used to extract shallow feature information of the image, and the feature matching loss is:
where μ () represents the channel dimensionAverage over the degrees, σ () represents the standard deviation,representing vector e R 3 Which has RGB pixel values of the output O at time t and +.>Represents VGG-19 network->Feature activation at layer 1 of (c);
at the same time to O t 、O t-1 Feature consistency constraints are also made between:
thus, the overall feature consistency loss function L f The method comprises the following steps:
L f =L f1 +L f2
8. the method for content unification of synthetic video according to claim 6, wherein L is a short term loss st Expressed as:
wherein the method comprises the steps ofIs O t-1 By optical flow->Image obtained after warping the frame, < >>Is based on input frame I t And distorted input frame->A visibility mask for the warp error calculation in between; optical flow->Is I t And I t-1 And the reflux between them.
9. The method for content unification of synthetic video according to claim 6, wherein a long term time loss L is applied between the first output frame and all output frames lt Expressed as:
is based on input frame I t And distorted input frame->Visibility mask for warp error calculation in between.
CN202311800961.6A 2023-12-26 2023-12-26 Method for content unification of composite continuous images Active CN117474817B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311800961.6A CN117474817B (en) 2023-12-26 2023-12-26 Method for content unification of composite continuous images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311800961.6A CN117474817B (en) 2023-12-26 2023-12-26 Method for content unification of composite continuous images

Publications (2)

Publication Number Publication Date
CN117474817A CN117474817A (en) 2024-01-30
CN117474817B true CN117474817B (en) 2024-03-15

Family

ID=89625941

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311800961.6A Active CN117474817B (en) 2023-12-26 2023-12-26 Method for content unification of composite continuous images

Country Status (1)

Country Link
CN (1) CN117474817B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117812275B (en) * 2024-02-28 2024-05-28 哈尔滨学院 Image optimization communication method for volleyball auxiliary training

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109672874A (en) * 2018-10-24 2019-04-23 福州大学 A kind of consistent three-dimensional video-frequency color calibration method of space-time
CN111768469A (en) * 2019-11-13 2020-10-13 中国传媒大学 Data visualization color matching extraction method based on image clustering
CN114048350A (en) * 2021-11-08 2022-02-15 湖南大学 Text-video retrieval method based on fine-grained cross-modal alignment model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109672874A (en) * 2018-10-24 2019-04-23 福州大学 A kind of consistent three-dimensional video-frequency color calibration method of space-time
CN111768469A (en) * 2019-11-13 2020-10-13 中国传媒大学 Data visualization color matching extraction method based on image clustering
CN114048350A (en) * 2021-11-08 2022-02-15 湖南大学 Text-video retrieval method based on fine-grained cross-modal alignment model

Also Published As

Publication number Publication date
CN117474817A (en) 2024-01-30

Similar Documents

Publication Publication Date Title
WO2019120110A1 (en) Image reconstruction method and device
WO2018000752A1 (en) Monocular image depth estimation method based on multi-scale cnn and continuous crf
CN117474817B (en) Method for content unification of composite continuous images
CN113673307A (en) Light-weight video motion recognition method
CN109948721B (en) Video scene classification method based on video description
Guo et al. Image dehazing via enhancement, restoration, and fusion: A survey
CN114494297B (en) Adaptive video target segmentation method for processing multiple priori knowledge
CN114170286A (en) Monocular depth estimation method based on unsupervised depth learning
Guo et al. A survey on image enhancement for Low-light images
CN113158905A (en) Pedestrian re-identification method based on attention mechanism
Yu et al. Fla-net: multi-stage modular network for low-light image enhancement
CN111861939A (en) Single image defogging method based on unsupervised learning
CN115035011A (en) Low-illumination image enhancement method for self-adaptive RetinexNet under fusion strategy
CN112990340B (en) Self-learning migration method based on feature sharing
Singh et al. Action recognition in dark videos using spatio-temporal features and bidirectional encoder representations from transformers
CN112270691B (en) Monocular video structure and motion prediction method based on dynamic filter network
CN116091955A (en) Segmentation method, segmentation device, segmentation equipment and computer readable storage medium
CN115690917B (en) Pedestrian action identification method based on intelligent attention of appearance and motion
US20220224934A1 (en) Machine-learned in-loop predictor for video compression
CN115689871A (en) Unsupervised portrait image color migration method based on generation countermeasure network
Xu et al. Attention‐based multi‐channel feature fusion enhancement network to process low‐light images
CN116029916A (en) Low-illumination image enhancement method based on dual-branch network combined with dense wavelet
CN115393491A (en) Ink video generation method and device based on instance segmentation and reference frame
CN114140334A (en) Complex coal mine image defogging method based on improved generation countermeasure network
Stival et al. Survey on Video Colorization: Concepts, Methods and Applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant