CN116132714B

CN116132714B - Video data transmission method for network television system

Info

Publication number: CN116132714B
Application number: CN202310402019.8A
Authority: CN
Inventors: 黄世华; 马秀文; 罗均文; 朱敬毅; 吴静
Original assignee: Shenzhen Lutong Network Technology Co ltd
Current assignee: Shenzhen Lutong Network Technology Co ltd
Priority date: 2023-04-17
Filing date: 2023-04-17
Publication date: 2023-06-30
Anticipated expiration: 2043-04-17
Also published as: CN116132714A

Abstract

The invention relates to the technical field of image communication, in particular to a video data transmission method for a network television system, which comprises the following steps: acquiring target video data for a network television system; dividing the region of each frame of target image in the target video data; carrying out area relevance analysis processing on each initial area; performing self-adaptive merging processing on each initial region in the initial region set; performing scale division on target areas in the target area group set, and determining a target dictionary under each target scale; carrying out inter-frame correlation analysis processing on each frame of target image in the target video data; dividing each frame of target image in the target video data, and screening a high-frequency dictionary from the multi-scale dictionary; compressing the low-frequency image set and the high-frequency image set; and transmitting the target compressed data. The invention realizes the communication of the target video data and improves the effect of compressing the video data.

Description

Video data transmission method for network television system

Technical Field

The invention relates to the technical field of image communication, in particular to a video data transmission method for a network television system.

Background

Video data refers to a sequence of video images of successive frames, essentially data consisting of images of successive frames. Video often consists of multiple story units, each story unit often contains multiple scenes, and images of successive multiple frames sequentially form each scene, so that short video data often contains a large number of images, and in order to speed up the transmission rate of the video data in a network television system, the video data often needs to be compressed. Currently, when video data is compressed, the following methods are generally adopted: compression of video data is achieved by compressing images in the video data. Among them, the method for compressing the image is generally: an image is compressed using a JPEG (Joint Photographic Experts Group), a compression standard for continuous tone still images, compression coding algorithm.

However, when the JPEG compression encoding algorithm is used to compress an image, there are often the following technical problems:

because the JPEG compression coding algorithm compresses the image through discrete cosine transform, the loss of image information is often caused when the JPEG compression coding algorithm is adopted to compress the image, so that the effect of compressing video data is poor.

Disclosure of Invention

The summary of the invention is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. The summary of the invention is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In order to solve the technical problem of poor compression effect on video data, the invention provides a video data transmission method for a network television system.

The invention provides a video data transmission method for a network television system, which comprises the following steps:

acquiring target video data for a network television system;

dividing the area of each frame of target image in the target video data to obtain an initial area set corresponding to the target image;

performing area correlation analysis processing on each initial area in an initial area set corresponding to each frame of target image to obtain area correlation corresponding to the initial area;

according to the region correlation degree corresponding to each initial region in the initial region set corresponding to each frame of target image in the target video data, carrying out self-adaptive merging processing on each initial region in the initial region set corresponding to the target image, and determining a target region group corresponding to the target image to obtain a target region group set corresponding to the target video data;

Performing scale division on target areas in the target area group set to obtain a target scale set, determining target dictionaries under each target scale in the target scale set to obtain a target dictionary set, and combining the target dictionaries in the target dictionary set into a multi-scale dictionary;

performing inter-frame correlation analysis processing on each frame of target image in the target video data to obtain an inter-frame correlation group corresponding to the target image;

dividing each frame of target images in the target video data according to the inter-frame correlation group corresponding to the target images to obtain a high-frequency image set and a low-frequency image set, and screening a high-frequency dictionary from the multi-scale dictionary based on the high-frequency image set;

compressing a target area in the low-frequency image set according to the multi-scale dictionary, and compressing the target area in the high-frequency image set according to the high-frequency dictionary to obtain target compressed data corresponding to the target video data;

and transmitting the target compressed data.

Further, the performing area correlation analysis processing on each initial area in the initial area set corresponding to each frame of target image to obtain an area correlation corresponding to the initial area includes:

Determining local entropy corresponding to each initial region in the initial region set;

determining the sum of the squares of the difference values of the local entropy corresponding to the initial region and the local entropy corresponding to each reference region in a reference region set as a first difference corresponding to the initial region, wherein the reference region in the reference region set is an initial region except the initial region in the initial region set;

for each reference region in the initial region and reference region set, performing positive correlation mapping on absolute values of differences between standard deviations of gray values corresponding to all pixel points in the initial region and standard deviations of gray values corresponding to all pixel points in the reference region to obtain second differences between the initial region and the reference region;

for each reference region in the initial region and reference region set, performing positive correlation mapping on the absolute value of the difference value of the first difference corresponding to the initial region and the first difference corresponding to the reference region to obtain a third difference between the initial region and the reference region;

determining a fourth difference between the initial region and each reference region in the reference region set according to a second difference and a third difference between the initial region and each reference region in the reference region set, wherein the fourth difference is positively correlated with the second difference and the fourth difference is positively correlated with the third difference;

Determining a cross correlation coefficient between the initial region and each reference region in the reference region set, and performing negative correlation mapping on a fourth difference between the initial region and the reference region to obtain a first similar index between the initial region and the reference region;

determining the product of a first proximity index and a cross correlation coefficient between the initial region and the reference region as a second proximity index between the initial region and the reference region;

normalizing the accumulated value of the second proximity index between the initial region and each reference region in the reference region set to obtain the region correlation degree corresponding to the initial region.

Further, the self-adaptive merging processing is performed on each initial region in the initial region set corresponding to the target image according to the region correlation degree corresponding to each initial region in the initial region set corresponding to each frame of target image in the target video data, and determining a target region group corresponding to the target image includes:

screening out a first preset number of initial areas with the largest area correlation degree from the initial area set to serve as a first initial area;

Screening out a second preset number of initial areas with minimum area correlation degree from the initial area set to serve as second initial areas;

and determining the screened first initial region and second initial region as seed regions, and performing region growth on the initial regions in the initial region set based on the seed regions and growth conditions to obtain a target region group, wherein the growth conditions are that the absolute value of the difference value of the region correlation degree corresponding to the adjacent initial regions is smaller than or equal to a preset correlation threshold value.

Further, the performing scale division on the target area in the target area group set to obtain a target scale set includes:

determining an absolute value of an area difference between each two target areas in the set of target area groups as an initial area difference between the two target areas;

taking the sum of the areas of every two target areas in the target area group set as a reference area between the two target areas;

for each two target areas in the target area group set, determining the ratio of the initial area difference between the two target areas to the reference area as the relative area difference between the two target areas;

If the relative area difference between two target areas in the target area group set is smaller than or equal to a preset area difference threshold value, dividing the two target areas into the same target scale to obtain a target scale set.

Further, the performing an inter-frame correlation analysis on each frame of the target image in the target video data to obtain an inter-frame correlation group corresponding to the target image includes:

screening target images adjacent to the target images from the target video data to serve as reference images, and obtaining a reference image group corresponding to the target images;

and determining the inter-frame correlation between the target image and each reference image in the reference image group, and obtaining the inter-frame correlation group corresponding to the target image.

Further, the determining the inter-frame correlation between the target image and each reference image in the set of reference images includes:

determining a target characteristic value corresponding to each pixel point in the target image and the reference image;

determining the average value of target characteristic values corresponding to all pixel points in each first target area in a first target area group as a first characteristic index corresponding to the first target area, and determining the average value of target characteristic values corresponding to all pixel points in each second target area in a second target area group as a reference characteristic index corresponding to the second target area, wherein the first target area group is a target area group corresponding to the target image, and the second target area group is a target area group corresponding to the reference image;

Determining a difference value of a target characteristic value corresponding to each pixel point in each first target area and a first characteristic index corresponding to the first target area as a first characteristic difference corresponding to each pixel point in the first target area, and determining a difference value of a target characteristic value corresponding to each pixel point in each second target area and a reference characteristic index corresponding to the second target area as a second characteristic difference corresponding to each pixel point in the second target area;

combining the first characteristic differences corresponding to all the pixel points in each first target area into a first characteristic difference sequence corresponding to the first target area, and combining the second characteristic differences corresponding to all the pixel points in each second target area into a second characteristic difference sequence corresponding to the second target area;

for each first target region in a first target region group, determining a first correlation index between the first target region and the reference image according to a first characteristic difference sequence corresponding to the first target region and a second characteristic difference sequence corresponding to each second target region in a second target region group;

And determining an accumulated sum of first correlation indexes between each first target region in the first target region group and the reference image as an inter-frame correlation between the target image and the reference image.

Further, the determining, according to the first feature difference sequence corresponding to the first target area and the second feature difference sequence corresponding to each second target area in the second target area group, a first correlation index between the first target area and the reference image includes:

for each second target region in the first target region and the second target region group, performing negative correlation mapping on the absolute value of the difference value of the region correlation degree corresponding to the first target region and the region correlation degree corresponding to the second target region to obtain a first similarity index between the first target region and the second target region;

for each second target region in the first target region and the second target region group, determining cosine similarity between a first characteristic difference sequence corresponding to the first target region and a second characteristic difference sequence corresponding to the second target region as a second similarity index between the first target region and the second target region;

For each second target region in the first target region and second target region group, determining a product of a first similarity index and a second similarity index between the first target region and the second target region as a third similarity index between the first target region and the second target region;

and determining the average value of third similar indexes between the first target area and all the second target areas in the second target area group as a first related index between the first target area and the reference image.

Further, the dividing each frame of the target image in the target video data according to the inter-frame correlation group corresponding to the target image to obtain a high frequency image set and a low frequency image set, including:

screening out the largest inter-frame correlation from the inter-frame correlation group corresponding to the target image, and taking the largest inter-frame correlation as a target inter-frame correlation index corresponding to the target image;

determining the product of the largest target inter-frame correlation index in target inter-frame correlation indexes corresponding to all target images in the target video data with preset duty ratio as a reference inter-frame correlation index;

when a target inter-frame correlation index corresponding to a target image in the target video data is greater than or equal to the reference inter-frame correlation index, determining the target image as a high-frequency image to obtain a high-frequency image set;

And when the target inter-frame correlation index corresponding to the target image in the target video data is smaller than the reference inter-frame correlation index, determining the target image as a low-frequency image to obtain a low-frequency image set.

Further, the screening the high-frequency dictionary from the multi-scale dictionary based on the high-frequency image set includes:

each target area in the target area group corresponding to each frame of high-frequency image in the high-frequency image set is determined to be a high-frequency area, and a high-frequency area set is obtained;

determining target use frequencies corresponding to each dictionary atom in the multi-scale dictionary, wherein the target use frequencies corresponding to the dictionary atoms are the use frequencies of the dictionary atoms when the high-frequency region set is subjected to sparse representation;

screening out the maximum target use frequency from target use frequencies corresponding to dictionary atoms in the multi-scale dictionary, and taking the maximum target use frequency as a reference use frequency;

determining the product of a preset screening coefficient and the reference use frequency as a high-frequency atomic threshold value;

when the target use frequency corresponding to dictionary atoms in the multi-scale dictionary is larger than the high-frequency atom threshold, determining the dictionary atoms as high-frequency dictionary atoms, and obtaining a high-frequency dictionary atom set;

And combining the high-frequency dictionary atoms in the high-frequency dictionary atom set into a high-frequency dictionary.

Further, the performing region division on each frame of target image in the target video data to obtain an initial region set corresponding to the target image includes:

and performing superpixel segmentation on the target image, and determining superpixels obtained by the superpixel segmentation as initial areas to obtain an initial area set corresponding to the target image.

The invention has the following beneficial effects:

the video data transmission method for the network television system realizes the communication of target video data, solves the technical problem of poor effect of compressing the video data, and improves the effect of compressing the video data. Firstly, since the video data often contains a large number of images, in order to accelerate the transmission rate of the video data in the network television system, the video data often needs to be compressed, so that the target video data for the network television system is acquired, and the target video data can be conveniently compressed later. Then, the target image is divided into areas, so that the subsequent analysis processing of the initial area can be facilitated. Then, since the information corresponding to some areas in the target image tends to be similar, there is a certain correlation, and the stronger the correlation between the areas is, the higher the compressible degree of the areas tends to be. Therefore, the initial areas are subjected to area relevance analysis processing, and the relevance among the initial areas can be conveniently quantified later. Furthermore, based on the region correlation, each initial region in the initial region set is subjected to self-adaptive merging processing, so that the target region can be compressed subsequently, and compression of a plurality of initial regions can be realized. And continuing to scale the target areas in the target area group set, and accurately determining the target dictionary under each target scale. And then, comprehensively considering the inter-frame correlation group corresponding to the target image, and improving the accuracy of determining the high-frequency image set and the low-frequency image set. And then, based on the multi-scale dictionary and the high-frequency dictionary, respectively compressing the target areas in the low-frequency image set and the high-frequency image set, so that the efficiency of compressing the target video data can be improved. Finally, the target compressed data is transmitted, so that the transmission of the target video data can be realized, and the invention does not cause the loss of image information.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a video data transmission method for a network television system according to the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description is given below of the specific implementation, structure, features and effects of the technical solution according to the present invention with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

acquiring target video data for a network television system;

carrying out area correlation analysis processing on each initial area in an initial area set corresponding to each frame of target image to obtain area correlation corresponding to the initial area;

according to the region correlation degree corresponding to each initial region in the initial region set corresponding to each frame of target image in the target video data, carrying out self-adaptive merging processing on each initial region in the initial region set corresponding to the target image, determining a target region group corresponding to the target image, and obtaining a target region group set corresponding to the target video data;

carrying out inter-frame correlation analysis processing on each frame of target image in the target video data to obtain an inter-frame correlation group corresponding to the target image;

Dividing each frame of target images in target video data according to the inter-frame correlation group corresponding to the target images to obtain a high-frequency image set and a low-frequency image set, and screening a high-frequency dictionary from the multi-scale dictionary based on the high-frequency image set;

compressing a target area in the low-frequency image set according to the multi-scale dictionary, and compressing the target area in the high-frequency image set according to the high-frequency dictionary to obtain target compression data corresponding to target video data;

and transmitting the target compressed data.

The following detailed development of each step is performed:

referring to fig. 1, a flow of some embodiments of a video data transmission method for a network television system according to the present invention is shown. The video data transmission method for the network television system comprises the following steps:

step S1, target video data for a network television system is acquired.

In some embodiments, target video data for a network television system may be acquired.

The network television system may be a software system for implementing a network television function. The network television can take a television, a personal computer and a handheld device as a display terminal, and realize services such as digital television, time-shifting television, interactive television and the like by accessing the set top box or the computer into a broadband network, and the appearance of the network television brings a brand new television watching method for people, changes the traditional passive television watching mode, and realizes the on-demand watching and on-demand and off-demand of the television. The target video data may be video for a network television system. The target video data may be a video composed of target images. The target image may be a preprocessed image. Pretreatment may include, but is not limited to: image denoising, graying, and image enhancement.

It should be noted that, the network television is often based on HTTP (Hyper Text Transfer Protocol ) to transmit video data, however, with development of transmission technologies, currently, video on demand services at the mobile end and the web end mostly use CDN (Content Delivery Network ) to realize online video playing. Because the video data often contains a large number of images, in order to accelerate the transmission rate of the video data in the network television system, the video data often needs to be compressed, so that the target video data for the network television system is acquired, and the target video data can be conveniently compressed later. The video data may be transmitted in the following manner: firstly, collecting video data, secondly, compressing the collected video data, and transmitting a file obtained by compression as transmission content.

As an example, acquiring target video data for a network television system may include the steps of:

first, initial video data is acquired.

For example, video data may be obtained from the CDN as initial video data.

As another example, video data may be collected by a camera and the collected video data may be used as initial video data.

And secondly, carrying out image denoising on the image in the initial video data, and taking the denoised image as a denoised image to obtain a denoised video.

The denoising video may be a video composed of denoising images. The image denoising may be bilateral filtering denoising.

It should be noted that, image denoising is performed on an image in the initial video data, so that the image quality in the initial video data can be improved.

Thirdly, graying the denoising image in the denoising video, and taking the image obtained by graying as a target image to obtain target video data.

And S2, carrying out region division on each frame of target image in the target video data to obtain an initial region set corresponding to the target image.

In some embodiments, the area division may be performed on each frame of the target image in the target video data, so as to obtain an initial area set corresponding to the target image.

As an example, the target image may be subjected to superpixel division, and superpixels obtained by the superpixel division may be determined as initial regions, so as to obtain an initial region set corresponding to the target image. For example, the target image may be divided into a preset number of superpixels using a SLIC (Simple Linear Iterative Clustering ) algorithm, and the divided superpixels may be used as the initial regions, so that the preset number of initial regions may be obtained. Wherein the preset number may be a preset number. For example, the preset number may be 100.

It should be noted that, since the superpixel is often a small area formed by a series of adjacent pixel points with similar characteristics such as color, brightness, texture, and the like, and the more similar the pixel points are, the more compressible the pixel points are often, so that the subsequent compression can be facilitated by performing superpixel segmentation on the target image and determining the superpixel obtained by the superpixel segmentation as the initial area.

And S3, carrying out area correlation analysis processing on each initial area in the initial area set corresponding to each frame of target image to obtain the area correlation degree corresponding to the initial area.

In some embodiments, the area correlation analysis may be performed on each initial area in the initial area set corresponding to each frame of the target image, so as to obtain the area correlation corresponding to the initial area.

It should be noted that, because the information corresponding to some areas in the target image is often similar, there is a certain correlation, and the stronger the correlation between the areas is, the higher the compressible degree of the areas is often. Therefore, the initial areas are subjected to area relevance analysis processing, and the relevance among the initial areas can be conveniently quantified later.

As an example, this step may include the steps of:

first, determining local entropy corresponding to each initial region in the initial region set.

The local entropy corresponding to the initial region may represent the gray level confusion degree of the initial region.

For example, the formula for determining local entropy correspondence for each initial region in the initial region set may be:

wherein,,

is the local entropy corresponding to the a-th initial region in the initial region set corresponding to the i-th frame target image in the target video data.

The number of the pixel points in the a-th initial region in the initial region set corresponding to the i-th frame target image (the pixel points with the same gray value are the same pixel point).

Is the a-th initial region in the initial region set corresponding to the i-th frame target imageThe frequency of occurrence of j pixel points in the a-th initial region.

Is based on natural constant

Logarithmic (log). i is the frame number of the target image in the target video data. a is the sequence number of the initial region in the initial region set corresponding to the i-th frame target image. j is the class number of the pixel point in the a-th initial area.

The larger the local entropy corresponding to the initial region, the more chaotic the pixel points in the initial region are often described.

And secondly, determining the sum of the squares of the difference values of the local entropy corresponding to the initial region and the local entropy corresponding to each reference region in the reference region set as a first difference corresponding to the initial region.

The reference region in the reference region set may be an initial region other than the initial region in the initial region set. For example, the reference region set corresponding to the first initial region in the initial region set may include: the initial regions of the initial region set are other than the first initial region.

And thirdly, for each reference region in the initial region and the reference region set, performing positive correlation mapping on absolute values of differences between standard deviations of gray values corresponding to all pixel points in the initial region and standard deviations of gray values corresponding to all pixel points in the reference region to obtain a second difference between the initial region and the reference region.

Fourth, for each reference region in the initial region and the reference region set, performing positive correlation mapping on the absolute value of the difference value between the first difference corresponding to the initial region and the first difference corresponding to the reference region, and obtaining a third difference between the initial region and the reference region.

And fifthly, determining a fourth difference between the initial region and the reference region according to the second difference and the third difference between the initial region and each reference region in the reference region set.

Wherein the fourth difference and the second difference may be positively correlated. The fourth difference may be positively correlated with the third difference.

And a sixth step of determining a cross correlation coefficient between the initial region and each reference region in the reference region set, and performing negative correlation mapping on a fourth difference between the initial region and the reference region to obtain a first similar index between the initial region and the reference region.

Wherein the cross-correlation coefficient between the initial region and the reference region may characterize the gray scale similarity between the initial region and the reference region.

For example, determining the cross-correlation coefficient between the initial region and the reference region may comprise the sub-steps of:

the first sub-step may be to sort the pixels in the initial area according to a left-to-right order and a top-to-bottom order, so as to obtain a first pixel sequence.

And a second sub-step, namely sorting the pixels in the reference area according to the sequence from left to right and from top to bottom to obtain a second pixel sequence.

And a third sub-step of determining cosine similarity between the first pixel point sequence and the second pixel point sequence as a cross-correlation coefficient between the initial region and the reference region.

Seventh, determining a product of the first proximity index and the cross correlation coefficient between the initial region and the reference region as a second proximity index between the initial region and the reference region.

And eighth step, normalizing the accumulated value of the second near indexes between the initial area and each reference area in the reference area set to obtain the area correlation degree corresponding to the initial area.

For example, the formula for determining the region correlation corresponding to the initial region may be:

wherein,,

is the region correlation corresponding to the a-th initial region in the initial region set corresponding to the i-th frame target image in the target video data.

Is the first difference corresponding to the a-th initial region in the initial region set corresponding to the i-th frame target image.

Is the first difference corresponding to the b-th reference region in the reference region set corresponding to the a-th initial region in the initial region set corresponding to the i-th frame target image.

Is the local entropy corresponding to the a-th initial region in the initial region set corresponding to the i-th frame target image.

Is the local entropy corresponding to the b-th reference region in the reference region set corresponding to the a-th initial region in the initial region set corresponding to the i-th frame target image.

Is the number of reference regions in the reference region set corresponding to the a-th initial region in the initial region set corresponding to the i-th frame target image.

Is a fourth difference between an a-th initial region in the initial region set corresponding to the i-th frame target image and a b-th reference region in the reference region set corresponding to the a-th initial region.

Is a preset factor greater than 0, and is mainly used for preventing denominator from being 0, such as

Taking 0.01.

Is the standard deviation of gray values corresponding to the a-th initial region and all pixel points in the a-th initial region in the initial region set corresponding to the i-th frame target image.

Is the standard deviation of the gray values corresponding to all the pixel points in the b-th reference area in the reference area set corresponding to the a-th initial area in the initial area set corresponding to the i-th frame target image.

Is the cross correlation coefficient between the a-th initial region in the initial region set corresponding to the i-th frame target image and the b-th reference region in the reference region set corresponding to the a-th initial region.

Is that

Is the absolute value of (c).

Is that

Is the absolute value of (c).

Is the second difference between the a-th initial region in the initial region set corresponding to the i-th frame target image and the b-th reference region in the reference region set corresponding to the a-th initial region.

Is of natural constant

To the power, can realize

Is a positive correlation mapping of (1).

Is the third difference between the a-th initial region in the initial region set corresponding to the i-th frame target image and the b-th reference region in the reference region set corresponding to the a-th initial region.

Is of natural constant

To the power, can realize

Is a positive correlation mapping of (1).

Is of natural constant

To the power.

Is a first proximity index between an a-th initial region in the initial region set corresponding to the i-th frame target image and a b-th reference region in the reference region set corresponding to the a-th initial region.

Can realize the pair of

Is a negative correlation mapping of (1).

Is a second proximity index between an a-th initial region in the initial region set corresponding to the i-th frame target image and a b-th reference region in the reference region set corresponding to the a-th initial region.

Can realize the pair of

Is included in the (c) for the normalization. i is the frame number of the target image in the target video data. a is the sequence number of the initial region in the initial region set corresponding to the i-th frame target image. b is the sequence number of the reference region in the reference region set corresponding to the a-th initial region.

It should be noted that video data is composed of a certain number of consecutive frame images in time order, and thus a compressed object is essentially an image. The image compression effect is often related to the correlation between adjacent frame images and the correlation between local areas in each frame image, and the stronger the correlation is, the more redundant information of video data is, and the greater the compressible degree is; the weaker the correlation between adjacent frame images or between local areas within each frame image, the less redundant information the video data is, and the less compressible. The redundant information of the images mainly comprises temporal redundancy and spatial redundancy, wherein the size of the temporal redundancy depends on the correlation strength between adjacent frame images, and the size of the spatial redundancy depends on the correlation strength between local areas in a single frame image. Therefore, the region correlation corresponding to the initial region is determined, and the compression degree can be indirectly quantified. First difference corresponding to a first initial region

The larger it tends to be explained that the larger the difference between the a-th initial region and the rest of the initial regions in the set of initial regions.

The smaller the size, the more similar the description between the a-th initial region and the b-th reference region.

The larger it tends to be explained that the larger the difference between the a-th initial region and the b-th reference region.

The larger the reference region, the more similar the reference region is between the a-th initial region and the b-th reference region. So the area correlation corresponding to the a-th initial area

The larger the initial region, the greater the correlation between the initial region a and the rest of the initial regions in the initial region set, the more suitable the initial region a is for dictionary atoms of shorter length, because the initial region a is represented by the dictionary atoms of the rest of the regions. Region correlation degree corresponding to the a-th initial region

The smaller the initial region, the smaller the correlation between the initial region a and the rest of the initial regions in the initial region set, the more suitable the initial region a is for dictionary atoms with longer length, and the initial region a cannot be represented by the dictionary atoms of the rest of the regions.

And S4, carrying out self-adaptive merging processing on each initial region in the initial region set corresponding to the target image according to the region correlation degree corresponding to each initial region in the initial region set corresponding to each frame of target image in the target video data, and determining a target region group corresponding to the target image to obtain a target region group set corresponding to the target video data.

In some embodiments, the adaptive merging process may be performed on each initial region in the initial region set corresponding to the target image according to the region correlation degree corresponding to each initial region in the initial region set corresponding to each frame of target image in the target video data, so as to determine a target region group corresponding to the target image, and obtain a target region group set corresponding to the target video data.

It should be noted that, based on the region correlation, the adaptive merging process is performed on each initial region in the initial region set, so that the target region can be compressed subsequently, so as to compress a plurality of initial regions.

As an example, this step may include the steps of:

the method comprises the steps of firstly, screening out a first preset number of initial areas with maximum area correlation from an initial area set to serve as a first initial area.

The first preset number may be a preset number. For example, the first preset number may be 10.

And step two, screening out a second preset number of initial areas with minimum area correlation degree from the initial area set to serve as second initial areas.

Wherein the second preset number may be a preset number. The first preset number and the second preset number may be equal. For example, the first preset number and the second preset number may each be 10.

Thirdly, determining the screened first initial region and second initial region as seed regions, and performing region growth on the initial regions in the initial region set based on the seed regions and growth conditions to obtain a target region group.

The growth condition may be that an absolute value of a difference value of the region correlation degrees corresponding to the adjacent initial regions is less than or equal to a preset correlation threshold. The correlation threshold may be an absolute value of a maximum region correlation difference value when two adjacent initial regions are considered similar, which is preset. For example, the correlation threshold may be 0.2.

For example, according to the region correlation corresponding to the initial region, the initial region in the initial region set may be subjected to region growth by a region growth algorithm, and the region after the region growth is used as the target region, thereby obtaining the target region group. The determined seed region may be used as an input of a region growing algorithm, and the growing criterion (growing condition) may be that an absolute value of a difference value of the region correlation degrees corresponding to the adjacent initial regions is less than or equal to a correlation threshold, and the stopping condition is that no adjacent growing region meeting the correlation threshold exists.

It should be noted that, the larger the area of the growing region where the initial region is located, the stronger the relevance between the initial region and the rest of the initial regions in the target image is often described, and the greater the possibility that the initial region can be reconstructed by the dictionary atoms corresponding to the region with stronger relevance in the target image is often described; if the area of the growing area where the initial area is located is smaller, the association of the initial area with the rest area in the target image is weaker, and the probability that the initial area can be reconstructed by dictionary atoms corresponding to the rest area in the target image is smaller.

Secondly, the closer the region correlation degree corresponding to the adjacent initial region is, the more similar the image information corresponding to the adjacent initial region is, and the higher the similarity of the pixel points in the adjacent initial region is. The stronger the correlation between the pixel points, the higher the compressibility degree, if the difference of the image information corresponding to the adjacent initial areas is larger, the similarity between the corresponding pixel points is lower, the weaker the correlation between the pixel points, and the compressibility degree of the image is lower. For example, a lake exists in an image, image information corresponding to pixel points in the area where the lake exists is lake water, the pixel points have strong relevance, the greater the space redundancy brought is, the greater the compressibility degree is, and the whole area where the lake exists can be represented by a small number of pixel points. Therefore, based on the region correlation, the self-adaptive merging processing is carried out on each initial region in the initial region set, so that the initial regions with similar correlation can be merged together, and the subsequent compression can be facilitated.

Second, the areas corresponding to different image contents in the target image tend not to be the same. For example, the image content may be a lake, some of the image content may be a person, and since the sizes of the corresponding lake and the corresponding person in the image tend to be different, if the target image is divided into areas with equal areas to form a fixed-scale dictionary, the corresponding fixed-length dictionary atoms of the person in the dictionary tend to generate additional redundancy. Therefore, the self-adaptive merging processing is carried out on each initial region in the initial region set, the target region containing the similar image content can be extracted, the target dictionary under each target scale can be conveniently and accurately determined later, and certain redundancy can be reduced.

And S5, performing scale division on target areas in the target area group set to obtain a target scale set, determining target dictionaries under each target scale in the target scale set to obtain a target dictionary set, and combining the target dictionaries in the target dictionary set into a multi-scale dictionary.

In some embodiments, the target area in the target area group set may be scaled to obtain a target scale set, and a target dictionary under each target scale in the target scale set is determined to obtain a target dictionary set, and target dictionaries in the target dictionary set are combined into a multi-scale dictionary.

It should be noted that, the scale division is performed on the target area in the target area group set, so that the target dictionary under each target scale can be accurately determined.

As an example, this step may include the steps of:

in a first step, the absolute value of the area difference between each two target areas in the set of target area groups is determined as the initial area difference between the two target areas.

The area of the target area can be represented by the number of pixel points in the target area.

For example, for any two target regions in the set of target region groups, the absolute value of the difference in the number of pixels in the two target regions may be determined as the initial area difference between the two target regions.

And secondly, taking the sum of the areas of every two target areas in the target area group set as a reference area between the two target areas.

Thirdly, for each two target areas in the target area group set, determining the ratio of the initial area difference between the two target areas to the reference area as the relative area difference between the two target areas;

fourth, if the relative area difference between two target areas in the target area group set is smaller than or equal to a preset area difference threshold, dividing the two target areas into the same target scale to obtain a target scale set.

The area difference threshold may be a preset maximum relative area difference when the areas of the two target areas are considered to be similar. For example, the area difference threshold may be 0.05.

And fifthly, determining target dictionaries under each target scale in the target scale set to obtain a target dictionary set.

The target dictionary under the target scale may be a dictionary corresponding to the target area in the target scale.

For example, MOD (Method of Optimal Direction, optimal direction method) may be employed to determine a dictionary corresponding to a target region in each target scale as a target dictionary at that target scale.

And sixthly, combining the target dictionaries in the target dictionary set into a multi-scale dictionary.

The multi-scale dictionary may include all dictionary atoms contained in the target dictionary set, among others. Wherein dictionary atoms may be elements that make up a dictionary.

In the process of learning the compressed image by using the dictionary, the initial dictionary and the sparse matrix need to be updated. Typically, one element in the initial dictionary and the sparse matrix is unchanged, and the other element is iteratively updated. And the K-SVD dictionary learning algorithm needs to update each dictionary atom in the process of updating the initial dictionary. However, for video data, there is a certain correlation between different frame images, and it is considered to update dictionary atoms with higher frequency of use and delete dictionary atoms with lower frequency of use. Dictionary atoms are a column vector corresponding to entries in an initial dictionary, which is typically an overcomplete dictionary, and not every dictionary atom is frequently used for video data with relevance.

In the initial dictionary updating process, the use frequency of dictionary atoms is different because the image information in different frame images in the video data is different, and for the classification result with larger scale, the image information carried by each classification result is relatively obvious in the corresponding image, and the area reflected in the classification result with larger scale in the image basically contains the area of the same information in the image. The video data is composed of successive frame images, so that there is a different degree of correlation between the images, for example, an image of a certain frame is an image of a lake, and the next frame image of the frame is likely to contain content related to the lake, and then dictionary atoms in a scale dictionary learned by the lake corresponding classification result in the frame image can be used for reconstructing the lake in the next frame image of the frame. And sparse representation can be performed on the next frame image of the frame by utilizing the scale dictionary corresponding to the frame image.

And S6, carrying out inter-frame correlation analysis processing on each frame of target image in the target video data to obtain an inter-frame correlation group corresponding to the target image.

In some embodiments, an inter-frame correlation analysis process may be performed on each frame of the target image in the target video data, so as to obtain an inter-frame correlation group corresponding to the target image.

As an example, this step may include the steps of:

and a first step of screening out target images adjacent to the target images from the target video data as reference images to obtain a reference image group corresponding to the target images.

For example, the reference image group corresponding to the i-th frame target image in the target video data may include: an i-1 th frame target image and an i+1 th frame target image. The reference image group corresponding to the 1 st frame of target image may include one reference image, which is the 2 nd frame of target image. The reference image group corresponding to the last 1 frame of target image may include one reference image, which is the 2 nd frame of target image. The reference picture group corresponding to the target picture other than the 1 st and last 1 st frame target pictures in the target video data may include two reference pictures.

And a second step of determining the inter-frame correlation between the target image and each reference image in the reference image group to obtain an inter-frame correlation group corresponding to the target image.

For example, determining the inter-frame correlation between the target image and each reference image in the set of reference images may comprise the sub-steps of:

and a first sub-step of determining a target characteristic value corresponding to each pixel point in the target image and the reference image.

The target feature value may be an LBP (Local Binary Patterns, local binary pattern) value, among others.

And a second sub-step of determining the average value of the target feature values corresponding to all the pixel points in each first target area in the first target area group as the first feature index corresponding to the first target area, and determining the average value of the target feature values corresponding to all the pixel points in each second target area in the second target area group as the reference feature index corresponding to the second target area.

The first target region group may be a target region group corresponding to the target image. The second target region group may be a target region group corresponding to the above-mentioned reference image.

And a third sub-step of determining a difference between the target feature value corresponding to each pixel point in each first target area and the first feature index corresponding to the first target area as a first feature difference corresponding to each pixel point in the first target area, and determining a difference between the target feature value corresponding to each pixel point in each second target area and the reference feature index corresponding to the second target area as a second feature difference corresponding to each pixel point in the second target area.

And a fourth sub-step of combining the first feature differences corresponding to all the pixel points in each first target area into a first feature difference sequence corresponding to the first target area, and combining the second feature differences corresponding to all the pixel points in each second target area into a second feature difference sequence corresponding to the second target area.

And a fifth sub-step of determining, for each first target region in the first target region group, a first correlation index between the first target region and the reference image based on the first feature difference sequence corresponding to the first target region and the second feature difference sequences corresponding to the respective second target regions in the second target region group.

For example, determining the first correlation index between the first target region and the reference image according to the first feature difference sequence corresponding to the first target region and the second feature difference sequence corresponding to each second target region in the second target region group may include the following steps:

first, for each second target region in the first target region and second target region group, performing negative correlation mapping on an absolute value of a difference value between a region correlation degree corresponding to the first target region and a region correlation degree corresponding to the second target region, so as to obtain a first similarity index between the first target region and the second target region.

For example, the manner of determining the region correlation corresponding to each target region in the target region group may be: and (3) taking the target area and the target area as an initial area and an initial area set respectively, and executing the step (S3) to obtain the area correlation corresponding to each initial area, namely the area correlation corresponding to the corresponding target area.

And then, for each second target region in the first target region and the second target region group, determining cosine similarity between a first characteristic difference sequence corresponding to the first target region and a second characteristic difference sequence corresponding to the second target region as a second similarity index between the first target region and the second target region.

Then, for each second target region in the first target region and second target region group, a product of a first similarity index and a second similarity index between the first target region and the second target region is determined as a third similarity index between the first target region and the second target region.

And finally, determining the average value of third similar indexes between the first target area and all the second target areas in the second target area group as a first related index between the first target area and the reference image.

A sixth sub-step of determining an accumulated sum of first correlation indexes between the reference images and each first target region in the first target region group as an inter-frame correlation between the target image and the reference image.

For example, the formula for determining the inter-frame correlation correspondence between the target image and each reference image in the reference image group may be:

wherein,,

is the inter-frame correlation between the i-th frame target image in the target video data and the D-th reference image in the reference image group to which the i-th frame target image corresponds.

Is the target feature value corresponding to the B-th pixel point in the a-th target area in the target area group (first target area group) corresponding to the i-th frame target image.

Is a first characteristic index corresponding to an A-th target region in a target region group corresponding to an i-th frame target image.

Is the first characteristic difference corresponding to the B pixel point in the A-th target area in the target area group corresponding to the i-th frame target image.

Is the t pixel point pair in the D-th target area in the target area group (second target area group) corresponding to the D-th reference image in the reference image group corresponding to the i-th frame target image A target feature value for the application.

Is the reference characteristic index corresponding to the D-th target area in the target area group corresponding to the D-th reference image in the reference image group corresponding to the i-th frame target image.

Is the second characteristic difference corresponding to the t pixel point in the D-th target area in the target area group corresponding to the D-th reference image in the reference image group corresponding to the i-th frame target image.

Is a first characteristic difference sequence corresponding to an A-th target region in a target region group corresponding to an i-th frame target image.

Is the first characteristic difference corresponding to the 1 st pixel point in the A-th target area in the target area group corresponding to the i-th frame target image.

Is the first feature difference corresponding to the 2 nd pixel point in the A-th target area in the target area group corresponding to the i-th frame target image.

Is the A-th target region in the target region group corresponding to the i-th frame target image

And the first characteristic difference corresponding to each pixel point.

Is the number of pixel points in the A-th target area in the target area group corresponding to the i-th frame target image.

Is the (D) th reference image in the (D) th reference image group corresponding to the (i) th frame of the target image And a second characteristic difference sequence corresponding to the target region.

Is the second characteristic difference corresponding to the 1 st pixel point in the D-th target area in the target area group corresponding to the D-th reference image in the reference image group corresponding to the i-th frame target image.

Is the second characteristic difference corresponding to the 2 nd pixel point in the D-th target area in the target area group corresponding to the D-th reference image in the reference image group corresponding to the i-th frame target image.

Is the (D) th target region in the (D) th target region group corresponding to the (D) th reference image in the (i) th frame target image corresponding reference image group

And second characteristic differences corresponding to the pixel points.

Is the number of pixel points in the D-th target area in the target area group corresponding to the D-th reference image in the reference image group corresponding to the i-th frame target image.

Is a first correlation index between an A-th target region in a target region group corresponding to an i-th frame target image and a D-th reference image in a reference image group corresponding to the i-th frame target image.

Is the number of target areas in the target area group corresponding to the D-th reference image in the reference image group corresponding to the i-th frame target image.

Is the region correlation corresponding to the A-th target region in the target region group corresponding to the i-th frame target image.

Is the region correlation corresponding to the D-th target region in the target region group corresponding to the D-th reference image in the reference image group corresponding to the i-th frame target image.

Taking 0.01.

Is a first similarity index between the A-th target region and the D-th target region in the target region group corresponding to the D-th reference image in the reference image group corresponding to the i-th frame target image.

Is that

And (3) with

Cosine similarity between the target region group corresponding to the ith frame of target image and the reference image group corresponding to the ith frame of target image, namely a second similarity index between the A-th target region in the target region group corresponding to the ith frame of target image and the D-th target region in the target region group corresponding to the D-th reference image in the reference image group corresponding to the ith frame of target image.

Is a third similar index between the A-th target region and the D-th target region in the target region group corresponding to the D-th reference image in the reference image group corresponding to the i-th frame target image.

Is the number of target areas in the target area group corresponding to the i-th frame target image. i is the frame number of the target image in the target video data. A is the ith Sequence number of target area in target area group corresponding to frame target image. B is the serial number of the pixel point in the A-th target area in the target area group corresponding to the i-th frame target image. D is the sequence number of the reference image in the reference image group corresponding to the i-th frame target image. D is the sequence number of the target region in the target region group corresponding to the D-th reference image in the reference image group corresponding to the i-th frame target image. t is the sequence number of the pixel point in the D-th target area in the target area group corresponding to the D-th reference image in the reference image group corresponding to the i-th frame target image.

Can realize the pair of

Is a negative correlation mapping of (1).

Is that

Is the absolute value of (c).

It should be noted that the number of the substrates,

the larger the target area a and the target area d are often explained to be more similar.

The smaller the target area a and the target area d are often explained to be more similar. So that

The larger the reference image, the more relevant the A-th target region and the D-th reference image, and the more similar the image information corresponding to the A-th target region and the D-th reference image. Thus (2)

The larger the frame, the stronger the correlation between the ith frame target image and the D reference image, the more similar the image information contained in the ith frame target image and the D reference image, and the more similar the image information contained in the ith frame target image and the D reference image, the more similar the image information contained in the D reference image, the more similar the image information is, the more similar the image information In sparse representation, the higher the use frequency of the corresponding dictionary atoms is, the higher the use frequency of the dictionary atoms can be conveniently screened later.

And S7, dividing each frame of target image in the target video data according to the inter-frame correlation group corresponding to the target image to obtain a high-frequency image set and a low-frequency image set, and screening a high-frequency dictionary from the multi-scale dictionary based on the high-frequency image set.

In some embodiments, the high-frequency dictionary may be selected from the multi-scale dictionary based on the high-frequency image set by dividing each frame of the target image in the target video data according to the inter-frame correlation group corresponding to the target image to obtain the high-frequency image set and the low-frequency image set.

It should be noted that, considering the inter-frame correlation group corresponding to the target image comprehensively, accuracy of determining the high-frequency image set and the low-frequency image set can be improved.

As an example, this step may include the steps of:

the first step is to screen out the largest inter-frame correlation from the inter-frame correlation group corresponding to the target image as the target inter-frame correlation index corresponding to the target image.

And secondly, determining the product of the maximum target inter-frame correlation index in target inter-frame correlation indexes corresponding to all target images in the target video data with preset duty ratio as a reference inter-frame correlation index.

The preset duty cycle may be a preset duty cycle. For example, the preset duty cycle may be 0.7.

And thirdly, determining the target image as a high-frequency image when the target inter-frame correlation index corresponding to the target image in the target video data is greater than or equal to the reference inter-frame correlation index, so as to obtain a high-frequency image set.

And fourthly, when the target inter-frame correlation index corresponding to the target image in the target video data is smaller than the reference inter-frame correlation index, determining the target image as a low-frequency image, and obtaining a low-frequency image set.

And fifthly, determining each target area in the target area group corresponding to each frame of high-frequency image in the high-frequency image set as a high-frequency area to obtain a high-frequency area set.

And sixthly, determining the target use frequency corresponding to each dictionary atom in the multi-scale dictionary.

The target frequency of use corresponding to the dictionary atom may be a frequency of use of the dictionary atom when the high-frequency region set is sparsely represented.

And seventh, screening out the maximum target use frequency from target use frequencies corresponding to dictionary atoms in the multi-scale dictionary, and taking the maximum target use frequency as a reference use frequency.

Eighth, determining the product of the preset screening coefficient and the reference use frequency as a high-frequency atomic threshold value.

The filtering coefficient may be a coefficient preset according to actual conditions for filtering the high-frequency dictionary atoms. For example, the filter coefficient may be 0.3.

And ninth, determining the dictionary atoms as high-frequency dictionary atoms when the target use frequency corresponding to the dictionary atoms in the multi-scale dictionary is larger than the high-frequency atom threshold value, and obtaining a high-frequency dictionary atom set.

And tenth, combining the high-frequency dictionary atoms in the high-frequency dictionary atom set into a high-frequency dictionary.

The high-frequency dictionary may be a dictionary composed of high-frequency dictionary atoms in a high-frequency dictionary atom set.

And S8, compressing the target area in the low-frequency image set according to the multi-scale dictionary, and compressing the target area in the high-frequency image set according to the high-frequency dictionary to obtain target compression data corresponding to the target video data.

In some embodiments, the target area in the low-frequency image set may be compressed according to the multi-scale dictionary, and the target area in the high-frequency image set may be compressed according to the high-frequency dictionary, so as to obtain target compressed data corresponding to the target video data.

Based on the multi-scale dictionary and the high-frequency dictionary, the target areas in the low-frequency image set and the high-frequency image set are respectively compressed, so that the efficiency of compressing the target video data can be improved.

As an example, a multi-scale dictionary may be used to obtain a sparse representation of a target region in a low-frequency image set, and a high-frequency dictionary may be used to obtain a sparse representation of a target region in a high-frequency image set, where all the obtained sparse representations are used as a sparse representation result of the target video data. After the sparse representation result of the target video data is obtained, carrying out quantization coding on the sparse representation and the dictionary atomic index corresponding to the sparse representation, and taking the coding result as the transmission content of the target video data in the network television system, wherein the coding result can be target compressed data corresponding to the target video data. The encoding can be accomplished by using huffman coding, which is a well-known technique and will not be described in detail.

Since the high-frequency image is a target image with a large target inter-frame correlation index, the correlation between the high-frequency image and other images tends to be strong, and thus the sparse representation of the target region in the high-frequency image set can be acquired by using the high-frequency dictionary. Because the low-frequency image is a target image with a small target inter-frame correlation index, the correlation between the low-frequency image and other images is weak, and therefore a multi-scale dictionary can be utilized to obtain sparse representation of a target area in the low-frequency image set.

Alternatively, since the multi-scale dictionary includes a high-frequency dictionary, a multi-scale dictionary may also be employed to obtain sparse representations of target regions in the high-frequency image set. However, the efficiency is relatively low when acquiring a sparse representation of a target region in a set of high frequency images using a multi-scale dictionary, as compared to using a high frequency dictionary.

Step S9, transmitting the target compressed data.

In some embodiments, the target compressed data may be transmitted.

As an example, the target compressed data may be determined as the transmission content of the target video data in the network television system, and the target compressed data may be transmitted.

In summary, the method and the device divide each frame of target image into a plurality of target areas based on the similarity degree between different areas in each frame of target image in the target video data, and finally obtain the multi-scale dictionary. The multi-scale dictionary considers the correlation among different video contents, and can acquire sparse representation with higher sparseness. And secondly, determining inter-frame correlation by utilizing the characteristic that video contents in different frame images in target video data have certain correlation, wherein the inter-frame correlation considers the correlation condition among all target areas in adjacent frame images, dividing the target areas into different target scales based on the areas of the target areas, and performing dictionary determination on the target areas in the different target scales through an MOD algorithm, so that dictionary atoms with higher use frequency can be conveniently screened later. The obtained high-frequency dictionary performs sparse representation on the image with stronger subsequent relevance, so that the problems of large calculation amount and low compression efficiency caused by iterative update of each dictionary atom in the dictionary updating process of the traditional K-SVD dictionary learning algorithm can be solved, the subsequent sparse representation is not influenced, the dictionary learning speed can be increased, and the video data transmission rate is improved.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention and are intended to be included within the scope of the invention.

Claims

1. A video data transmission method for a network television system, comprising the steps of:

acquiring target video data for a network television system;

transmitting the target compressed data;

performing inter-frame correlation analysis processing on each frame of target image in the target video data to obtain an inter-frame correlation group corresponding to the target image, including:

determining the inter-frame correlation between the target image and each reference image in the reference image group to obtain an inter-frame correlation group corresponding to the target image;

the determining the inter-frame correlation between the target image and each reference image in the set of reference images includes:

determining an accumulated sum of first correlation indexes between each first target region in a first target region group and the reference image as an inter-frame correlation between the target image and the reference image;

the determining a first correlation index between the first target region and the reference image according to the first feature difference sequence corresponding to the first target region and the second feature difference sequence corresponding to each second target region in the second target region group includes:

For each second target region in the first target region and the second target region group, determining a first similarity index between the first target region and the second target region according to the absolute value of the difference value of the region correlation degree corresponding to the first target region and the region correlation degree corresponding to the second target region, wherein the absolute value of the difference value of the region correlation degree and the first similarity index are in negative correlation;

determining a mean value of third similar indexes between the first target area and all second target areas in a second target area group as a first related index between the first target area and the reference image;

Dividing each frame of target image in the target video data according to the inter-frame correlation group corresponding to the target image to obtain a high-frequency image set and a low-frequency image set, wherein the method comprises the following steps:

when a target inter-frame correlation index corresponding to a target image in the target video data is smaller than the reference inter-frame correlation index, determining the target image as a low-frequency image to obtain a low-frequency image set;

the screening the high-frequency dictionary from the multi-scale dictionary based on the high-frequency image set comprises the following steps:

2. The method for transmitting video data of a network television system according to claim 1, wherein the performing a region correlation analysis process on each initial region in the initial region set corresponding to each frame of the target image to obtain a region correlation corresponding to the initial region includes:

Determining the sum of the squares of the difference values of the local entropy corresponding to the initial region and the local entropy corresponding to each reference region in the reference region set as a first difference corresponding to the initial region, wherein the reference region set corresponding to the a-th initial region in the initial region set corresponding to the i-th frame target image comprises: the method comprises the steps that in an initial region set corresponding to an ith frame of target image, an initial region except for an a-th initial region is formed, i is a frame number of the target image in target video data, and a is a sequence number of the initial region in the initial region set corresponding to the ith frame of target image;

for each reference region in the initial region and reference region set, determining a second difference between the initial region and the reference region according to the absolute value of the difference between the standard deviation of the gray values corresponding to all the pixel points in the initial region and the standard deviation of the gray values corresponding to all the pixel points in the reference region, wherein the absolute value of the difference between the standard deviation of the gray values and the second difference are positively correlated;

for each reference region in the initial region and reference region set, determining a third difference between the initial region and the reference region according to the absolute value of the difference value of the first difference corresponding to the initial region and the first difference corresponding to the reference region, wherein the absolute value of the difference value of the first difference is positively correlated with the third difference;

determining a cross-correlation coefficient between the initial region and each reference region in the reference region set, and determining a first similar index between the initial region and the reference region according to a fourth difference between the initial region and the reference region, wherein the fourth difference is in negative correlation with the first similar index;

3. The method for transmitting video data of a network television system according to claim 1, wherein the determining the target region group corresponding to the target image by adaptively combining each initial region in the initial region set corresponding to the target image according to the region correlation degree corresponding to each initial region in the initial region set corresponding to each frame of the target image in the target video data comprises:

4. The method for video data transmission of a network television system according to claim 1, wherein the performing scale division on the target area in the target area group set to obtain a target scale set includes:

5. The method for transmitting video data of a network television system according to claim 1, wherein the performing region division on each frame of target image in the target video data to obtain an initial region set corresponding to the target image includes: