CN114979651B - Terminal video data transmission method, device, equipment and medium - Google Patents

Terminal video data transmission method, device, equipment and medium Download PDF

Info

Publication number
CN114979651B
CN114979651B CN202210476602.9A CN202210476602A CN114979651B CN 114979651 B CN114979651 B CN 114979651B CN 202210476602 A CN202210476602 A CN 202210476602A CN 114979651 B CN114979651 B CN 114979651B
Authority
CN
China
Prior art keywords
video data
area
data
portrait
original video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210476602.9A
Other languages
Chinese (zh)
Other versions
CN114979651A (en
Inventor
蒋东东
董刚
赵雅倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210476602.9A priority Critical patent/CN114979651B/en
Publication of CN114979651A publication Critical patent/CN114979651A/en
Application granted granted Critical
Publication of CN114979651B publication Critical patent/CN114979651B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Human Computer Interaction (AREA)

Abstract

The invention discloses a method, a device, equipment and a medium for transmitting terminal video data, wherein the method comprises the following steps: the following steps are performed at a terminal as a transmitting end: receiving a preliminary training result, acquiring original video data, and reasoning the original video data based on the preliminary training result and a portrait identification algorithm to determine a portrait area and a background area in the original video data; determining position information of a portrait area and all areas; calculating the change of the background area based on a time domain noise reduction algorithm, and determining a selection area based on the change; determining selected area position information based on the change and the position information of the portrait area or all areas; compressing the video data of the selected area, splicing the compressed data with the position information of the selected area, and then sending the spliced data to a receiving end. By the scheme of the invention, the video data transmission quantity between the terminals is effectively reduced, and the video data transmission pressure is reduced.

Description

Terminal video data transmission method, device, equipment and medium
Technical Field
The present invention relates to the field of deep learning technologies, and in particular, to a method, an apparatus, a device, and a medium for transmitting terminal video data.
Background
With the development of 5G communication and the influence of uncertain environmental factors, more and more people can use a high-speed network to perform real-time video communication, especially in multi-terminal communication scenes such as remote education, monitoring and teleconferencing.
In the application scenario of multi-point transmission at multiple terminals, each terminal transmits video to all other terminal nodes, in order to reduce the transmission bandwidth, a method of compressing and transmitting video streams (i.e. video data) is generally used to compress the data amount of the video, so as to reduce the transmission bandwidth pressure caused by video stream transmission, as shown in fig. 1. However, as shown in fig. 2, each additional terminal, such as terminal 3, needs to transmit its own video stream to all other terminals and decode the video stream transmitted to terminal 3 by all other terminals, and all terminals need to add additional resource data for compression and decompression. In the scene of multi-terminal real-time communication, each time an edge terminal is added, the data bandwidth transmission of all terminals in the whole system can bring pressure.
Currently, there are many video compression algorithms for devices of terminals to assist the terminals in reducing the data transmission bandwidth between the terminals, such as the video data compression algorithm of H264, to reduce the bandwidth occupied by video transmission, as shown in fig. 3. However, with the exponential increase of the number of network terminals and the increasing cloud demand on work, multi-node video streaming is still a serious problem.
Disclosure of Invention
In view of this, the invention provides a method, a device and a medium for transmitting terminal video data, which utilize an artificial intelligent portrait identification algorithm to reduce noise in an auxiliary time domain, and select an effective portrait area to independently compress and transmit by isolating a background area which is basically unchanged, so that the data transmission amount is effectively reduced, and the data transmission pressure is reduced.
Based on the above object, an aspect of the embodiments of the present invention provides a method for transmitting terminal video data, which specifically includes: the following steps are performed at a terminal as a transmitting end:
Receiving the preliminary training result and acquiring original video data, and reasoning the original video data based on the preliminary training result and a portrait identification algorithm to determine a portrait area and a background area in the original video data;
determining the position information of the portrait area and all areas;
calculating the change of the background area based on a time domain noise reduction algorithm, and determining a selection area based on the change;
Determining selected region position information based on the change and the position information of the portrait region or the whole region;
Compressing the video data of the selected area, splicing the compressed data with the position information of the selected area, and sending the spliced data to a receiving end.
In some embodiments, inferring the raw video data based on the preliminary training results and a portrait identification algorithm to determine a portrait region and a background region in the raw video data includes:
On an auxiliary acceleration unit carried by the terminal, reasoning the original video data based on the primary training result and a portrait identification algorithm to determine a portrait area and a background area in the original video data, wherein the portrait area comprises a rectangle in shape; the auxiliary acceleration unit comprises an ASIC or FPGA.
In some embodiments, the method further comprises:
performing secondary training on the original video data based on the primary training result and the portrait identification algorithm;
And reasoning the original video data based on the secondary training result and the portrait identification algorithm to optimize the portrait area and the background area.
In some embodiments, performing secondary training on the raw video data based on the preliminary training results and the portrait identification algorithm includes:
On an auxiliary acceleration unit carried by the terminal, performing forward transmission calculation on the original video data based on a PE array in the auxiliary acceleration unit;
ending the secondary training if the forward transmission calculation result meets the threshold value;
If the forward transmission calculation result does not meet the threshold value, a data multiplexing link is built in the auxiliary acceleration unit, and reverse transmission calculation is carried out on the original video data based on the data multiplexing link;
ending the secondary training if the reverse transmission calculation result meets the threshold value;
And returning to the auxiliary acceleration unit carried by the terminal if the reverse transmission calculation result does not meet the threshold value, and carrying out forward transmission calculation on the original video containing the personnel based on the PE array in the auxiliary acceleration unit.
In some embodiments, constructing a data multiplexing link in the auxiliary acceleration unit, and performing reverse transmission calculation on the original video data based on the data multiplexing link, includes:
Taking an image input channel in the auxiliary acceleration unit as an activation data input channel to input activation data to the PE array;
Taking a weight input channel in the auxiliary acceleration unit as a loss data input channel to input loss data to the PE array;
And performing multiply-add operation on the activation data and the loss data based on the PE array to obtain weight gradient data, and updating weights based on the weight gradient data to finish the reverse transmission calculation.
In some embodiments, the method further comprises:
And sending the secondary training result to the cloud end so that the cloud end performs primary training on the video data based on a portrait identification algorithm and the secondary training result.
In some embodiments, the following steps are performed at a terminal that is the receiving end:
Receiving spliced compressed data, storing the position information of a selected area in the spliced compressed data into a memory, decompressing the compressed data in the spliced compressed data, and storing the decompressed compressed data into the memory;
Reading the selected region location information from the memory to determine a coverage area of video data;
and reading decompressed data from the memory to cover corresponding positions in the coverage area of the video data.
In another aspect of the embodiment of the present invention, there is also provided a device for transmitting terminal video data, including:
The reasoning module is configured to receive the preliminary training result sent by the cloud and acquire original video data, and the reasoning module is used for reasoning the original video data based on the preliminary training result and a figure recognition algorithm so as to determine a figure region and a background region in the original video data;
a first determination module configured to determine positional information of the portrait area and all areas;
A computing module configured to compute a change in the background region based on a temporal noise reduction algorithm and to determine a selection region based on the change;
A second determination module configured to determine selection area position information based on the change and position information of the portrait area or the entire area;
the compression module is configured to compress the video data of the selected area, splice the compressed data with the position information of the selected area and send the spliced data to a receiving end.
In yet another aspect of the embodiment of the present invention, there is also provided a computer apparatus, including: at least one processor; and a memory storing a computer program executable on the processor, which when executed by the processor, performs the steps of the method as above.
In yet another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method steps as described above.
The invention has at least the following beneficial technical effects: the video data is initially trained on the cloud based on the portrait identification algorithm, and the initial training result is sent to the terminal, so that all video data is not required to be uploaded to the cloud, the safety of the video data of the terminal is ensured, and meanwhile, the communication time delay and the pressure on network bandwidth are reduced; reasoning is carried out on the original video data at a transmitting end so as to separate a portrait area and a background area in the original video data; and the change of the background area is calculated based on a time domain noise reduction algorithm, and the compressed portrait area or all areas are selected to be sent to the receiving end according to the change condition of the background area, so that the data transmission quantity is effectively reduced, and the data transmission pressure is reduced.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of video data transmission performed by two conventional terminal devices;
fig. 2 is a schematic diagram of video data transmission performed by three conventional terminal devices;
Fig. 3 is a schematic diagram of video data transmission performed by two conventional terminal devices after using a compression algorithm;
Fig. 4 is a block diagram of an embodiment of video data transmission at a transmitting end according to the present invention;
FIG. 5 is a block diagram of an embodiment of performing preliminary training on video data at a cloud end according to the present invention;
FIG. 6 is a schematic diagram of reasoning about original video data based on a portrait identification algorithm provided by the invention;
FIG. 7 is a diagram of an embodiment of determining location information for a portrait area and an entire area according to the present invention;
FIG. 8 is a schematic diagram of an embodiment of calculating background area variation based on a time-domain noise reduction algorithm according to the present invention;
FIG. 9 is a schematic diagram of another embodiment of calculating background area variation based on a temporal noise reduction algorithm according to the present invention;
FIG. 10 is a schematic diagram of an embodiment of a data format of spliced video data according to the present invention;
FIG. 11 is a schematic diagram of an embodiment of an FPGA according to the present invention;
FIG. 12 is a schematic diagram of the structure of each PE array in the FPGA structure shown in FIG. 9;
FIG. 13 is a schematic diagram illustrating an embodiment of a weight gradient calculation based on PE units according to the present invention;
fig. 14 is a schematic diagram of an embodiment of video data transmission at a receiving end according to the present invention;
FIG. 15 is a diagram illustrating an embodiment of storing decompressed data to DDR according to the present invention;
FIG. 16 is a schematic diagram of an embodiment of reading video data from DDR and overlaying video data of a previous frame;
FIG. 17 is a schematic diagram of an embodiment of transmitting data through a transmitting end and receiving data through a receiving end according to the present invention;
fig. 18 is a schematic diagram of an embodiment of a terminal video data transmission device according to the present invention;
FIG. 19 is a schematic diagram illustrating a computer device according to an embodiment of the present invention;
Fig. 20 is a schematic structural diagram of an embodiment of a computer readable storage medium according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
It should be noted that, in the embodiments of the present invention, all the expressions "first" and "second" are used to distinguish two entities with the same name but different entities or different parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present invention, and the following embodiments are not described one by one.
Based on the above object, a first aspect of the embodiments of the present invention proposes an embodiment of a method for transmitting terminal video data.
As shown in fig. 4, for data transmission at the transmitting end, the method specifically includes the following steps:
S20, receiving a preliminary training result sent by a cloud and acquiring original video data, and reasoning the original video data based on the preliminary training result and a portrait identification algorithm to determine a portrait area and a background area in the original video data;
s30, determining the position information of the portrait area and all areas;
s40, calculating the change of the background area based on a time domain noise reduction algorithm, and determining a selection area based on the change;
s50, determining position information of a selected area based on the change and the position information of the portrait area or the whole area;
s60, compressing the video data of the selected area, splicing the compressed data with the position information of the selected area, and then sending the spliced data to a receiving end.
Specifically, in step S20, a preliminary training result sent by the cloud is received.
As shown in fig. 5, the process of performing preliminary training on video data at the cloud end specifically includes the following steps:
S201, performing preliminary training on the video data based on a portrait identification algorithm, and sending a preliminary training result to the terminal.
Specifically, the video transmission method of the traditional terminal equipment is generally completed by the cloud equipment, but if all training tasks of video data are completed by the cloud equipment, all video data are required to be uploaded to the cloud for training, so that data leakage is likely to be caused, and a large amount of video data are required to be transmitted to the cloud for processing, and the cloud processing is completed and then returned to each terminal, so that the communication time delay of video transmission is greatly increased, and the network bandwidth is also greatly stressed. According to the method and the device, the preliminary training of the video data is finished based on the portrait identification algorithm only at the cloud end, all video data are not required to be uploaded to the cloud end, the safety of the video data of the terminal is guaranteed, and meanwhile, the communication time delay and the pressure on the network bandwidth are reduced. The specific process is as follows:
Selecting some video data similar to the video data to be transmitted on the terminal from a public database based on an application scene of the video data at the cloud device for preliminary training to obtain a weight which accords with expectations, and transmitting the weight to the terminal, wherein the terminal comprises a transmitting end and a receiving end corresponding to the transmitting end, and a portrait identification algorithm comprises fast R-CNN (a convolution neural network algorithm based on regions), YOLO (You Only Look Once, a network for target detection) and the like. Compared with training of all video data of the terminal, the accuracy of training is slightly low, but the calculated amount required by training is less, and the training speed is increased.
The camera based on the terminal equipment collects the original video data (the video data can be understood as an image of a frame), and a schematic diagram of the human image recognition algorithm for reasoning the original video data according to the weight value obtained by the primary training result is shown in fig. 6.
The terminal device in this embodiment is a terminal device carrying a GPU or an ASIC (Application SPECIFIC INTEGRATED Circuit) or an FPGA (Field Programmable GATE ARRAY ), and the portrait area and the background area of the original video data are obtained by reasoning the original video data on the GPU or the ASIC or the FPGA based on a portrait identification algorithm, where the combination of the portrait area and the background area is all the areas.
In step S30, the total area is composed of the portrait area and the background area, the relative positional relationship between the portrait area and the background area is determined, and the total area and the portrait area are parameter-labeled, for example, as shown in fig. 7, starting points a, widths B, and heights C of the total area are determined with the upper left corner position of the total area as the origin, so that the starting points O, widths W, and heights H of the portrait area can be further determined.
In step S40, a change in the background area is calculated based on the temporal noise reduction algorithm, and when the change is smaller than the threshold, the background area is considered unchanged, the portrait area is determined as a selected area, and when the change is larger than the threshold, all the areas including the portrait area and the background area are determined as selected areas. As shown in fig. 8 and 9, a schematic diagram of calculating the background area change based on the time domain noise reduction algorithm is shown.
Calculating the change of the current frame video data of the background area and the previous frame video data of the background area through a time domain noise filter; in response to the change being less than the threshold, determining the portrait area as a selection area; in response to the change being greater than the threshold, all areas including the portrait area and the background area are determined as selection areas.
By the scheme, when the background area changes little, only the video data of the portrait area is compressed and sent to the receiving end, so that the data compression speed is improved, the data transmission quantity is reduced, and the data transmission pressure is reduced.
In step S50, determining the selected area position information based on the change and the position information of the portrait area or the entire area may include: in response to the change being less than a threshold, taking the position information of the portrait area as the position information of the selection area; and in response to the change being greater than a threshold, taking the position information of the background area as the position information of the selection area.
The specific way of determining the position information of the selected area is:
Calculating the starting point position: start coordinates = (time domain noise > threshold)? 0, selecting the width of the range of the left upper corner position of the small area: width = (time domain noise > threshold)? Full frame width: small area width selection range height: altitude= (time domain noise > threshold)? Full frame height, small area height.
In step S60, the video data of the selected area is compressed by an image compression algorithm, the compressed video data is spliced with the position information of the portrait area, the spliced data format is shown in fig. 10, and the spliced video data is sent to the receiving end. Through the video transmission mode, the data transmission quantity can be effectively reduced, and the data transmission pressure is reduced. It should be noted that, each terminal device may simultaneously receive video data of other terminal devices and transmit video data to other terminal devices, so each terminal device is both a transmitting end and a receiving end.
According to the embodiment of the invention, the video data is primarily trained on the cloud based on the portrait identification algorithm, and the primary training result is sent to the terminal, so that the safety of the video data of the terminal is ensured; reasoning is carried out on the original video data at a transmitting end so as to separate a portrait area and a background area in the original video data; and the change of the background area is calculated based on a time domain noise reduction algorithm, and the compressed portrait area or all areas are selected to be sent to the receiving end according to the change condition of the background area, so that the data transmission quantity is effectively reduced, and the data transmission pressure is reduced.
In some embodiments, inferring the raw video data based on the preliminary training results and a portrait identification algorithm to determine a portrait region and a background region in the raw video data includes:
On an auxiliary acceleration unit carried by the terminal, reasoning the original video data based on the primary training result and a portrait identification algorithm to determine a portrait area and a background area in the original video data, wherein the portrait area comprises a rectangle in shape; the auxiliary acceleration unit comprises an ASIC or FPGA.
Because GPU power consumption is large, if the original video data is inferred and trained based on the terminal equipment carrying the GPU, the power consumption of the terminal equipment is increased, so that in order to save the power consumption of the terminal equipment, the original video data is inferred and trained based on the ASIC or FPGA with low power consumption.
Because the terminal device generally carries an auxiliary acceleration unit, such as an ASIC (Application SPECIFIC INTEGRATED Circuit), an FPGA (Field Programmable GATE ARRAY ), and the like, the image area and the background area of the original video data are obtained by reasoning the original video data based on the ASIC or the FPGA, and the combination of the image area and the background area is the whole area. Taking an FPGA as an example, the reasoning process of the original video data will be described.
As shown in fig. 11, a schematic structure of the FPGA is shown.
The reasoning process of the original video data will be described with reference to fig. 6 and 11. The reasoning of the original video data based on the FPGA is mainly realized by a PE array in the AI accelerating unit. When the original video data is calculated by the convolution layer and the full-connection layer, the PE arrays respectively acquire corresponding data through corresponding channels, such as an image input channel (input feature index) and a weight input channel (weight buffer), and multiply-add the corresponding data to output the calculated original video data from an image output channel (output feature buffer).
Fig. 12 is a schematic structural diagram of each PE array in the FPGA structure shown in fig. 11. Based on the PE structure, the multiplication and addition operation of the weights and the video data in the convolution layer operation process or the full-connection layer operation process is realized.
By the embodiment of the invention, the reasoning of the original video data is realized, the original video data is separated into the portrait area and the background area, and the consumed power consumption is low.
In some embodiments, the method further comprises:
performing secondary training on the original video data based on the primary training result and the portrait identification algorithm;
And reasoning the original video data based on the secondary training result and the portrait identification algorithm to optimize the portrait area and the background area.
In some embodiments, performing secondary training on the raw video data based on the preliminary training results and the portrait identification algorithm includes:
on an auxiliary acceleration unit carried by the terminal, performing forward transmission calculation on the original video data based on a PE array in the auxiliary acceleration unit;
ending the secondary training if the forward transmission calculation result meets the threshold value;
If the forward transmission calculation result does not meet the threshold value, a data multiplexing link is built in the auxiliary acceleration unit, and reverse transmission calculation is carried out on the original video data based on the data multiplexing link;
ending the secondary training if the reverse transmission calculation result meets the threshold value;
And returning to the auxiliary acceleration unit carried by the terminal if the reverse transmission calculation result does not meet the threshold value, and carrying out forward transmission calculation on the original video containing the personnel based on the PE array in the auxiliary acceleration unit.
In this embodiment, the secondary training is performed on the original video data based on the portrait identification algorithm, where the secondary training includes forward transmission training and reverse transmission training. And the process of reasoning the original video data based on the portrait identification algorithm is a forward transmission training. Therefore, in the process of reasoning the original video data based on the PE array, secondary training can be carried out on the original video data at the same time, separated video data is obtained through forward transmission calculation, updated weights are obtained through reverse transmission training, the updated weights replace the initial weights in the forward transmission calculation to carry out forward transmission training on the original video data, forward transmission calculation and reverse transmission calculation are repeatedly carried out to obtain optimal weights, and reasoning is carried out on the original video data based on the optimal weights to improve accuracy of video data separation.
In some embodiments, constructing a data multiplexing link in the auxiliary acceleration unit, and performing reverse transmission calculation on the original video data based on the data multiplexing link, includes:
Taking an image input channel in the auxiliary acceleration unit as an activation data input channel to input activation data to the PE array;
Taking a weight input channel in the auxiliary acceleration unit as a loss data input channel to input loss data to the PE array;
And performing multiply-add operation on the activation data and the loss data based on the PE array to obtain weight gradient data, and updating weights based on the weight gradient data to finish the reverse transmission calculation.
When the video data is trained secondarily through the FPGA or the ASIC, the original FPGA or ASIC reasoning structure is used for forward transmission calculation, and the backward transmission calculation comprises weight gradient calculation and weight updating, wherein the weight gradient calculation formula is as follows:
Where W is the weight, i is the index of the neural network layer, r is the row, c is the column, lose is the lost data, activate is the activated data, and stride is the step. Because the original FPGA or ASIC reasoning structure has no active data and lost data transmission channels, when the reverse transmission calculation is carried out, a data multiplexing link is required to be constructed to realize the reverse transmission calculation of video data, and the specific process is as follows:
The activation data is transmitted to the PE unit through the original data channel of the feature, the loss data is transmitted to the PE unit through the original weight input channel, after the multiplication and addition calculation is completed, the weight gradient data is obtained, the weight data stored in the DRAM (Dynamic Random Access Memory ) is read out, the weight gradient obtained by the calculation is subtracted, and then the weight gradient is stored in the DRAM, so that the whole weight updating calculation process is completed, wherein the weight gradient calculation schematic diagram is shown in fig. 13 based on the PE unit.
In some embodiments, the method further comprises:
And sending the secondary training result to the cloud end so that the cloud end performs primary training on the video data based on a portrait identification algorithm and the secondary training result.
The latest weight value obtained by the secondary training is sent to cloud equipment;
The cloud device performs preliminary training on the acquired video data by taking the latest weight value as the initial weight of the portrait identification algorithm so as to improve the accuracy of an initial training result, thereby further improving the accuracy of reasoning the original video data at the terminal device.
In some embodiments, as shown in fig. 14, the following steps are performed at a terminal as a receiving end:
S100, receiving spliced compressed data, storing the position information of a selected area in the spliced compressed data into a memory, decompressing the compressed data in the spliced compressed data, and storing the decompressed compressed data into the memory;
S200, reading the selected area position information from the memory to determine the coverage area of video data;
s300, reading decompressed data from the memory to cover corresponding positions in the coverage area of the video data.
Specifically, in step S100, the spliced compressed data is received, the selected area location information in the spliced compressed data is stored in a memory, and after the compressed data in the spliced compressed data is decompressed, the decompressed data is stored in the memory, where the selected area location information is the starting point location, width and height of the video data, and the memory is a memory carried by a terminal device, generally a DDR (Dynamic Random Access Memory ).
As shown in fig. 15, the decompressed data is stored in DDR.
In steps S200 and S300, as shown in fig. 16, the video data is read from the DDR and the video data of the previous frame is overlaid. Reading a starting point position, a width and a height of the video data from the memory to determine a coverage area of the video data; and then the video data is read from the memory to be overlaid on the previous video data.
The shapes of the portrait area and all areas can be set to be rectangular, round, trapezoid and other basic shapes, and can also be spliced by various shapes or human shapes, preferably rectangular, the shape of the portrait area is set to be rectangular, so that the linear address ordering in the DDR is facilitated, the reading speed of data in the DDR is improved, the logic control complexity in video data coverage is facilitated, and the communication delay is reduced.
As shown in fig. 17, a schematic diagram of transmitting data through a transmitting end and receiving data through a receiving end according to the present invention is shown.
By constructing the data multiplexing link in the auxiliary acceleration unit, the whole image is not required to be transmitted to the cloud for training, the cloud is utilized for initial training, and the initial training result is utilized for secondary training and reasoning tasks at the terminal; in the video communication application scene of the multi-terminal equipment, the AI computing unit of the terminal equipment is combined to perform portrait area identification, the time domain noise reduction algorithm is combined to calculate the minimum change area range of the background area in the video data, and the video data is effectively transmitted according to the minimum change area range, so that the data compression speed is improved, the data transmission quantity is reduced, and the data transmission pressure is reduced; the rectangular selection area is rectangular, so that the linear address ordering in the memory is facilitated, the reading speed of data in the memory is improved, the logic control complexity during video data coverage is facilitated, and the communication delay is reduced.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 18, an embodiment of the present invention further provides a transmission apparatus of terminal video data, including:
The reasoning module 110 is configured to receive the preliminary training result sent by the cloud and acquire original video data, and reasoning the original video data based on the preliminary training result and a portrait identification algorithm to determine a portrait area and a background area in the original video data;
a first determining module 120, the first determining module 120 being configured to determine position information of the portrait area and all areas;
a calculation module 130, the calculation module 130 configured to calculate a change in the background region based on a temporal noise reduction algorithm, and to determine a selection region based on the change;
A second determining module 140, the second determining module 140 configured to determine selected region position information based on the change and the position information of the portrait region or the full region;
The compression module 150 is configured to compress the video data of the selected area, splice the compressed data with the position information of the selected area, and send the spliced data to a receiving end.
According to another aspect of the present invention, as shown in fig. 19, according to the same inventive concept, an embodiment of the present invention further provides a computer device 30, in which the computer device 30 includes a processor 310 and a memory 320, the memory 320 storing a computer program 321 executable on the processor, and the processor 310 executing the steps of the method as above.
The memory is used as a non-volatile computer readable storage medium, and can be used for storing non-volatile software programs, non-volatile computer executable programs and modules, such as program instructions/modules corresponding to the transmission method of the terminal video data in the embodiment of the application. The processor executes various functional applications of the device and data processing by running nonvolatile software programs, instructions and modules stored in the memory, that is, implements the method for transmitting terminal video data according to the above method embodiment.
The memory may include a memory program area and a memory data area, wherein the memory program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the device, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the local module through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
According to another aspect of the present invention, as shown in fig. 20, there is also provided a computer-readable storage medium 40, the computer-readable storage medium 40 storing a computer program 410 which, when executed by a processor, performs the above method.
Finally, it should be noted that, as will be appreciated by those skilled in the art, all or part of the procedures in implementing the methods of the embodiments described above may be implemented by a computer program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, and the program may include the procedures of the embodiments of the methods described above when executed. The storage medium of the program may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (RAM), or the like. The computer program embodiments described above may achieve the same or similar effects as any of the method embodiments described above.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and many other variations of the different aspects of the embodiments of the invention as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims (10)

1. A transmission method of terminal video data, characterized in that the following steps are performed at a terminal as a transmitting end:
receiving a preliminary training result sent by a cloud and acquiring original video data, and reasoning the original video data based on the preliminary training result and a portrait identification algorithm to determine a portrait area and a background area in the original video data;
determining the position information of the portrait area and all areas;
calculating the change of the background area based on a time domain noise reduction algorithm, and determining a selection area based on the change;
Determining selected region position information based on the change and the position information of the portrait region or the whole region;
Compressing the video data of the selected area, splicing the compressed data with the position information of the selected area, and sending the spliced data to a receiving end.
2. The method of claim 1, wherein inferring the raw video data based on the preliminary training results and a portrait identification algorithm to determine a portrait area and a background area in the raw video data comprises:
On an auxiliary acceleration unit carried by the terminal, reasoning the original video data based on the primary training result and a portrait identification algorithm to determine a portrait area and a background area in the original video data, wherein the portrait area comprises a rectangle in shape; the auxiliary acceleration unit comprises an ASIC or FPGA.
3. The method as recited in claim 2, further comprising:
performing secondary training on the original video data based on the primary training result and the portrait identification algorithm;
And reasoning the original video data based on the secondary training result and the portrait identification algorithm to optimize the portrait area and the background area.
4. The method of claim 3, wherein secondarily training the raw video data based on the preliminary training results and the portrait recognition algorithm comprises:
On an auxiliary acceleration unit carried by the terminal, performing forward transmission calculation on the original video data based on a PE array in the auxiliary acceleration unit;
ending the secondary training if the forward transmission calculation result meets the threshold value;
If the forward transmission calculation result does not meet the threshold value, a data multiplexing link is built in the auxiliary acceleration unit, and reverse transmission calculation is carried out on the original video data based on the data multiplexing link;
ending the secondary training if the reverse transmission calculation result meets the threshold value;
And returning to the auxiliary acceleration unit carried by the terminal if the reverse transmission calculation result does not meet the threshold value, and carrying out forward transmission calculation on the original video containing the personnel based on the PE array in the auxiliary acceleration unit.
5. The method of claim 4, wherein constructing a data multiplexing link in the auxiliary acceleration unit and performing reverse transmission calculation on the original video data based on the data multiplexing link, comprises:
Taking an image input channel in the auxiliary acceleration unit as an activation data input channel to input activation data to the PE array;
Taking a weight input channel in the auxiliary acceleration unit as a loss data input channel to input loss data to the PE array;
And performing multiply-add operation on the activation data and the loss data based on the PE array to obtain weight gradient data, and updating weights based on the weight gradient data to finish the reverse transmission calculation.
6. A method according to claim 3, further comprising:
And sending the secondary training result to the cloud end so that the cloud end performs primary training on the video data based on a portrait identification algorithm and the secondary training result.
7. The method according to claim 1, characterized in that the following steps are performed at the terminal as receiving end:
Receiving spliced compressed data, storing the position information of a selected area in the spliced compressed data into a memory, decompressing the compressed data in the spliced compressed data, and storing the decompressed compressed data into the memory;
Reading the selected region location information from the memory to determine a coverage area of video data;
and reading decompressed data from the memory to cover corresponding positions in the coverage area of the video data.
8. A terminal video data transmission apparatus, comprising:
The reasoning module is configured to receive the preliminary training result sent by the cloud and acquire original video data, and the reasoning module is used for reasoning the original video data based on the preliminary training result and a figure recognition algorithm so as to determine a figure region and a background region in the original video data;
a first determination module configured to determine positional information of the portrait area and all areas;
A computing module configured to compute a change in the background region based on a temporal noise reduction algorithm and to determine a selection region based on the change;
A second determination module configured to determine selection area position information based on the change and position information of the portrait area or the entire area;
the compression module is configured to compress the video data of the selected area, splice the compressed data with the position information of the selected area and send the spliced data to a receiving end.
9. A computer device, comprising:
At least one processor; and
A memory storing a computer program executable on the processor, wherein the processor performs the steps of the method of any one of claims 1 to 7 when the program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor performs the steps of the method according to any one of claims 1 to 7.
CN202210476602.9A 2022-04-30 2022-04-30 Terminal video data transmission method, device, equipment and medium Active CN114979651B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210476602.9A CN114979651B (en) 2022-04-30 2022-04-30 Terminal video data transmission method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210476602.9A CN114979651B (en) 2022-04-30 2022-04-30 Terminal video data transmission method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN114979651A CN114979651A (en) 2022-08-30
CN114979651B true CN114979651B (en) 2024-06-07

Family

ID=82978757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210476602.9A Active CN114979651B (en) 2022-04-30 2022-04-30 Terminal video data transmission method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN114979651B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074465B (en) * 2023-03-10 2023-10-24 共道网络科技有限公司 Cross-network court trial system, method, equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103037206A (en) * 2012-12-14 2013-04-10 广东威创视讯科技股份有限公司 Method and system of video transmission
CN113554676A (en) * 2021-07-08 2021-10-26 Oppo广东移动通信有限公司 Image processing method, device, handheld terminal and computer readable storage medium
CN113660495A (en) * 2021-08-11 2021-11-16 易谷网络科技股份有限公司 Real-time video stream compression method and device, electronic equipment and storage medium
CN113808066A (en) * 2020-05-29 2021-12-17 Oppo广东移动通信有限公司 Image selection method and device, storage medium and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881927B (en) * 2019-05-02 2021-12-21 三星电子株式会社 Electronic device and image processing method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103037206A (en) * 2012-12-14 2013-04-10 广东威创视讯科技股份有限公司 Method and system of video transmission
CN113808066A (en) * 2020-05-29 2021-12-17 Oppo广东移动通信有限公司 Image selection method and device, storage medium and electronic equipment
CN113554676A (en) * 2021-07-08 2021-10-26 Oppo广东移动通信有限公司 Image processing method, device, handheld terminal and computer readable storage medium
CN113660495A (en) * 2021-08-11 2021-11-16 易谷网络科技股份有限公司 Real-time video stream compression method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
低码率下增强目标区域质量的视频会议系统;凌波;方健;顾伟康;叶秀清;杜歆;;浙江大学学报(工学版);20090415(04);全文 *

Also Published As

Publication number Publication date
CN114979651A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
US20210278836A1 (en) Method and device for dual-light image integration, and unmanned aerial vehicle
CN104574306A (en) Face beautifying method for real-time video and electronic equipment
CN114979651B (en) Terminal video data transmission method, device, equipment and medium
CN110620924B (en) Method and device for processing coded data, computer equipment and storage medium
CN111445392A (en) Image processing method and device, computer readable storage medium and electronic device
CN109587491A (en) A kind of intra-frame prediction method, device and storage medium
CN101594533B (en) Method suitable for compressing sequence images of unmanned aerial vehicle
CN115115540A (en) Unsupervised low-light image enhancement method and unsupervised low-light image enhancement device based on illumination information guidance
CN113660486A (en) Image coding, decoding, reconstructing and analyzing method, system and electronic equipment
CN115358929A (en) Compressed image super-resolution method, image compression method and system
CN109934307B (en) Disparity map prediction model training method, prediction method and device and electronic equipment
CN112232292A (en) Face detection method and device applied to mobile terminal
CN112183227B (en) Intelligent face region coding method and device
CN115471413A (en) Image processing method and device, computer readable storage medium and electronic device
CN112633198A (en) Picture processing method and device, storage medium and electronic device
CN112883947A (en) Image processing method, image processing device, computer equipment and storage medium
CN115115526A (en) Image processing method and apparatus, storage medium, and graphic calculation processor
CN111314697B (en) Code rate setting method, equipment and storage medium for optical character recognition
CN114140363B (en) Video deblurring method and device and video deblurring model training method and device
WO2024067176A1 (en) Parking space detection processing method and device, storage medium, and electronic device
CN111881912B (en) Data processing method and device and electronic equipment
CN111260038B (en) Implementation method and device of convolutional neural network, electronic equipment and storage medium
WO2022147745A1 (en) Encoding method, decoding method, encoding apparatus, decoding apparatus
CN115937990B (en) Multi-person interaction detection system and method
CN112422965B (en) Video code rate control method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant