CN114979651A - Terminal video data transmission method, device, equipment and medium - Google Patents

Terminal video data transmission method, device, equipment and medium Download PDF

Info

Publication number
CN114979651A
CN114979651A CN202210476602.9A CN202210476602A CN114979651A CN 114979651 A CN114979651 A CN 114979651A CN 202210476602 A CN202210476602 A CN 202210476602A CN 114979651 A CN114979651 A CN 114979651A
Authority
CN
China
Prior art keywords
video data
area
data
portrait
original video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210476602.9A
Other languages
Chinese (zh)
Inventor
蒋东东
董刚
赵雅倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210476602.9A priority Critical patent/CN114979651A/en
Publication of CN114979651A publication Critical patent/CN114979651A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a method, a device, equipment and a medium for transmitting terminal video data, wherein the method comprises the following steps: the following steps are executed at a terminal as a transmitting end: receiving a preliminary training result, acquiring original video data, and reasoning the original video data based on the preliminary training result and a portrait recognition algorithm to determine a portrait area and a background area in the original video data; determining the position information of the portrait area and all the areas; calculating the change of the background area based on a time domain noise reduction algorithm, and determining a selection area based on the change; determining selection area position information based on the change and position information of the portrait area or the whole area; and compressing the video data in the selected area, splicing the compressed data with the position information of the selected area, and then sending the spliced data to a receiving end. By the scheme of the invention, the video data transmission quantity between the terminals is effectively reduced, and the video data transmission pressure is reduced.

Description

Terminal video data transmission method, device, equipment and medium
Technical Field
The present invention relates to the field of deep learning technologies, and in particular, to a method, an apparatus, a device, and a medium for transmitting terminal video data.
Background
With the development of 5G communication and the influence brought by uncertain environmental factors, more and more people use a high-speed network to carry out real-time video communication, especially in multi-terminal communication scenes such as remote education, monitoring and teleconferencing.
In an application scenario of multi-terminal and multi-point transmission, each terminal needs to send a video to all other terminal nodes, and in order to reduce a transmission bandwidth, a method of compressing and sending a video stream (i.e., video data) is generally added, so as to compress a data amount of the video, so as to reduce a transmission bandwidth pressure caused by video stream transmission, as shown in fig. 1. However, as shown in fig. 2, each time a terminal, such as the terminal 3, is added, it needs to transmit its video stream to all other terminals and decode the video streams transmitted by all other terminals to the terminal 3, and all the terminals need to add extra resource data for compression and decompression. In the scenario of multi-terminal real-time communication, when an edge terminal is newly added, pressure is brought to data bandwidth transmission of all terminals in the whole system.
At present, for a terminal device, there have been a lot of video compression algorithms to assist the terminal to reduce data transmission bandwidth between terminals, for example, video data compression algorithms such as H264 and the like to reduce bandwidth occupied by video transmission, as shown in fig. 3. However, with the exponential increase of the number of network terminals and the increasing demand of working clouds, the multi-node video streaming is still a serious problem.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a device, and a medium for transmitting terminal video data, which utilize an artificial intelligence portrait identification algorithm to perform time domain noise reduction in an auxiliary manner, and select an effective portrait area to perform independent compression and propagation by isolating a background area that is basically unchanged, thereby effectively reducing data transmission amount and reducing data transmission pressure.
Based on the foregoing object, an aspect of the embodiments of the present invention provides a method for transmitting video data of a terminal, which specifically includes: the following steps are executed at a terminal as a transmitting end:
receiving the preliminary training result and acquiring original video data, and reasoning the original video data based on the preliminary training result and a portrait recognition algorithm to determine a portrait area and a background area in the original video data;
determining the position information of the portrait area and all the portrait areas;
calculating the change of the background area based on a time domain noise reduction algorithm, and determining a selection area based on the change;
determining selection area position information based on the change and position information of the portrait area or the full area;
and compressing the video data of the selected area, splicing the compressed data with the position information of the selected area, and then sending the spliced data to a receiving end.
In some embodiments, inferring the raw video data based on the preliminary training results and the portrait recognition algorithm to determine portrait and background regions in the raw video data comprises:
on an auxiliary acceleration unit carried by the terminal, reasoning the original video data based on the preliminary training result and a portrait recognition algorithm to determine a portrait area and a background area in the original video data, wherein the shape of the portrait area comprises a rectangle; the auxiliary acceleration unit comprises an ASIC or FPGA.
In some embodiments, the method further comprises:
performing secondary training on the original video data based on the primary training result and the portrait recognition algorithm;
and reasoning the original video data based on the secondary training result and the portrait recognition algorithm to optimize the portrait area and the background area.
In some embodiments, training the raw video data a second time based on the preliminary training results and the face recognition algorithm comprises:
on an auxiliary acceleration unit carried by the terminal, performing forward transmission calculation on the original video data based on a PE array in the auxiliary acceleration unit;
in response to the forward transmission calculation result meeting the threshold, ending the secondary training;
in response to the result of the forward transmission calculation not meeting the threshold, constructing a data multiplexing link in the auxiliary acceleration unit, and performing reverse transmission calculation on the original video data based on the data multiplexing link;
in response to the reverse transmission calculation result meeting the threshold, ending the secondary training;
and returning to an auxiliary accelerating unit carried by the terminal in response to the backward transmission calculation result not meeting the threshold value, and performing forward transmission calculation on the original video containing the personnel based on the PE array in the auxiliary accelerating unit.
In some embodiments, constructing a data multiplexing link in the auxiliary acceleration unit and performing a reverse transport calculation on the original video data based on the data multiplexing link comprises:
taking an image input channel in the auxiliary acceleration unit as an activation data input channel to input activation data to the PE array;
using a weight input channel in the auxiliary acceleration unit as a loss data input channel to input loss data to the PE array;
and completing the multiply-add operation of the activation data and the loss data based on the PE array to obtain weight gradient data, and updating the weight based on the weight gradient data to complete the reverse transmission calculation.
In some embodiments, the method further comprises:
and sending the secondary training result to the cloud end so that the cloud end carries out primary training on video data based on a portrait recognition algorithm and the secondary training result.
In some embodiments, the following steps are performed at the terminal as the receiving end:
receiving spliced compressed data, storing the position information of a selected area in the spliced compressed data into a memory, decompressing the compressed data in the spliced compressed data and storing the decompressed compressed data in the memory;
reading the selection area position information from the memory to determine the coverage of the video data;
reading the decompressed data from the memory to cover to the corresponding position in the coverage range of the video data.
In another aspect of the embodiments of the present invention, a device for transmitting terminal video data is further provided, including:
the system comprises an inference module, a background region and a video processing module, wherein the inference module is configured to receive a preliminary training result sent by a cloud end and acquire original video data, and reason the original video data based on the preliminary training result and a portrait recognition algorithm to determine a portrait region and a background region in the original video data;
a first determination module configured to determine location information of the portrait area and the entire area;
a calculation module configured to calculate a variation of the background region based on a temporal noise reduction algorithm and determine a selection region based on the variation;
a second determination module configured to determine selection region position information based on the change and position information of the portrait region or the full region;
and the compression module is configured to compress the video data of the selected area, and send the compressed data to a receiving end after being spliced with the position information of the selected area.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing a computer program executable on the processor, the computer program when executed by the processor implementing the steps of the method as above.
In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.
The invention has at least the following beneficial technical effects: the video data are initially trained on the cloud based on the portrait recognition algorithm, and the initial training result is sent to the terminal, so that all the video data are not required to be uploaded to the cloud, the security of the video data of the terminal is ensured, and meanwhile, the communication delay and the pressure on network bandwidth are reduced; reasoning original video data at a sending end so as to separate a portrait area from a background area in the original video data; and the change of the background area is calculated based on a time domain noise reduction algorithm, and the compressed portrait area or the whole area is selected to be sent to the receiving end according to the change condition of the background area, so that the data transmission quantity is effectively reduced, and the data transmission pressure is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a schematic diagram of a conventional two-terminal device for video data transmission;
fig. 2 is a schematic diagram of video data transmission performed by three conventional terminal devices;
fig. 3 is a schematic diagram of video data transmission performed after two terminal devices use a compression algorithm in a conventional manner;
fig. 4 is a block diagram of an embodiment of video data transmission performed at a sending end according to the present invention;
fig. 5 is a block diagram of an embodiment of preliminary training of video data at the cloud end according to the present invention;
FIG. 6 is a schematic diagram of reasoning on original video data based on a portrait recognition algorithm according to the present invention;
FIG. 7 is a diagram illustrating an embodiment of determining location information of a portrait area and a full area according to the present invention;
FIG. 8 is a diagram illustrating an embodiment of calculating a background region variation based on a temporal noise reduction algorithm according to the present invention;
FIG. 9 is a diagram illustrating a background region variation calculation based on a temporal noise reduction algorithm according to another embodiment of the present invention;
fig. 10 is a schematic diagram illustrating an embodiment of a data format of spliced video data according to the present invention;
FIG. 11 is a schematic structural diagram of an embodiment of an FPGA provided in the present invention;
FIG. 12 is a schematic diagram of the structure of each PE array in the FPGA structure shown in FIG. 9;
FIG. 13 is a diagram illustrating an embodiment of performing weight gradient calculations based on PE units according to the present invention;
fig. 14 is a schematic diagram of an embodiment of video data transmission at a receiving end according to the present invention;
FIG. 15 is a diagram illustrating an embodiment of storing decompressed data to a DDR provided by the invention;
FIG. 16 is a diagram illustrating an embodiment of reading video data from a DDR and overwriting a previous frame of video data according to the invention;
fig. 17 is a schematic diagram of an embodiment of sending data through a sending end and receiving data through a receiving end according to the present invention;
fig. 18 is a schematic diagram of an embodiment of a terminal video data transmission apparatus provided in the present invention;
FIG. 19 is a schematic diagram illustrating an embodiment of a computer apparatus;
fig. 20 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In view of the above object, a first aspect of the embodiments of the present invention provides an embodiment of a method for transmitting video data of a terminal.
As shown in fig. 4, in order to perform data transmission at a transmitting end, the method specifically includes the following steps:
s20, receiving a preliminary training result sent by the cloud and acquiring original video data, and reasoning the original video data based on the preliminary training result and a portrait recognition algorithm to determine a portrait area and a background area in the original video data;
s30, determining the position information of the portrait area and all the areas;
s40, calculating the change of the background area based on a time domain noise reduction algorithm, and determining a selection area based on the change;
s50, determining selection area position information based on the change and the position information of the portrait area or the whole area;
and S60, compressing the video data of the selected area, splicing the compressed data with the position information of the selected area, and sending the spliced data to a receiving end.
Specifically, in step S20, a preliminary training result sent by the cloud is received.
As shown in fig. 5, the process of performing preliminary training on video data at the cloud specifically includes the following steps:
s201, performing preliminary training on the video data based on a portrait recognition algorithm, and sending a preliminary training result to the terminal.
Specifically, the video transmission method of the conventional terminal device is generally completed by the cloud device, but if all training tasks of video data are completed by the cloud device, all video data need to be uploaded to the cloud for training, so that data leakage may be caused, and a large amount of video data need to be transmitted to the cloud for processing, and then return to each terminal after the cloud processing is completed, so that the communication delay of video transmission is greatly increased, and great pressure is brought to network bandwidth. According to the invention, the initial training of the video data is completed only on the cloud side based on the portrait recognition algorithm, and all the video data are not required to be uploaded to the cloud side, so that the communication time delay and the pressure on the network bandwidth are reduced while the security of the terminal video data is ensured. The specific process is as follows:
the method comprises the steps of selecting some video data similar to video data to be transmitted on a terminal from a public database in an application scene of cloud equipment based on the video data to carry out preliminary training, obtaining weight meeting expectations, and sending the weight to the terminal, wherein the terminal comprises a sending end and a receiving end corresponding to the sending end, and a portrait recognition algorithm comprises fast R-CNN (a region-based convolutional neural network algorithm), YOLO (You Only Look on, a network for target detection) and the like. Compared with the training of all video data of the terminal, the training accuracy is slightly low, but the calculation amount required by the training is small, and the training speed is accelerated.
A schematic diagram of a portrait recognition algorithm for reasoning original video data according to a weight value obtained from a preliminary training result is shown in fig. 6.
The terminal device of this embodiment is a terminal device carrying a GPU or an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array), and the original video data is inferred on the GPU or the ASIC or the FPGA based on a portrait recognition algorithm to obtain a portrait area and a background area of the original video data, where the combination of the portrait area and the background area is all areas.
In step S30, the whole area is composed of the portrait area and the background area, the relative position relationship between the portrait area and the background area is determined, and the whole area and the portrait area are subjected to parameter labeling, for example, as shown in fig. 7, the starting point a, the width B, and the height C of the whole area are determined with the upper left corner position of the whole area as the origin, so that the starting point O, the width W, and the height H of the portrait area can be further determined.
In step S40, the change of the background region is calculated based on the time domain noise reduction algorithm, and when the change is smaller than a threshold, the background region is considered to be unchanged, and the portrait region is determined as a selection region, and when the change is larger than the threshold, all regions including the portrait region and the background region are determined as the selection region. Fig. 8 and 9 are schematic diagrams illustrating the calculation of the background region variation based on the temporal noise reduction algorithm.
Calculating the change of the current frame video data of the background area and the previous frame video data of the background area through a time domain noise filter; in response to the change being less than the threshold, determining the portrait area as a selection area; in response to the change being greater than the threshold, an entire area including the portrait area and the background area is determined as the selection area.
By the scheme, when the background area changes little, only the video data of the portrait area is compressed and sent to the receiving end, so that the data compression speed is increased, the data transmission quantity is reduced, and the data transmission pressure is reduced.
In step S50, determining the selection area position information based on the change and the position information of the portrait area or the entire area may include: in response to the change being less than a threshold, taking the position information of the portrait area as the position information of the selection area; in response to the change being greater than a threshold, taking the location information of the background area as the location information of the selection area.
The specific way of determining the position information of the selection area is as follows:
calculating the starting point position: is the starting point coordinate (time domain noise > threshold)? 0, selecting the range width of the upper left corner of the small area: width? The whole frame width is equal to the small region width selection range height: height? The whole frame height is the small region height.
In step S60, the video data in the selected area is compressed by an image compression algorithm, and the compressed video data is spliced with the position information of the portrait area, and the spliced data format is as shown in fig. 10, and the spliced video data is sent to the receiving end. By the video transmission mode, the data transmission quantity can be effectively reduced, and the data transmission pressure is reduced. It should be noted that each terminal device receives video data of other terminal devices and transmits video data to other terminal devices at the same time, so that each terminal device is both a transmitting end and a receiving end.
According to the embodiment of the invention, the video data is preliminarily trained on the basis of the portrait recognition algorithm at the cloud end, and the preliminary training result is sent to the terminal, so that the safety of the video data of the terminal is ensured; reasoning original video data at a sending end so as to separate a portrait area from a background area in the original video data; and the change of the background area is calculated based on a time domain noise reduction algorithm, and the compressed portrait area or the whole area is selected to be sent to a receiving end according to the change condition of the background area, so that the data transmission quantity is effectively reduced, and the data transmission pressure is reduced.
In some embodiments, inferring the raw video data based on the preliminary training results and the portrait recognition algorithm to determine portrait and background regions in the raw video data comprises:
on an auxiliary acceleration unit carried by the terminal, reasoning the original video data based on the preliminary training result and a portrait recognition algorithm to determine a portrait area and a background area in the original video data, wherein the shape of the portrait area comprises a rectangle; the auxiliary acceleration unit comprises an ASIC or FPGA.
Because the GPU has high power consumption, if the original video data is reasoned and trained on the basis of the terminal equipment carrying the GPU, the power consumption of the terminal equipment is increased, and therefore, in order to save the power consumption of the terminal equipment, the original video data is reasoned and trained on the basis of the ASIC or FPGA with low power consumption.
Since the terminal device generally carries an auxiliary acceleration unit such as an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), and the like, the ASIC or the FPGA performs inference on the original video data to obtain a portrait area and a background area of the original video data, and a combination of the portrait area and the background area is all areas. The inference process of the original video data is explained by taking the FPGA as an example.
Fig. 11 is a schematic diagram of the FPGA.
The inference process of the original video data will be described with reference to fig. 6 and 11. The FPGA-based reasoning for the original video data is mainly realized by a PE array in an AI accelerating unit. When performing convolutional layer calculation and full link layer calculation on the original video data, each PE array obtains corresponding data through a corresponding channel, for example, an image input channel (input feature index) and a weight input channel (weight buffer) to perform multiply-add calculation so as to output the calculated original video data from an image output channel (output feature buffer).
Fig. 12 is a schematic structural diagram of each PE array in the FPGA structure shown in fig. 11. Based on the PE structure, the multiplication and addition operation of the weight and the video data in the convolution layer operation process or the full connection layer operation process is realized.
By the embodiment of the invention, the inference on the original video data is realized, the original video data is separated into the image area and the background area, and the consumed power consumption is low.
In some embodiments, the method further comprises:
performing secondary training on the original video data based on the primary training result and the portrait recognition algorithm;
and reasoning the original video data based on the secondary training result and the portrait recognition algorithm to optimize the portrait area and the background area.
In some embodiments, training the raw video data a second time based on the preliminary training results and the face recognition algorithm comprises:
on an auxiliary acceleration unit carried by the terminal, performing forward transmission calculation on the original video data based on a PE array in the auxiliary acceleration unit;
in response to the forward transmission calculation result meeting the threshold, ending the secondary training;
in response to the result of the forward transmission calculation not meeting the threshold, constructing a data multiplexing link in the auxiliary acceleration unit, and performing reverse transmission calculation on the original video data based on the data multiplexing link;
in response to the reverse transmission calculation result meeting the threshold, finishing the secondary training;
and returning to an auxiliary accelerating unit carried by the terminal in response to the backward transmission calculation result not meeting the threshold value, and performing forward transmission calculation on the original video containing the personnel based on the PE array in the auxiliary accelerating unit.
In this embodiment, the original video data is subjected to secondary training based on a face recognition algorithm, where the secondary training includes forward transmission training and reverse transmission training. And the process of reasoning the original video data based on the portrait recognition algorithm is a forward transmission training. Therefore, in the process of reasoning the original video data based on the PE array, the original video data can be trained for the second time, the separated video data is obtained through forward transmission calculation, the updated weight is obtained through reverse transmission training, the updated weight replaces the initial weight in the forward transmission calculation to carry out forward propagation training on the original video data, forward transmission calculation and reverse transmission calculation are repeatedly carried out to obtain the optimal weight, and the reasoning is carried out on the original video data based on the optimal weight to improve the accuracy of video data separation.
In some embodiments, constructing a data multiplexing link in the auxiliary acceleration unit and performing a reverse transport calculation on the original video data based on the data multiplexing link comprises:
taking an image input channel in the auxiliary acceleration unit as an activation data input channel to input activation data to the PE array;
using a weight input channel in the auxiliary acceleration unit as a loss data input channel to input loss data to the PE array;
and completing the multiply-add operation of the activation data and the loss data based on the PE array to obtain weight gradient data, and updating the weight based on the weight gradient data to complete the reverse transmission calculation.
When the video data is trained secondarily through the FPGA or the ASIC, the forward transmission calculation only needs to use the original FPGA or ASIC reasoning structure, and the backward transmission calculation comprises calculation of weight gradient and update of weight, wherein the calculation formula of the weight gradient is as follows:
Figure BDA0003625859740000121
wherein W is weight, i is index of neural network layer, r is row, c is column, lose is loss data, Activate is activation data, stride is step. Because the original FPGA or ASIC reasoning structure has no active data and loss data transmission channel, when carrying out reverse transmission calculation, a data multiplexing link needs to be constructed to realize the reverse transmission calculation of the video data, and the specific process is as follows:
the activation data is transmitted to the PE unit through the original data channel for inputting feature, the loss data is transmitted to the PE unit through the original weight input channel, and after the multiplication and addition calculation is completed and the weight gradient data is obtained, the weight data stored in the DRAM (Dynamic Random Access Memory) is read out, the calculated weight gradient is subtracted, and then stored in the DRAM, and the calculation process of updating the whole weight is completed, wherein a schematic diagram of performing weight gradient calculation based on the PE unit is shown in fig. 13.
In some embodiments, the method further comprises:
and sending the secondary training result to the cloud end so that the cloud end carries out primary training on video data based on a portrait recognition algorithm and the secondary training result.
Sending the latest weighted value obtained by the secondary training to the cloud equipment;
the cloud device takes the latest weight value as the initial weight of the portrait recognition algorithm, and conducts preliminary training on the obtained video data to improve the accuracy of the initial training result, so that the accuracy of reasoning on the original video data at the terminal device is further improved.
In some embodiments, as shown in fig. 14, the following steps are performed at the terminal as the receiving end:
s100, receiving spliced compressed data, storing position information of a selected area in the spliced compressed data into a memory, decompressing the compressed data in the spliced compressed data and storing the decompressed compressed data in the memory;
s200, reading the position information of the selected area from the memory to determine the coverage range of the video data;
and S300, reading the decompressed data from the memory to cover the corresponding position in the coverage range of the video data.
Specifically, in step S100, the spliced compressed data is received, the position information of the selected area in the spliced compressed data is stored in the Memory, and the compressed data in the spliced compressed data is decompressed, that is, the decompressed data is stored in the Memory, where the position information of the selected area is the starting position, the width, and the height of the video data, and the Memory is a Memory carried by the terminal device, and is typically a Dynamic Random Access Memory (DDR).
As shown in fig. 15, a schematic diagram of storing decompressed data to DDR is shown.
In steps S200 and S300, as shown in fig. 16, a schematic diagram of reading video data from the DDR and overwriting the previous frame of video data is shown. Reading the starting position, the width and the height of the video data from the memory to determine the coverage range of the video data; and reading the video data from the memory to be overlaid on the previous video data.
The shape of the portrait area and the shape of the whole area can be set to be a basic shape such as a rectangle, a circle, a trapezoid and the like, or can be a shape or a human shape formed by splicing a plurality of shapes, preferably a rectangle, and the shape of the portrait area is set to be a rectangle so as to facilitate linear address sequencing in the DDR, improve the reading speed of data in the DDR, facilitate the logic control complexity when video data is covered, and reduce the communication delay.
Fig. 17 is a schematic diagram of the present invention, in which data is transmitted by a transmitting end and received by a receiving end.
By constructing the data multiplexing link in the auxiliary acceleration unit, the whole image does not need to be transmitted to the cloud for training, the cloud is used for carrying out initial training, and the initial training result is used for carrying out secondary training and reasoning tasks at the terminal; under the application scene of video communication of the multi-terminal equipment, the AI computing unit of the terminal equipment is combined to identify the portrait area, the minimum change area range of the background area in the video data is computed by combining the time domain noise reduction algorithm, and the effective transmission video data is determined according to the minimum change area range, so that the data compression speed is improved, the data transmission quantity is reduced, and the data transmission pressure is reduced; the rectangular selection area is convenient for linear address sequencing in the memory, improves the reading speed of data in the memory, facilitates the logic control complexity during video data coverage and reduces the communication delay.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 18, an embodiment of the present invention further provides a terminal video data transmission apparatus, including:
the inference module 110, the inference module 110 is configured to receive a preliminary training result sent by a cloud and obtain original video data, and perform inference on the original video data based on the preliminary training result and a portrait recognition algorithm to determine a portrait area and a background area in the original video data;
a first determining module 120, the first determining module 120 configured to determine position information of the portrait area and the entire area;
a calculation module 130, the calculation module 130 configured to calculate a variation of the background region based on a temporal noise reduction algorithm and determine a selection region based on the variation;
a second determination module 140, the second determination module 140 configured to determine selection area location information based on the change and location information of the portrait area or the full area;
a compression module 150, where the compression module 150 is configured to compress the video data in the selected area, and splice the compressed data with the position information of the selected area and send the result to a receiving end.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 19, the embodiment of the present invention further provides a computer device 30, in which the computer device 30 comprises a processor 310 and a memory 320, the memory 320 stores a computer program 321 that can run on the processor, and the processor 310 executes the program to perform the steps of the above method.
The memory, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the transmission method of terminal video data in the embodiments of the present application. The processor executes various functional applications and data processing of the device by running the nonvolatile software programs, instructions and modules stored in the memory, namely, the transmission method of the terminal video data of the above method embodiment is realized.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 20, an embodiment of the present invention further provides a computer-readable storage medium 40, where the computer-readable storage medium 40 stores a computer program 410, which when executed by a processor, performs the above method.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The foregoing are exemplary embodiments of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A method for transmitting video data of a terminal, wherein the following steps are executed at the terminal as a transmitting end:
receiving a preliminary training result sent by a cloud end and acquiring original video data, and reasoning the original video data based on the preliminary training result and a portrait recognition algorithm to determine a portrait area and a background area in the original video data;
determining the position information of the portrait area and all the portrait areas;
calculating the change of the background area based on a time domain noise reduction algorithm, and determining a selection area based on the change;
determining selection area position information based on the change and position information of the portrait area or the whole area;
and compressing the video data of the selected area, splicing the compressed data with the position information of the selected area, and then sending the spliced data to a receiving end.
2. The method of claim 1, wherein inferring the raw video data based on the preliminary training results and a portrait recognition algorithm to determine portrait and background regions in raw video data comprises:
on an auxiliary acceleration unit carried by the terminal, reasoning the original video data based on the preliminary training result and a portrait recognition algorithm to determine a portrait area and a background area in the original video data, wherein the shape of the portrait area comprises a rectangle; the auxiliary acceleration unit comprises an ASIC or FPGA.
3. The method of claim 2, further comprising:
performing secondary training on the original video data based on the primary training result and the portrait recognition algorithm;
and reasoning the original video data based on the secondary training result and the portrait recognition algorithm to optimize the portrait area and the background area.
4. The method of claim 3, wherein training the raw video data a second time based on the preliminary training results and the face recognition algorithm comprises:
on an auxiliary acceleration unit carried by the terminal, performing forward transmission calculation on the original video data based on a PE array in the auxiliary acceleration unit;
in response to the forward transmission calculation result meeting the threshold, ending the secondary training;
in response to the result of the forward transmission calculation not meeting the threshold, constructing a data multiplexing link in the auxiliary acceleration unit, and performing reverse transmission calculation on the original video data based on the data multiplexing link;
in response to the reverse transmission calculation result meeting the threshold, ending the secondary training;
and returning to an auxiliary accelerating unit carried by the terminal in response to the backward transmission calculation result not meeting the threshold value, and performing forward transmission calculation on the original video containing the personnel based on the PE array in the auxiliary accelerating unit.
5. The method of claim 4, wherein constructing a data multiplexing link in the auxiliary acceleration unit and performing a reverse transport calculation on the original video data based on the data multiplexing link comprises:
taking an image input channel in the auxiliary acceleration unit as an activation data input channel to input activation data to the PE array;
using a weight input channel in the auxiliary acceleration unit as a loss data input channel to input loss data to the PE array;
and completing the multiply-add operation of the activation data and the loss data based on the PE array to obtain weight gradient data, and updating the weight based on the weight gradient data to complete the reverse transmission calculation.
6. The method of claim 3, further comprising:
and sending the secondary training result to the cloud end so that the cloud end carries out primary training on video data based on a portrait recognition algorithm and the secondary training result.
7. The method according to claim 1, characterized in that the following steps are performed at the terminal acting as the receiving end:
receiving spliced compressed data, storing the position information of a selected area in the spliced compressed data into a memory, decompressing the compressed data in the spliced compressed data and storing the decompressed compressed data in the memory;
reading the selection area position information from the memory to determine the coverage of the video data;
reading the decompressed data from the memory to cover to the corresponding position in the coverage range of the video data.
8. A device for transmitting video data of a terminal, comprising:
the system comprises an inference module, a background region and a user interface, wherein the inference module is configured to receive a preliminary training result sent by a cloud end, acquire original video data and perform inference on the original video data based on the preliminary training result and a human image recognition algorithm to determine a human image region and a background region in the original video data;
a first determination module configured to determine location information of the portrait area and the entire area;
a calculation module configured to calculate a variation of the background region based on a temporal noise reduction algorithm and determine a selection region based on the variation;
a second determination module configured to determine selection region position information based on the change and position information of the portrait region or the full region;
and the compression module is configured to compress the video data of the selected area, splice the compressed data with the position information of the selected area and send the spliced data to a receiving end.
9. A computer device, comprising:
at least one processor; and
memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202210476602.9A 2022-04-30 2022-04-30 Terminal video data transmission method, device, equipment and medium Pending CN114979651A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210476602.9A CN114979651A (en) 2022-04-30 2022-04-30 Terminal video data transmission method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210476602.9A CN114979651A (en) 2022-04-30 2022-04-30 Terminal video data transmission method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN114979651A true CN114979651A (en) 2022-08-30

Family

ID=82978757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210476602.9A Pending CN114979651A (en) 2022-04-30 2022-04-30 Terminal video data transmission method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN114979651A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074465A (en) * 2023-03-10 2023-05-05 共道网络科技有限公司 Cross-network court trial system, method, equipment and computer readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074465A (en) * 2023-03-10 2023-05-05 共道网络科技有限公司 Cross-network court trial system, method, equipment and computer readable storage medium
CN116074465B (en) * 2023-03-10 2023-10-24 共道网络科技有限公司 Cross-network court trial system, method, equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN112543347B (en) Video super-resolution method, device, system and medium based on machine vision coding and decoding
CN111901604B (en) Video compression method, video reconstruction method, corresponding devices, camera and video processing equipment
WO2023174098A1 (en) Real-time gesture detection method and apparatus
JP2023547941A (en) Neural network-based bitstream decoding and encoding
CN114979651A (en) Terminal video data transmission method, device, equipment and medium
CN114418121A (en) Model training method, object processing method and device, electronic device and medium
CN116342884A (en) Image segmentation and model training method and server
CN111652181A (en) Target tracking method and device and electronic equipment
CN113628116B (en) Training method and device for image processing network, computer equipment and storage medium
CN113344794B (en) Image processing method and device, computer equipment and storage medium
CN115442609A (en) Characteristic data encoding and decoding method and device
WO2023098688A1 (en) Image encoding and decoding method and device
CN116156202A (en) Method, system, terminal and medium for realizing video error concealment
CN116016934A (en) Video encoding method, apparatus, electronic device, and computer-readable medium
CN115471417A (en) Image noise reduction processing method, apparatus, device, storage medium, and program product
CN115330633A (en) Image tone mapping method and device, electronic equipment and storage medium
CN113327254A (en) Image segmentation method and system based on U-type network
CN116918329A (en) Video frame compression and video frame decompression method and device
CN112633198A (en) Picture processing method and device, storage medium and electronic device
CN114140363B (en) Video deblurring method and device and video deblurring model training method and device
CN111260038B (en) Implementation method and device of convolutional neural network, electronic equipment and storage medium
CN110796202A (en) Network integration training method and device, electronic equipment and storage medium
CN111314697B (en) Code rate setting method, equipment and storage medium for optical character recognition
CN116668702B (en) Video coding method, device, terminal equipment and storage medium
WO2024067176A1 (en) Parking space detection processing method and device, storage medium, and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination