CN114363631B - Deep learning-based audio and video processing method and device - Google Patents

Deep learning-based audio and video processing method and device Download PDF

Info

Publication number
CN114363631B
CN114363631B CN202111495106.XA CN202111495106A CN114363631B CN 114363631 B CN114363631 B CN 114363631B CN 202111495106 A CN202111495106 A CN 202111495106A CN 114363631 B CN114363631 B CN 114363631B
Authority
CN
China
Prior art keywords
data
predicted
audio
accuracy
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111495106.XA
Other languages
Chinese (zh)
Other versions
CN114363631A (en
Inventor
余丹
兰雨晴
黄永琢
王丹星
唐霆岳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Standard Intelligent Security Technology Co Ltd
Original Assignee
China Standard Intelligent Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Standard Intelligent Security Technology Co Ltd filed Critical China Standard Intelligent Security Technology Co Ltd
Priority to CN202111495106.XA priority Critical patent/CN114363631B/en
Publication of CN114363631A publication Critical patent/CN114363631A/en
Application granted granted Critical
Publication of CN114363631B publication Critical patent/CN114363631B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The application provides an audio and video processing method and device based on deep learning, and relates to the technical field of data processing. The method comprises the steps of predicting compressed audio and video streams through deep learning and a neural network to obtain predicted data of each frame; respectively obtaining the accuracy of the predicted relevant data and the accuracy of the predicted non-relevant data of each frame according to the comparison between the predicted data of each frame and the original data of the audio and video stream; then judging the level of the current deep learning and neural network prediction according to the accuracy of the relevant data and the accuracy of the non-relevant data; and transmitting the predicted level to a terminal of a worker in a binary form, and displaying the predicted level in a lighting bar form at the terminal. It can be seen that, in the embodiment of the application, the traditional function prediction scheme is replaced by the deep learning and the prediction of the neural network, the compressed audio and video frames are predicted, and the prediction efficiency can be improved.

Description

Deep learning-based audio and video processing method and device
Technical Field
The application relates to the technical field of data processing, in particular to an audio and video processing method and device based on deep learning.
Background
The aim of audio-video compression is to reduce the audio-video data rate on the premise of ensuring the hearing and visual effects as much as possible, and the audio-video compression ratio generally refers to the ratio of the data volume after compression to the data volume before compression. In the related technology, the compression of the audio and video mainly only reserves the motion vectors of an I frame and other frames, a P frame and a B frame are predicted from the I frame, the prediction method is relatively fixed, a lot of information needs to be stored, and the calculation resources are consumed. Although the code stream can be compressed to be very small by the encoding mode, the uncompressed complete code stream is difficult to predict and restore from the compressed code stream, so that the complete code stream can only be retransmitted when the complete code stream is needed. Therefore, there is a need to solve this technical problem.
Disclosure of Invention
In view of the above problems, the present application is proposed to provide a deep learning-based audio/video processing method and apparatus that overcome or at least partially solve the above problems, and the prediction efficiency can be improved by deep learning and prediction of a neural network instead of the conventional function prediction scheme. The technical scheme is as follows:
in a first aspect, an audio and video processing method based on deep learning is provided, which includes the following steps:
predicting the compressed audio and video stream through a deep learning and neural network to obtain predicted data of each frame;
respectively obtaining the accuracy of the predicted relevant data and the accuracy of the predicted non-relevant data of each frame according to the comparison of the predicted data of each frame with the original data of the audio and video stream;
judging the level of current deep learning and neural network prediction according to the accuracy of the relevant data and the accuracy of the non-relevant data;
and transmitting the predicted level to a terminal of a worker in a binary form, and displaying the predicted level in a lighting bar form at the terminal.
In a possible implementation manner, the bar lattice is a vertical bar lattice with a plurality of rows and a column on the terminal, each row of the vertical bar lattice is an independent bar lattice, and the independent bar lattice in each row can be independently controlled to be turned on and off.
In a possible implementation manner, the predicted accuracy of the relevant data and the predicted accuracy of the non-relevant data of each frame are obtained by comparing the predicted data of each frame with the original data of the audio/video stream according to the following formulas:
Figure BDA0003400471790000021
Figure BDA0003400471790000022
wherein l (i) represents the accuracy of the relevant data for the ith frame predicted by the neural network through deep learning; f (i) represents the accuracy of the uncorrelated data of the ith frame predicted by the deep learning and neural network; wherein if
Figure BDA0003400471790000023
Then L (i) ═ 1, if
Figure BDA0003400471790000024
Then f (i) ═ 1; d i (a) Representing a binary number on the a bit in the binary form data of the ith frame predicted by the deep learning and neural network; d i,0 (a) Representing the binary number on the a bit in the ith frame binary form data of the original data of the audio and video stream; g i (a) Representing a characteristic detection function, and if the binary number on the a bit in the ith frame binary form data of the original data of the audio and video stream is a characteristic number, reflecting the characteristic value of the audio and video stream as a function value G i (a) If it is 1, the function value G is reversed i (a)=0;m i The bit number of binary number in the ith frame binary form data of the original data of the audio and video stream is represented; | | represents the absolute value; [] 10 Indicating that the values in parentheses are converted to decimal.
In one possible implementation, the binary data transmitted to the staff terminal is obtained according to the accuracy of the relevant data and the accuracy of the non-relevant data by using the following formula:
Figure BDA0003400471790000031
wherein (C) 2 Data in binary form representing transmission to a staff terminal; n represents the total frame number of the audio and video stream; Λ represents a logical and; () 2 The numbers in parentheses represent data in binary form.
In one possible implementation, the following formula is used to control the individual bars on the vertical bar to light up according to the binary data received by the terminal:
Figure BDA0003400471790000032
wherein k represents the number of independent lattice control lightening on the vertical lattice bar; k represents the total number of independent bars on the vertical bars.
In a second aspect, an audio and video processing device based on deep learning is provided, including:
the prediction module is used for predicting the compressed audio and video stream through deep learning and a neural network to obtain predicted data of each frame;
the comparison module is used for respectively obtaining the accuracy of the predicted related data and the accuracy of the predicted non-related data of each frame according to the comparison of the predicted data of each frame and the original data of the audio and video stream;
the judging module is used for judging the level of the current deep learning and neural network prediction according to the accuracy of the relevant data and the accuracy of the non-relevant data;
and the transmission module is used for transmitting the predicted level to a terminal of a worker in a binary mode and displaying the predicted level in a lighting bar form at the terminal.
In a possible implementation manner, the bar lattice is a vertical bar lattice with a plurality of rows and a column on the terminal, each row of the vertical bar lattice is an independent bar lattice, and the independent bar lattice in each row can be independently controlled to be turned on and off.
In one possible implementation, the comparing module is further configured to:
and respectively obtaining the accuracy of the predicted relevant data and the accuracy of the predicted non-relevant data of each frame by comparing the predicted data of each frame with the original data of the audio and video stream by using the following formula:
Figure BDA0003400471790000041
Figure BDA0003400471790000042
wherein l (i) represents the accuracy of the relevant data for the ith frame predicted by the neural network through deep learning; f (i) represents the accuracy of the uncorrelated data of the ith frame predicted by the deep learning and neural network; wherein if
Figure BDA0003400471790000043
Then L (i) ═ 1, if
Figure BDA0003400471790000044
Then f (i) ═ 1; d i (a) Representing a binary number on the a bit in the binary form data of the ith frame predicted by the deep learning and neural network; d i,0 (a) Representing the binary number on the a bit in the ith frame binary form data of the original data of the audio and video stream; g i (a) Representing a characteristic detection function, and if the binary number on the a bit in the ith frame binary form data of the original data of the audio and video stream is a characteristic number, reflecting the characteristic value of the audio and video stream as a function value G i (a) If it is 1, the function value G is reversed i (a)=0;m i The bit number of binary number in the ith frame binary form data of the original data of the audio and video stream is represented; | | represents the absolute value; [] 10 Indicating that the values in parentheses are converted to decimal.
In one possible implementation, the transmission module is further configured to:
obtaining binary data transmitted to the staff terminal according to the accuracy of the related data and the accuracy of the non-related data by using the following formula:
Figure BDA0003400471790000045
wherein (C) 2 Data in binary form representing transmission to a staff terminal; n represents the total frame number of the audio and video stream(ii) a Λ represents a logical and; () 2 The numbers in parentheses represent data in binary form.
In one possible implementation, the apparatus further includes:
the control module is used for controlling the independent bars on the vertical bar to be lightened according to the binary data received by the terminal by using the following formula:
Figure BDA0003400471790000051
wherein k represents the number of independent lattice control lightening on the vertical lattice bar; k represents the total number of independent bars on the vertical bars.
By means of the technical scheme, the method and the device for processing the audio and video based on the deep learning provided by the embodiment of the application firstly predict the compressed audio and video stream through the deep learning and the neural network to obtain each predicted frame data; respectively obtaining the accuracy of the predicted relevant data and the accuracy of the predicted non-relevant data of each frame according to the comparison between the predicted data of each frame and the original data of the audio and video stream; then judging the level of the current deep learning and neural network prediction according to the accuracy of the relevant data and the accuracy of the non-relevant data; and transmitting the predicted level to the terminal of the staff in a binary form, and displaying the predicted level in a form of lighting bar at the terminal. It can be seen that, in the embodiment of the application, the traditional function prediction scheme is replaced by the deep learning and the prediction of the neural network, the compressed audio and video frames are predicted, and the prediction efficiency can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 shows a flowchart of an audio and video processing method based on deep learning according to an embodiment of the present application;
fig. 2 shows a structure diagram of an audio-video processing device based on deep learning according to an embodiment of the present application;
fig. 3 shows a structure diagram of an audio-video processing device based on deep learning according to another embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that such uses are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the term "include" and its variants are to be read as open-ended terms meaning "including, but not limited to".
The embodiment of the application provides an audio and video processing method based on deep learning, which can be applied to electronic equipment such as a mobile terminal, a personal computer and a tablet computer. As shown in fig. 1, the deep learning based audio and video processing method may include the following steps S101 to S104:
step S101, predicting the compressed audio and video stream through a deep learning and neural network to obtain predicted data of each frame;
step S102, according to the comparison between the predicted data of each frame and the original data of the audio/video stream, respectively obtaining the accuracy of the predicted related data and the accuracy of the predicted non-related data of each frame;
step S103, judging the level of the current deep learning and neural network prediction according to the accuracy of the relevant data and the accuracy of the non-relevant data;
and step S104, transmitting the predicted level to the terminal of the staff in a binary mode, and displaying the predicted level in a lighting bar form at the terminal.
The method comprises the steps of firstly, predicting compressed audio and video streams through deep learning and a neural network to obtain predicted data of each frame; respectively obtaining the accuracy of the predicted relevant data and the accuracy of the predicted non-relevant data of each frame according to the comparison between the predicted data of each frame and the original data of the audio and video stream; then judging the level of the current deep learning and neural network prediction according to the accuracy of the relevant data and the accuracy of the non-relevant data; and transmitting the predicted level to the terminal of the staff in a binary form, and displaying the predicted level in a form of lighting bar at the terminal. It can be seen that, in the embodiment of the application, the traditional function prediction scheme is replaced by the deep learning and the prediction of the neural network, the compressed audio and video frames are predicted, and the prediction efficiency can be improved.
The embodiment of the application provides a possible implementation manner, the bar lattices are vertical lattices with a plurality of rows and a column on the terminal, each row of the vertical lattices is an independent bar lattice, and the independent bar lattice in each row can be independently controlled to be turned on and off.
In the embodiment of the present application, a possible implementation manner is provided, in the above step S102, the accuracy of the relevant data and the accuracy of the non-relevant data of each predicted frame are respectively obtained by comparing each predicted frame of data with the original data of the audio/video stream, and specifically, the accuracy of the relevant data and the accuracy of the non-relevant data of each predicted frame can be respectively obtained by comparing each predicted frame of data with the original data of the audio/video stream by using the following formula:
Figure BDA0003400471790000071
Figure BDA0003400471790000072
wherein L (i) representsThe accuracy of relevant data of the ith frame predicted by the neural network is over-deeply learned; f (i) represents the accuracy of the uncorrelated data of the ith frame predicted by the deep learning and neural network; wherein if
Figure BDA0003400471790000073
Then L (i) ═ 1, if
Figure BDA0003400471790000074
Then f (i) ═ 1; d i (a) Representing a binary number on the a bit in the binary form data of the ith frame predicted by the deep learning and neural network; d i,0 (a) Representing the binary number on the a bit in the ith frame binary form data of the original data of the audio and video stream; g i (a) Representing a characteristic detection function, and if the binary number on the a bit in the ith frame binary form data of the original data of the audio and video stream is a characteristic number, reflecting the characteristic value of the audio and video stream as a function value G i (a) If it is 1, the function value G is reversed i (a)=0;m i The bit number of binary number in the ith frame binary form data of the original data of the audio and video stream is represented; | | represents the absolute value; [] 10 Indicating that the values in parentheses are converted to decimal.
According to the embodiment of the application, the predicted data accuracy and the predicted non-relevant data accuracy of each frame are obtained respectively by comparing the predicted data of each frame with the original data, and the accuracy is further divided into two parts to analyze the deep learning and neural network algorithm.
In the embodiment of the present application, a possible implementation manner is provided, where in step S103, the current deep learning and neural network prediction level is determined according to the related data accuracy and the non-related data accuracy, and in step S104, the prediction level is transmitted to the terminal of the worker in a binary form, and specifically, the following formula may be used to obtain the binary data transmitted to the terminal of the worker according to the related data accuracy and the non-related data accuracy:
Figure BDA0003400471790000081
wherein (C) 2 Data in binary form representing transmission to a staff terminal; n represents the total frame number of the audio and video stream; Λ represents a logical and; () 2 The numbers in parentheses represent data in binary form.
According to the method and the device for transmitting the binary data, the binary data transmitted to the terminal of the worker are obtained according to the accuracy of the relevant data and the accuracy of the non-relevant data, and the binary data is transmitted most quickly, most efficiently and most conveniently, so that the binary data represents two accuracy levels to be transmitted efficiently and conveniently.
In the embodiment of the present application, a possible implementation manner is provided, where in step S104, the predicted level is transmitted to the terminal of the worker in a binary form, and is displayed in the terminal in a form of a lighting bar, and specifically, the following formula may be used to control an independent bar on a vertical bar to light according to binary data received by the terminal:
Figure BDA0003400471790000082
wherein k represents the number of independent lattice control lightening on the vertical lattice bar; k represents the total number of individual bars on the vertical bars.
In the embodiment of the application, the independent bars on the vertical bar lattice bars are lighted by the lighting number pairs obtained in the steps from bottom to top, namely the independent bars are changed from non-filling color to white filling; and then the staff can know the grade of the current deep learning and neural network prediction by observing the number of the lighted independent grids on the terminal, and further optimize or improve the deep learning and neural network algorithm, so that the grade is higher and the algorithm is more complete.
It should be noted that, in practical applications, all the possible embodiments described above may be combined in a combined manner at will to form possible embodiments of the present application, and details are not described here again.
Based on the same inventive concept, the embodiment of the application further provides an audio and video processing device based on deep learning.
Fig. 2 shows a block diagram of an audio-video processing device based on deep learning according to an embodiment of the present application. As shown in fig. 2, the deep learning based audio-video processing device may include a prediction module 210, a comparison module 220, a judgment module 230, and a transmission module 240.
The prediction module 210 is configured to predict the compressed audio/video stream through a deep learning and neural network, so as to obtain predicted data of each frame;
the comparison module 220 is configured to compare the predicted data of each frame with the original data of the audio/video stream to obtain the accuracy of the predicted related data and the accuracy of the predicted non-related data of each frame;
a judging module 230, configured to judge a level of the current deep learning and neural network prediction according to the accuracy of the relevant data and the accuracy of the non-relevant data;
and a transmission module 240 for transmitting the predicted level to the terminal of the staff member in a binary form, and displaying the predicted level in a form of a lighting bar at the terminal.
The embodiment of the application provides a possible implementation manner, the bar lattices are vertical lattices with a plurality of rows and a column on the terminal, each row of the vertical lattices is an independent bar lattice, and the independent bars in each row can be independently controlled to be turned on and off.
In an embodiment of the present application, a possible implementation manner is provided, and the comparing module 220 shown in fig. 2 is further configured to:
and respectively obtaining the accuracy of the predicted relevant data and the accuracy of the predicted non-relevant data of each frame by comparing the predicted data of each frame with the original data of the audio and video stream by using the following formula:
Figure BDA0003400471790000091
Figure BDA0003400471790000092
wherein l (i) represents the accuracy of the relevant data for the ith frame predicted by the neural network through deep learning; f (i) represents the accuracy of the uncorrelated data of the ith frame predicted by the deep learning and neural network; wherein if
Figure BDA0003400471790000093
Then L (i) ═ 1, if
Figure BDA0003400471790000094
Then f (i) ═ 1; d i (a) Representing a binary number on the a bit in the binary form data of the ith frame predicted by the deep learning and neural network; d i,0 (a) Representing the binary number on the a bit in the ith frame binary form data of the original data of the audio and video stream; g i (a) Representing a characteristic detection function, and if the binary number on the a bit in the ith frame binary form data of the original data of the audio and video stream is a characteristic number, reflecting the characteristic value of the audio and video stream as a function value G i (a) If it is 1, the function value G is reversed i (a)=0;m i The bit number of binary number in the ith frame binary form data of the original data of the audio and video stream is represented; | | represents the absolute value; [] 10 Indicating that the values in parentheses are converted to decimal.
In the embodiment of the present application, a possible implementation manner is provided, and the transmission module 240 shown in fig. 2 is further configured to:
obtaining binary data transmitted to the staff terminal according to the accuracy of the relevant data and the accuracy of the non-relevant data by using the following formula:
Figure BDA0003400471790000101
wherein (C) 2 Data in binary form representing transmission to a staff terminal; n represents the total frame number of the audio and video stream; Λ represents a logical and; () 2 The numbers in parentheses are data in binary form.
A possible implementation manner is provided in the embodiment of the present application, as shown in fig. 3, the apparatus shown in fig. 2 above may further include:
a control module 310, configured to control the individual bars on the vertical bar to light up according to the binary data received by the terminal by using the following formula:
Figure BDA0003400471790000102
wherein k represents the number of independent lattice control lightening on the vertical lattice bar; k represents the total number of individual bars on the vertical bars.
According to the embodiment of the application, the independent bars on the vertical bar lattice bar are controlled to be lightened according to the binary data received by the terminal, so that a worker can know the level of the current deep learning and neural network prediction according to the lightening condition of the independent bars on the vertical bar lattice bar, and then the deep learning and neural network algorithm is optimized or improved, so that the level is higher, and the algorithm is more perfect.
According to the audio and video processing device based on deep learning, firstly, compressed audio and video streams are predicted through the deep learning and neural network, and each predicted frame data is obtained; respectively obtaining the accuracy of the predicted relevant data and the accuracy of the predicted non-relevant data of each frame according to the comparison between the predicted data of each frame and the original data of the audio and video stream; then judging the level of the current deep learning and neural network prediction according to the accuracy of the relevant data and the accuracy of the non-relevant data; and transmitting the predicted level to the terminal of the staff in a binary form, and displaying the predicted level in a form of lighting bar at the terminal. It can be seen that, in the embodiment of the application, the traditional function prediction scheme is replaced by the deep learning and prediction of the neural network, the compressed audio and video frame is predicted, and the prediction efficiency can be improved.
It can be clearly understood by those skilled in the art that the specific working processes of the system, the apparatus, and the module described above may refer to the corresponding processes in the foregoing method embodiments, and for the sake of brevity, the detailed description is omitted here.
Those of ordinary skill in the art will understand that: the technical solution of the present application may be essentially or wholly or partially embodied in the form of a software product, where the computer software product is stored in a storage medium and includes program instructions for enabling an electronic device (e.g., a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application when the program instructions are executed. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Alternatively, all or part of the steps of implementing the foregoing method embodiments may be implemented by hardware (an electronic device such as a personal computer, a server, or a network device) associated with program instructions, which may be stored in a computer-readable storage medium, and when the program instructions are executed by a processor of the electronic device, the electronic device executes all or part of the steps of the method described in the embodiments of the present application.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments can be modified or some or all of the technical features can be equivalently replaced within the spirit and principle of the present application; such modifications or substitutions do not depart from the scope of the present application.

Claims (8)

1. An audio and video processing method based on deep learning is characterized by comprising the following steps:
predicting the compressed audio and video stream through a deep learning and neural network to obtain predicted data of each frame;
respectively obtaining the accuracy of the predicted relevant data and the accuracy of the predicted non-relevant data of each frame according to the comparison between the predicted data of each frame and the original data of the audio and video stream;
judging the level of current deep learning and neural network prediction according to the accuracy of the relevant data and the accuracy of the non-relevant data;
transmitting the predicted level to a terminal of a worker in a binary form, and displaying the predicted level in a lighting bar form at the terminal;
the method comprises the following steps of comparing predicted data of each frame with original data of audio and video streams by using the following formula to respectively obtain the accuracy of predicted related data and the accuracy of predicted non-related data of each frame:
Figure FDA0003669336130000011
Figure FDA0003669336130000012
wherein l (i) represents the accuracy of the relevant data for the ith frame predicted by the neural network through deep learning; f (i) represents the accuracy of the uncorrelated data of the ith frame predicted by the deep learning and neural network; wherein if
Figure FDA0003669336130000013
Then L (i) ═ 1, if
Figure FDA0003669336130000014
Then f (i) ═ 1; d i (a) Representing a binary number on the a bit in the binary form data of the ith frame predicted by the deep learning and neural network; d i,0 (a) Frame i binary representing original data of audio-video streamBinary number on the a-th bit in the format data; g i (a) Representing a characteristic detection function, and if the binary number on the a bit in the ith frame binary form data of the original data of the audio and video stream is a characteristic number, reflecting the characteristic value of the audio and video stream as a function value G i (a) If it is 1, the function value G is reversed i (a)=0;m i The bit number of binary number in the ith frame binary form data of the original data of the audio and video stream is represented; | | represents the absolute value; [] 10 Indicating that the values in parentheses are converted to decimal.
2. The deep learning-based audio and video processing method according to claim 1, wherein the bar lattice is a vertical bar lattice having a plurality of rows and a column on the terminal, each row of the vertical bar lattice is an independent bar lattice, and the independent bar lattice in each row can be independently controlled to be turned on and off.
3. The deep learning-based audio and video processing method according to claim 2, wherein the binary data transmitted to the staff terminal is obtained according to the accuracy of the relevant data and the accuracy of the non-relevant data by using the following formula:
Figure FDA0003669336130000021
wherein (C) 2 Data in binary form representing transmission to a staff terminal; n represents the total frame number of the audio and video stream; Λ represents a logical and; () 2 The numbers in parentheses represent data in binary form.
4. The deep learning-based audio/video processing method according to claim 3, wherein the independent bars on the vertical bars are controlled to be lighted up according to the binary data received by the terminal by using the following formula:
Figure FDA0003669336130000022
wherein k represents the number of independent lattice control lightening on the vertical lattice bar; k represents the total number of independent bars on the vertical bars.
5. An audio/video processing device based on deep learning, comprising:
the prediction module is used for predicting the compressed audio and video stream through deep learning and a neural network to obtain predicted data of each frame;
the comparison module is used for respectively obtaining the accuracy of the predicted related data and the accuracy of the predicted non-related data of each frame according to the comparison of the predicted data of each frame and the original data of the audio and video stream;
the judging module is used for judging the level of the current deep learning and neural network prediction according to the accuracy of the relevant data and the accuracy of the non-relevant data;
the transmission module is used for transmitting the predicted level to a terminal of a worker in a binary mode and displaying the predicted level in a lighting bar form at the terminal;
wherein the comparison module is further configured to:
and respectively obtaining the accuracy of the predicted relevant data and the accuracy of the predicted non-relevant data of each frame by comparing the predicted data of each frame with the original data of the audio and video stream by using the following formula:
Figure FDA0003669336130000031
Figure FDA0003669336130000032
wherein l (i) represents the accuracy of the relevant data for the ith frame predicted by the neural network through deep learning; f (i) shows the learning by deep learning and neural networkThe non-correlated data accuracy of the predicted ith frame; wherein if
Figure FDA0003669336130000033
Then L (i) ═ 1, if
Figure FDA0003669336130000034
Then f (i) ═ 1; d i (a) Representing a binary number on the a bit in the binary form data of the ith frame predicted by the deep learning and neural network; d i,0 (a) Representing the binary number on the a bit in the ith frame binary form data of the original data of the audio and video stream; g i (a) Representing a characteristic detection function, and if the binary number on the a bit in the ith frame binary form data of the original data of the audio and video stream is a characteristic number, reflecting the characteristic value of the audio and video stream as a function value G i (a) If it is 1, the function value G is reversed i (a)=0;m i The bit number of binary number in the ith frame binary form data of the original data of the audio and video stream is represented; | | represents the absolute value; [] 10 Indicating that the values in parentheses are converted to decimal.
6. The deep learning-based audio/video processing device according to claim 5, wherein the bar is a vertical bar having a plurality of rows and a column on the terminal, each row of the vertical bar is an independent bar, and the independent bar in each row can be independently controlled to be turned on and off.
7. The deep learning based audio-video processing device according to claim 6, wherein the transmission module is further configured to:
obtaining binary data transmitted to the staff terminal according to the accuracy of the related data and the accuracy of the non-related data by using the following formula:
Figure FDA0003669336130000041
wherein (C) 2 Data in binary form representing transmission to a staff terminal; n represents the total frame number of the audio and video stream; Λ represents a logical and; () 2 The numbers in parentheses represent data in binary form.
8. The deep learning based audio-video processing device according to claim 7, further comprising:
the control module is used for controlling the independent bars on the vertical bar to be lightened according to the binary data received by the terminal by using the following formula:
Figure FDA0003669336130000042
wherein k represents the number of independent lattice control lightening on the vertical lattice bar; k represents the total number of independent bars on the vertical bars.
CN202111495106.XA 2021-12-09 2021-12-09 Deep learning-based audio and video processing method and device Active CN114363631B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111495106.XA CN114363631B (en) 2021-12-09 2021-12-09 Deep learning-based audio and video processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111495106.XA CN114363631B (en) 2021-12-09 2021-12-09 Deep learning-based audio and video processing method and device

Publications (2)

Publication Number Publication Date
CN114363631A CN114363631A (en) 2022-04-15
CN114363631B true CN114363631B (en) 2022-08-05

Family

ID=81098112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111495106.XA Active CN114363631B (en) 2021-12-09 2021-12-09 Deep learning-based audio and video processing method and device

Country Status (1)

Country Link
CN (1) CN114363631B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108805257A (en) * 2018-04-26 2018-11-13 北京大学 A kind of neural network quantization method based on parameter norm
WO2019009452A1 (en) * 2017-07-06 2019-01-10 삼성전자 주식회사 Method and device for encoding or decoding image

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102262554B1 (en) * 2017-12-14 2021-06-09 한국전자통신연구원 Method and apparatus for encoding and decoding image using prediction network
JP7277699B2 (en) * 2018-12-05 2023-05-19 日本電信電話株式会社 Image processing device, learning device, image processing method, learning method, and program
KR20200084516A (en) * 2019-01-03 2020-07-13 삼성전자주식회사 Display apparatus, apparatus for providing image and method of controlling the same
CN112188202A (en) * 2019-07-01 2021-01-05 西安电子科技大学 Self-learning video coding and decoding technology based on neural network
CN110557633B (en) * 2019-08-28 2021-06-29 深圳大学 Compression transmission method, system and computer readable storage medium for image data
EP3846477B1 (en) * 2020-01-05 2023-05-03 Isize Limited Preprocessing image data
EP4107947A4 (en) * 2020-02-21 2024-03-06 Nokia Technologies Oy A method, an apparatus and a computer program product for video encoding and video decoding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019009452A1 (en) * 2017-07-06 2019-01-10 삼성전자 주식회사 Method and device for encoding or decoding image
CN108805257A (en) * 2018-04-26 2018-11-13 北京大学 A kind of neural network quantization method based on parameter norm

Also Published As

Publication number Publication date
CN114363631A (en) 2022-04-15

Similar Documents

Publication Publication Date Title
EP3707906B1 (en) Electronic apparatus and control method thereof
CN103886623B (en) A kind of method for compressing image, equipment and system
WO2021068598A1 (en) Encoding method and device for screen sharing, and storage medium and electronic equipment
CN103188493B (en) Image encoding apparatus and image encoding method
EP1175100A2 (en) Method and apparatus for image encoding and decoding
CN101014132B (en) Selection of encoded data, creation of recoding data, and recoding method and device
JP2014099905A (en) Video compression method
CN101346719A (en) Selecting key frames from video frames
US20100303364A1 (en) Image quality evaluation method, image quality evaluation system and image quality evaluation program
EP1962512A2 (en) Adaptive intra refresh coding method and apparatus
JP4915341B2 (en) Learning apparatus and method, image processing apparatus and method, and program
CN114245209B (en) Video resolution determination, model training and video coding method and device
EP4179539A1 (en) Genomic information compression by configurable machine learning-based arithmetic coding
CN105898296A (en) Video coding frame selection method and device
US20220408097A1 (en) Adaptively encoding video frames using content and network analysis
CN114363631B (en) Deep learning-based audio and video processing method and device
CN114449262A (en) Video coding control method, device, equipment and storage medium
JP2015012483A (en) Image encoding apparatus, image encoding method, image encoding program, image decoding apparatus, image decoding method, and image decoding program
CN108668170B (en) Image information processing method and device, and storage medium
CN109219960B (en) Method, device and equipment for optimizing video coding quality smoothness and storage medium
CN108668169B (en) Image information processing method and device, and storage medium
CN102387289A (en) Reconfigurable operation apparatus, data compression apparatus and reconfigurable operation
CN114245140B (en) Code stream prediction method and device based on deep learning
JP4784814B2 (en) Encoding apparatus, encoding method, and program
CN114222124B (en) Encoding and decoding method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant