CN108289248B

CN108289248B - Deep learning video decoding method and device based on content prediction

Info

Publication number: CN108289248B
Application number: CN201810048036.5A
Authority: CN
Inventors: 廖裕民; 罗玉明
Original assignee: Fuzhou Rockchip Electronics Co Ltd
Current assignee: Rockchip Electronics Co Ltd
Priority date: 2018-01-18
Filing date: 2018-01-18
Publication date: 2020-05-15
Anticipated expiration: 2038-01-18
Also published as: CN108289248A

Abstract

The invention provides a deep learning video decoding method and device based on content prediction, wherein after video stream information is received, the method transmits the analyzed current frame information to a display screen for display on one side, analyzes the next key frame information corresponding to the current frame information on the other side as pre-judgment frame information, transmits the pre-judgment frame information to a convolutional neural network circuit for identification and judgment, and adds prompt information to the current frame information on the other side when the pre-judgment frame information is detected to be matched with preset content; and on the other hand, continuously extracting the next pre-judging frame information and matching the next pre-judging frame information with the preset content, and circulating the steps until the pre-judging frame information sent to the convolutional neural network circuit for identification and judgment is not matched with the preset content any more, and determining the segment needing frame skipping according to the timestamp of the current pre-judging frame and the difference value between the timestamp of the current pre-judging frame and the timestamp of the current frame. The scheme can predict and judge the video playing content in advance, and effectively meets the user requirements.

Description

Deep learning video decoding method and device based on content prediction

Technical Field

The present invention relates to the field of computers, and in particular, to a method and an apparatus for decoding a deep learning video based on content prediction.

Background

At present, video stream data is decoded by a video decoding chip and then is directly sent to a display to be played when being played, a good detection mechanism does not exist for detecting video content, so that when some pictures which can not be seen (such as pornography or violence) exist in the video content, the existing method cannot give early warning in time well, and especially when minors exist in audiences, once the pictures are played, the physical and psychological health of the minors can be affected, and young psychology of the minors can be left with childhood shadows.

Disclosure of Invention

Therefore, a technical scheme of deep learning video decoding based on content prediction is needed to be provided, so that the problem that the existing method cannot avoid that juveniles contact contents such as pornography or violence in the video due to the fact that the video contents cannot be detected and predicted in advance is solved.

In order to achieve the above object, the inventors provide a deep learning video decoding apparatus based on content prediction, the apparatus including a video control unit, a video decoding unit, a frame buffer unit, a convolutional neural network circuit, a preset content determination unit, a cue information adding unit, a frame skipping storage unit, a display control unit, and a display unit; the frame buffer unit comprises a current frame buffer unit and a prejudgment frame buffer unit; the video control unit is connected with the video decoding unit, the video decoding unit is respectively connected with the current frame cache unit and the prejudgment frame cache unit, the current frame cache unit is connected with the display control unit, and the display control unit is connected with the display unit; the pre-judging frame buffer unit is connected with a convolutional neural network circuit, the convolutional neural network circuit is connected with a preset content judging unit, the preset content judging unit is respectively connected with a video control unit, a prompt information adding unit and a frame skipping storage unit, and the prompt information adding unit is connected with a current frame buffer unit;

the video control unit is used for receiving video stream information, wherein the video stream information comprises multi-frame image information, the multi-frame image information comprises current frame information and first prejudgment frame information, and the first prejudgment frame information is next key frame information corresponding to the current frame information;

the video decoding unit is used for analyzing the current frame information, storing the analyzed current frame information into the current frame cache unit, analyzing the first pre-judging frame information and storing the analyzed first pre-judging frame information into the pre-judging frame cache unit;

the convolutional neural network circuit is used for acquiring first pre-judgment frame information in the pre-judgment frame cache unit, judging whether the first pre-judgment frame information is matched with preset content or not, and if the first pre-judgment frame information is matched with the preset content, the preset content judgment unit is used for sending a first control signal to the prompt information adding unit and sending a second control signal to the video control unit;

the video control unit is used for receiving a second control signal and sending second pre-judging frame information to the video decoding unit for analysis, and the video decoding unit is used for storing the analyzed second pre-judging frame information into the pre-judging frame cache unit; the second pre-judging frame information is the next key frame information corresponding to the first pre-judging frame information;

the convolutional neural network circuit is used for acquiring second pre-judging frame information in the pre-judging frame cache unit, judging whether the second pre-judging frame information is matched with preset content or not, and if the second pre-judging frame information is matched with the preset content, the preset content judgment unit is used for sending a first control signal to the prompt information adding unit and sending a second control signal to the video control unit again; repeating the steps until the convolutional neural network circuit judges that the pre-judging frame information in the pre-judging frame cache unit is not matched with the preset content, and recording a time stamp corresponding to the pre-judging frame information in the pre-judging frame cache unit at the moment; the preset content judging unit is used for storing the timestamp information corresponding to the prejudgment frame information in the current prejudgment frame cache unit and all the frame information between the timestamp and the timestamp of the prejudgment frame information which is judged to be matched with the preset content for the first time into the frame skipping storage unit;

the prompt information adding unit is used for acquiring current frame information in the current frame cache unit after receiving the first control signal, adding prompt information to the current frame information to obtain current frame adjustment information, and writing the current frame adjustment information back to the current frame cache unit;

the display control unit is used for acquiring the current frame adjustment information and transmitting the current frame adjustment information to the display unit for displaying, and the display unit is used for displaying the current frame adjustment information.

Furthermore, the convolutional neural network circuit comprises a parameter initialization unit, an initial value storage unit, a data fetching unit, a parameter storage unit, a reconfigurable network matrix unit, an error calculation unit and a back propagation write-back unit; the parameter storage unit stores parameter elements, the parameter elements comprise convolution kernels, weights and convolution offset values, and the access unit comprises a weight access unit, a convolution kernel access unit and a convolution offset access unit;

the convolutional neural network circuit is also used for carrying out neural network training on preset content before acquiring first pre-judging frame information in the pre-judging frame cache unit; the method specifically comprises the following steps:

the parameter initialization unit is used for acquiring each parameter initial value of the network structure from the initial value storage unit, controlling the access unit to acquire each parameter element with corresponding quantity from the parameter storage unit according to the initial value of each parameter, and configuring the reconfigurable network matrix unit;

the reconfigurable network matrix unit is used for carrying out neural network calculation according to initial values of all configured parameter elements, the error calculation unit is used for judging whether the matching degree of the calculation result and the real information is greater than a preset matching degree, if so, the training is judged to be finished, and the back propagation write-back unit is used for updating the parameter values of all the parameter elements into the current parameter values and writing the updated parameter values into the parameter storage unit; and the reconfigurable network matrix unit is used for adjusting the configuration parameter values of all the parameter elements according to the difference between the matching degree of the training result and the matching degree of the last training result, writing the adjusted parameter values into the parameter storage unit through the back propagation write-back unit, and controlling the access unit to obtain the corresponding number of all the parameter elements according to the adjusted parameter values and perform neural network calculation again until the training is completed.

Furthermore, the video decoding unit is further configured to analyze a next encoded frame corresponding to the first pre-determined frame information to obtain motion vector information corresponding to the first pre-determined frame information, and transmit the motion vector information to the convolutional neural network circuit;

the convolutional neural network circuit is used for judging whether the first pre-judging frame information is matched with the preset content or not, and further comprises: the convolution neural network circuit is used for carrying out neural network identification calculation on the motion vector information corresponding to the first pre-judging frame information, calculating the matching degree of the motion vector information with the preset motion vector information, and judging whether the first pre-judging frame information is matched with the preset content or not according to the first pre-judging frame information and the calculation result of the matching degree of the motion vector information corresponding to the first pre-judging frame information.

Further, when the preset content determining unit determines that the pre-determined frame information in the current pre-determined frame buffer unit matches the preset content, the video control unit is further configured to receive a frame skipping instruction, and not send all the frame information in the frame skipping storage unit to the video decoding unit for analysis.

Furthermore, the prompt information is early warning subtitle information, and the device further comprises a subtitle storage unit; the prompt information adding unit is used for acquiring the current frame information in the current frame caching unit after receiving the first control signal, and adding prompt information to the current frame information to obtain current frame adjustment information, wherein the current frame adjustment information comprises:

the prompt information adding unit is used for acquiring the subtitle information in the subtitle storage unit, superposing the early warning subtitle information on the subtitle information to obtain subtitle adjustment information, and adding the subtitle adjustment information to the corresponding position of the current frame information to obtain the current frame adjustment information.

The inventor also provides a deep learning video decoding method based on content prediction, which is applied to a deep learning video decoding device based on content prediction, and the device comprises a video control unit, a video decoding unit, a frame buffer unit, a convolutional neural network circuit, a preset content judgment unit, a prompt information adding unit, a frame skipping storage unit, a display control unit and a display unit; the frame buffer unit comprises a current frame buffer unit and a prejudgment frame buffer unit; the video control unit is connected with the video decoding unit, the video decoding unit is respectively connected with the current frame cache unit and the prejudgment frame cache unit, the current frame cache unit is connected with the display control unit, and the display control unit is connected with the display unit; the pre-judging frame buffer unit is connected with a convolutional neural network circuit, the convolutional neural network circuit is connected with a preset content judging unit, the preset content judging unit is respectively connected with a video control unit, a prompt information adding unit and a frame skipping storage unit, and the prompt information adding unit is connected with a current frame buffer unit; the method comprises the following steps:

the method comprises the steps that a video control unit receives video stream information, wherein the video stream information comprises multi-frame image information, the multi-frame image information comprises current frame information and first prejudging frame information, and the first prejudging frame information is next key frame information corresponding to the current frame information;

the video decoding unit analyzes the current frame information, stores the analyzed current frame information into a current frame cache unit, analyzes the first pre-judging frame information, and stores the analyzed first pre-judging frame information into a pre-judging frame cache unit;

the convolutional neural network circuit acquires first pre-judging frame information in the pre-judging frame cache unit, judges whether the first pre-judging frame information is matched with preset content or not, and if the first pre-judging frame information is matched with the preset content, the preset content judgment unit sends a first control signal to the prompt information adding unit and sends a second control signal to the video control unit;

the video control unit receives the second control signal and sends the second pre-judging frame information to the video decoding unit for analysis, and the video decoding unit stores the analyzed second pre-judging frame information into the pre-judging frame cache unit; the second pre-judging frame information is the next key frame information corresponding to the first pre-judging frame information;

the convolutional neural network circuit acquires second pre-judgment frame information in the pre-judgment frame cache unit, judges whether the second pre-judgment frame information is matched with preset content or not, and if the second pre-judgment frame information is matched with the preset content, the preset content judgment unit sends a first control signal to the prompt information adding unit and sends a second control signal to the video control unit again; repeating the steps until the convolutional neural network circuit judges that the pre-judging frame information in the pre-judging frame cache unit is not matched with the preset content, and recording a time stamp corresponding to the pre-judging frame information in the pre-judging frame cache unit at the moment; the preset content judging unit stores the timestamp information corresponding to the pre-judging frame information in the current pre-judging frame cache unit and all frame information between the timestamp and the timestamp of the pre-judging frame information which is judged to be matched with the preset content for the first time into the frame skipping storage unit;

the prompt information adding unit acquires current frame information in the current frame cache unit after receiving the first control signal, adds prompt information to the current frame information to obtain current frame adjustment information, and writes the current frame adjustment information back to the current frame cache unit;

the display control unit acquires the current frame adjustment information and transmits the current frame adjustment information to the display unit for displaying, and the display unit displays the current frame adjustment information.

Furthermore, the convolutional neural network circuit comprises a parameter initialization unit, an initial value storage unit, a data fetching unit, a parameter storage unit, a reconfigurable network matrix unit, an error calculation unit and a back propagation write-back unit; the parameter storage unit stores parameter elements, the parameter elements comprise convolution kernels, weights and convolution offset values, and the access unit comprises a weight access unit, a convolution kernel access unit and a convolution offset access unit; before acquiring first pre-judgment frame information in the pre-judgment frame cache unit, the convolutional neural network circuit performs neural network training on preset content; the method specifically comprises the following steps:

the parameter initialization unit acquires each parameter initial value of the network structure from the initial value storage unit, controls the access unit to acquire each parameter element with corresponding quantity from the parameter storage unit according to the initial value of each parameter, and configures the reconfigurable network matrix unit;

the reconfigurable network matrix unit carries out neural network calculation according to initial values of all configured parameter elements, the error calculation unit judges whether the matching degree of the calculation result and real information is greater than a preset matching degree, if so, the training is judged to be finished, the back propagation write-back unit updates the parameter values of all the parameter elements to the current parameter values, and writes the updated parameter values into the parameter storage unit; otherwise, the reconfigurable network matrix unit adjusts the configuration parameter values of all the parameter elements according to the difference between the matching degree of the current training result and the matching degree of the last training result, writes the adjusted parameter values into the parameter storage unit through the back propagation write-back unit, and controls the access unit to obtain the corresponding number of all the parameter elements according to the adjusted parameter values and perform neural network calculation again until the training is completed.

Further, the method further comprises:

the video decoding unit analyzes the next coding frame corresponding to the first pre-judging frame information to obtain motion vector information corresponding to the first pre-judging frame information, and transmits the motion vector information to the convolutional neural network circuit;

the step of judging whether the first pre-judging frame information is matched with the preset content by the convolutional neural network circuit further comprises the following steps: and the convolutional neural network circuit performs neural network identification calculation on the motion vector information corresponding to the first pre-judging frame information, calculates the matching degree of the motion vector information with the preset motion vector information, and judges whether the first pre-judging frame information is matched with the preset content or not according to the first pre-judging frame information and the calculation result of the matching degree of the motion vector information corresponding to the first pre-judging frame information.

Further, when the predetermined content determining unit determines that the predetermined frame information in the current predetermined frame buffer unit matches the predetermined content, the method further includes: and the video control unit receives the frame skipping instruction and does not send all the frame information in the frame skipping storage unit to the video decoding unit for analysis.

Furthermore, the prompt information is early warning subtitle information, and the device further comprises a subtitle storage unit; the prompt information adding unit acquires the current frame information in the current frame caching unit after receiving the first control signal, and adds prompt information to the current frame information to obtain current frame adjustment information, wherein the current frame adjustment information comprises:

the prompt information adding unit acquires the subtitle information in the subtitle storage unit, superimposes early warning subtitle information on the subtitle information to obtain subtitle adjustment information, and adds the subtitle adjustment information to the corresponding position of the current frame information to obtain the current frame adjustment information.

The method comprises the steps of sending analyzed current frame information to a display screen for display after receiving video stream information, analyzing next key frame information corresponding to the current frame information as first pre-judging frame information on the other side, sending the first pre-judging frame information to a convolutional neural network circuit for identification and judgment, adding prompt information to the current frame information on the one hand when the first pre-judging frame information is matched with preset content so as to give full warning to the content to be played next and enable a user to know, continuously extracting the next pre-judging frame information on the other side and matching the next pre-judging frame information with the preset content on the other side, and circulating the steps until the pre-judging frame information sent to the convolutional neural network circuit for identification and judgment is not matched with the preset content any more, at this time, the segment needing frame skipping is determined according to the timestamp of the current pre-judging frame and the difference value between the timestamp of the current pre-judging frame and the timestamp of the current frame, and the segment is stored in the frame skipping storage unit. By the scheme, the video content can be detected and prejudged in advance, the juveniles are prevented from contacting the contents such as pornography or violence in the video, and the video detection method has a wide market prospect.

Drawings

FIG. 1 is a diagram of a deep learning video decoding apparatus based on content prediction according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a convolutional neural network circuit according to an embodiment of the present invention;

FIG. 3 is a circuit diagram of an error calculation unit according to an embodiment of the present invention;

FIG. 4 is a circuit diagram of an upgrade unit according to an embodiment of the present invention;

fig. 5 is a circuit diagram of a multiplier-adder unit according to an embodiment of the present invention;

fig. 6 is a circuit configuration diagram of the reconfigurable network matrix unit 208 according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating a deep learning video decoding method based on content prediction according to an embodiment of the present invention;

description of reference numerals:

101. a video control unit; 102. a video decoding unit; 103. a current frame buffer unit; 104. a pre-judgment frame buffer unit; 105. a convolutional neural network circuit; 106. a preset content determination unit; 107. a prompt information adding unit; 108. a frame skip storage unit; 109. a display control unit; 110. a display unit; 111. a subtitle storage unit; 112. a motion vector calculation unit;

201. a parameter initialization unit;

202. a parameter storage unit; 2021. a weight value and bias value storage unit; 2022. a first convolution core storage unit; 2023. a second convolution kernel storage unit;

203. a weight value taking unit; 204. a convolution kernel taking unit; 205. a convolution offset access unit; 206. an error calculation unit; 207. a back propagation write back unit;

208. a reconfigurable network matrix unit; 2081. an upgrade unit; 2082. activating a function unit; 2083. a multiplier-adder unit; 2084. a calculation cache unit; 2085. and interconnecting the matrix units.

Detailed Description

To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.

Fig. 1 is a schematic diagram of a deep learning video decoding apparatus based on content prediction according to an embodiment of the present invention. The device comprises a video control unit 101, a video decoding unit 103, a frame buffer unit, a convolutional neural network circuit 105, a preset content judgment unit 106, a prompt information adding unit 107, a frame skipping storage unit 108, a display control unit 109 and a display unit 110; the frame buffer unit comprises a current frame buffer unit 103 and a prejudgment frame buffer unit 104; the video control unit 101 is connected to the video decoding unit 102, the video decoding unit 102 is respectively connected to the current frame buffer unit 103 and the predicted frame buffer unit 104, the current frame buffer unit 103 is connected to the display control unit 101, and the display control unit 109 is connected to the display unit 110; the pre-judgment frame buffer unit 104 is connected to a convolutional neural network circuit 105, the convolutional neural network circuit 105 is connected to a preset content judgment unit 106, the preset content judgment unit 106 is respectively connected to a video control unit 101, a prompt information adding unit 107 and a frame skipping storage unit 108, and the prompt information adding unit 107 is connected to a current frame buffer unit 103;

the video control unit 101 is configured to receive video stream information, where the video stream information includes multi-frame image information, the multi-frame image information includes current frame information and first pre-determined frame information, and the first pre-determined frame information is next key frame information corresponding to the current frame information. In the video decoding process, the video is decoded in units of frames, and in the video compression, each frame represents a static image. In actual compression, various algorithms are used to reduce the data size, with IPB being the most common. In short, the I frame is a key frame and belongs to intraframe compression, and the compression mode is the same as that of AVI; p means forward search, B means bidirectional search, and both P and B frames compress data based on I frames, i.e., storing delta (motion vector) compared to I frames. The IPB frame references in the video compression process are linked as follows:http://blog.csdn.net/ tanyhuan/article/details/48346571. In a specific application process, hundreds or even thousands of P frames or B frames are generally inserted between one I frame and the next I frame, and since the I frame is key frame information and contains complete information of an image of one frame, the I frame can be used as a pre-judgment frame, and then the content of the pre-judgment frame is predicted, and the content contained in the I frame is identified and judged.

The video decoding unit 102 is configured to analyze current frame information, store the analyzed current frame information in the current frame caching unit 103, analyze first pre-determined frame information, and store the analyzed first pre-determined frame information in the pre-determined frame caching unit 104. In the video decoding process, the decoding speed of the decoding chip is generally higher than the video transmission display speed, so that the video is ensured not to be blocked in the playing process. Therefore, after parsing the current frame, the video decoding unit may obtain the first predicted frame information by multiplying the decoding interval and store the first predicted frame information in the predicted frame buffer unit. In this embodiment, the current frame information may include an I frame, a B frame, or a P frame, and the prejudged frame information is the next I frame information corresponding to the current frame information. Assuming that the 1000 th frame of a certain video stream is an I frame, the 2000 th frame is a next I frame, and 1000B frames or P frames are inserted between the 1000 th frame and the 2000 th frame, when the video decoding unit decodes any frame between the 1000 th frame and the 2000 th frame (i.e., the current frame is a certain frame between the 1000 th frame and the 2000 th frame), the video decoding unit obtains the 2000 th frame as a first pre-determined frame and stores the first pre-determined frame in a pre-determined frame buffer unit for subsequent processing.

The convolutional neural network circuit 105 is configured to obtain first pre-determined frame information in the pre-determined frame buffer unit 104, determine whether the first pre-determined frame information matches with a preset content, and if the first pre-determined frame information matches with the preset content, the preset content determining unit 106 is configured to send a first control signal to the prompt information adding unit and send a second control signal to the video control unit 101. In this embodiment, the preset content is set image information that requires frame skipping, such as violence, erotic pictures, and the like.

As shown in fig. 2, the convolutional neural network circuit includes a parameter initialization unit 201, an initial value storage unit, an access unit, a parameter storage unit 202, a reconfigurable network matrix unit 208, an error calculation unit 206, and a back propagation write-back unit 207; the parameter storage unit 202 stores therein parameter elements. In the present embodiment, the parameter elements include the number of layers of the neural network, the number of neurons in each layer of the neural network, the convolution kernel value, the convolution arrangement value, and the weight value in each layer of the neural network. Correspondingly, the access unit comprises a weight access unit 203, a convolution kernel access unit 204 and a convolution offset access unit 205; the convolutional neural network circuit is also used for carrying out neural network training on preset content before acquiring first pre-judging frame information in the pre-judging frame cache unit; the method specifically comprises the following steps:

the parameter initialization unit 201 is configured to obtain each parameter initial value of the network structure from the initial value storage unit, control the access unit to obtain each parameter element in a corresponding number from the parameter storage unit 202 according to the initial value of each parameter, and configure the reconfigurable network matrix unit.

In some embodiments, the device includes a WIFI communication unit and an initial value configuration query unit, the initial value configuration query unit is further connected to the internet through the WIFI communication unit, and the initial value configuration query unit is configured to search, through the WIFI communication unit, an initial value of a neural network configuration parameter required by the task request from the internet when the initial value of the neural network configuration parameter corresponding to the task request is not acquired from the initial value storage unit, and store the initial value of the neural network configuration parameter required after the initial value of the neural network configuration parameter is searched. In short, the initial value storage unit stores some configuration parameters of the network structure so that the reconfigurable network matrix unit can be called in time when performing neural network training, and meanwhile, the parameter initialization unit can download and store some configuration parameters of the neural network structure which are not available locally to the initial value storage unit through the internet, so that the application range of the device is expanded.

The reconfigurable network matrix unit 208 is configured to perform neural network calculation according to initial values of the configured parameter elements, the error calculation unit 206 is configured to determine whether a matching degree between a current calculation result and real information is greater than a preset matching degree, if so, it is determined that training is completed, and the back propagation write-back unit 207 is configured to update parameter values of the parameter elements to current parameter values, and write the updated parameter values into the parameter storage unit 202; otherwise, the reconfigurable network matrix unit 208 is configured to adjust the configuration parameter values of each parameter element according to the difference between the matching degree of the current training result and the matching degree of the last training result, write the adjusted parameter values into the parameter storage unit through the back propagation write-back unit, and control the access unit to obtain a corresponding number of each parameter element according to the adjusted parameter values, and perform neural network calculation again until the training is completed. The circuit configuration of the error calculation unit is shown in fig. 3.

The real information refers to characteristic information input to the reconfigurable network matrix unit, that is, preset content information, and may be images containing pornography and violence elements. For example, if the reconfigurable network matrix unit 208 performs training based on pornographic image recognition, the input characteristic information is pornographic image information, the object to be compared by the error calculation unit is the difference between the recognition result obtained by the current calculation result and the input pornographic image information, if the matching degree between the two is higher, the error is smaller, and if the error is smaller than a preset error, the training is determined to be completed. The neural network training calculation process has been disclosed in the prior art with various algorithms, and is not described herein again.

As shown in fig. 6, the reconfigurable network matrix unit 208 includes an interconnection matrix unit 2085, a multiplier accumulator unit 2083, an upgrade unit 2081, an activation function unit 2082, and a multiplier accumulator configuration unit; the multiplier-adder unit, the upgrading unit and the activation function unit are respectively connected with the interconnected matrix unit, and the multiplier-adder unit is connected with the multiplier-adder configuration unit; the multiplier-adder unit comprises a plurality of multiplier-adder units of different precisions. The parameter initialization unit configures each parameter trained by the reconfigurable network matrix unit 208, and also configures parameter connection information between each parameter element, so that the reconfigurable network matrix unit 208 can reconstruct a neural network structure with a corresponding function according to each configuration parameter and the parameter connection information. The parameter connection information and the initial values of the configuration parameters corresponding to the parameter connection information are stored in an initial value configuration storage unit in advance, and when the initial value configuration storage unit does not inquire the initial values of the neural network structure configuration parameters corresponding to the task request, the WIFI communication unit 111 can download the parameter connection relations corresponding to the initial values of the parameters when searching the initial values of the neural network structure configuration parameters required by the task request from the internet, and store the parameter connection relations in the initial value configuration storage unit.

The multiplier-adder configuration unit is used for configuring the precision of a multiplier-adder, and the reconfigurable network matrix unit 208 adopts a multiplier-adder unit with corresponding precision to calculate according to the configured precision of the multiplier-adder when performing neural network training; and the interconnection matrix unit is used for interconnecting the multiplier-adder unit, the upgrading unit and the activation function unit according to the parameter connection information so as to form a corresponding neural network structure. The circuit configuration of the upgrade unit is shown in fig. 4, and the circuit configuration of the multiplier-adder unit is shown in fig. 5.

The multiplier-adder unit comprises multiplier-adder units with different precisions, such as 8-bit integer, 16-bit floating point number, 32-bit floating point number and the like. The multiplier-adder configuration unit can enable the neural network structure to be built by selecting multipliers with different accuracies by sending different control signals, so that multiple choices are provided. Similarly, the activation function unit may also include a plurality of activation functions (such as sigmoid, ReLU, and the like), which may be selected by different control signals, and recorded in the parameter configuration information after selection, and then the selected multiplier-adder unit, upgrade unit, and activation function unit are interconnected by the interconnection matrix unit according to the parameter connection information.

The video control unit 101 is configured to receive a second control signal and send second pre-judging frame information to the video decoding unit for analysis, and the video decoding unit is configured to store the analyzed second pre-judging frame information in the pre-judging frame cache unit; the second pre-judging frame information is the next key frame information corresponding to the first pre-judging frame information.

The convolutional neural network circuit 105 is configured to obtain second pre-determined frame information in the pre-determined frame buffer unit, determine whether the second pre-determined frame information matches with a preset content, and if so, the preset content determination unit is configured to send a first control signal to the prompt information adding unit and to send a second control signal to the video control unit 101 again; repeating the steps until the convolutional neural network circuit 105 judges that the pre-judging frame information in the pre-judging frame cache unit is not matched with the preset content, and recording a timestamp corresponding to the pre-judging frame information in the pre-judging frame cache unit 104 at the moment; the predetermined content determining unit 106 is configured to store, in the frame skipping storage unit 108, time stamp information corresponding to the prejudged frame information in the current prejudged frame caching unit 104, and all frame information between the time stamp and a time stamp of the prejudged frame information that is determined to be matched with the predetermined content for the first time.

When the preset content determining unit determines that the first pre-determined frame information conforms to the preset content, it means that several frames downwards from the first pre-determined frame information are likely to contain information conforming to the preset content, i.e. they are likely to be all image frames of pornography, violence, etc., and therefore the downwards determination needs to be continued. For example, the 1000 th frame and the 2000 th frame … are both I frames, and when the preset content determines that the 1000 th frame meets the preset content, the video control unit will continue to extract the 2000 th frame and transmit the 2000 th frame to the decoder for parsing, thereby determining whether the 2000 th frame matches the preset content; when the preset content judges that the 1000 th frame accords with the preset content, the video control unit continuously extracts the 2000 th frame and transmits the 2000 th frame to a decoder for analysis, and then the 2000 th frame is judged whether to be matched with the preset content or not through a convolutional neural network training circuit; if the 2000 th frame is still matched with the preset content, the 3000 th frame is continuously extracted and transmitted to the decoder for analysis, and whether the 3000 th frame is matched with the preset content is further judged, and the rest is done until the I frame extracted at a certain time is not matched with the preset content. If 8000 is found not to match the preset content when 8000 is determined, it means that 1000 th frame to 8000 th frame are likely to be information conforming to the preset content, and it is necessary to process this part of the video stream separately, so that timestamp information corresponding to 8000 th frame is recorded, and the difference between the timestamp of 1000 th frame and the timestamp of 8000 th frame is calculated, so that when the video stream is played to 1000 th frame, the segment between 1000 th frame and 8000 th frame can be skipped according to the user's selection.

The prompt information adding unit 107 is configured to, upon receiving the first control signal, obtain current frame information in the current frame cache unit, add prompt information to the current frame information to obtain current frame adjustment information, and write back the current frame adjustment information to the current frame cache unit. The display control unit 109 is configured to obtain the current frame adjustment information and transmit the current frame adjustment information to the display unit 110 for displaying, and the display unit 110 is configured to display the current frame adjustment information.

In some embodiments, the prompt information is early warning subtitle information, and the apparatus further includes a subtitle storage unit; the prompt information adding unit is used for acquiring the current frame information in the current frame caching unit after receiving the first control signal, and adding prompt information to the current frame information to obtain current frame adjustment information, wherein the current frame adjustment information comprises: the prompt information adding unit is used for acquiring the subtitle information in the subtitle storage unit, superposing the early warning subtitle information on the subtitle information to obtain subtitle adjustment information, and adding the subtitle adjustment information to the corresponding position of the current frame information to obtain the current frame adjustment information. The corresponding position of the current frame information refers to the position of the subtitle in the current frame information, and is generally located at the bottom of the current frame information (image). By adding the early warning subtitles, a user can timely know that a picture which accords with preset content exists in video stream information to be played (namely, the pre-judged frame information behind the current frame information accords with the content), and if the audience has a picture which is not grown up, the user can timely carry out frame skipping operation, so that the influence on the physical and mental health of the audience due to watching pictures such as pornography and violence is avoided.

Preferably, in order to improve the autonomy of user selection, when the predetermined content determining unit determines that the predetermined frame information in the current predetermined frame buffer unit matches the predetermined content, the video control unit is further configured to receive a frame skipping instruction, and not send all the frame information in the frame skipping storage unit to the video decoding unit for parsing. The user can select according to actual needs to determine whether frame skipping processing is required to be performed with the frame information stored in the frame skipping storage unit, wherein the frame skipping processing means that all the frame information in the frame skipping storage unit is not sent to the video decoding unit for analysis, and thus the display unit does not display the part of the frame information naturally.

In order to improve the accuracy of the preset content matching judgment, in some embodiments, the video decoding unit is further configured to analyze a next encoded frame corresponding to the first pre-determined frame information to obtain motion vector information corresponding to the first pre-determined frame information, and transmit the motion vector information to the convolutional neural network circuit; the convolutional neural network circuit is used for judging whether the first pre-judging frame information is matched with the preset content or not, and further comprises: the convolution neural network circuit is used for carrying out neural network identification calculation on the motion vector information corresponding to the first pre-judging frame information, calculating the matching degree of the motion vector information with the preset motion vector information, and judging whether the first pre-judging frame information is matched with the preset content or not according to the first pre-judging frame information and the calculation result of the matching degree of the motion vector information corresponding to the first pre-judging frame information.

The parameter storage unit 202 includes a weight and offset storage unit 2021, a first convolution kernel storage unit 2022, and a second convolution kernel storage unit 2023, where a convolution kernel stored in the first convolution kernel storage unit is used to train and identify the pre-determined frame information (i.e., I frame), and a convolution kernel stored in the second convolution kernel storage unit is used to train and identify the next frame information of the pre-determined frame information (i.e., a plurality of frames after the I frame, i.e., P frame containing motion vector information). In short, if an image is a still image, it is often not accurate enough to determine whether the image belongs to a pornographic or violent type, and if the determination can be made in combination with the movement change trend of the image, the determination accuracy can be greatly improved. Therefore, when the I frame is compared and judged with the preset content, 1 to 2P frames behind the I frame are compared with the preset motion vector information, the motion vector information contained in the P frames is a parameter representing the motion change trend of the P frames, different convolution kernels are used for the pre-judgment frame information and the motion vector information, and the judgment accuracy is improved.

As shown in fig. 7, the inventor further provides a deep learning video decoding method based on content prediction, which is applied to a deep learning video decoding apparatus based on content prediction, and the apparatus includes a video control unit, a video decoding unit, a frame buffer unit, a convolutional neural network circuit, a preset content determination unit, a prompt information addition unit, a frame skipping storage unit, a display control unit, and a display unit; the frame buffer unit comprises a current frame buffer unit and a prejudgment frame buffer unit; the video control unit is connected with the video decoding unit, the video decoding unit is respectively connected with the current frame cache unit and the prejudgment frame cache unit, the current frame cache unit is connected with the display control unit, and the display control unit is connected with the display unit; the pre-judging frame buffer unit is connected with a convolutional neural network circuit, the convolutional neural network circuit is connected with a preset content judging unit, the preset content judging unit is respectively connected with a video control unit, a prompt information adding unit and a frame skipping storage unit, and the prompt information adding unit is connected with a current frame buffer unit; the method comprises the following steps:

the video control unit first receives video stream information entering step S701. The video stream information comprises multi-frame image information, the multi-frame image information comprises current frame information and first prejudgment frame information, and the first prejudgment frame information is the next key frame information corresponding to the current frame information.

And then, the video decoding unit analyzes the current frame information in step S702, stores the analyzed current frame information into the current frame cache unit, analyzes the first pre-judgment frame information, and stores the analyzed first pre-judgment frame information into the pre-judgment frame cache unit.

Then, step S703 is carried out, the convolutional neural network circuit obtains first pre-judging frame information in the pre-judging frame cache unit, whether the first pre-judging frame information is matched with the preset content is judged, if so, step S704 is carried out, the preset content judgment unit sends a first control signal to the prompt information adding unit, and sends a second control signal to the video control unit; otherwise, the process proceeds to step S710.

Then step S705, the video control unit receives a second control signal, and sends second pre-judging frame information to the video decoding unit for analysis, and the video decoding unit stores the analyzed second pre-judging frame information into the pre-judging frame cache unit; the second pre-judging frame information is the next key frame information corresponding to the first pre-judging frame information.

Then, the convolutional neural network circuit in the step S706 acquires the next prejudgment frame information in the prejudgment frame cache unit, judges whether the next prejudgment frame information is matched with the preset content, and if so, the convolutional neural network circuit in the step S707 sends a first control signal to the prompt information adding unit and sends a second control signal to the video control unit again; and then proceeds to step S706 again. If not, step S710 is performed to record the timestamp corresponding to the pre-determined frame information in the pre-determined frame buffer unit at this time; the preset content judging unit stores the timestamp information corresponding to the prejudgment frame information in the current prejudgment frame cache unit and all the frame information between the timestamp and the timestamp of the prejudgment frame information which is judged to be matched with the preset content for the first time into the frame skipping storage unit.

When a certain piece of prejudgment frame information is judged to accord with the preset content, the device can further judge the next prejudgment frame information corresponding to the prejudgment frame information, namely when the judgment result of the step S706 is yes, the device can continue to enter the step S705 after entering the step S707, and in parallel, the device can enter the step S708 after entering the step S707, a prompt information adding unit obtains the current frame information in a current frame cache unit after receiving the first control signal, adds prompt information to the current frame information, obtains current frame adjustment information, and writes the current frame adjustment information back to the current frame cache unit; then, step S709 is entered, the display control unit obtains the current frame adjustment information and transmits it to the display unit for displaying, and the display unit displays the current frame adjustment information. Therefore, through early warning, the user can be informed that violent or erotic pictures exist in the video stream played next in time, and the user is reminded to take corresponding measures to process.

In some embodiments, the convolutional neural network circuit comprises a parameter initialization unit, an initial value storage unit, an access unit, a parameter storage unit, a reconfigurable network matrix unit, an error calculation unit and a back propagation write-back unit; the parameter storage unit stores parameter elements, the parameter elements comprise convolution kernels, weights and convolution offset values, and the access unit comprises a weight access unit, a convolution kernel access unit and a convolution offset access unit; before acquiring first pre-judgment frame information in the pre-judgment frame cache unit, the convolutional neural network circuit performs neural network training on preset content; the method specifically comprises the following steps:

In certain embodiments, the method further comprises: the video decoding unit analyzes the next coding frame corresponding to the first pre-judging frame information to obtain motion vector information corresponding to the first pre-judging frame information, and transmits the motion vector information to the convolutional neural network circuit; the step of judging whether the first pre-judging frame information is matched with the preset content by the convolutional neural network circuit further comprises the following steps: and the convolutional neural network circuit performs neural network identification calculation on the motion vector information corresponding to the first pre-judging frame information, calculates the matching degree of the motion vector information with the preset motion vector information, and judges whether the first pre-judging frame information is matched with the preset content or not according to the first pre-judging frame information and the calculation result of the matching degree of the motion vector information corresponding to the first pre-judging frame information.

In some embodiments, when the predetermined content determining unit determines that the predetermined frame information in the current predetermined frame buffer unit matches the predetermined content, the method further includes: and the video control unit receives the frame skipping instruction and does not send all the frame information in the frame skipping storage unit to the video decoding unit for analysis.

In some embodiments, the prompt information is early warning subtitle information, and the apparatus further includes a subtitle storage unit; the prompt information adding unit acquires the current frame information in the current frame caching unit after receiving the first control signal, and adds prompt information to the current frame information to obtain current frame adjustment information, wherein the current frame adjustment information comprises: the prompt information adding unit acquires the subtitle information in the subtitle storage unit, superimposes early warning subtitle information on the subtitle information to obtain subtitle adjustment information, and adds the subtitle adjustment information to the corresponding position of the current frame information to obtain the current frame adjustment information.

It should be noted that, although the above embodiments have been described herein, the invention is not limited thereto. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by making changes and modifications to the embodiments described herein, or by using equivalent structures or equivalent processes performed in the content of the present specification and the attached drawings, which are included in the scope of the present invention.

Claims

1. A deep learning video decoding device based on content prediction is characterized by comprising a video control unit, a video decoding unit, a frame buffer unit, a convolutional neural network circuit, a preset content judgment unit, a prompt information adding unit, a frame skipping storage unit, a display control unit and a display unit; the frame buffer unit comprises a current frame buffer unit and a prejudgment frame buffer unit; the video control unit is connected with the video decoding unit, the video decoding unit is respectively connected with the current frame cache unit and the prejudgment frame cache unit, the current frame cache unit is connected with the display control unit, and the display control unit is connected with the display unit; the pre-judging frame buffer unit is connected with a convolutional neural network circuit, the convolutional neural network circuit is connected with a preset content judging unit, the preset content judging unit is respectively connected with a video control unit, a prompt information adding unit and a frame skipping storage unit, and the prompt information adding unit is connected with a current frame buffer unit;

the convolutional neural network circuit is used for acquiring first pre-judgment frame information in the pre-judgment frame cache unit and judging whether the first pre-judgment frame information is matched with preset content; if the first control signal is matched with the first control signal, the preset content judging unit is used for sending the first control signal to the prompt information adding unit and sending the second control signal to the video control unit; the preset content is set image information needing frame skipping;

2. The content prediction-based deep learning video decoding device according to claim 1, wherein the convolutional neural network circuit comprises a parameter initialization unit, an initial value storage unit, an access unit, a parameter storage unit, a reconfigurable network matrix unit, an error calculation unit, and a back propagation write-back unit; the parameter storage unit stores parameter elements, the parameter elements comprise convolution kernels, weights and convolution offset values, and the access unit comprises a weight access unit, a convolution kernel access unit and a convolution offset access unit;

the reconfigurable network matrix unit is used for carrying out neural network calculation according to the initial values of the configured parameter elements, and the error calculation unit is used for judging whether the matching degree of the calculation result and the real information is greater than the preset matching degree; if so, judging that the training is finished, and updating the parameter values of the parameter elements to the current parameter values by the back propagation write-back unit and writing the updated parameter values into the parameter storage unit; otherwise, the reconfigurable network matrix unit is used for adjusting the configuration parameter values of all the parameter elements according to the difference between the matching degree of the current training result and the matching degree of the last training result, writing the adjusted parameter values into the parameter storage unit through the back propagation write-back unit, and controlling the access unit to obtain the corresponding number of all the parameter elements according to the adjusted parameter values and perform neural network calculation again until the training is completed;

the real information refers to characteristic information input to the reconfigurable network matrix unit, namely preset content information.

3. The deep learning video decoding device based on content prediction according to claim 1 or 2, wherein the video decoding unit is further configured to parse a next encoded frame corresponding to the first pre-determined frame information to obtain motion vector information corresponding to the first pre-determined frame information, and transmit the motion vector information to the convolutional neural network circuit;

4. The deep learning video decoding device based on content prediction as claimed in claim 1 or 2, wherein when the predetermined content determining unit determines that the predetermined frame information in the current predetermined frame buffer unit matches the predetermined content, the video control unit is further configured to receive a frame skipping instruction without sending all the frame information in the frame skipping storage unit to the video decoding unit for parsing.

5. The deep learning video decoding apparatus based on content prediction according to claim 1 or 2, wherein the cue information is pre-warning subtitle information, the apparatus further comprising a subtitle storage unit; the prompt information adding unit is used for acquiring the current frame information in the current frame caching unit after receiving the first control signal, and adding prompt information to the current frame information to obtain current frame adjustment information, wherein the current frame adjustment information comprises:

6. A deep learning video decoding method based on content prediction is characterized in that the method is applied to a deep learning video decoding device based on content prediction, and the device comprises a video control unit, a video decoding unit, a frame buffer unit, a convolutional neural network circuit, a preset content judgment unit, a prompt information adding unit, a frame skipping storage unit, a display control unit and a display unit; the frame buffer unit comprises a current frame buffer unit and a prejudgment frame buffer unit; the video control unit is connected with the video decoding unit, the video decoding unit is respectively connected with the current frame cache unit and the prejudgment frame cache unit, the current frame cache unit is connected with the display control unit, and the display control unit is connected with the display unit; the pre-judging frame buffer unit is connected with a convolutional neural network circuit, the convolutional neural network circuit is connected with a preset content judging unit, the preset content judging unit is respectively connected with a video control unit, a prompt information adding unit and a frame skipping storage unit, and the prompt information adding unit is connected with a current frame buffer unit; the method comprises the following steps:

the convolutional neural network circuit acquires first pre-judgment frame information in a pre-judgment frame cache unit and judges whether the first pre-judgment frame information is matched with preset content; if the first control signal is matched with the second control signal, the preset content judgment unit sends the first control signal to the prompt information adding unit and sends the second control signal to the video control unit; the preset content is set image information needing frame skipping;

7. The content prediction-based deep learning video decoding method according to claim 6, wherein the convolutional neural network circuit comprises a parameter initialization unit, an initial value storage unit, an access unit, a parameter storage unit, a reconfigurable network matrix unit, an error calculation unit and a back propagation write-back unit; the parameter storage unit stores parameter elements, the parameter elements comprise convolution kernels, weights and convolution offset values, and the access unit comprises a weight access unit, a convolution kernel access unit and a convolution offset access unit; before acquiring first pre-judgment frame information in the pre-judgment frame cache unit, the convolutional neural network circuit performs neural network training on preset content; the method specifically comprises the following steps:

the reconfigurable network matrix unit carries out neural network calculation according to the initial values of the configured parameter elements; the error calculation unit judges whether the matching degree of the calculation result and the real information is greater than the preset matching degree, if so, the training is judged to be completed, the back propagation write-back unit updates the parameter values of the parameter elements to the current parameter values, and writes the updated parameter values into the parameter storage unit; otherwise, the reconfigurable network matrix unit adjusts the configuration parameter values of all parameter elements according to the difference between the matching degree of the current training result and the matching degree of the last training result, writes the adjusted parameter values into the parameter storage unit through the back propagation write-back unit, and controls the access unit to obtain the corresponding number of all parameter elements according to the adjusted parameter values and perform neural network calculation again until the training is completed;

8. The method for content prediction based deep learning video decoding according to claim 6 or 7, wherein the method further comprises:

9. The method for decoding deep learning video based on content prediction according to claim 6 or 7, wherein when the predetermined content decision unit decides that the predicted frame information in the current predicted frame buffer unit matches the predetermined content, the method further comprises: and the video control unit receives the frame skipping instruction and does not send all the frame information in the frame skipping storage unit to the video decoding unit for analysis.

10. The method for decoding deep learning video based on content prediction according to claim 6 or 7, wherein the prompt information is early warning caption information, the apparatus further comprises a caption storage unit; the prompt information adding unit acquires the current frame information in the current frame caching unit after receiving the first control signal, and adds prompt information to the current frame information to obtain current frame adjustment information, wherein the current frame adjustment information comprises: