WO2023093339A1

WO2023093339A1 - Video processing method and apparatus based on intelligent digital retina

Info

Publication number: WO2023093339A1
Application number: PCT/CN2022/124876
Authority: WO
Inventors: 滕波; 王琪; 向国庆; 周东东; 洪一帆; 张羿; 焦立欣
Original assignee: 浙江智慧视频安防创新中心有限公司
Priority date: 2021-11-26
Filing date: 2022-10-12
Publication date: 2023-06-01
Also published as: CN113840147A; CN113840147B

Abstract

Disclosed in the present invention are a video processing method and apparatus based on an intelligent digital retina. The method comprises: associating each time slice in a time slice division result with the number of searches and/or the number of playbacks and performing analysis to obtain the number of interests in the time slice; according to the data volume of data to be deleted corresponding to a target time window and the number of interests in each time slice, determining and deleting a data slice to be deleted of the target time window, wherein the data slice to be deleted comprises a video data slice to be deleted and/or a feature data slice to be deleted, to obtain a processed video stream and a corresponding feature stream. According to the video processing method provided in the embodiments of the present application, a data slice to be deleted of a target time window can be accurately determined and deleted according to the data volume of data to be deleted corresponding to the target time window and the number of interests in each time slice; in this way, the storage overhead in a video processing process based on an intelligent digital retina can be effectively reduced.

Description

A video processing method and device based on intelligent digital retina

technical field

The invention relates to the technical field of video processing, in particular to a video processing method and device based on an intelligent digital retina.

Background technique

Since the concept of digital retina was put forward, it has attracted great attention in the fields of video codec, video surveillance and so on. In the traditional field of image processing, video compression and video analysis belong to two different fields. Inspired by the biological functions of the human retina, digital retina technology is the first to propose an intelligent image sensor integrating video compression and video analysis. Specifically, the digital retina is characterized by the ability to obtain video compression data and video feature data at the same time, and transmit them to the cloud through data streams for later playback and retrieval. In order to obtain the feature flow of the image, the digital retina technology introduces the concept of model flow, which means that the image acquisition front-end can apply different feature extraction models according to the needs, and these models can be sent to the image acquisition front-end through cloud storage and reverse transmission. .

In video compression, the basic idea is to compress the spatio-temporal redundant information of the video by computing. The basic paradigm of video compression has not changed significantly in the past few decades. Block-based video compression codec technology has developed very maturely, which has the characteristics of moderate computational complexity, high compression rate, and high reconstruction quality. Therefore, in the past It has been widely used for decades, and the current mainstream codec technologies include H.264/H.265/H.266 and MPEG2/MPEG4, etc., which are mainly based on block-based video codec technology. Since the early video coding, the paradigm of coding theory has not changed. The technology adopted in the new generation of coding standards is to improve the compression ratio by "computing for space". For example, the evolution from H.264 to H.265 increases the compression rate by 50%, but it also brings greater computing requirements. This is due to the use of more flexible coding units and more flexible reference frames, so that the compression method based on motion compensation can tap more compression potential.

Since the Digital Retina Framework fuses video-related aspects of feature recognition and data compression, a new paradigm is created that excludes a technique measured by a single Comprehensive evaluation method of the target. This is also the valuable enlightenment obtained from the biological structure of the retina. The retina is not simply transmitting or compressing image data, but an intelligent front-end device that serves various complex tasks of the brain.

However, although digital retina technology brings the integration and intelligence of video collection and analysis, it also means higher requirements for storage space. On the one hand, the cloud server needs to store video stream data, on the other hand, it also needs to store feature stream data, and at the same time, it also needs to store model data.

How to reduce the storage overhead of the video processing method based on intelligent digital retina is a technical problem to be solved.

Contents of the invention

Based on this, it is necessary to provide a video processing method, device, electronic device and storage medium based on an intelligent digital retina to solve the problem that the existing intelligent digital retina-based video processing method consumes a large amount of storage overhead.

In the first aspect, the embodiment of the present application provides a video processing method based on intelligent digital retina, the method comprising:

Get the video stream and the corresponding feature stream;

The video stream and the corresponding feature stream are divided into time slices according to the preset division method, and the corresponding time slice division results are obtained, and the time slice division results include the timestamp corresponding to each time slice, the corresponding video data slice and Corresponding feature data slice;

Associating and analyzing each time slice in the time slice division result with the number of searches and/or the number of playbacks to obtain the number of times of attention for each time slice;

According to the amount of data to be deleted corresponding to the target time window and the number of times of attention for each time slice, determine and delete the data slices to be deleted in the target time window, the data slices to be deleted include video data slices to be deleted and/or the feature data slice to be deleted to obtain the processed video stream and the corresponding feature stream.

In one embodiment, the data slice to be deleted includes the first video data slice to be deleted in the target time window and the first feature data slice to be deleted in the target time window; The amount of data to be deleted corresponding to the window and the number of times of attention for each time slice, determining and deleting the data slice to be deleted in the target time window includes:

The first video data piece to be deleted and the first feature data piece to be deleted are determined and deleted according to the amount of data to be deleted corresponding to the target time window and the number of times of attention in each time slice.

In one embodiment, the data slice to be deleted includes the second video data slice to be deleted in the target time window, and the data volume of the data to be deleted corresponding to the target time window and the number of each time slice Focusing on the number of times, determining and deleting the data pieces to be deleted in the target time window include:

The second video data piece to be deleted is determined and deleted according to the amount of data to be deleted corresponding to the target time window and the number of times of attention in each time slice.

In one embodiment, the method also includes:

The video data pieces to be deleted in the target time window are deleted according to a preset deletion manner.

In one embodiment, the deleting the video data pieces to be deleted in the target time window according to the preset deletion method includes:

Obtain the total amount of data within the target time window, and obtain the maximum amount of stored data to be allocated;

calculating the difference between the total amount of data within the target time window and the maximum amount of stored data to be allocated;

determining the video data slice to be deleted in the target time window based on the difference;

Deleting the video data piece to be deleted, and generating reconstruction feature data for reconstructing video data;

The characteristic data piece of the target time window and the reconstructed characteristic data are stored.

In one embodiment, the method also includes:

Obtaining a video reconstruction model, undeleted video data, feature data slices of the target time window, and the reconstructed feature data;

Based on the video reconstruction model, undeleted video data, feature data slices of the target time window, and the reconstructed feature data, perform reconstruction processing on the data slices to be deleted in the target time window, and generate corresponding reconstructed videos data.

In one embodiment, the method also includes:

Match the corresponding video reconstruction model based on the type of depth model.

In one embodiment, the type matching based on the depth model to the corresponding video reconstruction model includes:

If the depth model is a model that generates an image with a first preset resolution range, the matched corresponding video reconstruction model is a reconstructed depth model with a second preset resolution range; or,

If the depth model is a feature extraction model, the matched corresponding video reconstruction model is a decoder of an autoencoder; or,

If the depth model is a model for extracting reconstruction features based on a generative adversarial model, the corresponding video reconstruction model that is matched is a generative adversarial network.

In the second aspect, the embodiment of the present application provides a video processing device based on intelligent digital retina, the device includes:

An acquisition module, configured to acquire video streams and corresponding feature streams;

A division module, configured to divide the video stream and the corresponding feature stream acquired by the acquisition module into time slices according to a preset division method to obtain corresponding time slice division results, the time slice division results including each time slice Corresponding timestamp, corresponding video data slice and corresponding feature data slice;

An association analysis module, for associating and analyzing each time slice in the time slice division result obtained by the division module with the number of searches and/or the number of playbacks, to obtain the number of times of attention for each time slice;

The determination and deletion module is used to determine and delete the data slice to be deleted in the target time window according to the amount of data to be deleted corresponding to the target time window and the number of times of attention for each time slice obtained by the association analysis module, The data slices to be deleted include video data slices to be deleted and/or feature data slices to be deleted, and a processed video stream and corresponding feature streams are obtained.

In the third aspect, the embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor runs the computer program to realize Method steps as described above.

In a fourth aspect, the embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the program is executed by a processor to implement the above-mentioned method steps.

The technical solutions provided by the embodiments of the present application may include the following beneficial effects:

In the embodiment of the present application, the video stream and the corresponding feature stream are obtained; the video stream and the corresponding feature stream are divided into time slices according to the preset division method, and the corresponding time slice division results are obtained. The time slice division results include each time slice The time stamp corresponding to the slice, the corresponding video data slice, and the corresponding feature data slice; associate and analyze each time slice in the time slice division result with the number of searches and/or playback numbers, and obtain the number of attentions for each time slice ; and according to the amount of data to be deleted corresponding to the target time window and the number of times of concern for each time slice, determine and delete the data slices to be deleted in the target time window, the data slices to be deleted include video data slices to be deleted and/or Or the feature data slice to be deleted, to obtain the processed video stream and the corresponding feature stream. The video processing method based on the intelligent digital retina provided by the embodiment of the present application can accurately determine and delete the data to be deleted in the target time window according to the data volume of the data to be deleted corresponding to the target time window and the number of attentions in each time slice In this way, the storage overhead in intelligent digital retina-based video processing can be effectively reduced. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention.

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.

FIG. 1 is a schematic flow diagram of a video processing method based on an intelligent digital retina provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a storage judgment working mechanism in a specific application scenario provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of video data slices and feature data slices corresponding to time windows in a specific application scenario provided by an embodiment of the present application;

4 is a schematic diagram of a deleted video data piece and a deleted feature data piece corresponding to a time window in a specific application scenario provided by an embodiment of the present application;

Fig. 5 is a schematic diagram of storing and reconstructing feature data and storing feature data slices in a specific application scenario provided by an embodiment of the present application;

Fig. 6 is a schematic structural diagram of a video processing device based on an intelligent digital retina provided by an embodiment of the present application;

Fig. 7 shows a schematic diagram of a connection structure of an electronic device according to an embodiment of the present application.

Detailed ways

The following description and drawings illustrate specific embodiments of the invention sufficiently to enable those skilled in the art to practice them.

It should be clear that the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

Optional embodiments of the present disclosure will be described in detail below in conjunction with the accompanying drawings.

As shown in FIG. 1 , it is a schematic flowchart of a video processing method based on an intelligent digital retina provided by an embodiment of the present application; as shown in FIG. 1 , an embodiment of the present application provides a video processing method based on an intelligent digital retina, Concretely include the following method steps:

S102: Obtain a video stream and a corresponding feature stream.

The video processing method provided in the embodiment of the present application is based on the intelligent digital retina technology, and the principle of the intelligent digital retina technology is as follows:

Front-end devices have both video compression and deep models for video feature extraction. Since the backend can deploy different models to the frontend through the transmission method, it can be understood that the frontend device has the ability to adaptively acquire any depth model. Therefore, as long as a model with special feature extraction capabilities is trained offline, it can be deployed to the front-end device through the intelligent digital retina model stream. In the cloud, the main purpose of the feature stream is to perform image retrieval. After the user obtains the retrieval results, a common linkage requirement is to playback images or videos. However, in application areas such as smart cities, a large city has millions of front-end acquisition devices. Although cloud servers and high-speed communication networks can realize data transmission and cloud processing, storage space is still a bottleneck. This is because the amount of video data generated in real time is huge, and cloud storage can only store video data streams for a limited time in the past. Since traditional video coding technology is based on pixels, the storage side can only choose to retain video data based on the timestamp, for example, only save the data of the past 7 days. In other words, the system will automatically "forget" the data before 7 days. However, this is completely different from the way the human brain memorizes data. The human brain's memory of video data is almost non-pixel-based. For example, the amygdala is responsible for emotion-based memories, and the amygdala forms important memories under the action of stress hormones. , that is to say, the amygdala stores data based on the importance of events; the memory data mode of the cerebellum is based on implicit information rather than direct information; the prefrontal cortex is responsible for processing and memorizing the processed semantic information; in addition , The brain also performs long-term and short-term memory conversion, and this work is done through the hippocampus.

The intelligent digital retina used in the video processing method provided by the embodiment of the present application can not only provide simultaneous acquisition and transmission of video streams and feature streams, but also store data and utilize newly added feature stream data.

S104: Divide the video stream and the corresponding feature stream into time slices according to the preset division method, and obtain the corresponding time slice division results. The time slice division results include the timestamp corresponding to each time slice, the corresponding video data slice and the corresponding Feature data sheet.

In an implementation manner, the preset division manner may be: the division of time slices is divided into time slices of equal length. In another implementation manner, the preset division method may be: the time slice is divided according to coded GOP segments.

S106: Associating and analyzing each time slice in the time slice division result with the number of searches and/or the number of playbacks to obtain the number of attentions for each time slice.

S108: According to the amount of data to be deleted corresponding to the target time window and the number of times of attention for each time slice, determine and delete the data slices to be deleted in the target time window, the data slices to be deleted include the video data slices to be deleted and/or Or the feature data slice to be deleted, to obtain the processed video stream and the corresponding feature stream.

In a possible implementation, the data slices to be deleted include the first video data slice to be deleted in the target time window and the first feature data slice to be deleted in the target time window, according to the data to be deleted corresponding to the target time window The amount of data and the number of times of attention in each time slice, determining and deleting the data slices to be deleted in the target time window includes the following steps:

Determine and delete the first video data piece to be deleted and the first feature data piece to be deleted according to the amount of data to be deleted corresponding to the target time window and the number of times of attention in each time slice.

In a possible implementation, the data slice to be deleted includes the second video data slice to be deleted in the target time window, and is determined according to the amount of data to be deleted corresponding to the target time window and the number of times of attention in each time slice And deleting the data piece to be deleted in the target time window includes the following steps:

According to the amount of data to be deleted corresponding to the target time window and the number of times of attention in each time slice, determine and delete the second video data slice to be deleted.

As shown in FIG. 2 , it is a schematic diagram of a storage judgment working mechanism in a specific application scenario provided by the embodiment of the present application.

As shown in Figure 2, a search engine is used to provide users with video retrieval. The search engine users match the search results from the feature storage and give feedback to the users. If the user further needs video playback, the playback engine will decode the video stream information according to the timestamp information and play the video. Further, as shown in FIG. 3, a decision module is stored, which generates a decision to delete/retain feature data and video stream data according to the search results of the search engine and/or the playback results of the playback engine. This is because the search engine obtains the user's feature input, which means that the searched feature belongs to the user's attention feature. The user operations of the playback engine are also highly correlated with the content that the user cares about. Therefore, the storage judgment module judges the preserved features and video data through the user's feedback.

The working mechanism of storage judgment is introduced in detail below.

First, divide the video stream and the corresponding feature stream into time slices. In an implementation manner, the time slices are divided into time slices of equal length. In another embodiment, the time slice is divided according to encoded GOP segments. Further, each time slice is associated with the number of searches and playbacks. Obtain an attention count corresponding to each time slice. Within a time window, calculate and generate a data volume of deleted data, and delete the data volume in order of the number of attentions from low to high. The time window is updated periodically, therefore, data deletion is performed periodically. As shown in FIG. 3 , each video data piece corresponds to the feature data piece with the same time, and is associated with an attention count. Note that the attention times corresponding to the video data slice and the feature data slice in the same time period are not necessarily equal. The time window covers the video data slices and special data slices from timestamp 3 to timestamp 8.

Further, according to the number of attentions in the time window and the planned amount of data deletion, the storage decision module will delete the video data slice corresponding to timestamp 4 and the feature data slice corresponding to timestamp 5 in an example. Specifically, suppose the total amount of data in the time window is D _t , and the maximum allocable storage space is D _max , and there is

D _t −D _Max =D _D >0.

At this time, the deletion judgment calculates and executes the deleted data slice according to _D , and the result can be directly calculated according to the number of attentions in Figure 2, but the video processing method provided in the embodiment of the present application includes using any algorithm to calculate the required deleted data piece. The deleted data is shown in Figure 4. Since the amount of feature data depends on the feature extraction model used, in some cases the amount of data in the feature stream is very small, and deleting the corresponding feature data slices does not save much storage space. Therefore, in one embodiment, the deletion process can only be performed on the video data slice or the feature data slice.

In a possible implementation manner, the video processing method provided in the embodiment of the present application further includes the following steps:

The video data pieces to be deleted in the target time window are deleted according to a preset deletion method.

In the video processing method provided in the embodiment of the present application, the deletion of video or feature data is a direct deletion, that is, all related data will be discarded. Although this method is relatively easy to implement, it has some disadvantages. In a limited time, the distribution of attention times cannot represent all potential user needs. If a piece of data is deleted, the user will no longer be able to get any feedback on the video within the timestamp. Therefore, if other methods can be used to meet the needs of small probability and respond, it will bring a qualitative breakthrough compared to traditional storage and deletion methods. Due to the limited storage resources in the cloud, computing resources can be coordinated at any time. Therefore, in one embodiment, data deletion adopts an "incomplete" deletion method.

In the embodiment of this application, the default deletion method is the "incomplete" deletion method, and the specific steps are as follows:

S1: Calculate the video data slice to be deleted according to the time window and allocated storage resources.

S2: Deleting the video data to be deleted, and generating feature data that can be used to reconstruct the video data.

S3: retain the feature data slice and the reconstructed feature data generated in S2.

As mentioned above, the video data slice in Figure 4 can be a closed GOP, which means that deleting a video data slice not only deletes the B frame or P frame with a small amount of data, but also includes the I frame with a large amount of data. frame. Since a closed GOP is encoded independently, it means that all video data in the data slice will completely disappear. Therefore, the feature data described in S2 does not depend on any block-based coding video data. In a more feasible method, the feature data of the reconstructed video data of S2 is obtained through a deep learning model. Figure 5 shows a schematic diagram of the above process. Based on the storage decision, a piece of video data is input to a deep model for extracting reconstruction feature data, while the original encoded data is discarded. Ultimately, only the reconstructed feature data and feature data slices will be kept on the memory side.

In a possible implementation manner, deleting the video data piece to be deleted in the target time window according to a preset deletion method includes the following steps:

Obtain the total amount of data in the target time window, and obtain the maximum amount of storage data to be allocated;

Calculate the difference between the total amount of data in the target time window and the maximum amount of stored data to be allocated;

Determine the video data slice to be deleted in the target time window based on the difference;

Deleting the video data sheet to be deleted, and generating reconstruction feature data for reconstructing the video data;

Store feature data slices and reconstruction feature data of the target time window.

Obtain the video reconstruction model, the undeleted video data, the feature data slice of the target time window and the reconstructed feature data;

Based on the video reconstruction model, undeleted video data, feature data slices of the target time window, and reconstruction feature data, the data slices to be deleted in the target time window are reconstructed to generate corresponding reconstructed video data.

In a possible implementation manner, matching the corresponding video reconstruction model based on the type of the depth model includes the following steps:

If the depth model is a model that generates an image with a first preset resolution range, the corresponding video reconstruction model that is matched is a reconstructed depth model with a second preset resolution range.

In this step, the first preset resolution range is often an ultra-low resolution range, and the corresponding second preset resolution range of the reconstructed depth model is an ultra-high resolution range. If the depth model is a model capable of generating super-low-resolution images, the generated super-low-resolution images are images generated by coding consecutive images in a residual-based coding manner.

If the depth model is a feature extraction model, the matched corresponding video reconstruction model is a decoder of an autoencoder.

In this step, if the depth model is a feature extraction model, for example, an encoder of a sub-encoder, the corresponding video reconstruction model is a decoder of an auto-encoder.

If the depth model is a model that extracts reconstruction features based on a generative adversarial model, the corresponding video reconstruction model that is matched is a generative adversarial network.

In this step, if the depth model is a model based on the generative confrontation model for reconstruction feature extraction, the feature extraction model is mainly used to extract the memory features of human body bone features and appearance attribute information, and the corresponding video reconstruction model is a Generative Adversarial Networks. The GAN takes feature values as input and reconstructs video data according to a trained generative model.

After processing as shown in Figure 5, the storage decision unit combined with the depth model can still guarantee the storage space consumption within the time window. At the same time, since the characteristic data pieces are completely preserved, the user may have a playback requirement for the deleted video data pieces. At this point, the playback engine will utilize the depth model to reconstruct the deleted data from the reconstructed feature data. The depth model used for video reconstruction corresponds one-to-one with the depth model used for reconstruction feature extraction in Fig. 5.

In the embodiment of the present application, the video stream and the corresponding feature stream are obtained; the video stream and the corresponding feature stream are divided into time slices according to the preset division method, and the corresponding time slice division results are obtained. The time slice division results include each time slice Timestamp corresponding to the slice, corresponding video data slice, and corresponding feature data slice; associate and analyze each time slice in the time slice division result with the number of searches and/or replays, and obtain the number of attentions for each time slice ; and according to the amount of data to be deleted corresponding to the target time window and the number of times of concern for each time slice, determine and delete the data slices to be deleted in the target time window, the data slices to be deleted include video data slices to be deleted and/or Or the feature data slice to be deleted, to obtain the processed video stream and the corresponding feature stream. The intelligent digital retina-based video processing method provided in the embodiment of the present application can accurately determine and delete the data to be deleted in the target time window according to the amount of data to be deleted corresponding to the target time window and the number of attentions in each time slice In this way, the storage overhead in intelligent digital retina-based video processing can be effectively reduced.

The following is an embodiment of video processing based on intelligent digital retina in the embodiment of the present application, which can be used to implement the embodiment of the video processing method based on intelligent digital retina in the embodiment of the present application. For the details not disclosed in the embodiment of the intelligent digital retina-based video processing device in the embodiment of the present application, please refer to the embodiment of the intelligent digital retina-based video processing method in the embodiment of the present application.

Please refer to FIG. 6 , which shows a schematic structural diagram of an intelligent digital retina-based video processing device provided by an exemplary embodiment of the present invention. The intelligent digital retina-based video processing device can be implemented as all or a part of the terminal through software, hardware or a combination of the two. The intelligent digital retina-based video processing device includes an acquisition module 602 , a division module 604 , an association analysis module 606 and a determination and deletion module 608 .

Specifically, the acquisition module 602 is configured to acquire video streams and corresponding feature streams;

The division module 604 is configured to divide the video stream and the corresponding feature stream acquired by the acquisition module 602 into time slices according to a preset division method, and obtain corresponding time slice division results, the time slice division results including the timestamp corresponding to each time slice , the corresponding video data slice and the corresponding feature data slice;

The association analysis module 606 is used to associate and analyze each time slice in the time slice division result obtained by the division module 604 with the search quantity and/or playback quantity, and obtain the number of times of attention of each time slice;

The determination and deletion module 608 is used to determine and delete the data slice to be deleted in the target time window according to the data volume of the data to be deleted corresponding to the target time window and the number of times of concern for each time slice obtained by the association analysis module 606. The data slices include video data slices to be deleted and/or feature data slices to be deleted, and a processed video stream and corresponding feature streams are obtained.

Optionally, the data slices to be deleted include the first video data slice to be deleted of the target time window and the first feature data slice to be deleted of the target time window, and the determination and deletion module 608 is used for:

Optionally, the data slice to be deleted includes the second video data slice to be deleted in the target time window, and the determination and deletion module 608 is used for:

Optionally, the device also includes:

A deletion module (not shown in FIG. 6 ), configured to delete the video data pieces to be deleted in the target time window according to a preset deletion method.

Optionally, remove modules specifically for:

Optionally, the obtaining module 602 is also used to: obtain the video reconstruction model, the characteristic data slice of the target time window and the reconstructed characteristic data;

Optionally, the device also includes:

The video data reconstruction module (not shown in Fig. 6) is used for based on the video reconstruction model obtained by the acquisition module 602, the undeleted video data, the feature data sheet and the reconstruction feature data of the target time window, to the target time window The deleted data slices are reconstructed to generate corresponding reconstructed video data.

Optionally, the device also includes:

The reconstruction model matching module (not shown in FIG. 6 ) is configured to match the corresponding video reconstruction model based on the type of the depth model.

Optionally, the reconstruction model matching module is specifically used for:

If the depth model is a model that generates an image with a first preset resolution range, the corresponding video reconstruction model that is matched is a reconstructed depth model with a second preset resolution range; or,

It should be noted that when the intelligent digital retina-based video processing device provided in the above-mentioned embodiments executes the intelligent digital retina-based video processing method, the division of the above-mentioned functional units is used as an example for illustration. The above function allocation is completed by different functional units, that is, the internal structure of the device is divided into different functional units, so as to complete all or part of the functions described above. In addition, the intelligent digital retina-based video processing device and the intelligent digital retina-based video processing method embodiment provided in the above-mentioned embodiments belong to the same concept, and its implementation process is detailed in the intelligent digital retina-based video processing method embodiment, which is not repeated here. repeat.

In the embodiment of the present application, the acquisition module is used to obtain the video stream and the corresponding feature stream; the division module is used to divide the video stream and the corresponding feature stream acquired by the acquisition module into time slices according to the preset division method, and obtain the corresponding time Slice division results, the time slice division results include the timestamp corresponding to each time slice, the corresponding video data slice and the corresponding feature data slice; the association analysis module is used to divide each time slice in the time slice division result obtained by the division module Correlating and analyzing with the number of searches and/or playbacks to obtain the number of attentions of each time slice; and the determination and deletion module is used to obtain each time according to the amount of data to be deleted corresponding to the target time window and the association analysis module The number of attentions of the slices, determine and delete the data slices to be deleted in the target time window, the data slices to be deleted include the video data slices to be deleted and/or the feature data slices to be deleted, and the processed video stream and corresponding features flow. The intelligent digital retina-based video processing device provided in the embodiment of the present application can accurately determine and delete the data to be deleted in the target time window according to the amount of data to be deleted corresponding to the target time window and the number of attentions in each time slice In this way, the storage overhead in intelligent digital retina-based video processing can be effectively reduced.

As shown in FIG. 7 , this embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and operable on the processor. The processor runs the computer program to realize the above-mentioned Method steps.

An embodiment of the present application provides a storage medium storing computer-readable instructions, on which a computer program is stored, and the program is executed by a processor to implement the above method steps.

Referring to FIG. 7 , it shows a schematic structural diagram of an electronic device suitable for implementing the embodiment of the present application. The terminal equipment in the embodiment of the present application may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like. The electronic device shown in FIG. 7 is only an example, and should not limit the functions and scope of use of this embodiment of the present application.

As shown in FIG. 7 , an electronic device may include a processing device (such as a central processing unit, a graphics processing unit, etc.) (RAM) 703 to execute various appropriate actions and processing. In RAM 703, various programs and data necessary for the operation of the electronic device are also stored. The processing device 701 , ROM 702 , and RAM 703 are connected to each other through a bus 704 . An input/output (I/O) interface 705 is also connected to the bus 704 .

Typically, the following devices can be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 707 such as a computer; a storage device 708 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While FIG. 7 shows an electronic device having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 709 , or from storage means 708 , or from ROM 702 . When the computer program is executed by the processing device 701, the above-mentioned functions defined in the method of the embodiment of the present application are performed.

It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.

Computer program code for carrying out the operations of the present disclosure can be written in one or more programming languages, or combinations thereof, including object-oriented programming languages—such as Java, Smalltalk, C++, and conventional Procedural Programming Language - such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present application may be implemented by means of software or by means of hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.

The above disclosures are only preferred embodiments of the present application, which certainly cannot limit the scope of the present application. Therefore, equivalent changes made according to the claims of the present application still fall within the scope of the present application.

Claims

A video processing method based on an intelligent digital retina, characterized in that the method comprises:

Get the video stream and the corresponding feature stream;

The video stream and the corresponding feature stream are divided into time slices according to the preset division method, and the corresponding time slice division results are obtained, and the time slice division results include the timestamp corresponding to each time slice, the corresponding video data slice and Corresponding feature data slice;

Associating and analyzing each time slice in the time slice division result with the number of searches and/or the number of playbacks to obtain the number of times of attention for each time slice;

According to the amount of data to be deleted corresponding to the target time window and the number of times of attention for each time slice, determine and delete the data slices to be deleted in the target time window, the data slices to be deleted include video data slices to be deleted and/or the feature data slice to be deleted to obtain the processed video stream and the corresponding feature stream.
The method according to claim 1, wherein the data slice to be deleted comprises the first video data slice to be deleted in the target time window and the first feature data slice to be deleted in the target time window ; According to the amount of data to be deleted corresponding to the target time window and the number of times of concern for each time slice, determining and deleting the data slice to be deleted in the target time window includes:

The first video data piece to be deleted and the first feature data piece to be deleted are determined and deleted according to the amount of data to be deleted corresponding to the target time window and the number of times of attention in each time slice.
The method according to claim 1, wherein the data slice to be deleted comprises the second video data slice to be deleted in the target time window, and the data volume of the data to be deleted corresponding to the target time window is and the number of times of concern for each time slice, determining and deleting the data slices to be deleted in the target time window includes:

The second video data piece to be deleted is determined and deleted according to the amount of data to be deleted corresponding to the target time window and the number of times of attention in each time slice.
The method according to claim 1, further comprising:

The video data pieces to be deleted in the target time window are deleted according to a preset deletion mode.
The method according to claim 4, wherein the deleting the video data slice to be deleted in the target time window according to a preset deletion method comprises:

Obtain the total amount of data within the target time window, and obtain the maximum amount of stored data to be allocated;

calculating the difference between the total amount of data within the target time window and the maximum amount of stored data to be allocated;

determining the video data slice to be deleted in the target time window based on the difference;

Deleting the video data piece to be deleted, and generating reconstruction feature data for reconstructing video data;

The characteristic data piece of the target time window and the reconstructed characteristic data are stored.
The method according to claim 5, wherein the method further comprises:

Obtaining a video reconstruction model, undeleted video data, feature data slices of the target time window, and the reconstructed feature data;

Based on the video reconstruction model, undeleted video data, feature data slices of the target time window, and the reconstructed feature data, perform reconstruction processing on the data slices to be deleted in the target time window, and generate corresponding reconstructed videos data.
The method according to claim 6, further comprising:

Match the corresponding video reconstruction model based on the type of depth model.
The method according to claim 7, wherein the type matching of the corresponding video reconstruction model based on the depth model comprises:

If the depth model is a model that generates an image with a first preset resolution range, the matched corresponding video reconstruction model is a reconstructed depth model with a second preset resolution range; or,

If the depth model is a feature extraction model, the matched corresponding video reconstruction model is a decoder of an autoencoder; or,

If the depth model is a model for extracting reconstruction features based on a generative adversarial model, the corresponding video reconstruction model that is matched is a generative adversarial network.
A video processing device based on an intelligent digital retina, characterized in that the device includes:

An acquisition module, configured to acquire video streams and corresponding feature streams;

A division module, configured to divide the video stream and the corresponding feature stream acquired by the acquisition module into time slices according to a preset division method to obtain corresponding time slice division results, the time slice division results including each time slice Corresponding timestamp, corresponding video data slice and corresponding feature data slice;

An association analysis module, for associating and analyzing each time slice in the time slice division result obtained by the division module with the number of searches and/or the number of playbacks, to obtain the number of times of attention for each time slice;

The determination and deletion module is used to determine and delete the data slice to be deleted in the target time window according to the amount of data to be deleted corresponding to the target time window and the number of times of attention for each time slice obtained by the association analysis module, The data slices to be deleted include video data slices to be deleted and/or feature data slices to be deleted, and a processed video stream and corresponding feature streams are obtained.
An electronic device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, characterized in that the processor runs the computer program to implement claims 1- 8. The video processing method described in any one.
A computer-readable storage medium on which a computer program is stored, wherein the program is executed by a processor to implement the video processing method according to any one of claims 1-8.