CN111966859A

CN111966859A - Video data processing method and device and readable storage medium

Info

Publication number: CN111966859A
Application number: CN202010876747.9A
Authority: CN
Inventors: 马兆远; 董利健; 韩德伟; 李康; 杨勖; 朱善玮; 殷小雷; 徐建; 毕东柱
Original assignee: Bmi Beijing Intelligent System Co ltd
Current assignee: Bmi Beijing Intelligent System Co ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2020-11-20

Abstract

The application provides a video data processing method and device and a readable storage medium. The video data processing method comprises the following steps: acquiring a processing request of video data; the processing request comprises compressed domain video data to be processed and a processing mode of the compressed domain video data, and the compressed domain video data is undecoded video data; extracting a target data stream based on the compressed domain video data; inputting the target data stream into a pre-trained feature extraction model to obtain video features corresponding to the target data stream; and processing the video features according to the processing mode. The processing method improves the efficiency and stability of video feature extraction.

Description

Video data processing method and device and readable storage medium

Technical Field

The present application relates to the field of video data processing technologies, and in particular, to a method and an apparatus for processing video data, and a readable storage medium.

Background

In the prior art, when video data is processed, a compressed domain video needs to be decoded to an image level and then further processed (for example, video feature extraction), the time consumption of a video decoding process is long, the efficiency of video feature extraction is affected, and the stability of extracted features is poor.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method and an apparatus for processing video data, and a readable storage medium, so as to improve efficiency and stability of video feature extraction.

In a first aspect, an embodiment of the present application provides a method for processing video data, including: acquiring a processing request of video data; the processing request comprises compressed domain video data to be processed and a processing mode of the compressed domain video data, and the compressed domain video data is undecoded video data; extracting a target data stream based on the compressed domain video data; inputting the target data stream into a pre-trained feature extraction model to obtain video features corresponding to the target data stream; and processing the video features according to the processing mode.

In the embodiment of the application, compared with the prior art, after the compressed domain video data to be processed is obtained, the target data stream is extracted according to the undecoded video data of the compressed domain video data, then the feature extraction is performed on the target data stream according to the trained feature extraction model, and finally the video features are processed according to the processing mode. On one hand, when the feature extraction is carried out, the undecoded video data is utilized, so that the video decoding process consuming longer time is avoided, and the efficiency of video feature extraction is improved; on the other hand, the extracted features are derived from a target data stream extracted from undecoded video data, and targeted feature extraction can improve the stability of the extracted features. Therefore, the method improves the efficiency and stability of video feature extraction.

As a possible implementation, before the processing request for obtaining video data, the method further includes: acquiring a training data set; the training data set comprises original video data and converted video data, and the converted video data is video data obtained by converting the original video data; extracting a first specified video data stream from the original video data and a second specified video data stream from the transformed video data; and training an initial feature extraction model based on the first specified video data stream and the second specified video data stream to obtain a trained feature extraction model.

In the embodiment of the application, when the feature extraction model is trained, the model is trained by using the original video data and the transformed video data, so that the robustness of the trained feature extraction model can be improved, for example, when the trained model performs feature extraction on a video transformed by an original video, corresponding features can be extracted, and the stability of feature extraction is further improved.

As a possible implementation, before the acquiring the training data set, the method further includes: acquiring the original video data; simulating a preset attack means to attack the original video data to obtain the converted video data; and combining the original video data and the transformed video data to obtain the training data set.

In the embodiment of the application, the original video data is attacked by a simulated attack means, and the converted video data can be quickly and effectively obtained.

As a possible implementation manner, the training an initial feature extraction model based on the first specified video data stream and the second specified video data stream to obtain a trained feature extraction model includes: converting the first designated video data stream and the second designated video data stream into a two-dimensional form; and inputting the first specified video data stream and the second specified video data stream which are converted into the two-dimensional form into the initial feature extraction model for training so as to obtain the trained feature extraction model.

In the embodiment of the application, the video frames are converted into the two-dimensional form and then input into the model for training, so that the training effect and efficiency can be improved, and the training of the model can be quickly and effectively completed.

As a possible implementation manner, the processing the video feature according to the processing manner includes: and when the processing mode is determined to be warehousing, storing the video characteristics into a preset video characteristic library.

In the embodiment of the application, if the video data needs to be put into a storage, the extracted video features are stored in a preset video feature library, so that the reusability of the extracted video features is improved.

As a possible implementation manner, the processing the video feature according to the processing manner includes: when the processing mode is determined to be query, matching the video features with the existing video features in a pre-stored video feature library; and if the video features are successfully matched with the existing video features, outputting a processing result of the compressed domain video data as an infringing video, wherein the processing result comprises an infringing fragment corresponding to the compressed domain video data.

In the embodiment of the application, the extraction of the video features can also be applied to an infringement comparison scene, at this time, the processing mode can be query, and the extracted video features are compared with the existing features to obtain a corresponding comparison result, so that the accuracy of the infringement comparison is improved.

As a possible implementation, the method further includes: acquiring source video data; extracting a source video data stream from the source video data; performing feature extraction on the source video data stream to obtain source video features corresponding to the source video data stream; and storing the source video characteristics to obtain the video characteristic library.

In the embodiment of the application, the video feature library is obtained by extracting the features of the source video data, so that the comprehensiveness of the video features in the video feature library is ensured, and the infringement comparison is facilitated.

In a second aspect, an embodiment of the present application provides a processing apparatus for video data, where the processing apparatus includes functional modules for implementing the method described in the first aspect and any one of the possible implementation manners of the first aspect.

In a third aspect, an embodiment of the present application provides a readable storage medium, where a computer program is stored on the readable storage medium, and when the computer program is executed by a computer, the computer program performs the method according to the first aspect and any one of the possible implementation manners of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart of a method for processing video data according to an embodiment of the present application;

fig. 2 is a functional block diagram of a video data processing apparatus according to an embodiment of the present disclosure.

Icon: 200-processing means of video data; 201-an acquisition module; 202-a first extraction module; 203-a second extraction module; 204-processing module.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

The video data processing method provided by the embodiment of the application can be applied to various systems needing video data processing, such as: video retrieval systems, video entry systems, video database systems, and the like. These systems generally include a front end and a server (back end), where the front end is used as a user end and can realize user interaction with the system, such as: a user can upload data through a front end, initiate a data processing request and the like; correspondingly, the front end can send the data uploaded by the user and the data processing request to the server, after the data processing request is processed by the server, the corresponding result is fed back to the front end, and the front end is displayed to the user for checking. The server is used as a data processing end and can process data and store the data, so that the system can realize various functions.

Based on the application scenario, please refer to fig. 1, which is a flowchart of a processing method of video data according to an embodiment of the present application, where the processing method may be applied to the server side, and the processing method includes:

step 101: acquiring a processing request of video data; the processing request comprises compressed domain video data to be processed and a processing mode of the compressed domain video data, and the compressed domain video data is undecoded video data.

Step 102: a target data stream is extracted based on the compressed domain video data.

Step 103: and inputting the target data stream into a pre-trained feature extraction model to obtain the video features corresponding to the target data stream.

Step 104: and processing the video characteristics according to the processing mode.

Next, detailed embodiments of steps 101 to 104 and the processing method will be described.

In step 101, the acquired processing request may be a processing request sent by the front end, and the processing request may be a real-time processing request or a non-real-time processing request. For real-time processing requests, for example: when a user needs to process certain video data in real time, a processing request is initiated in real time through the front end, and then the front end sends the processing request received in real time to the server, and the server receives the processing request. For non-real-time processing requests, for example: the user has a batch of video data to be processed, but does not require the processing of the video data to be completed currently, but has a specified processing time, the user can initiate a processing request of the batch of video data through the front end, the processing request is accompanied with the expected processing time of the batch of video data, after the front end sends the batch of video data and the expected processing time corresponding to the batch of video data to the server, the server obtains the processing request for processing the data before or when the expected processing time arrives.

Further, the processing request includes compressed domain video data to be processed and a processing mode of the compressed domain video data, for the compressed domain video data, the compressed domain refers to a video storage form, and a plurality of images can be obtained after the compressed domain video (which is a concept similar to a common "video") is decoded. Further, after acquiring the processing request in step 101, step 102 is executed to extract the target data stream based on the compressed domain video data. For step 102, in order to select a specific video data stream from the undecoded video data, during the extraction, the compressed domain video data may be demultiplexed by a video data processing tool (e.g., ffmpeg open source library) to separate into a video stream and an audio stream, and then the target data stream is extracted from the video stream based on the extraction rule. The extraction rule may be: the video data stream is extracted at fixed intervals or only the I-frame video data stream is extracted. For example, the interval between timestamps corresponding to each extracted data stream is 3S, then for one 1 minute compressed domain video data, 20 video data streams can be extracted. The extraction rule for extracting only the I-frame video data stream can be understood as a video data stream for extracting only key frames, the I-frame is an intra-frame coded image for compressing transmission data amount by removing image space redundant information as much as possible, and more representative video features can be extracted by performing feature extraction on the video data stream of the I-frame. In any way, the finally extracted target data stream is the undecoded video data, and the video decoding which takes a long time is not needed in the middle.

Further, after the target data stream is extracted in step 102, step 103 is executed to input the target data stream into a pre-trained feature extraction model to obtain the video features corresponding to the target data stream. To facilitate the understanding of step 103, a training method of the feature extraction model is described next.

As an alternative implementation, the training process of the feature extraction model includes: acquiring a training data set; the training data set comprises original video data and converted video data, and the converted video data is video data obtained by converting the original video data; extracting a first specified video data stream from the original video data and a second specified video data stream from the transformed video data; and training the initial feature extraction model based on the first specified video data stream and the second specified video data stream to obtain a trained feature extraction model.

The training data set includes original video data and transformed video data, and for the transformed video data, the original video data is transformed, and as an optional implementation manner, the acquisition of the training data set includes: acquiring original video data; simulating a preset attack means to attack the original video data to obtain transformed video data; and combining the original video data and the transformed video data to obtain a training data set.

When raw video data is acquired, a large number of raw videos can be collected through various channels, such as: locally stored raw video data is obtained, which may be uploaded by the user or previously collected from other sources. For another example: a large amount of raw video data on a network, such as videos published on various video websites, is obtained. After the original video data are obtained, a preset attack means is simulated to attack the original video data, and transformed video data (namely attack video data) are obtained. For attack means, such as: and modifying part of data in the original video data to enable the data to be possibly incomplete and to cause the condition of frame missing. For another example: the data structure in the original video data is destroyed, and the destroyed data structure may also result in incomplete data in the original video data.

Further, after the original video data and the transformed video data are obtained, the original video data and the transformed video data are correspondingly combined to obtain a training data set. For the combination mode, it can be understood that, assuming that there are 100 original videos and 100 transformed videos corresponding to the 100 original videos, the original videos and the transformed videos may be combined into data pairs according to their corresponding transformation relationships, and the obtained training data set includes 100 sets of video data pairs, where each set of video data pairs includes the original videos and the transformed videos.

It will be appreciated that both the original video data and the transformed video data are also compressed domain video data.

Further, based on the training data set, each data pair needs to be processed, a first specified video data stream is extracted from the original video data, and a second specified video data stream is extracted from the transformed video data. For the first designated video data stream and the second designated video data stream, the extraction manner of the target data stream described in the foregoing embodiment is the same, that is, demultiplexing is performed first, and then extraction is performed according to a set rule. It should be noted that, for the first designated video data stream and the second designated video data stream, extraction is performed according to the same rule, since data in the transformed video data frame may be damaged, if a corresponding video data stream in the transformed video data frame cannot be extracted under the same rule, an adjacent video data stream of the corresponding video data stream may be used as the second designated video data stream. Also, for each set of data pairs, extraction of the specified video data stream needs to be performed for the original video data and the transformed video data.

Further, after obtaining the first designated video data stream and the second designated video data stream, as an alternative implementation, the training process includes: converting the first designated video data stream and the second designated video data stream into a two-dimensional form; and inputting the first specified video data stream and the second specified video data stream which are converted into the two-dimensional form into an initial feature extraction model for training so as to obtain a trained feature extraction model.

In this embodiment, converting the data stream into a two-dimensional form is equivalent to preprocessing the training data, and the two-dimensional form conversion can be implemented by using techniques such as padding and crop. Further, the training data converted into two-dimensional form is input into an initial feature extraction model, which may be based on a depth model of the classification network, and when the classification network is trained based on the training data pairs, the vector output by the second-to-last layer of the classification network may be considered as a feature of the input data (i.e., the input video frame) to describe the input data. And when the input data are data pairs, the classification network utilizes the L2 distance as a loss function, minimizes the distance between the data pairs, maximizes the distance between different data, and further realizes that the training model can perform feature extraction for both the original video and the transformed video. Wherein, the classification network model can adopt: SVM (Support Vector Machine), LR (logistic Loss), and the like.

In the embodiment of the application, the video data stream is converted into the two-dimensional form and then input into the model for training, so that the training effect and efficiency can be improved, and the training of the model can be quickly and effectively completed.

In the embodiment of the present application, an implementation manner of performing model training by using a training data set composed of original video data and transformed video data is adopted, so that robustness of a model obtained by training can be ensured, and applicability of the model obtained by training can be ensured (that is, feature extraction can be performed on most video data). In actual application, only the original video data can be used as a training data set to perform model training, the method needs less data, the data processing method is simpler, and the method is suitable for application scenarios with lower requirements on the applicability of the training model (for example, application scenarios in which the safety and stability of the video data can be ensured).

Further, no matter which kind of training data set is adopted to train the model, after the feature extraction model is trained well, the trained model can also be tested, and its test process can include: acquiring a test data set, wherein the test data set comprises original video data and converted video data; then, respectively extracting appointed video data streams based on the original video data and the transformed video data, and then inputting the extracted appointed video data streams into a trained feature extraction model to obtain extracted video features; and carrying out accuracy evaluation on the trained feature extraction model based on the extracted video features.

The test data set can be obtained on the basis of the training data set, for example, the attack mode of original video data in the training data set is changed, and different transformation video data are obtained; for another example: a portion of the original video data therein is replaced, and transformed video data is obtained based on the replaced original video data. The extraction mode of the designated video data stream in the test data set is consistent with that of the training data set. When accuracy evaluation is performed, the video features extracted based on the trained feature extraction model may be compared with preset video features (which may be video features extracted by using other feature extraction methods), and accuracy of the video features is determined based on a comparison result. Such as: the higher the similarity, the higher the representative model accuracy.

Further, after evaluating the accuracy of the feature extraction model, the trained feature extraction model may be adjusted based on the accuracy, such as: if the accuracy is less than ninety percent, the model can be adjusted in parameters, or the data in the training data set can be added for training again until the accuracy of the finally trained feature extraction model is more than ninety percent, and then the trained feature extraction model is applied.

Further, based on the training process of the trained feature extraction model, it can be understood that, in step 103, after the target data stream is input into the trained feature extraction model, the trained feature extraction model can directly output the video features corresponding to the target data stream.

Further, after the corresponding video features are obtained in step 103, step 104 is executed to process the video features according to the processing mode.

For step 104, the application scenario of the processing method is introduced for a video retrieval system in the embodiment of the present application. In the video retrieval system, a certain section of video data can be retrieved, and also the video data can be put in storage, when the video data is retrieved, whether the video data exists can be judged based on the existing video database (which can be understood as a copyright library), and if the video data exists, the video data is infringed.

Based on the application scenario, the processing manner of the video data may include: and performing storage processing on the video data and inquiring (or retrieving) the video data. For different processing modes, there is a corresponding processing code (which can be understood as a processing mode identifier), so in step 104, the processing mode of the video data can be directly determined based on the processing mode identifier in the processing request.

In the embodiment of the present application, the video database may be understood as a video feature library, and the video feature library may be preset and stored. As a possible implementation manner, the process of setting the video feature library includes: acquiring source video data; extracting a source video data stream from source video data; performing feature extraction on the source video data stream to obtain source video features corresponding to the source video data stream; and storing the source video characteristics to obtain a video characteristic library.

The source video data may be loaded with video data from a local video library, i.e. video data obtained from different sources as described in the previous embodiments. After the source video data is acquired, extracting a source video data stream for each source video in the source video data, where an extraction manner of the source video data stream is consistent with that of the target data stream, the first specified video data stream, and the second specified video data stream in the foregoing embodiment.

Further, after the source video data stream is extracted, feature extraction is performed on the source video data stream, and when feature extraction is performed on the source video data stream, feature extraction may be performed by using the trained feature extraction model in the foregoing embodiment, or may be performed by using other available feature extraction methods. And after the source video features are extracted, correspondingly storing the source video features to obtain a video feature library.

Further, based on the stored video feature library, in step 104, if it is determined that the processing mode is binning, the video features may be stored in the preset (pre-stored) video feature library.

When storing, the stored video features may be numbered or indexed, for example: and setting the number or the index of the video data corresponding to the video characteristics according to the video uploading time of the video data corresponding to the video characteristics. By the method, when the video features of the designated video data need to be searched subsequently, the video features can be directly searched according to the setting rule of the numbers or the indexes, and the orderliness of the video features in the video feature library can be ensured.

In addition, in addition to storing the video features in the video feature library, the video feature library may also be provided with a corresponding source video database, and the source video database stores source videos corresponding to the video features in the video feature library. By the storage mode, the video characteristics can be stored, and simultaneously, the source data corresponding to the video characteristics can be stored, so that the video data can be traced conveniently.

As another embodiment, in step 104, if it is determined that the processing mode is query, matching the video features with existing video features in a pre-stored video feature library; and if the video features are successfully matched with the existing video features, outputting the compressed domain video data as a processing result of the infringing video, wherein the processing result comprises an infringing fragment corresponding to the compressed domain video data.

When the video features are matched with the existing video features, the video features can be compared with the basic information according to the coding format of the video features, and if the coding format is the same as the basic information (such as timestamp information), the matching of the video features and the basic information is determined to be successful. When the encoding formats are different, the two encoding information and the encoding format may be unified (for example, converted into one encoding format) according to the respective encoding information and the respective encoding format, and then the two encoding information and the encoding format are compared according to the unified encoding information and the unified basic information, and if the two encoding information and the unified encoding format are the same, it is determined that the two encoding information and the basic information are successfully matched. Of course, if the comparison result is different, the matching fails.

Further, if the matching is successful, the compressed domain video data is judged to be an infringement video, and a corresponding infringement segment in the compressed domain video data can be positioned according to the same video characteristics and output as a processing result. Such as: if the video features of the compressed domain video data have the matched existing video features, the video frames corresponding to the video features are infringement segments; or the video frame corresponding to the existing video feature is an infringing fragment.

Further, if the matching is unsuccessful, the compressed domain video data is output as a processing result of the non-infringement video.

In the embodiment of the present application, two application scenarios, namely a video entry system and a video database system, are also mentioned. Next, a brief example of an alternative processing manner and a specific processing flow in the two application scenarios will be described.

In the application scenario of the video recording system, after the corresponding video features are obtained, the processing mode can be storage and investigation. For storage, for example: and correspondingly storing the obtained corresponding video characteristics and the video data to be recorded (namely the video data in the compressed domain to be processed), so that the stored video data can be ensured to be corresponding to the video characteristics, and the video data can be conveniently analyzed and called at any time. For the troubleshooting, for example: and matching the obtained corresponding video characteristics with the existing video characteristics in the system, wherein if the matched video characteristics exist, the video data to be recorded possibly exist in the system, and the repeated recording is not needed.

In the application scenario of the video database system, after the corresponding video features are obtained, the processing mode can be storage and update. For storage, for example: and correspondingly storing the obtained corresponding video characteristics and the video data to be stored (namely the video data in the compressed domain to be processed), so that the video characteristics of the video data in the video database can be ensured to be corresponding to the video characteristics, and the video data can be processed at any time. For updates, for example: and matching the obtained corresponding video features based on the obtained video features with the existing video features in the system, and if the matched video features exist, updating the video data corresponding to the matched video features into the current compressed domain video data to be processed so as to realize the updating of the video data with the same video features.

Based on the same inventive concept, referring to fig. 2, an embodiment of the present application further provides a video data processing apparatus 200, including: the system comprises an acquisition module 201, a first extraction module 202, a second extraction module 203 and a processing module 204.

An obtaining module 201, configured to obtain a processing request of video data; the processing request comprises compressed domain video data to be processed and a processing mode of the compressed domain video data, and the compressed domain video data is undecoded video data. A first extraction module 202, configured to extract a target data stream based on the compressed domain video data. The second extraction module 203 is configured to input the target data stream into a pre-trained feature extraction model to obtain a video feature corresponding to the target data stream. And the processing module 204 is configured to process the video features according to the processing mode.

Optionally, the apparatus 200 for processing video data further includes: a training module to: acquiring a training data set; the training data set comprises original video data and converted video data, and the converted video data is video data obtained by converting the original video data; extracting a first specified video data stream from the original video data and a second specified video data stream from the transformed video data; and training an initial feature extraction model based on the first specified video data stream and the second specified video data stream to obtain a trained feature extraction model.

Optionally, the training module is specifically configured to: acquiring the original video data; simulating a preset attack means to attack the original video data to obtain the converted video data; and combining the original video data and the transformed video data to obtain the training data set.

Optionally, the training module is further specifically configured to: converting the first designated video data stream and the second designated video data stream into a two-dimensional form; and inputting the first specified video data stream and the second specified video data stream which are converted into the two-dimensional form into the initial feature extraction model for training so as to obtain the trained feature extraction model.

Optionally, the processing module 204 is specifically configured to: and when the processing mode is determined to be warehousing, storing the video characteristics into a preset video characteristic library.

Optionally, the processing module 204 is further specifically configured to: when the processing mode is determined to be query, matching the video features with the existing video features in a pre-stored video feature library; and if the video features are successfully matched with the existing video features, outputting a processing result of the compressed domain video data as an infringing video, wherein the processing result comprises an infringing fragment corresponding to the compressed domain video data.

Optionally, the processing module 204 is further configured to obtain source video data; extracting a source video data stream from the source video data; performing feature extraction on the source video data stream to obtain source video features corresponding to the source video data stream; and storing the source video characteristics to obtain the video characteristic library.

The embodiments and specific examples of the processing method of video data in the foregoing embodiments are also applicable to the apparatus in fig. 2, and the implementation method of the processing apparatus 200 of video data in fig. 2 is clear to those skilled in the art from the foregoing detailed description of the processing method of video data, so for the brevity of the description, detailed description is not provided here.

Based on the same inventive concept, embodiments of the present application further provide a readable storage medium, where a computer program is stored on the readable storage medium, and when the computer program is executed by a computer, the computer program performs the video data processing method according to any of the above embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for processing video data, comprising:

acquiring a processing request of video data; the processing request comprises compressed domain video data to be processed and a processing mode of the compressed domain video data, and the compressed domain video data is undecoded video data;

extracting a target data stream based on the compressed domain video data;

inputting the target data stream into a pre-trained feature extraction model to obtain video features corresponding to the target data stream;

and processing the video features according to the processing mode.

2. The processing method according to claim 1, wherein prior to the processing request to obtain video data, the method further comprises:

acquiring a training data set; the training data set comprises original video data and converted video data, and the converted video data is video data obtained by converting the original video data;

extracting a first specified video data stream from the original video data and a second specified video data stream from the transformed video data;

and training an initial feature extraction model based on the first specified video data stream and the second specified video data stream to obtain a trained feature extraction model.

3. The process of claim 2, wherein prior to said obtaining a training data set, said process further comprises:

acquiring the original video data;

simulating a preset attack means to attack the original video data to obtain the converted video data;

and combining the original video data and the transformed video data to obtain the training data set.

4. The processing method of claim 2, wherein the training an initial feature extraction model based on the first specified video data stream and the second specified video data stream to obtain a trained feature extraction model comprises:

converting the first designated video data stream and the second designated video data stream into a two-dimensional form;

and inputting the first specified video data stream and the second specified video data stream which are converted into the two-dimensional form into the initial feature extraction model for training so as to obtain the trained feature extraction model.

5. The processing method according to claim 1, wherein the processing the video feature according to the processing mode comprises:

and when the processing mode is determined to be warehousing, storing the video characteristics into a preset video characteristic library.

6. The processing method according to claim 1, wherein the processing the video feature according to the processing mode comprises:

when the processing mode is determined to be query, matching the video features with the existing video features in a pre-stored video feature library;

and if the video features are successfully matched with the existing video features, outputting a processing result of the compressed domain video data as an infringing video, wherein the processing result comprises an infringing fragment corresponding to the compressed domain video data.

7. The processing method of claim 6, further comprising:

acquiring source video data;

extracting a source video data stream from the source video data;

performing feature extraction on the source video data stream to obtain source video features corresponding to the source video data stream;

and storing the source video characteristics to obtain the video characteristic library.

8. An apparatus for processing video data, comprising:

the acquisition module is used for acquiring a processing request of video data; the processing request comprises compressed domain video data to be processed and a processing mode of the compressed domain video data, and the compressed domain video data is undecoded video data;

a first extraction module for extracting a target data stream based on the compressed domain video data;

the second extraction module is used for inputting the target data stream into a pre-trained feature extraction model to obtain video features corresponding to the target data stream;

and the processing module is used for processing the video features according to the processing mode.

9. The processing apparatus as in claim 8, further comprising a training module to:

10. A readable storage medium, having stored thereon a computer program which, when executed by a computer, performs the method of any one of claims 1-7.