CN111726475A - Video processing method, system, electronic device and storage medium - Google Patents

Video processing method, system, electronic device and storage medium Download PDF

Info

Publication number
CN111726475A
CN111726475A CN202010599677.7A CN202010599677A CN111726475A CN 111726475 A CN111726475 A CN 111726475A CN 202010599677 A CN202010599677 A CN 202010599677A CN 111726475 A CN111726475 A CN 111726475A
Authority
CN
China
Prior art keywords
data
feature
video processing
video
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010599677.7A
Other languages
Chinese (zh)
Inventor
汤泽胜
许盛辉
潘照明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Media Technology Beijing Co Ltd
Original Assignee
Netease Media Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Media Technology Beijing Co Ltd filed Critical Netease Media Technology Beijing Co Ltd
Priority to CN202010599677.7A priority Critical patent/CN111726475A/en
Publication of CN111726475A publication Critical patent/CN111726475A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a video processing method, a video processing system, electronic equipment and a storage medium, which can effectively solve the problems of repeated development and system resource waste. The video processing system includes: a data layer, a feature layer and an application layer; the data layer is used for respectively extracting data to be processed from each acquired video data; the characteristic layer is used for respectively extracting at least one type of characteristic data from each data to be processed; and the application layer is used for acquiring corresponding characteristic data according to the type of the characteristic data required by each video processing task and executing the video processing operation corresponding to each video processing task based on the acquired characteristic data.

Description

Video processing method, system, electronic device and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a video processing method, a video processing system, an electronic device, and a storage medium.
Background
This section is intended to provide a background or context to the embodiments of the application that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
With the high-speed popularization of the internet, users publish a large amount of video contents on each large network platform every day, and the network platforms need to use a plurality of different video processing algorithms to analyze and identify the video contents published by the users so as to filter out videos with low quality or without compliance, and can classify the analysis results of the video contents and distribute and recommend the video contents. However, different video processing tasks often have different processing flows and involve different video processing algorithms, so that a set of video processing flow and system needs to be separately arranged for different video processing tasks, development cost is increased undoubtedly, and system resources are wasted.
Disclosure of Invention
In view of the above technical problems, an improved method capable of effectively solving the problems of repeated development and system resource waste is highly needed.
In one aspect, an embodiment of the present application provides a video processing system, including:
the data layer is used for respectively extracting data to be processed from each acquired video data;
the characteristic layer is used for respectively extracting at least one type of characteristic data from each data to be processed;
the application layer is used for acquiring corresponding characteristic data according to the type of the characteristic data required by each video processing task and executing the video processing operation corresponding to each video processing task based on the acquired characteristic data.
Optionally, the data layer is further configured to:
storing the extracted data to be processed into a data storage unit;
generating corresponding first metadata aiming at each data to be processed stored in the data storage unit, wherein the first metadata comprises a storage position of the data to be processed;
adding the first metadata of each data to be processed into a message queue.
Optionally, the feature layer is specifically configured to:
acquiring first metadata from the message queue;
acquiring corresponding data to be processed from the data storage unit based on the storage position in the acquired first metadata;
and extracting at least one type of feature data from the acquired data to be processed based on at least one feature extraction model corresponding to the data type of the acquired data to be processed.
Optionally, the data layer is specifically configured to:
decoding each acquired video data to obtain data to be processed of multiple data types corresponding to each video data, wherein the data types include: video frame data and audio data.
Optionally, the feature layer is further configured to:
storing each extracted feature data into a feature storage unit;
generating corresponding second metadata aiming at each feature data stored in the feature storage unit, wherein the second metadata comprises the storage position and the type of the feature data;
second metadata of the respective characteristic data is added to the message queue.
Optionally, the application layer is specifically configured to:
acquiring corresponding second metadata from the message queue according to the type of the feature data required by each video processing task;
and acquiring corresponding feature data from the feature storage unit based on the storage position in the acquired second metadata.
Optionally, the second metadata further includes a text vector corresponding to the video data to which the feature data belongs, and the text vector is obtained based on the text data corresponding to the video data;
the application layer is specifically configured to execute a video processing operation corresponding to each video processing task based on the obtained feature data and the text vector in the second metadata.
In one aspect, an embodiment of the present application provides a video processing method, including:
respectively extracting data to be processed from each acquired video data;
respectively extracting at least one characteristic data from each data to be processed;
and acquiring corresponding characteristic data according to the type of the characteristic data required by each video processing task, and executing the video processing operation corresponding to each video processing task based on the acquired characteristic data.
Optionally, the method further comprises:
storing the extracted data to be processed into a data storage unit;
generating corresponding first metadata aiming at each data to be processed stored in the data storage unit, wherein the first metadata comprises a storage position of the data to be processed;
adding the first metadata of each data to be processed into a message queue.
Optionally, the extracting at least one feature data from each data to be processed respectively specifically includes:
acquiring first metadata from the message queue;
acquiring corresponding data to be processed from the data storage unit based on the storage position in the acquired first metadata;
and extracting at least one type of feature data from the acquired data to be processed based on at least one feature extraction model corresponding to the data type of the acquired data to be processed.
Optionally, the extracting the data to be processed from each obtained video data respectively specifically includes:
decoding each acquired video data to obtain data to be processed of multiple data types corresponding to each video data, wherein the data types include: video frame data and audio data.
Optionally, the method further comprises:
storing each extracted feature data into a feature storage unit;
generating corresponding second metadata aiming at each feature data stored in the feature storage unit, wherein the second metadata comprises the storage position and the type of the feature data;
second metadata of the respective characteristic data is added to the message queue.
Optionally, the obtaining, according to the type of the feature data required by each video processing task, corresponding feature data specifically includes:
acquiring corresponding second metadata from the message queue or the key value storage system according to the type of the feature data required by each video processing task;
and acquiring corresponding feature data from the feature storage unit based on the storage position in the acquired second metadata.
Optionally, the second metadata further includes a text vector corresponding to the video data to which the feature data belongs, and the text vector is obtained based on the text data corresponding to the video data;
the executing the video processing operation corresponding to each video processing task based on the acquired feature data specifically includes:
and executing the video processing operation corresponding to each video processing task based on the acquired feature data and the text vector in the second metadata.
In one aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of any one of the methods when executing the computer program.
In one aspect, an embodiment of the present application provides a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, implement the steps of any of the above-described methods.
In one aspect, an embodiment of the present application provides a computer program product comprising a computer program stored on a computer-readable storage medium, the computer program comprising program instructions that, when executed by a processor, implement the steps of any of the methods described above.
The video processing method, the video processing system, the electronic device and the storage medium provided by the embodiment of the application abstract various video stream analysis services to obtain a data layer, a feature layer and an application layer, realize the preprocessing of video data based on the data layer, extract various general feature data from different video data based on the feature layer, and the application layer only needs to acquire corresponding feature data from the feature layer according to the feature data required by the executed video processing task, so that different video processing tasks can reuse the processed data of the data layer and the feature layer, realize the resource multiplexing among different levels, reduce the repeated calculation, improve the video processing efficiency, realize different video processing tasks only by deploying one set of video processing system, and the performance optimization of a bottom service can bring the integral optimization and promotion to an upper service, and each video processing task does not need to be optimized and evaluated independently, so that the development cost of video processing is reduced.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
fig. 1 is a schematic view of an application scenario of a video processing method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a video processing system according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a video processing method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The principles and spirit of the present application will be described with reference to a number of exemplary embodiments. It should be understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the present application, and are not intended to limit the scope of the present application in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, embodiments of the present application may be embodied as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the form of: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
In this document, it is to be understood that any number of elements in the figures are provided by way of illustration and not limitation, and any nomenclature is used for differentiation only and not in any limiting sense.
For convenience of understanding, terms referred to in the embodiments of the present application are explained below:
kafka: a distributed message queue system is a core module that constructs a distributed computing service that spans multiple machines.
Redis: a distributed key value storage system based on a memory is equivalent to a distributed dictionary and can be used for fast data query.
Metadata: data used to describe data, such as the type of data, storage location, storage format, and so forth.
Ffmpeg (fast Forward mpeg): the system is a set of open-source computer programs which can be used for recording, converting digital audio and video and converting the digital audio and video into streams, and provides a complete solution for recording, converting and streaming audio and video.
Wget: is a free tool for automatically downloading files from a network, supports downloading through three most common TCP/IP protocols of HTTP, HTTPS and FTP, and can use an HTTP proxy. The name "wget" is derived from the World Wide Web in combination with "get". By automatic download, we mean that wget can continue to execute in the background after the user logs out of the system until the download task is completed.
Training on a large number of YouTube data sets obtains a VGG-like model, and 128-dimensional embedding is generated in the model. The original version of AudioSet is 128-dimensional embedding for each data, and the 128-dimensional embedding is generated by the VGG model described above. We refer to this tensierflow based VGG model as VGGish. Vggist supports the extraction of 128-dimensional embedding feature vectors with semantics from audio waveforms.
The NeXtVLAD model can reduce the frame-level features to video-level features and then classify the video-level features.
The principles and spirit of the present application are explained in detail below with reference to several representative embodiments of the present application.
Summary of The Invention
In the prior art, a set of video processing flow and system needs to be separately arranged for different video processing tasks, which undoubtedly increases development cost and wastes system resources.
In order to solve the above problems, the inventor of the present application finds that different video processing tasks have similar processing flows and steps when processing videos by analyzing the processing flows of common video processing tasks, such as downloading video data and preprocessing the video data, extracting required feature data based on the preprocessed video data, and performing corresponding learning or analysis based on the feature data to obtain a processing result. To this end, an embodiment of the present application provides a video processing system, which includes a data layer, a feature layer, and an application layer, where the data layer is configured to extract data to be processed from each acquired video data, the feature layer is configured to extract at least one feature data from each data to be processed, and the application layer is configured to acquire corresponding feature data according to a type of feature data required by each video processing task, and execute a video processing operation corresponding to each video processing task based on the acquired feature data. Based on the data layer and the feature layer, various feature data in different video data can be extracted, the application layer only needs to acquire corresponding feature data from the feature layer according to the feature data required by the currently executed video processing task, therefore, different video processing tasks can reuse the data processed by the data layer and the feature layer, resource reuse among different levels is realized, repeated calculation is reduced, video processing efficiency is improved, different video processing tasks can be realized only by deploying one set of video processing system, performance optimization of the bottom layer service can bring overall optimization and promotion to the upper layer service, optimization and evaluation of each video processing task are not needed, and development cost of video processing is reduced.
Having described the basic principles of the present application, various non-limiting embodiments of the present application are described in detail below.
Application scene overview
Fig. 1 is a schematic view of an application scenario of a video processing method according to an embodiment of the present application. The application scenario includes a terminal device 101, a server 102, a data storage system 103, and a video processing system 104. The terminal device 101, the server 102, the data storage system 103, and the video processing system 104 may be connected through a wired or wireless communication network. The terminal device 101 includes, but is not limited to, a desktop computer, a mobile phone, a mobile computer, a tablet computer, a media player, a smart wearable device, a smart television, a vehicle-mounted device, a Personal Digital Assistant (PDA), or other electronic devices capable of implementing the above functions. The server 102, the data storage system 103, and the video processing system 104 may be independent physical servers, may also be a server cluster or distributed system formed by a plurality of physical servers, and may also be cloud servers providing basic cloud computing services such as cloud service, cloud databases, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, and big data and artificial intelligence platforms.
The server 102 is used for providing video services, which may be, for example, live video, online video playing, video downloading, video publishing, and the like. The terminal device 101 is installed with a video client, and a user may obtain a video service provided by the server 102 through the video client, or the user may access a video website through a browser in the terminal device 101 to obtain the video service provided by the server 102. For example, any user may upload a video to the server 102 corresponding to the video playing platform through the video client, so as to distribute the recorded video through the video playing platform, so that other users may view the video distributed by the user.
The data storage system 103 is used for storing videos provided by the server 102 to users or various videos uploaded by the users. The video processing system 104 acquires videos uploaded by users from the data storage system 103, analyzes and identifies the videos to filter out low-quality or non-compliant videos, that is, to audit the videos, and may classify the videos based on the video analysis and identification results, and the server 102 distributes and recommends the videos based on the audit results and classification results of the video processing system 104. In addition, the video processing system 104 can also perform face recognition, motion recognition and the like on the video, and the video service can be widely applied to security scenes, intelligent traffic scenes and the like.
The following describes a video processing system and a video processing method according to an exemplary embodiment of the present application with reference to an application scenario of fig. 1. It should be noted that the above application scenarios are only presented to facilitate understanding of the spirit and principles of the present application, and the embodiments of the present application are not limited in this respect. Rather, embodiments of the present application may be applied to any scenario where applicable.
Exemplary method
Referring to fig. 2, an embodiment of the present application provides a video processing system including a data layer, a feature layer, and an application layer.
The data layer is used for extracting data to be processed from the acquired video data respectively.
In specific implementation, the data layer acquires video data from the data storage system, decodes each acquired video data to acquire data to be processed of multiple data types corresponding to each video data, and stores the data acquired by decoding for the feature layer. The data types obtained by decoding include, but are not limited to, video frame data and audio data.
In practical application, the data layer may use a download tool such as wget to download video data in the data storage system to a local disk of the data layer, and then use FFmpeg to perform decoding frame extraction and audio stream file separation on the video data to obtain video frame data and audio data. And packaging the video frame data obtained by decoding by using a tar tool in a linux environment, and storing the video frame data in a local disk of a data layer, so that the feature layer can acquire all video frames contained in one piece of video data at one time.
In addition, when the video data are analyzed, the video data can be analyzed by combining with the text data corresponding to the video data. The text data includes, but is not limited to, a title, a brief introduction, a video type, and the like of the video data, and for this purpose, the data layer may further store the text data corresponding to the video data. Specifically, each piece of video data corresponds to a unique video identifier, and video frame data, audio data and text data separated from the video data can be marked through the video identifiers to determine the video data to which each piece of data belongs, so that the data can be traced. The video data, the audio data, and the text data may be stored separately according to data types, and of course, the video frame data, the audio data, and the text data of the same video data may also be stored in the same file directory.
The data layer is mainly used for performing general preprocessing on video data acquired by the video processing system and separating various types of data to be processed for the use of the characteristic layer. The feature layer does not need to be preprocessed any more, only the required data is called directly from the data layer, any feature extraction model in the feature layer can directly obtain the data to be preprocessed through the data layer, the same data to be preprocessed can be provided for a plurality of feature extraction models to use, and a large amount of calculation cost and storage cost are saved for the feature layer by sharing the data preprocessed through the data layer.
The characteristic layer is used for extracting at least one type of characteristic data from each data to be processed.
In specific implementation, a plurality of feature extraction models are integrated in the feature layer to obtain a plurality of feature data, so as to meet the requirements of the application layer in executing various video processing tasks. The feature data refers to data which is extracted from data to be processed and describes a certain feature, such as an image feature, a face feature, a motion feature, a voiceprint feature, a semantic feature and the like. For example, for video frame data, a human face feature model, an image feature model, an inter-frame motion feature extraction model, and the like may be deployed in the feature layer to extract one or more feature data from the video frame data; aiming at the audio data, a voiceprint feature model, a voice recognition model, a semantic recognition model and the like can be deployed in a feature layer so as to extract one or more feature data from the audio data; for text data, a word vector model, a semantic recognition model, and the like may also be deployed in the feature layer to extract one or more feature data from the text data. Wherein one type of feature data is available for multiple video processing tasks.
The method and the device have the advantages that the types and the number of the feature extraction models deployed in the feature layer are not limited, corresponding feature extraction models can be deployed according to feature data required by a video processing task, the feature extraction models can be newly added in the feature layer according to changes of the video processing task, and when the feature extraction models with better performance or more universality appear, the deployed feature extraction models in the feature layer can be updated, replaced or deleted at any time, so that the accuracy and the reusability of the feature data extracted through the feature layer are improved.
In order to improve the universality and the reusability of the feature data extracted through the feature layer, the feature extraction model deployed in the feature layer can select a universal model, the universal model is obtained by training large-scale data and tasks, and the obtained feature data has better generalization capability due to larger scale of the processed data and can be used as feature data used in a plurality of different video processing tasks. For example, the feature layer may select an inclusion V3 model pre-trained on the ImageNet dataset to extract image features of each video frame, may extract audio features of audio data using a VGGish model, may perform face detection on the video frames using an MTCNN (Multi-task Cascaded Convolutional network) face detection algorithm, and may calculate face features using an arcfacce model trained on the MS1M (i.e., a face recognition dataset) dataset. The deployed feature extraction model in the feature layer can be expanded according to actual requirements, so that more general features can be obtained through the feature layer.
During specific implementation, the feature layer can pack each extracted feature data, mark the corresponding feature type of the feature data, and store the packed feature data in a local disk of the feature layer, so that the application layer can conveniently acquire the required feature data according to the video processing task. Specifically, the type corresponding to the feature data, that is, one type of feature data corresponding to one feature extraction model, may be labeled according to the feature extraction model used.
The characteristic layer is mainly used for extracting various general characteristic data from various data to be processed so as to be used by the application layer. Therefore, the application layer does not need to perform feature extraction operation any more, only the feature data of the required type needs to be called from the feature layer directly, any video processing task in the application layer can directly obtain any feature data obtained through the feature layer, the same feature data can be provided for a plurality of video processing tasks to use, and a large amount of calculation cost and storage cost are saved for the application layer by sharing the feature data of the feature layer.
The application layer is used for acquiring corresponding characteristic data according to the type of the characteristic data required by each video processing task and executing video processing operation corresponding to each video processing task based on the acquired characteristic data.
In specific implementation, the application layer acquires the feature data of the corresponding type from the feature layer according to the types of the feature data required by different video processing tasks, and performs corresponding processing based on the acquired feature data to obtain a video processing result. The video processing task deployed in the application layer can be a processing task executed based on a determined model or algorithm, and can also be a model training task performed based on feature data acquired by the feature layer. For example, for a task of video classification or video labeling, audio features, image features and a heading word vector corresponding to each piece of video data can be obtained from a feature layer, and a NeXtVLAD model is used to classify or label each piece of video. Aiming at the video recall task, the image features and the title word vectors corresponding to all video data can be obtained from the feature layer, and the MoEE model is trained, so that the MoEE model obtains the video features with better semantic expression by learning the correlation between the video content and the title, and the recall rate of the MoEE model in video processing is improved. For the task of face recognition, face features extracted from various video data can be obtained from a feature layer, people contained in the video are recognized based on the face features, and unknown faces which cannot be recognized can be subjected to cluster analysis based on a KNN (K-nearest neighbor) algorithm or a clustering algorithm.
The embodiment of the application does not limit the type and the number of the video processing tasks deployed in the application layer, and the video processing tasks in the application layer can be added, deleted or modified according to the actual application requirements. What the user needs to do is to configure the type of the feature data required by the video processing task and the algorithm or the model used for processing the feature data in the application layer, without paying attention to how to extract the feature data from the video data, and by sharing the feature data of the feature layer, a large amount of calculation cost and storage cost can be saved for the application layer; and based on a large amount of shared characteristic data, various video processing tasks can be more efficiently and conveniently realized, the development difficulty of upper-layer services is reduced, and the development period is shortened.
In addition, the feature data output by the feature layer can be used as the input features of each video processing task in the application layer, namely, the task processing result is obtained based on the input features. The application layer can also directly use the feature data output by the feature layer as a final task processing result, for example, directly use the feature data in a recall and ranking model in a recommendation system.
The video processing system provided by the embodiment of the application divides the video processing system into a data layer, a feature layer and an application layer by abstracting various video stream analysis services, realizes the preprocessing of video data based on the data layer, extracts various general feature data from different video data based on the feature layer, and the application layer only needs to acquire corresponding feature data from the feature layer according to the feature data required by an executed video processing task. Therefore, different video processing tasks can reuse the processed data of the data layer and the feature layer, resource reuse among different levels is realized, repeated calculation is reduced, video processing efficiency is improved, different video processing tasks can be realized only by deploying one set of video processing system, the performance optimization of the bottom layer service can bring overall optimization and promotion to the upper layer service, each video processing task does not need to be optimized and evaluated independently, and the development cost of video processing is reduced.
When the data volume required to be processed by the system is reduced, the computing resources of part of the servers can be released, and the dynamic configuration of the computing resources is realized. Therefore, each layer in the video processing system of the embodiment of the application can be conveniently expanded or reduced, so that the flow change can be more flexibly coped with, and the system resources are reasonably distributed.
In order to facilitate data transmission among the data layer, the feature layer and the application layer, data transmission among the layers can be realized through the message queue.
Taking the data layer as an example, the data layer is further configured to: storing the extracted data to be processed into a data storage unit; generating corresponding first metadata aiming at each to-be-processed data stored in a data storage unit, wherein the first metadata comprises a storage position of the to-be-processed data; adding the first metadata of each data to be processed into a message queue. In addition, other data related to the data to be processed, such as a data type of the data to be processed, a video identifier of video data to which the data to be processed belongs, a generation time of the data to be processed, and the like, may also be stored in the data storage unit, wherein the data type includes, but is not limited to, audio data, video frame data, text data, and the like.
Based on this, the feature layer is specifically used for: acquiring first metadata from a message queue; acquiring corresponding data to be processed from a data storage unit based on the storage position in the acquired first metadata; and extracting at least one type of feature data from the acquired data to be processed based on at least one feature extraction model corresponding to the data type of the acquired data to be processed.
In specific implementation, after the video data is processed by the data layer, the data to be processed corresponding to the video data is immediately stored in the data storage unit, first metadata corresponding to the data to be processed is generated, and the first metadata is added to the message queue to inform the feature layer of processing the data to be processed. The feature layer continuously acquires first metadata sent by the data layer from the message queue, acquires corresponding data to be processed from the data storage unit based on a storage position in the first metadata, determines a feature extraction model required for processing the data type based on the data type of the data to be processed, and extracts feature data from the data to be processed based on the determined feature extraction model. It should be noted that each data type may correspond to multiple feature extraction models, so as to obtain multiple feature data.
In specific implementation, the first metadata may further store a data type of the data to be processed, so as to facilitate the feature layer to quickly determine a feature extraction model used when the data to be processed is processed.
In specific implementation, the data storage unit corresponding to the data layer may be a local disk of the data layer, or may be a storage system shared by each layer in the video processing system. For the purpose of fast querying data, the data storage unit may employ a key-value storage system, such as Redis.
In specific implementation, data transmission between a data layer and a feature layer can be realized by using a kafka message queue, the data layer transmits first metadata to the feature layer in a kafka message queue mode, and meanwhile, the data layer can execute http server service of python and provide download service of data to be processed. For the feature layer and the application layer, the required data to be processed may be queried and downloaded from the data storage unit through a URL (Uniform Resource Locator) of a storage location of the data to be processed, where the URL of the storage location of the data to be processed may be: "http:// hostname: port number/data file".
Taking the feature layer as an example, the feature layer may also be used to: storing each extracted feature data into a feature storage unit; generating corresponding second metadata aiming at each feature data stored in the feature storage unit, wherein the second metadata comprises the storage position and the type of the feature data; second metadata of the respective characteristic data is added to the message queue.
Based on this, the application layer can be used to: acquiring corresponding second metadata from the message queue according to the type of the feature data required by each video processing task; and acquiring corresponding feature data from the feature storage unit based on the storage position in the acquired second metadata. The type of feature data required for each video processing task may be one type or multiple types.
In addition, other data related to the feature data, such as the type of the feature data, the video identifier of the video data to which the feature data belongs, the generation time of the feature data, the text vector corresponding to the video data to which the feature data belongs, and the like, may also be stored in the feature storage unit. Wherein the text vector is obtained based on the text data corresponding to the video data.
In order to facilitate the application layer to acquire the required feature data, the feature storage unit may store the feature data in a classified manner according to the type of the feature data. To this end, the application layer may also be used to: and acquiring corresponding feature data from the feature storage unit according to the type of the feature data required by each video processing task. For example, if the video processing task a needs face feature data, the face feature data is acquired from the feature storage unit, and the video processing task a is performed based on the face feature data.
In specific implementation, the feature storage unit may be a local disk of a feature layer, or may be a storage system shared by layers in the video processing system. The feature storage unit may be a key-value storage system for the purpose of fast querying of data.
In specific implementation, data transmission between the feature layer and the application layer can be realized by using the kafka message queue, the feature layer transmits the second metadata to the application layer in the kafka message queue mode, and meanwhile, the feature layer can execute http server service of python and provide download service of the feature data. For the application layer, the required feature data may be queried and downloaded from the feature storage unit through a URL (Uniform Resource Locator) of a storage location of the feature data, where the URL of the storage location of the feature data may be: "http:// hostname: port number/data file".
And the application layer acquires the required metadata from the message queue or the key value storage system according to the video processing task, and further acquires the corresponding characteristic data to complete the video processing task.
Further, the second metadata further includes a text vector corresponding to the video data to which the feature data belongs, and the text vector is obtained based on the text data corresponding to the video data. To this end, the application layer is specifically used for: and executing the video processing operation corresponding to each video processing task based on the acquired feature data and the text vector in the second metadata. For example, when the text vector of the video data is determined based on the video title or the video type, the classification model may be trained by using the text vector as the label of the image feature data corresponding to the video data and the image feature data with the label as training data. Or on the basis of the feature data corresponding to the video data, the accuracy of the processing result is improved by combining the text vector corresponding to the video data. Corresponding text vectors are directly carried in the second metadata of the feature data, so that the operation of searching the text vectors can be omitted, and the processing efficiency is improved.
Based on the above video processing system, an embodiment of the present application provides a video processing method, and referring to fig. 3, the video processing method is applicable to the video processing system shown in fig. 2, and specifically includes the following steps:
s301, extracting data to be processed from the acquired video data respectively.
Step S301 specifically includes: decoding each acquired video data to obtain data to be processed of multiple data types corresponding to each video data, wherein the data types comprise: video frame data and audio data.
Step S301 may be performed by a data layer, and the specific embodiment may refer to an embodiment of the data layer.
S302, at least one type of feature data is extracted from each data to be processed.
Step S302 may be performed by a feature layer, and the specific embodiment may refer to an embodiment of the feature layer.
And S303, acquiring corresponding characteristic data according to the type of the characteristic data required by each video processing task, and executing the video processing operation corresponding to each video processing task based on the acquired characteristic data.
Step S303 may be performed by the application layer, and the specific embodiment may refer to an embodiment of the application layer.
On the basis of any one of the above embodiments, the video processing method provided in the embodiment of the present application further includes the following steps: storing the extracted data to be processed into a data storage unit; generating corresponding first metadata aiming at each to-be-processed data stored in a data storage unit, wherein the first metadata comprises a storage position of the to-be-processed data; adding the first metadata of each data to be processed into a message queue. The above steps may be performed by the data layer.
Correspondingly, step S302 specifically includes: acquiring first metadata from a message queue; acquiring corresponding data to be processed from a data storage unit based on the storage position in the acquired first metadata; and extracting at least one type of feature data from the acquired data to be processed based on at least one feature extraction model corresponding to the data type of the acquired data to be processed.
On the basis of any one of the above embodiments, the video processing method provided in the embodiment of the present application further includes the following steps: storing each extracted feature data into a feature storage unit; generating corresponding second metadata aiming at each feature data stored in the feature storage unit, wherein the second metadata comprises the storage position and the type of the feature data; second metadata of the respective characteristic data is added to the message queue.
Correspondingly, step S303 specifically includes: acquiring corresponding second metadata from a message queue or a key value storage system according to the type of the characteristic data required by each video processing task; and acquiring corresponding feature data from the feature storage unit based on the storage position in the acquired second metadata.
Further, the second metadata further includes a text vector corresponding to the video data to which the feature data belongs, and the text vector is obtained based on the text data corresponding to the video data. Based on this, the executing of the video processing operation corresponding to each video processing task based on the acquired feature data in step S303 specifically includes: and executing the video processing operation corresponding to each video processing task based on the acquired feature data and the text vector in the second metadata.
The video processing method can be implemented on the basis of a video processing system, and further implementation manners thereof can refer to implementation manners of the video processing system, and are not described again.
According to the video processing method, multiple levels of processing steps are separated by abstracting multiple video stream analysis services, the first level is used for carrying out unified preprocessing on video data to obtain data to be processed, the second level is used for extracting various universal characteristic data from different data to be processed, the third level is used for obtaining corresponding characteristic data to process according to the characteristic data required by an executed video processing task, a video processing result is obtained, resource multiplexing among different levels is achieved, repeated calculation is reduced, video processing efficiency is improved, overall optimization and improvement can be brought to an upper-layer service through performance optimization of a bottom-layer service, optimization and evaluation of each video processing task are not needed, and development cost of video processing is reduced.
Exemplary device
Based on the same inventive concept as the video processing method, an embodiment of the present application further provides an electronic device, where the electronic device may specifically be a single physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. As shown in fig. 4, the electronic device 40 may include at least one processor 401 and at least one memory 402. Wherein the memory 402 stores program code which, when executed by the processor 401, causes the processor 401 to perform the various steps in the video processing method according to the various exemplary embodiments of the present application described in the "exemplary methods" section above in this description. For example, the processor 401 may perform step S301 shown in fig. 3, respectively extracting data to be processed from the acquired video data; step S302, at least one characteristic data is respectively extracted from each data to be processed; step S303, acquiring corresponding feature data according to the type of the feature data required by each video processing task, and executing a video processing operation corresponding to each video processing task based on the acquired feature data.
The Processor 401 may be a general-purpose Processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.
Memory 402, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 402 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.
Exemplary program product
An embodiment of the present application provides a computer-readable storage medium for storing computer program instructions for the electronic device, which includes a program for executing the video processing method in any exemplary embodiment of the present application.
The computer storage media may be any available media or data storage device that can be accessed by a computer, including but not limited to magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs)), etc.
In some possible embodiments, various aspects of the present application may also be implemented as a computer program product including program code for causing a server device to execute the steps in the video processing method according to various exemplary embodiments of the present application described in the above-mentioned "exemplary method" section of this specification when the computer program product runs on the server device, for example, the server device may execute the step S301 shown in fig. 3 to extract the data to be processed from the acquired video data respectively; step S302, at least one characteristic data is respectively extracted from each data to be processed; and step S303, acquiring corresponding characteristic data according to the type of the characteristic data required by each video processing task, and executing the video processing operation corresponding to each video processing task based on the acquired characteristic data.
The computer program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer program product for video processing according to an embodiment of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a server device. However, the program product of the present application is not limited thereto, and in this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., over the internet using an internet service provider).
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the spirit and principles of the application have been described with reference to several particular embodiments, it is to be understood that the application is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit from the description. The application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. A video processing system, comprising: a data layer, a feature layer and an application layer;
the data layer is used for respectively extracting data to be processed from each acquired video data;
the characteristic layer is used for respectively extracting at least one type of characteristic data from each data to be processed;
the application layer is used for acquiring corresponding characteristic data according to the type of the characteristic data required by each video processing task and executing the video processing operation corresponding to each video processing task based on the acquired characteristic data.
2. The system of claim 1, wherein the data layer is further configured to:
storing the extracted data to be processed into a data storage unit;
generating corresponding first metadata aiming at each data to be processed stored in the data storage unit, wherein the first metadata comprises a storage position of the data to be processed;
adding the first metadata of each data to be processed into a message queue.
3. The system of claim 2, wherein the feature layer is specifically configured to:
acquiring first metadata from the message queue;
acquiring corresponding data to be processed from the data storage unit based on the storage position in the acquired first metadata;
and extracting at least one type of feature data from the acquired data to be processed based on at least one feature extraction model corresponding to the data type of the acquired data to be processed.
4. The system according to any one of claims 1 to 3, wherein the data layer is specifically configured to:
decoding each acquired video data to obtain data to be processed of multiple data types corresponding to each video data, wherein the data types include: video frame data and audio data.
5. The system of any of claims 1 to 3, wherein the feature layer is further configured to:
storing each extracted feature data into a feature storage unit;
generating corresponding second metadata aiming at each feature data stored in the feature storage unit, wherein the second metadata comprises the storage position and the type of the feature data;
second metadata of the respective characteristic data is added to the message queue.
6. The system of claim 5, wherein the application layer is specifically configured to:
acquiring corresponding second metadata from the message queue according to the type of the feature data required by each video processing task;
and acquiring corresponding feature data from the feature storage unit based on the storage position in the acquired second metadata.
7. The system according to claim 6, wherein the second metadata further includes a text vector corresponding to the video data to which the feature data belongs, the text vector being obtained based on the text data corresponding to the video data;
the application layer is specifically configured to execute a video processing operation corresponding to each video processing task based on the obtained feature data and the text vector in the second metadata.
8. A video processing method, comprising:
respectively extracting data to be processed from each acquired video data;
respectively extracting at least one characteristic data from each data to be processed;
and acquiring corresponding characteristic data according to the type of the characteristic data required by each video processing task, and executing the video processing operation corresponding to each video processing task based on the acquired characteristic data.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of claim 8 are performed when the computer program is executed by the processor.
10. A computer-readable storage medium having computer program instructions stored thereon, which, when executed by a processor, implement the steps of the method of claim 8.
CN202010599677.7A 2020-06-28 2020-06-28 Video processing method, system, electronic device and storage medium Pending CN111726475A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010599677.7A CN111726475A (en) 2020-06-28 2020-06-28 Video processing method, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010599677.7A CN111726475A (en) 2020-06-28 2020-06-28 Video processing method, system, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN111726475A true CN111726475A (en) 2020-09-29

Family

ID=72569141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010599677.7A Pending CN111726475A (en) 2020-06-28 2020-06-28 Video processing method, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN111726475A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408208A (en) * 2021-06-25 2021-09-17 成都欧珀通信科技有限公司 Model training method, information extraction method, related device and storage medium
CN113473179A (en) * 2021-06-30 2021-10-01 北京百度网讯科技有限公司 Video processing method, video processing device, electronic equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006105470A1 (en) * 2005-03-31 2006-10-05 Euclid Discoveries, Llc Apparatus and method for processing video data
CN105654047A (en) * 2015-12-21 2016-06-08 中国石油大学(华东) Online video intelligent processing system based on deep learning in cloud environment
CN106354883A (en) * 2016-09-30 2017-01-25 北京中星微电子有限公司 Method and system for video information structure organization
EP2742446B1 (en) * 2012-06-15 2019-01-16 Qatar Foundation A system and method to store video fingerprints on distributed nodes in cloud systems
CN110288001A (en) * 2019-05-28 2019-09-27 西南电子技术研究所(中国电子科技集团公司第十研究所) Target identification method based on the training study of target data feature

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006105470A1 (en) * 2005-03-31 2006-10-05 Euclid Discoveries, Llc Apparatus and method for processing video data
EP2742446B1 (en) * 2012-06-15 2019-01-16 Qatar Foundation A system and method to store video fingerprints on distributed nodes in cloud systems
CN105654047A (en) * 2015-12-21 2016-06-08 中国石油大学(华东) Online video intelligent processing system based on deep learning in cloud environment
CN106354883A (en) * 2016-09-30 2017-01-25 北京中星微电子有限公司 Method and system for video information structure organization
CN110288001A (en) * 2019-05-28 2019-09-27 西南电子技术研究所(中国电子科技集团公司第十研究所) Target identification method based on the training study of target data feature

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408208A (en) * 2021-06-25 2021-09-17 成都欧珀通信科技有限公司 Model training method, information extraction method, related device and storage medium
CN113473179A (en) * 2021-06-30 2021-10-01 北京百度网讯科技有限公司 Video processing method, video processing device, electronic equipment and medium
CN113473179B (en) * 2021-06-30 2022-12-02 北京百度网讯科技有限公司 Video processing method, device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
CN107844586B (en) News recommendation method and device
US10643610B2 (en) Voice interaction based method and apparatus for generating multimedia playlist
US9270964B1 (en) Extracting audio components of a portion of video to facilitate editing audio of the video
WO2017096877A1 (en) Recommendation method and device
US8649613B1 (en) Multiple-instance-learning-based video classification
CN113486833B (en) Multi-modal feature extraction model training method and device and electronic equipment
CN113010703B (en) Information recommendation method and device, electronic equipment and storage medium
CN102763105A (en) Method and apparatus for segmenting and summarizing media content
US20200012675A1 (en) Method and apparatus for processing voice request
CN113590850A (en) Multimedia data searching method, device, equipment and storage medium
US20150050010A1 (en) Video to data
CN108334895B (en) Target data classification method and device, storage medium and electronic device
CN111726475A (en) Video processing method, system, electronic device and storage medium
CN103440243A (en) Teaching resource recommendation method and device thereof
CN114328996A (en) Method and device for publishing information
CN111368141B (en) Video tag expansion method, device, computer equipment and storage medium
KR20130137332A (en) Speech recognition server for determining service type based on speech informaion of device, content server for providing content to the device based on the service type, the device, and methods thereof
CN110895503B (en) Application performance monitoring method and client
CN115297183A (en) Data processing method and device, electronic equipment and storage medium
US20230315990A1 (en) Text detection method and apparatus, electronic device, and storage medium
CN104572964A (en) Zip file unzipping method and device
CN110019874B (en) Method, device and system for generating index file
US11841885B2 (en) Multi-format content repository search
Körner et al. Mastering Azure Machine Learning: Perform large-scale end-to-end advanced machine learning in the cloud with Microsoft Azure Machine Learning
WO2021258972A1 (en) Video retrieval method and apparatus, and electronic device and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200929

RJ01 Rejection of invention patent application after publication