CN114079804B

CN114079804B - Method, device, terminal and storage medium for detecting multimedia resources

Info

Publication number: CN114079804B
Application number: CN202010814517.XA
Authority: CN
Inventors: 宋怡松; 蔡佳何
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-08-13
Filing date: 2020-08-13
Publication date: 2024-03-26
Anticipated expiration: 2040-08-13
Also published as: CN114079804A

Abstract

The application provides a method, a device, a terminal and a storage medium for detecting multimedia resources, comprising the following steps: acquiring a target multimedia resource; determining a target resource type of a target multimedia resource, and determining a target frame extraction strategy corresponding to the target resource type; and according to the target frame extraction strategy, extracting frames from the target multimedia resource to obtain a video frame set containing at least one video frame. According to the method and the device, the target frame extraction strategy for the target multimedia resource can be determined according to the target resource type of the target multimedia resource, so that frame extraction operation is carried out from the target multimedia resource with complicated content by utilizing the target frame extraction strategy to obtain a video frame set, after the video frame set is sent to a terminal and displayed, a detector can check the video frame set through the terminal, and the target multimedia resource is detected and monitored according to the video frame set, so that detection time is shortened, and detection efficiency is improved.

Description

Method, device, terminal and storage medium for detecting multimedia resources

Technical Field

The present invention relates to the field of multimedia technologies, and in particular, to a method, an apparatus, a terminal, and a storage medium for detecting a multimedia resource.

Background

With the development of multimedia technology, multimedia resources are widely used in various fields such as image, voice, education, communication, entertainment, etc.

Because the quantity of the multimedia resources propagated through the multimedia platform is larger, the quality of the content is uneven, in order to ensure the quality of the content of the multimedia resources, the detection personnel of the multimedia platform need to detect and monitor the quality of the content of the multimedia resources, and at present, the process of monitoring the quality of the content of the multimedia resources is as follows: and the inspector acquires and plays the multimedia resource to be inspected, views the multimedia resource in the whole process, and determines whether the quality of the content of the multimedia resource meets the requirement according to the related evaluation criterion.

However, in the current scheme, the content of the multimedia resource can be obtained only after the detection personnel need to watch the multimedia resource all the way from beginning to end, so as to judge whether the quality of the content of the multimedia resource meets the requirement, so that the detection process is long in time consumption and low in efficiency.

Disclosure of Invention

In order to overcome the problems in the related art, the application provides a method, a device, a terminal and a storage medium for detecting multimedia resources.

According to a first aspect of embodiments of the present application, there is provided a method for detecting a multimedia resource, including:

acquiring a target multimedia resource;

in response to receiving a frame extraction request for the target multimedia resource, determining a target resource type of the target multimedia resource, and determining a target frame extraction strategy corresponding to the target resource type;

and extracting frames from the target multimedia resources according to the target frame extraction strategy to obtain a video frame set containing at least one video frame, wherein the video frame set is applied to a scene for detecting the target multimedia resources and comprises at least one video frame extracted from the target multimedia resources.

In one possible implementation, the target resource type includes: the display content of the target multimedia resources of the picture video type comprises at least one first picture, the duration of the target multimedia resources of the short video type is smaller than or equal to a first preset time, and the duration of the target multimedia resources of the long video type is larger than or equal to a second preset time;

The step of determining the target frame extraction strategy corresponding to the target resource type specifically comprises the following steps:

when the target resource type is the picture video type, determining that the target frame extraction strategy is a first target frame extraction strategy;

when the target resource type is the short video type, determining that the target frame extraction strategy is a second target frame extraction strategy;

and when the target resource type is the long video type, determining that the target frame extraction strategy is a third target frame extraction strategy.

In one possible implementation, in the event that the target frame-taking policy is determined to be the first target frame-taking policy,

the step of extracting frames from the target multimedia resource according to the target frame extraction strategy to obtain a video frame set containing at least one video frame comprises the following steps:

and according to the first target frame extraction strategy, at least one first picture contained in the display content of the target multimedia resource of the picture video type is obtained, and a video frame set containing the at least one first picture is generated.

In one possible implementation, in the event that the target frame-taking policy is determined to be a second target frame-taking policy,

and determining the target frame extraction number of the target multimedia resource, extracting second pictures with the number of the target frame extraction number from the target multimedia resource, and generating a video frame set containing the second pictures.

In one possible implementation manner, the step of extracting the second pictures from the target multimedia resource, where the second pictures are the target frame extraction number, includes:

determining a first target frame extraction time interval according to the duration of the target multimedia resource and the target frame extraction number;

and performing frame extraction from the target multimedia resource at each first target frame extraction time interval to obtain the second pictures with the number of the target frame extraction numbers.

In one possible implementation, in the case where the target frame-taking policy is determined to be a third target frame-taking policy,

And determining a second target frame extraction time interval of the target multimedia resource, extracting a third picture from each second target frame extraction time interval in the target multimedia resource, and generating a video frame set containing the third picture.

In one possible implementation manner, after the step of extracting frames from the target multimedia resource according to the target frame extraction policy to obtain a video frame set including at least one video frame, the method further includes:

and sending the video frame set to the terminal for display.

In a possible implementation manner, the set of video frames further includes relative time information that each video frame is located in the target multimedia resource;

the step of sending the video frame set to the terminal for display specifically includes:

and sending the video frame set to the terminal so that the terminal can display each video frame contained in the video frame set and the relative time information corresponding to each video frame.

In one possible implementation, before the step of sending the set of video frames to the terminal for presentation, the method further includes:

Performing dimension reduction processing on each video frame in the video frame set to generate low-rank matrix data;

and generating and storing the compressed data packet of the video frame set according to the low-rank matrix data.

According to a second aspect of embodiments of the present application, there is provided a device for detecting a multimedia resource, including:

the acquisition module is used for acquiring the target multimedia resource;

the determining module is used for determining a target resource type of the target multimedia resource in response to receiving a frame extraction request aiming at the target multimedia resource, and determining a target frame extraction strategy corresponding to the target resource type;

and the frame extraction module is used for extracting frames from the target multimedia resources according to the target frame extraction strategy to obtain a video frame set containing at least one video frame, wherein the video frame set is applied to a scene for detecting the target multimedia resources and comprises at least one video frame extracted from the target multimedia resources.

The determining module specifically comprises:

the first determining submodule is used for determining that the target frame extraction strategy is a first target frame extraction strategy when the target resource type is the picture video type;

the second determining submodule is used for determining that the target frame extraction strategy is a second target frame extraction strategy when the target resource type is the short video type;

and the third determining submodule is used for determining that the target frame extraction strategy is a third target frame extraction strategy when the target resource type is the long video type.

the frame extraction module comprises:

the first frame extraction submodule is used for acquiring at least one first picture contained in the display content of the target multimedia resource of the picture video type according to the first target frame extraction strategy, and generating a video frame set containing the at least one first picture.

the frame extraction module comprises:

and the second frame extraction sub-module is used for determining the target frame extraction number of the target multimedia resource, extracting second pictures with the number equal to the target frame extraction number from the target multimedia resource, and generating a video frame set containing the second pictures.

In one possible implementation manner, the second frame extraction sub-module includes:

the determining unit is used for determining a first target frame extraction time interval according to the duration of the target multimedia resource and the target frame extraction number;

and the frame extraction unit is used for extracting frames from the first target frame extraction time interval in each interval in the target multimedia resource to obtain the second pictures with the number of the target frame extraction numbers.

the frame extraction module comprises:

and the second frame extraction sub-module is used for determining a second target frame extraction time interval of the target multimedia resource, extracting a third picture from each second target frame extraction time interval in the target multimedia resource, and generating a video frame set containing the third picture.

In one possible embodiment, the apparatus further comprises:

and the display module is used for sending the video frame set to the terminal for display.

The display module specifically comprises:

and the display sub-module is used for sending the video frame set to the terminal so that the terminal can display each video frame contained in the video frame set and the relative time information corresponding to each video frame.

In one possible embodiment, the apparatus further comprises:

the first generation module is used for carrying out dimension reduction processing on each video frame in the video frame set to generate low-rank matrix data;

and the second generation module is used for generating and storing the compressed data packet of the video frame set according to the low-rank matrix data.

According to a third aspect of embodiments of the present application, there is provided a terminal comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to perform operations performed to implement the method of detecting a multimedia asset as provided herein.

According to a fourth aspect of embodiments of the present application, there is provided a non-transitory computer-readable storage medium, which when executed by a processor of a terminal, causes the terminal to perform an operation to implement a method of detecting a multimedia resource as provided herein.

According to a fifth aspect of embodiments of the present application, there is provided an application program comprising one or more instructions which, when executed by a processor of a terminal, enable the terminal to perform an operation to implement a method for detecting a multimedia resource as provided herein.

The technical scheme provided by the embodiment of the application at least brings the following beneficial effects:

acquiring a target multimedia resource; in response to receiving a frame extraction request for a target multimedia resource, determining a target resource type of the target multimedia resource, and determining a target frame extraction strategy corresponding to the target resource type; according to the target frame extraction strategy, frame extraction is carried out on the target multimedia resources to obtain a video frame set containing at least one video frame, the video frame set is applied to a scene for detecting the target multimedia resources, the video frame set comprises at least one video frame extracted from the target multimedia resources, in the embodiment of the application, the target frame extraction strategy aiming at the target multimedia resources can be determined according to the target resource types of the target multimedia resources, so that frame extraction operation is carried out on the target multimedia resources with more complicated contents by utilizing the target frame extraction strategy to obtain a video frame set containing at least one video frame, the video frame set is applied to a scene for detecting the target multimedia resources, and after the video frame set is sent to a terminal and displayed, as the video frame set contains partial video frames extracted from the target multimedia resources, a detector can check the video frame set through the terminal and detect and monitor the target multimedia resources according to the video frame set, so that the time for detecting the target multimedia resources is shortened, and the detection efficiency is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application and do not constitute an undue limitation on the application.

FIG. 1 is an illustration of an implementation environment in which a method for detecting multimedia assets is involved, in accordance with an exemplary embodiment;

FIG. 2 is a flow chart illustrating a method of detecting a multimedia asset according to an exemplary embodiment;

FIG. 3 is a flow chart illustrating another method of detecting a multimedia asset according to an exemplary embodiment;

FIG. 4 is a block diagram illustrating a multimedia asset detection device according to an exemplary embodiment;

FIG. 5 is a block diagram of a terminal shown in accordance with an exemplary embodiment;

fig. 6 is a block diagram of another terminal shown according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

Fig. 1 is an implementation environment involved in a method for detecting a multimedia resource according to an exemplary embodiment, where the environment includes: a terminal 10 and a server 20.

The terminal 10 may be a smart phone, a tablet computer, a notebook computer, etc., and the embodiments of the present disclosure do not specifically limit the product type of the terminal 10.

The server 20 has a storage capability for storing multimedia resources and a video frame set obtained by frame extraction of a target multimedia resource, and the server 20 also has a calculation capability for frame extraction of the multimedia resources.

The terminal 10 and the server 20 may communicate with each other via a wired network or a wireless network.

Fig. 2 is a flowchart of a method for detecting a multimedia resource according to an exemplary embodiment, taking a server to execute an embodiment of the disclosure as an example, as shown in fig. 2, the method for detecting a multimedia resource provided by the embodiment of the disclosure includes the following steps:

and 101, acquiring a target multimedia resource.

In this step, the server may acquire the target multimedia resource that needs to be detected.

For example, a user may acquire or issue multimedia resources through an existing video playing platform, so that the video playing platform includes a plurality of multimedia resources, in order to ensure quality of the multimedia resources in the platform, a detecting person of the video playing platform needs to detect the multimedia resources, and at this time, the detecting person may utilize a server to acquire a target multimedia resource to be detected from the plurality of multimedia resources in the video playing platform.

Step 102, in response to receiving a frame extraction request for the target multimedia resource, determining a target resource type of the target multimedia resource, and determining a target frame extraction policy corresponding to the target resource type.

In this step, after the target multimedia resource is obtained, in the case that a frame extraction request for the target multimedia resource is received, the server may first determine a target resource type of the target multimedia resource and determine a target frame extraction policy corresponding to the target resource type in response to the frame extraction request, so that the target frame extraction policy for the target multimedia resource may be determined for the target multimedia resource of different resource types, so that the frame extraction policy for the target multimedia resource is applicable to the target multimedia resource.

It should be noted that, the frame extraction request may be a frame extraction request sent by a user through a terminal, or may be a frame extraction request generated by a server when a certain trigger condition is met, for example, when an uploading process of the target multimedia resource is completed, a frame extraction request for the target multimedia resource may be generated, so that a target frame extraction policy corresponding to the target multimedia resource is determined in response to the needle extraction request, and the frame extraction process of the target multimedia resource is completed.

In the embodiment of the invention, the target resource types of the target multimedia resource can comprise a picture video type, a short video type and a long video type.

The display content of the target multimedia resource of the picture video type comprises at least one first picture, for example, the multimedia resource is manufactured by a plurality of original pictures according to a specific rule through various video manufacturing software, and video frames in the multimedia resource are generated by the plurality of original pictures through operations such as dynamic effects, post editing and the like.

In addition, the duration of the target multimedia resource of the short video type is smaller than or equal to a first preset time, the duration of the target multimedia resource of the long video type is larger than or equal to a second preset time, namely, the target multimedia resource with the shorter duration is determined to be the short video type, and the target multimedia resource with the longer duration is determined to be the long video type.

The first preset time may be the same as the second preset time or different from the second preset time, for example, the first preset time is 10 seconds, the second preset time is 30 minutes, the multimedia resource with the duration less than or equal to 10 seconds is a short video type, and the multimedia resource with the duration greater than or equal to 30 minutes is a long video type.

Further, a target frame extraction policy corresponding to a target resource type may be determined according to the target resource type of the target multimedia resource.

Specifically, a target frame extraction strategy applicable to the target multimedia resource of the type can be determined according to the characteristics of the target resource type of the target multimedia resource.

For example, if the target multimedia resource is a picture video type, it is indicated that the video frames included in the target multimedia resource are each composed of an original picture for generating the target multimedia resource, so that the target frame extraction policy corresponding to the picture video type can be used for extracting and generating the original picture of the target multimedia resource.

If the target multimedia resource is of a short video type, the duration of the target multimedia resource is short, so that video frames with the number of target frame extraction numbers can be directly extracted from the target multimedia resource according to the fixed target frame extraction numbers.

If the target multimedia resource is of a long video type, the duration of the target multimedia resource is longer, and if a detector directly extracts video frames with the number of target frame extraction numbers from the target multimedia resource according to the fixed target frame extraction numbers, the time interval is longer and the number of video frames is smaller because the target frame extraction numbers are too small, so that the extracted video frames cannot better and uniformly cover the content of the target multimedia resource, and the detector cannot easily detect and evaluate the content of the multimedia resource according to the frame extraction results. Therefore, the frame extraction can be directly performed from the target multimedia resource according to the fixed frame extraction time interval, so that the video frames with moderate quantity and better and uniform coverage of the content of the target multimedia resource can be obtained.

Step 103, according to the target frame extraction strategy, extracting frames from the target multimedia resource to obtain a video frame set containing at least one video frame, wherein the video frame set is applied to a scene for detecting the target multimedia resource and comprises at least one video frame extracted from the target multimedia resource.

In this step, the server may perform frame extraction on the target multimedia resource according to the determined target frame extraction policy, so as to obtain, according to the picture extracted from the target multimedia resource, a video frame set including at least one video frame, where the video frame set is applied to a scene in which the target multimedia resource is detected, and includes at least one video frame extracted from the target multimedia resource, so that the obtained video frame set may be used by a detecting person to detect and evaluate the content of the target multimedia resource according to the video frame set.

In the embodiment of the invention, a frame is extracted from a target multimedia resource by adopting a media resource frame extraction frame (ffmepg) which is all dependent on an open source according to a target frame extraction strategy, wherein the frame is frequency frame. Wherein, ffmepg is a set of open source computer programs which can be used for recording, converting digital audio and video and converting the digital audio and video into streams, and can provide a complete solution for recording, converting and streaming audio and video.

In summary, the method for detecting a multimedia resource provided in the embodiment of the present application includes: acquiring a target multimedia resource; in response to receiving a frame extraction request for a target multimedia resource, determining a target resource type of the target multimedia resource, and determining a target frame extraction strategy corresponding to the target resource type; according to the target frame extraction strategy, frame extraction is carried out on the target multimedia resources to obtain a video frame set containing at least one video frame, the video frame set is applied to a scene for detecting the target multimedia resources, the video frame set comprises at least one video frame extracted from the target multimedia resources, in the embodiment of the application, the target frame extraction strategy aiming at the target multimedia resources can be determined according to the target resource types of the target multimedia resources, so that frame extraction operation is carried out on the target multimedia resources with more complicated contents by utilizing the target frame extraction strategy to obtain a video frame set containing at least one video frame, the video frame set is applied to a scene for detecting the target multimedia resources, and after the video frame set is sent to a terminal and displayed, as the video frame set contains partial video frames extracted from the target multimedia resources, a detector can check the video frame set through the terminal and detect and monitor the target multimedia resources according to the video frame set, so that the time for detecting the target multimedia resources is shortened, and the detection efficiency is improved.

Fig. 3 is a flowchart illustrating steps of another method for detecting a multimedia asset according to an exemplary embodiment, as shown in fig. 3, including the steps of:

step 201, obtaining a target multimedia resource.

The step may refer to step 101, and will not be described herein.

Step 202, determining a target resource type of the target multimedia resource in response to receiving the frame extraction request for the target multimedia resource.

The step may refer to step 102, and will not be described herein.

Step 203, when the target resource type is the picture video type, determining that the target frame extraction policy is a first target frame extraction policy.

In this step, if the target resource type is the picture video type, the target frame extraction policy may be determined to be a first target frame extraction policy.

Step 204, according to the first target frame extraction strategy, at least one first picture contained in the display content of the target multimedia resource of the picture video type is obtained, and a video frame set containing the at least one first picture is generated.

In the step, when the target resource type is the picture video type, determining that a target frame extraction strategy is a first target frame extraction strategy, and carrying out frame extraction on the target multimedia resource according to the first target frame extraction strategy.

Specifically, the first target frame extraction policy may determine at least one first picture included in the display content of the target multimedia resource of the picture video type, obtain the first picture, and generate a video frame set including the first picture according to the first picture, where the video frame set is applied in a scene for detecting the target multimedia resource.

Step 205, when the target resource type is the short video type, determining that the target frame extraction policy is a second target frame extraction policy.

In this step, if the target resource type is the short video type, the target frame extraction policy may be determined to be a second target frame extraction policy.

Step 206, determining a target frame extraction number of the target multimedia resource, extracting second pictures with the number of the target frame extraction number from the target multimedia resource, and generating a video frame set containing the second pictures.

In the step, when the target resource type is the short video type, determining that a target frame extraction strategy is a second target frame extraction strategy, and extracting frames of the target multimedia resource according to the second target frame extraction strategy.

Specifically, the second target frame extraction policy may determine a target frame extraction number of a target multimedia resource of a short video type, further extract a second picture with the number equal to the target frame extraction number from the target multimedia resource, and generate a video frame set including the second picture according to the second picture, where the video frame set is applied in a scene where the target multimedia resource is detected.

The target frame extraction number can be a value input by a detector, a default fixed value or a value automatically generated according to the duration of the target multimedia resource.

Optionally, the step of extracting frames from the target multimedia resource according to the target frame extraction policy to obtain a video frame set including at least one video frame may specifically include:

substep 2061, determining a first target frame-pumping time interval based on the length of the target multimedia asset and the target frame-pumping number.

In this step, after determining the target frame number of the target multimedia resource, a first target frame time interval may be determined according to the duration of the multimedia resource and the target frame number.

Specifically, a ratio of the duration of the target multimedia resource to the number of target frame frames may be determined, and the ratio is used as a time interval for performing frame extraction operation from the target multimedia resource, i.e. a first target frame extraction time interval.

For example, if the duration of the target multimedia resource is 18 seconds, the target frame extraction number is a fixed number of 9 frames determined by the inspector, and then the ratio of the duration of the target multimedia resource to the target frame extraction number is 2, the first target frame extraction time interval is 2 seconds, that is, 9 frames of the second picture can be extracted from the target multimedia resource with the duration of 18 seconds at a time interval of 2 seconds.

Sub-step 2062, performing frame extraction from the target multimedia resource at each of the first target frame extraction time intervals, to obtain the second pictures with the number of the target frame extraction numbers.

In this step, after determining a first target frame-extracting time interval, frames may be extracted from the target multimedia resource at each interval of the first target frame-extracting time interval, to obtain the second pictures with the number of the target frame-extracting numbers, and a video frame set including the second pictures is generated according to the second pictures.

Step 207, when the target resource type is the long video type, determining that the target frame extraction policy is a third target frame extraction policy.

In this step, if the target resource type is the long video type, the target frame extraction policy may be determined to be a third target frame extraction policy.

Step 208, determining a second target frame extraction time interval of the target multimedia resource, extracting a third picture from the target multimedia resource at each interval of the second target frame extraction time interval, and generating a video frame set containing the third picture.

In this step, a second target frame extraction time interval of the target multimedia resource may be determined first, and then a third picture may be extracted from the target multimedia resource at each interval of the second target frame extraction time interval, and a video frame set including the third picture may be generated according to the third picture, where the video frame set is applied in a scene where the target multimedia resource is detected.

The second target frame extraction time interval may be a time value input by a detector, or a default fixed time value, or a time value automatically generated according to the duration of the target multimedia resource.

For example, if the duration of the target multimedia resource is 30 minutes, and the second target frame extraction time interval is a fixed time value determined by the detecting personnel for 30 seconds, it is indicated that the detecting personnel hopes to extract one frame of picture from the target multimedia resource as the third picture every 30 seconds, and the target multimedia resource with the duration of 30 minutes can extract 60 frames of third pictures in total.

Step 209, performing dimension reduction processing on each video frame in the video frame set, so as to generate low-rank matrix data.

In this step, after performing a frame extraction operation on the target multimedia resource, each video frame in the obtained video frame set may be subjected to a dimension reduction process, so as to generate low rank matrix data.

Specifically, in actual signal or image acquisition and processing, the higher the dimension of data is, the greater the limitation is brought to data acquisition and processing, for example, when three-dimensional or four-dimensional (three spatial dimensions plus one spectral dimension or one time dimension) signals are acquired, data processing and acquisition are often difficult, however, as the dimension of data increases, there is often more correlation and redundancy between these high-dimensional data, and for a frame of picture, the correlation between pixels of the picture is represented by the sparse distribution of the coefficient of a certain transform domain, so that the sparsity and redundancy existing between the high-dimensional data can be reasonably and fully utilized, and through dimension reduction processing, the data can be efficiently acquired, represented and reconstructed, so that the data can occupy a smaller storage space and can be acquired and processed more efficiently.

In the embodiment of the invention, each video frame contained in the obtained video frame set can be subjected to dimension reduction processing, so that the video frame is converted into low-rank matrix data which has lower correlation and redundancy and occupies smaller storage space by vector data with more correlation and redundancy, and the video frame set obtained after frame extraction of the target multimedia resource is reduced to cause larger pressure on the storage space.

Step 210, generating and storing a compressed data packet of the video frame set according to the low-rank matrix data.

In this step, according to the low-rank matrix data generated by performing the dimension reduction processing on each video frame, a compressed data packet of the video frame set is generated, and the compressed data packet is stored, so that the pressure of the video frame set on the storage space is reduced while the condition that a detection person can detect a target multimedia resource according to the video frame set obtained by frame extraction is satisfied.

Step 211, the video frame set is sent to the terminal for display.

After the video frame set is obtained, the server can send the video frame set to a terminal through a wired network or a wireless network so as to enable the terminal to display the video frame set.

In this step, in order to facilitate the detection personnel to obtain the video frame set extracted from the target multimedia resource through the terminal, and detect and evaluate the content of the target multimedia resource according to the video frame set, after receiving the video frame set, the terminal may display the video frame set in a display screen of the terminal, further, may display at least one video frame included in the video frame set in a playing interface of the target multimedia resource in the terminal, so that the detection personnel may play a portion related to the offending video frame in the target multimedia resource directly in the playing interface after determining that the offending content may occur in one of the video frames through the at least one video frame included in the video frame set, so as to further confirm whether the offending content exists in the target multimedia resource.

Optionally, the video frame set further includes relative time information of each video frame located in the target multimedia resource, and step 211 specifically includes:

sub-step 2111, transmitting the video frame set to the terminal, so that the terminal can display each video frame contained in the video frame set and the relative time information corresponding to each video frame.

In this step, in order to facilitate the detection personnel to acquire the video frame set extracted from the target multimedia resource through the terminal, and detect and evaluate the content of the target multimedia resource according to the video frame set, after receiving the video frame set, the terminal may display the video frame set in the display screen of the terminal, further, may display at least one video frame included in the video frame set and the relative time information of the video frame located in the target multimedia resource in the playing interface of the target multimedia resource in the terminal, so that the detection personnel may locate, through the at least one video frame included in the video frame set, the position of the video frame located in the target multimedia resource directly according to the relative time information corresponding to the video frame after determining that the illegal content possibly appears in one of the video frames, and play the portion of the target multimedia resource related to the illegal video frame in the playing interface, so as to further confirm whether the illegal content exists in the target multimedia resource.

In addition, when the target multimedia resource is of a short video type, a first target frame extraction time interval can be determined according to the duration and the target frame extraction number of the target multimedia resource, and frames are extracted from the target multimedia resource at each first target frame extraction time interval to obtain the second pictures with the number of the target frame extraction number, so that the second pictures obtained by frame extraction can be ensured to uniformly cover the target multimedia resource; when the target multimedia resource is of a long video type, the third picture can be extracted from the target multimedia resource at each second target frame extraction time interval, so that video frames with moderate quantity and better and uniformly covering the content of the target multimedia resource are obtained, and the accuracy of detecting and evaluating the content of the multimedia resource by a detecting personnel according to the frame extraction result is improved.

Fig. 4 is a block diagram illustrating a multimedia asset detection apparatus according to an exemplary embodiment, and as shown in fig. 4, the apparatus 30 may include:

an obtaining module 301, configured to obtain a target multimedia resource;

a determining module 302, configured to determine a target resource type of the target multimedia resource in response to receiving a frame extraction request for the target multimedia resource, and determine a target frame extraction policy corresponding to the target resource type;

And the frame extraction module 303 is configured to extract frames from the target multimedia resource according to the target frame extraction policy, so as to obtain a video frame set containing at least one video frame, where the video frame set is applied to a scene for detecting the target multimedia resource, and the video frame set includes at least one video frame extracted from the target multimedia resource.

The determining module 302 is connected with the acquiring module 301, the frame extracting module 303 is connected with the determining module 302, and the displaying module 304 is connected with the frame extracting module 303.

The device provided by the embodiment of the application comprises: acquiring a target multimedia resource; in response to receiving a frame extraction request for a target multimedia resource, determining a target resource type of the target multimedia resource, and determining a target frame extraction strategy corresponding to the target resource type; according to the target frame extraction strategy, frame extraction is carried out on the target multimedia resources to obtain a video frame set containing at least one video frame, the video frame set is applied to a scene for detecting the target multimedia resources, the video frame set comprises at least one video frame extracted from the target multimedia resources, in the embodiment of the application, the target frame extraction strategy aiming at the target multimedia resources can be determined according to the target resource types of the target multimedia resources, so that frame extraction operation is carried out on the target multimedia resources with more complicated contents by utilizing the target frame extraction strategy to obtain a video frame set containing at least one video frame, the video frame set is applied to a scene for detecting the target multimedia resources, and after the video frame set is sent to a terminal and displayed, as the video frame set contains partial video frames extracted from the target multimedia resources, a detector can check the video frame set through the terminal and detect and monitor the target multimedia resources according to the video frame set, so that the time for detecting the target multimedia resources is shortened, and the detection efficiency is improved.

the determining module specifically comprises:

the frame extraction module comprises:

The frame extraction module comprises:

In one possible embodiment, the apparatus further comprises:

the display module specifically comprises:

In one possible embodiment, the apparatus further comprises:

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Fig. 5 is a block diagram of a terminal according to an exemplary embodiment. For example, the terminal 400 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 5, the terminal 400 may include one or more of the following components: a processing component 402, a memory 404, a power component 406, a multimedia component 408, an audio component 410, an input/output (I/O) interface 412, a sensor component 414, and a communication component 416.

The processing component 402 generally controls overall operation of the terminal 400, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 402 may include one or more processors 420 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 402 can include one or more modules that facilitate interaction between the processing component 402 and other components. For example, the processing component 402 may include a multimedia module to facilitate interaction between the multimedia component 408 and the processing component 402.

The memory 404 is configured to store various types of data to support operations at the terminal 400. Examples of such data include instructions for any application or method operating on the terminal 400, contact data, phonebook data, messages, pictures, videos, and the like. The memory 404 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power supply component 406 provides power to the various components of the terminal 400. The power supply components 406 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the terminal 400.

The multimedia component 408 comprises a screen between the terminal 400 and the user providing an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 408 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the terminal 400 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 410 is configured to output and/or input audio signals. For example, the audio component 410 includes a Microphone (MIC) configured to receive external audio signals when the terminal 400 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 404 or transmitted via the communication component 416. In some embodiments, audio component 410 further includes a speaker for outputting audio signals.

The I/O interface 412 provides an interface between the processing component 402 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 414 includes one or more sensors for providing status assessment of various aspects of the terminal 400. For example, the sensor assembly 414 may detect the on/off state of the terminal 400, the relative positioning of the components, such as the display and keypad of the terminal 400, the sensor assembly 414 may also detect the change in position of the terminal 400 or a component of the terminal 400, the presence or absence of user contact with the terminal 400, the orientation or acceleration/deceleration of the terminal 400, and the change in temperature of the terminal 400. The sensor assembly 414 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 414 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 414 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 416 is configured to facilitate communication between the terminal 400 and other devices, either wired or wireless. The terminal 400 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 416 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 416 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the terminal 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic elements for performing the steps in the above-described method of detecting a multimedia asset when the terminal 400 is provided as the aforementioned first terminal.

In an exemplary embodiment, a non-transitory storage medium is also provided, such as a memory 404, comprising instructions executable by the processor 420 of the terminal 400 to perform the method of detecting a multimedia asset described above. For example, the non-transitory storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

In an exemplary embodiment, the embodiment of the present application further provides an application program, including one or more instructions, where the one or more instructions may be executed by a processor of a terminal, enable the terminal to perform an operation performed by a method for detecting a multimedia resource provided in the present application.

Fig. 6 is a block diagram of another terminal shown according to an exemplary embodiment. Referring to fig. 6, terminal 500 includes a processing component 522 that further includes one or more processors and memory resources represented by memory 532 for storing instructions, such as applications, executable by processing component 522. The application programs stored in the memory 532 may include one or more modules each corresponding to a set of instructions. Further, the processing component 522 is configured to execute instructions to perform steps in the method of detecting a multimedia asset described above.

The terminal 500 may also include a power component 526 configured to perform power management of the terminal 500, a wired or wireless network interface 550 configured to connect the terminal 500 to a network, and an input output (I/O) interface 558. The terminal 500 may operate based on an operating system stored in the memory 532, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM or the like.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for detecting a multimedia resource, the method comprising:

acquiring a target multimedia resource;

extracting frames from the target multimedia resources according to the target frame extraction strategy to obtain a video frame set containing at least one video frame, wherein the video frame set is applied to a scene for detecting the target multimedia resources and comprises at least one video frame extracted from the target multimedia resources; the target resource types of the target multimedia resource comprise a picture video type, a short video type and a long video type, wherein the picture video type is a multimedia resource which is manufactured by a plurality of original pictures according to a specific rule through various video manufacturing software; the duration of the target multimedia resource of the short video type is smaller than or equal to a first preset time, and the duration of the target multimedia resource of the long video type is larger than or equal to a second preset time;

when the target resource type is the picture video type, determining that the target frame extraction strategy is a first target frame extraction strategy, wherein the first target frame extraction strategy is to extract and generate an original picture of the target multimedia resource;

when the target resource type is the short video type, determining that the target frame extraction strategy is a second target frame extraction strategy, wherein the second target frame extraction strategy is to extract video frames with the number of target frame extraction numbers, and the method comprises the following steps: determining the ratio of the duration of the target multimedia resource to the number of the target frame pumping frames, and taking the ratio as the time interval for frame pumping operation from the target multimedia resource;

and when the target resource type is the long video type, determining that the target frame extraction strategy is a third target frame extraction strategy, wherein the third target frame extraction strategy is to extract frames from target multimedia resources according to a fixed frame extraction time interval.

2. The method of claim 1, wherein, in the event that the target frame-taking policy is determined to be a first target frame-taking policy,

3. The method of claim 1, wherein, in the event that the target frame-taking policy is determined to be a second target frame-taking policy,

4. The method of claim 3, wherein the step of extracting the second pictures from the target multimedia resource by the target number of frames comprises:

5. The method of claim 1, wherein, in the event that the target frame-taking policy is determined to be a third target frame-taking policy,

6. The method of claim 1, wherein after the step of frame-extracting the target multimedia asset according to the target frame-extraction policy to obtain a set of video frames comprising at least one video frame, the method further comprises:

and sending the video frame set to the terminal for display.

7. The method of claim 6, wherein the set of video frames further includes relative time information for each of the video frames located in the target multimedia asset;

8. The method of claim 6, wherein prior to the step of transmitting the set of video frames to the terminal for presentation, the method further comprises:

9. A device for detecting a multimedia resource, the device comprising:

the acquisition module is used for acquiring the target multimedia resource;

the determining module is used for determining a target resource type of the target multimedia resource and determining a target frame extraction strategy corresponding to the target resource type under the condition that a frame extraction request sent by a terminal aiming at the target multimedia resource is received;

The frame extraction module is used for extracting frames from the target multimedia resources according to the target frame extraction strategy to obtain a video frame set containing at least one video frame, wherein the video frame set is applied to a scene for detecting the target multimedia resources and comprises at least one video frame extracted from the target multimedia resources; the target resource type includes: the display content of the target multimedia resources of the picture video type comprises at least one first picture, the duration of the target multimedia resources of the short video type is smaller than or equal to a first preset time, and the duration of the target multimedia resources of the long video type is larger than or equal to a second preset time;

the determining module specifically comprises:

the first determining submodule is used for determining that the target frame extraction strategy is a first target frame extraction strategy when the target resource type is the picture video type, and the first target frame extraction strategy is used for extracting and generating an original picture of the target multimedia resource;

a second determining submodule, configured to determine, when the target resource type is the short video type, that the target frame extraction policy is a second target frame extraction policy, where the second target frame extraction policy is to extract a number of video frames with a target frame extraction number, and include: determining the ratio of the duration of the target multimedia resource to the number of the target frame pumping frames, and taking the ratio as the time interval for frame pumping operation from the target multimedia resource;

And the third determining submodule is used for determining that the target frame extraction strategy is a third target frame extraction strategy when the target resource type is the long video type, wherein the third target frame extraction strategy is to extract frames from target multimedia resources according to fixed frame extraction time intervals.

10. The apparatus of claim 9, wherein, in the event that the target frame-taking policy is determined to be a first target frame-taking policy,

the frame extraction module comprises:

11. The apparatus of claim 9, wherein, in the event that the target frame-taking policy is determined to be a second target frame-taking policy,

the frame extraction module comprises:

12. The apparatus of claim 11, wherein the second frame-extraction sub-module comprises:

13. The apparatus of claim 9, wherein, in the event that the target frame-taking policy is determined to be a third target frame-taking policy,

the frame extraction module comprises:

14. The apparatus of claim 9, wherein the apparatus further comprises:

15. The apparatus of claim 14, wherein said set of video frames further comprises relative time information for each of said video frames located in said target multimedia asset;

The display module specifically comprises:

16. The apparatus of claim 14, wherein the apparatus further comprises:

17. A terminal, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to perform operations performed to implement the method of detecting a multimedia resource as claimed in any one of claims 1 to 8.

18. A non-transitory computer readable storage medium, which when executed by a processor of a terminal, causes the terminal to perform an operation performed to implement the method of detecting a multimedia resource as claimed in any one of claims 1 to 8.