CN116600168A

CN116600168A - Multimedia data processing method and device, electronic equipment and storage medium

Info

Publication number: CN116600168A
Application number: CN202310401667.1A
Authority: CN
Inventors: 陶继伟
Original assignee: Shenzhen Sailing Weiye Technology Co ltd
Current assignee: Shenzhen Sailing Weiye Technology Co ltd
Priority date: 2023-04-10
Filing date: 2023-04-10
Publication date: 2023-08-15

Abstract

The application is suitable for the technical field of data processing, and provides a multimedia data processing method, a device, electronic equipment and a storage medium, which comprise the following steps: receiving a multimedia video and a multimedia tag input by a user; identifying a caption area in the multimedia video to obtain caption-free video data and caption text data; processing and identifying an audio analog signal in the multimedia video to obtain background music data; classifying and storing the caption-free video data, caption text data and background music data according to the multimedia tag; receiving a video making instruction input by a user, wherein the video making instruction comprises a video to be made and a video tag; receiving a caption importing instruction, and invoking corresponding caption text data according to the video label, so that a user selects the caption text data, and importing the caption text data selected by the user into the video to be produced. The application can automatically generate the caption-free video data, the caption text data and the background music data, and is efficient and convenient.

Description

Multimedia data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method and apparatus for processing multimedia data, an electronic device, and a storage medium.

Background

For video fans, some wonderful video clips are downloaded frequently, video content, video lines and even background music can be utilized and processed secondarily, at present, users need to process the video and the subtitle text by themselves to obtain video and subtitle text without subtitles, and find the video and the subtitle text to obtain the background music, which is time-consuming and labor-consuming. Accordingly, there is a need to provide a multimedia data processing method, apparatus, electronic device, and storage medium, which aim to solve the above problems.

Disclosure of Invention

Aiming at the defects existing in the prior art, the application aims to provide a multimedia data processing method, a device, an electronic device and a storage medium, so as to solve the problems existing in the background art.

The application is realized in that a method for processing multimedia data comprises the following steps:

receiving a multimedia video and a multimedia tag input by a user;

identifying a caption area in the multimedia video to obtain caption-free video data and caption text data;

processing and identifying an audio analog signal in the multimedia video to obtain background music data;

classifying and storing the caption-free video data, caption text data and background music data according to the multimedia tag;

receiving a video making instruction input by a user, wherein the video making instruction comprises a video to be made and a video tag;

receiving a caption importing instruction, and invoking corresponding caption text data according to the video label so that a user selects the caption text data selected by the user to import the caption text data selected by the user into the video to be produced;

and receiving a music import instruction, calling corresponding background music data according to the video tag, enabling a user to select, and importing the background music data selected by the user into the video to be produced.

As a further scheme of the application: the step of identifying the caption area in the multimedia video to obtain the caption-free video data and the caption text data specifically comprises the following steps:

determining a caption area in the multimedia video, identifying caption texts in the caption area, sequencing and integrating all the caption texts according to the playing time to obtain a caption text data;

and cutting the lower edge of the multimedia video so that the caption area is cut off, and obtaining the caption-free video data.

As a further scheme of the application: the step of processing and identifying the audio analog signal in the multimedia video to obtain the background music data specifically comprises the following steps:

converting an audio analog signal in the multimedia video into a digital signal;

extracting audio features, and constructing an audio fingerprint according to the audio features;

and inputting the audio fingerprints into a song database, and performing similarity retrieval to obtain background music data.

As a further scheme of the application: the step of calling the corresponding caption text data according to the video tag to enable the user to select and importing the caption text data selected by the user into the video to be produced specifically comprises the following steps:

matching the video tag with the multimedia tags of all the caption text data to determine successfully matched caption text data;

receiving a caption text selection instruction, and selecting one caption text data;

and receiving a caption text editing instruction, editing caption text data, adding a time period for each caption text, and importing the caption text data into the video to be produced according to the time period.

As a further scheme of the application: the step of calling the corresponding background music data according to the video tag to enable the user to select and importing the background music data selected by the user into the video to be produced specifically comprises the following steps:

matching the video tag with the multimedia tags of all background music data to determine the successfully matched background music data;

receiving a background music selection instruction, and selecting one of the background music data;

receiving a background music editing instruction, intercepting background music data, adding an import time node, and importing the background music data into a video to be produced according to the import time node.

Another object of the present application is to provide a multimedia data processing apparatus, the apparatus comprising:

the multimedia data receiving module is used for receiving the multimedia video and the multimedia tag input by the user;

the subtitle video data module is used for identifying a subtitle region in the multimedia video to obtain subtitle-free video data and subtitle text data;

the background music data module is used for processing and identifying the audio analog signals in the multimedia video to obtain background music data;

the data classification storage module is used for classifying and storing the caption-free video data, the caption text data and the background music data according to the multimedia tag;

the video production module is used for receiving a video production instruction input by a user, wherein the video production instruction comprises a video to be produced and a video tag;

the subtitle text importing module is used for receiving a subtitle importing instruction, invoking corresponding subtitle text data according to the video tag, enabling a user to select, and importing the subtitle text data selected by the user into the video to be produced;

the background music importing module is used for receiving a music importing instruction, invoking corresponding background music data according to the video tag, enabling a user to select, and importing the background music data selected by the user into the video to be produced.

As a further scheme of the application: the subtitle video data module includes:

the caption text data unit is used for determining a caption area in the multimedia video, identifying caption texts in the caption area, sequencing and integrating all the caption texts according to the playing time to obtain a caption text data;

and the caption-free video data unit is used for cutting the lower edge of the multimedia video so that the caption area is cut off to obtain caption-free video data.

As a further scheme of the application: the background music data module includes:

the analog signal conversion unit is used for converting an audio analog signal in the multimedia video into a digital signal;

the audio fingerprint generation unit is used for extracting audio characteristics and constructing audio fingerprints according to the audio characteristics;

and the song data retrieval unit is used for inputting the audio fingerprints into the song database, and performing similarity retrieval to obtain background music data.

The application also provides an electronic device comprising a processor, a storage medium and a computer program stored on the storage medium and capable of running on the processor, which when executed by the processor, implements the specific steps of the multimedia data processing method.

The application also provides a storage medium, wherein the storage medium stores a program or instructions which, when executed by a processor, implement specific steps in the multimedia data processing method.

Compared with the prior art, the application has the beneficial effects that:

after the multimedia video and the multimedia label are input by a user, the application can automatically generate the caption-free video data, the caption text data and the background music data, and can carry out classified storage, and the subsequent utilization of the caption text data and the background music data is more convenient and quicker.

Drawings

Fig. 1 is a flowchart of a multimedia data processing method.

Fig. 2 is a flowchart of a method for processing multimedia data to obtain subtitle-free video data and subtitle text data.

Fig. 3 is a flowchart of a method for processing multimedia data to obtain background music data.

Fig. 4 is a flowchart of a multimedia data processing method for retrieving corresponding subtitle text data according to a video tag.

Fig. 5 is a flowchart of a multimedia data processing method for retrieving corresponding background music data according to a video tag.

Fig. 6 is a schematic structural diagram of a multimedia data processing apparatus.

Fig. 7 is a schematic diagram of a structure of a subtitle video data module in a multimedia data processing apparatus.

Fig. 8 is a schematic diagram of a background music data module in a multimedia data processing apparatus.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

Specific implementations of the application are described in detail below in connection with specific embodiments.

As shown in fig. 1, an embodiment of the present application provides a multimedia data processing method, which includes the following steps:

s100, receiving a multimedia video and a multimedia tag input by a user;

s200, identifying a caption area in the multimedia video to obtain caption-free video data and caption text data;

s300, processing and identifying audio analog signals in the multimedia video to obtain background music data;

s400, classifying and storing the caption-free video data, caption text data and background music data according to the multimedia label;

s500, receiving a video production instruction input by a user, wherein the video production instruction comprises a video to be produced and a video tag;

s600, receiving a caption importing instruction, calling corresponding caption text data according to a video label, enabling a user to select, and importing the caption text data selected by the user into a video to be produced;

s700, receiving a music import instruction, calling corresponding background music data according to the video tag, enabling a user to select, and importing the background music data selected by the user into the video to be produced.

It should be noted that, for video fans, some wonderful video clips are often downloaded, and video content, video lines and even background music can be secondarily utilized and processed, currently, users need to process by themselves to obtain video without subtitles and subtitle text, and find themselves to obtain background music, which is time-consuming and laborious.

In the embodiment of the application, when a user needs to make secondary use and process a certain section of multimedia video, the multimedia video is directly input, and a multimedia tag is set, wherein the multimedia tag is used for reflecting the type of the multimedia video, for example, the multimedia tag is a text and a wisdom, so that the subsequent secondary use is convenient. When a user needs to make a short video by using stored caption text data and background music data, directly inputting a video making instruction, wherein the video making instruction comprises a video to be made and a video tag, uploading the video to be made, and inputting a caption importing instruction when the caption text data is needed, the embodiment of the application can call the corresponding caption text data according to the video tag, and after the user selects, importing the selected caption text data into the video to be made; when background music data is needed to be used, a music import instruction is input, the embodiment of the application can call the corresponding background music data according to the video tag, and after a user selects the background music data, the selected background music data is imported into a video to be produced, so that the subtitle text data and the background music data are more conveniently and rapidly utilized.

As shown in fig. 2, as a preferred embodiment of the present application, the step of identifying a subtitle region in a multimedia video to obtain subtitle-less video data and subtitle text data specifically includes:

s201, determining a caption area in a multimedia video, identifying caption texts in the caption area, sequencing and integrating all the caption texts according to the playing time to obtain a caption text data;

s202, cutting the lower edge of the multimedia video so that the caption area is cut off, and obtaining caption-free video data.

In the embodiment of the application, the caption area in the multimedia video is automatically determined, the picture at the caption area is intercepted, the picture is subjected to character recognition to obtain the caption text, all the caption texts are sequenced and integrated according to the playing time to obtain a caption text data, and a user can modify and delete the caption text data according to the self requirement; the embodiment of the application also cuts the lower edge of the multimedia video so that the caption area is cut off, and thus, the video data without captions can be obtained.

As shown in fig. 3, as a preferred embodiment of the present application, the steps of processing and identifying an audio analog signal in a multimedia video to obtain background music data specifically include:

s301, converting an audio analog signal in a multimedia video into a digital signal;

s302, extracting audio features, and constructing an audio fingerprint according to the audio features;

s303, inputting the audio fingerprints into a song database, and performing similarity retrieval to obtain background music data.

In the embodiment of the application, the multimedia video is required to be divided into a plurality of sections, then the audio analog signal in each section of the multimedia video is converted into a digital signal (ADC), then the audio characteristics are extracted, the audio fingerprints are constructed according to the audio characteristics, the audio fingerprints are input into a song database, the song database contains a large number of digital signals of songs, the similarity is searched, the song with the highest similarity is output, and the digital file of the output song is background music data.

As shown in fig. 4, as a preferred embodiment of the present application, the step of retrieving corresponding subtitle text data according to a video tag, enabling a user to select, and importing the subtitle text data selected by the user into a video to be produced specifically includes:

s601, matching the video tag with the multimedia tags of all the caption text data, and determining successfully matched caption text data;

s602, receiving a caption text selection instruction, and selecting one caption text data;

s603, receiving a caption text editing instruction, editing caption text data, adding a time period for each caption text, and importing the caption text data into the video to be produced according to the time period.

In the embodiment of the application, in order to quickly determine the caption text data matched with the video, the video label is required to be matched with the multimedia labels of all the caption text data, the matching degree is calculated, when the matching degree is larger than a set value, the matching is considered successful, all the successfully matched caption text data are called out, the user selects one of the caption text data, the user can edit the caption text data according to the requirement, and a time period is added for each caption text.

As shown in fig. 5, as a preferred embodiment of the present application, the step of retrieving corresponding background music data according to a video tag, enabling a user to select, and importing the background music data selected by the user into a video to be produced specifically includes:

s701, matching the video tag with the multimedia tags of all background music data, and determining the successfully matched background music data;

s702, receiving a background music selection instruction, and selecting one of the background music data;

s703, receiving a background music editing instruction, intercepting the background music data, adding an import time node, and importing the background music data into the video to be produced according to the import time node.

In the embodiment of the application, in order to quickly determine the background music data matched with the video, the video tag is required to be matched with the multimedia tags of all the background music data, the matching degree is calculated, when the matching degree is larger than a set value, the matching is considered successful, all the successfully matched background music data are called out, the user selects one of the background music data, the user can intercept the background music data according to the requirement, an import time node is added, and the background music data are imported into the corresponding node of the video to be manufactured according to the import time node.

As shown in fig. 6, an embodiment of the present application further provides a multimedia data processing apparatus, where the apparatus includes:

a multimedia data receiving module 100 for receiving a multimedia video and a multimedia tag inputted by a user;

the caption video data module 200 is configured to identify a caption area in the multimedia video, so as to obtain caption-free video data and caption text data;

the background music data module 300 is configured to process and identify an audio analog signal in a multimedia video to obtain background music data;

the data classification storage module 400 is configured to store the non-subtitle video data, the subtitle text data, and the background music data in a classified manner according to the multimedia tag;

the video production module 500 is configured to receive a video production instruction input by a user, where the video production instruction includes a video to be produced and a video tag;

the subtitle text importing module 600 is configured to receive a subtitle importing instruction, invoke corresponding subtitle text data according to a video tag, enable a user to select, and import the subtitle text data selected by the user into a video to be produced;

the background music importing module 700 is configured to receive a music importing instruction, invoke corresponding background music data according to a video tag, enable a user to select, and import the background music data selected by the user into a video to be produced.

As shown in fig. 7, as a preferred embodiment of the present application, the subtitle video data module 200 includes:

a caption text data unit 201, configured to determine a caption area in the multimedia video, identify caption texts in the caption area, sort and integrate all caption texts according to the playing time, and obtain a caption text data;

the no-subtitle video data unit 202 is configured to crop a lower edge of the multimedia video, so that a subtitle region is cropped to obtain no-subtitle video data.

As shown in fig. 8, as a preferred embodiment of the present application, the background music data module 300 includes:

an analog signal conversion unit 301, configured to convert an audio analog signal in a multimedia video into a digital signal;

an audio fingerprint generation unit 302, configured to extract audio features, and construct an audio fingerprint according to the audio features;

the song data retrieving unit 303 is configured to input the audio fingerprint into a song database, and perform similarity retrieval to obtain background music data.

The embodiment of the application also provides electronic equipment, which comprises a processor, a storage medium and a computer program stored on the storage medium and capable of running on the processor, wherein the specific steps in the multimedia data processing method are realized when the processor executes the computer program.

The embodiment of the application also provides a storage medium, wherein the storage medium stores a program or instructions which, when executed by a processor, realize specific steps in the multimedia data processing method.

The foregoing description of the preferred embodiments of the present application should not be taken as limiting the application, but rather should be understood to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the application.

It should be understood that, although the steps in the flowcharts of the embodiments of the present application are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method of multimedia data processing, the method comprising the steps of:

receiving a multimedia video and a multimedia tag input by a user;

2. The method for processing multimedia data according to claim 1, wherein the step of identifying a caption area in the multimedia video to obtain the caption-free video data and the caption text data comprises:

3. The method for processing multimedia data according to claim 1, wherein the step of processing and recognizing the audio analog signal in the multimedia video to obtain the background music data specifically comprises:

4. The method for processing multimedia data according to claim 1, wherein the step of retrieving the corresponding subtitle text data according to the video tag, enabling the user to select, and importing the subtitle text data selected by the user into the video to be produced specifically comprises:

5. The method for processing multimedia data according to claim 1, wherein the step of retrieving the corresponding background music data according to the video tag, enabling the user to select, and importing the background music data selected by the user into the video to be produced specifically comprises:

6. A multimedia data processing apparatus, the apparatus comprising:

7. The multimedia data processing apparatus of claim 6, wherein the subtitle video data module comprises:

8. The multimedia data processing apparatus of claim 6, wherein the background music data module comprises:

9. An electronic device comprising a processor, a storage medium and a computer program stored on the storage medium and capable of running on the processor, which when executed by the processor, performs the specific steps of the multimedia data processing method according to any one of claims 1 to 5.

10. A storage medium having stored thereon a program or instructions which, when executed by a processor, implement the specific steps of the multimedia data processing method of any of claims 1 to 5.