CN114143598A

CN114143598A - Video sound effect synthesis system and method

Info

Publication number: CN114143598A
Application number: CN202010916128.8A
Authority: CN
Inventors: 王德江
Original assignee: Shanghai Yunkan Uav Technology Co ltd
Current assignee: Shanghai Yunkan Uav Technology Co ltd
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2022-03-04

Abstract

The invention discloses a video sound effect synthesis system and a method, relating to the technical field of video processing, wherein the system comprises: a decoding unit decoding the media file; a separating unit for separating video data and effect data from the media file data packet generated by decoding; the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and the sound effect synthesis unit is configured to synthesize the generated sound effect file and generate a synthesized sound effect. The method can decode and separate through the original video to obtain the sound effect file, and then synthesize the obtained sound effect file, so that the automation and the intellectualization of sound effect synthesis are realized. Has the advantages of high efficiency and good synthesis effect.

Description

Video sound effect synthesis system and method

Technical Field

The invention relates to the technical field of video processing, in particular to a video sound effect synthesis system and method.

Background

With the development of computer technology and network information computing, people begin to transmit and release information through networks, the networks become important links of people's entertainment and work, digital audio also becomes a mainstream network data form, and with the development of the big data era, the application of audio data is more and more extensive. After the provider of digital audio has published the audio file onto the network, many users can download this shared resource, setting it as their own ring tone, website background music, etc.

Conventionally, after downloading the initial audio from the network, editing the initial audio generally includes cutting the length of the audio, simply splicing the audio, and so on, and when a user wants to insert other audio into the initial audio, the user needs to manually locate the adding position of the audio and add the audio one by one. However, if a sound effect is to be added to the rhythm point of the initial audio, the operations of recognition and addition need to be repeated many times, and the operation process is complicated.

Disclosure of Invention

In view of this, the present invention provides a video and sound effect synthesis system and method, which can decode and separate an original video to obtain a sound effect file, and then synthesize the obtained sound effect file, thereby realizing automation and intellectualization of sound effect synthesis. Has the advantages of high efficiency and good synthesis effect.

In order to achieve the purpose, the invention adopts the following technical scheme:

a video sound effect synthesis system, the system comprising: a decoding unit decoding the media file; a separating unit for separating video data and effect data from the media file data packet generated by decoding; the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and the sound effect synthesis unit is configured to synthesize the generated sound effect file and generate a synthesized sound effect.

Further, the separation unit distinguishes video data and audio data according to an identifier in the media file data packet; the separation unit further comprises a buffer for buffering the sound effect data.

Further, the sound effect synthesizing unit synthesizes the generated sound effect file, and the method for generating the synthesized sound effect executes the following steps: acquiring an initial sound effect; identifying a rhythm point in the initial sound effect, and marking a sound effect area in the initial sound effect according to the rhythm point; obtain with the audio that the audio zone corresponds will audio in the audio is synthesized to audio zone in the initial audio obtains the synthesis audio.

Further, the method for identifying the rhythm point in the initial sound effect comprises the following steps: identifying the beat attribute of the initial sound effect to obtain a beat point of the initial sound effect; analyzing the frequency spectrum of the initial sound effect to obtain characteristic points in the frequency spectrum of the initial sound effect; and matching the beat point of the initial sound effect with the characteristic point in the initial sound effect frequency spectrum to obtain the beat point of the initial sound effect.

Further, the method for labeling the sound effect area in the initial sound effect according to the rhythm point comprises the following steps: placing the initial sound effect into a first audio track; identifying a rhythm point in the initial sound effect in the first audio track, generating a second audio track corresponding to the first audio track, and marking a sound effect area corresponding to the rhythm point in the second audio track; will sound effect in the sound effect is synthesized sound effect region in the initial sound effect obtains the synthetic sound effect includes: extracting a sound effect to be added from the sound effect, and placing the sound effect to be added into the sound effect area; and synthesizing the first audio track and the second audio track to obtain the synthesized sound effect.

Further, the sound effect coding unit can code the sound effect data according to a user-defined sampling rate, sound channel setting, coding format and bit rate.

A video sound effect synthesis method comprises the following steps: step 1: a decoding unit decoding the media file; step 2: a separation unit for separating video data and effect data from the media file data packet generated by decoding; and step 3: the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and 4, step 4: and the sound effect synthesis unit is used for synthesizing the generated sound effect file to generate a synthesized sound effect.

Compared with the prior art, the invention has the following beneficial effects: the method can decode and separate through the original video to obtain the sound effect file, and then synthesize the obtained sound effect file, so that the automation and the intellectualization of sound effect synthesis are realized. Has the advantages of high efficiency and good synthesis effect.

Drawings

The invention is described in further detail below with reference to the following figures and detailed description:

FIG. 1 is a schematic diagram of a system architecture of a video and sound effect synthesis system according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a method for synthesizing video audio effects according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.

It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only configured to match the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not configured to limit the conditions under which the present invention can be implemented, so that the present invention has no technical significance, and any structural modification, ratio relationship change, or size adjustment should still fall within the scope of the present invention without affecting the efficacy and achievable purpose of the present invention. In addition, the terms such as "upper", "lower", "left", "right", "middle" and "one" used in the present specification are for clarity of description, and are not configured to limit the scope of the present invention, and changes or modifications of the relative relationship may be made without substantial technical changes and modifications.

Example 1

As shown in fig. 1, a video sound effect synthesis system, the system comprising: a decoding unit decoding the media file; a separating unit for separating video data and effect data from the media file data packet generated by decoding; the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and the sound effect synthesis unit is configured to synthesize the generated sound effect file and generate a synthesized sound effect.

By adopting the above technical solution, the sound effect is the effect made by sound, which is the noise or sound added to the sound tape to enhance the reality, atmosphere or dramatic message of a scene. The sound includes musical tones and sound effects. Including digital audio, ambient audio, MP3 audio (normal audio, professional).

Sound effects or Sound effects (Sound effects or Audio effects) are artificially created or enhanced sounds used to enhance the Sound processing of art or other content of movies, video games, music, or other media.

In movie and television production, a sound effect is a sound recorded and presented for giving a specific scenario or creative without conversation or music. This term is often used to refer to the process used for recording, not to the recording itself. In professional movie production, the separation of dialog, music and sound recordings is critical, and it must be understood that in this context, recorded dialogs and music are never used as sound effects, and the processing applied to them is often sound effects.

Example 2

On the basis of the above embodiment, the separation unit distinguishes between video data and audio data according to an identifier in the media file packet; the separation unit further comprises a buffer for buffering the sound effect data.

Example 3

On the basis of the previous embodiment, the sound effect synthesizing unit synthesizes the generated sound effect file, and the method for generating the synthesized sound effect executes the following steps: acquiring an initial sound effect; identifying a rhythm point in the initial sound effect, and marking a sound effect area in the initial sound effect according to the rhythm point; obtain with the audio that the audio zone corresponds will audio in the audio is synthesized to audio zone in the initial audio obtains the synthesis audio.

Specifically, the method mainly refers to processing the sound through a digital sound effect processor, so that the sound has different spatial characteristics, such as a hall, an opera house, a cinema, a karst cave, a stadium and the like. The ambient sound effect is mainly realized by processing the sound through ambient filtering, ambient displacement, ambient reflection, ambient transition and the like, so that a listener feels like being in different environments. The sound effect processing is very commonly applied to computer sound cards, and the application of the combination sound is gradually increased. The ambient sound effect also has its disadvantages, and because part of sound information is inevitably lost when the sound is processed, and the simulated effect has a certain gap with the real environment, some people feel that the sound is "false".

Example 4

On the basis of the above embodiment, the method for identifying the rhythm point in the initial sound effect comprises the following steps: identifying the beat attribute of the initial sound effect to obtain a beat point of the initial sound effect; analyzing the frequency spectrum of the initial sound effect to obtain characteristic points in the frequency spectrum of the initial sound effect; and matching the beat point of the initial sound effect with the characteristic point in the initial sound effect frequency spectrum to obtain the beat point of the initial sound effect.

Example 5

On the basis of the previous embodiment, the method for labeling the sound effect area in the initial sound effect according to the rhythm point comprises the following steps: placing the initial sound effect into a first audio track; identifying a rhythm point in the initial sound effect in the first audio track, generating a second audio track corresponding to the first audio track, and marking a sound effect area corresponding to the rhythm point in the second audio track; will sound effect in the sound effect is synthesized sound effect region in the initial sound effect obtains the synthetic sound effect includes: extracting a sound effect to be added from the sound effect, and placing the sound effect to be added into the sound effect area; and synthesizing the first audio track and the second audio track to obtain the synthesized sound effect.

Example 6

On the basis of the above embodiment, the sound effect encoding unit may encode the sound effect data according to a user-defined sampling rate, channel setting, encoding format, and bit rate.

Example 7

As shown in fig. 2, a method for synthesizing video and sound effects, the method comprises the following steps: step 1: a decoding unit decoding the media file; step 2: a separation unit for separating video data and effect data from the media file data packet generated by decoding; and step 3: the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and 4, step 4: and the sound effect synthesis unit is used for synthesizing the generated sound effect file to generate a synthesized sound effect.

Specifically, the conventional analysis/synthesis type speech synthesis method is mainly used for compression encoding of speech. In such applications, the poor separation is not a significant problem. That is, if a sound is synthesized again without modifying the parameters, a sound close to the original sound can be obtained. In a typical Linear Predictive Coding (LPC), white noise or pulse trains with the same spectrum are assumed for the sound sources. In addition, for vocal tract, it is assumed that the numerator is only a transfer function of the omnipolar type of constant term. In practice the spectra of the sound sources are not identical. Further, the transfer function of the vocal tract is not of the all-pole type due to the influence of the complicated uneven shape of the vocal tract and the branching to the nasal cavity. Therefore, in the LPC analysis and synthesis system, there is a certain deterioration in sound quality due to a mismatch of models. Typically, synthetic sounds such as nasal congestion and buzzing sounds are known.

Example 8

Example 9

Example 10

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage unit and the processing unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of skill in the art would appreciate that the various illustrative elements, method steps, and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the elements, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, register unit, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or unit that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or unit.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not to be construed as limiting the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A video sound effect synthesis system, the system comprising: a decoding unit for decoding the media file; a separating unit for separating video data and effect data from the media file data packet generated by decoding; the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and the sound effect synthesis unit is configured to synthesize the generated sound effect file and generate a synthesized sound effect.

2. The system of claim 1, wherein: the separation unit distinguishes video data and effect data according to identifiers in the media file data packet; the separation unit further comprises a buffer for buffering the sound effect data.

3. The system of claim 2, wherein the sound effect synthesizing unit synthesizes the generated sound effect file, and the method for generating the synthesized sound effect comprises the following steps: acquiring an initial sound effect; identifying a rhythm point in the initial sound effect, and marking a sound effect area in the initial sound effect according to the rhythm point; obtain with the audio that the audio zone corresponds will audio in the audio is synthesized to audio zone in the initial audio obtains the synthesis audio.

4. A system according to claim 3, wherein the method of identifying a tempo point in the initial sound effect comprises: identifying the beat attribute of the initial sound effect to obtain a beat point of the initial sound effect; analyzing the frequency spectrum of the initial sound effect to obtain characteristic points in the frequency spectrum of the initial sound effect; and matching the beat point of the initial sound effect with the characteristic point in the initial sound effect frequency spectrum to obtain the beat point of the initial sound effect.

5. The system of claim 4, wherein the method for labeling the sound effect area in the initial sound effect according to the rhythm point comprises: placing the initial sound effect into a first audio track; identifying a rhythm point in the initial sound effect in the first audio track, generating a second audio track corresponding to the first audio track, and marking a sound effect area corresponding to the rhythm point in the second audio track; will sound effect in the sound effect is synthesized sound effect region in the initial sound effect obtains the synthetic sound effect includes: extracting a sound effect to be added from the sound effect, and placing the sound effect to be added into the sound effect area; and synthesizing the first audio track and the second audio track to obtain the synthesized sound effect.

6. The system of claim 5 wherein the sound effect encoding unit encodes the sound effect data at a user-defined sampling rate, channel setting, encoding format, and bit rate.

7. A method for synthesizing video sound effects based on the system of any one of claims 1 to 6, wherein the method comprises the following steps: step 1: a decoding unit decoding the media file; step 2: a separation unit for separating video data and effect data from the media file data packet generated by decoding; and step 3: the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and 4, step 4: and the sound effect synthesis unit is used for synthesizing the generated sound effect file to generate a synthesized sound effect.

8. The method of claim 7, wherein: the separation unit distinguishes video data and effect data according to identifiers in the media file data packet; the separation unit further comprises a buffer for buffering the sound effect data.

9. The method of claim 8, wherein the sound effect synthesizing unit synthesizes the generated sound effect file, and the method of generating the synthesized sound effect performs the following steps: acquiring an initial sound effect; identifying a rhythm point in the initial sound effect, and marking a sound effect area in the initial sound effect according to the rhythm point; obtain with the audio that the audio zone corresponds will audio in the audio is synthesized to audio zone in the initial audio obtains the synthesis audio.

10. A method as recited in claim 9, wherein the method of identifying a tempo point in the initial sound effect comprises: identifying the beat attribute of the initial sound effect to obtain a beat point of the initial sound effect; analyzing the frequency spectrum of the initial sound effect to obtain characteristic points in the frequency spectrum of the initial sound effect; and matching the beat point of the initial sound effect with the characteristic point in the initial sound effect frequency spectrum to obtain the beat point of the initial sound effect.