CN114143598A - Video sound effect synthesis system and method - Google Patents

Video sound effect synthesis system and method Download PDF

Info

Publication number
CN114143598A
CN114143598A CN202010916128.8A CN202010916128A CN114143598A CN 114143598 A CN114143598 A CN 114143598A CN 202010916128 A CN202010916128 A CN 202010916128A CN 114143598 A CN114143598 A CN 114143598A
Authority
CN
China
Prior art keywords
sound effect
initial
audio
point
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010916128.8A
Other languages
Chinese (zh)
Inventor
王德江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yunkan Uav Technology Co ltd
Original Assignee
Shanghai Yunkan Uav Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yunkan Uav Technology Co ltd filed Critical Shanghai Yunkan Uav Technology Co ltd
Priority to CN202010916128.8A priority Critical patent/CN114143598A/en
Publication of CN114143598A publication Critical patent/CN114143598A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages

Abstract

The invention discloses a video sound effect synthesis system and a method, relating to the technical field of video processing, wherein the system comprises: a decoding unit decoding the media file; a separating unit for separating video data and effect data from the media file data packet generated by decoding; the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and the sound effect synthesis unit is configured to synthesize the generated sound effect file and generate a synthesized sound effect. The method can decode and separate through the original video to obtain the sound effect file, and then synthesize the obtained sound effect file, so that the automation and the intellectualization of sound effect synthesis are realized. Has the advantages of high efficiency and good synthesis effect.

Description

Video sound effect synthesis system and method
Technical Field
The invention relates to the technical field of video processing, in particular to a video sound effect synthesis system and method.
Background
With the development of computer technology and network information computing, people begin to transmit and release information through networks, the networks become important links of people's entertainment and work, digital audio also becomes a mainstream network data form, and with the development of the big data era, the application of audio data is more and more extensive. After the provider of digital audio has published the audio file onto the network, many users can download this shared resource, setting it as their own ring tone, website background music, etc.
Conventionally, after downloading the initial audio from the network, editing the initial audio generally includes cutting the length of the audio, simply splicing the audio, and so on, and when a user wants to insert other audio into the initial audio, the user needs to manually locate the adding position of the audio and add the audio one by one. However, if a sound effect is to be added to the rhythm point of the initial audio, the operations of recognition and addition need to be repeated many times, and the operation process is complicated.
Disclosure of Invention
In view of this, the present invention provides a video and sound effect synthesis system and method, which can decode and separate an original video to obtain a sound effect file, and then synthesize the obtained sound effect file, thereby realizing automation and intellectualization of sound effect synthesis. Has the advantages of high efficiency and good synthesis effect.
In order to achieve the purpose, the invention adopts the following technical scheme:
a video sound effect synthesis system, the system comprising: a decoding unit decoding the media file; a separating unit for separating video data and effect data from the media file data packet generated by decoding; the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and the sound effect synthesis unit is configured to synthesize the generated sound effect file and generate a synthesized sound effect.
Further, the separation unit distinguishes video data and audio data according to an identifier in the media file data packet; the separation unit further comprises a buffer for buffering the sound effect data.
Further, the sound effect synthesizing unit synthesizes the generated sound effect file, and the method for generating the synthesized sound effect executes the following steps: acquiring an initial sound effect; identifying a rhythm point in the initial sound effect, and marking a sound effect area in the initial sound effect according to the rhythm point; obtain with the audio that the audio zone corresponds will audio in the audio is synthesized to audio zone in the initial audio obtains the synthesis audio.
Further, the method for identifying the rhythm point in the initial sound effect comprises the following steps: identifying the beat attribute of the initial sound effect to obtain a beat point of the initial sound effect; analyzing the frequency spectrum of the initial sound effect to obtain characteristic points in the frequency spectrum of the initial sound effect; and matching the beat point of the initial sound effect with the characteristic point in the initial sound effect frequency spectrum to obtain the beat point of the initial sound effect.
Further, the method for labeling the sound effect area in the initial sound effect according to the rhythm point comprises the following steps: placing the initial sound effect into a first audio track; identifying a rhythm point in the initial sound effect in the first audio track, generating a second audio track corresponding to the first audio track, and marking a sound effect area corresponding to the rhythm point in the second audio track; will sound effect in the sound effect is synthesized sound effect region in the initial sound effect obtains the synthetic sound effect includes: extracting a sound effect to be added from the sound effect, and placing the sound effect to be added into the sound effect area; and synthesizing the first audio track and the second audio track to obtain the synthesized sound effect.
Further, the sound effect coding unit can code the sound effect data according to a user-defined sampling rate, sound channel setting, coding format and bit rate.
A video sound effect synthesis method comprises the following steps: step 1: a decoding unit decoding the media file; step 2: a separation unit for separating video data and effect data from the media file data packet generated by decoding; and step 3: the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and 4, step 4: and the sound effect synthesis unit is used for synthesizing the generated sound effect file to generate a synthesized sound effect.
Further, the separation unit distinguishes video data and audio data according to an identifier in the media file data packet; the separation unit further comprises a buffer for buffering the sound effect data.
Further, the sound effect synthesizing unit synthesizes the generated sound effect file, and the method for generating the synthesized sound effect executes the following steps: acquiring an initial sound effect; identifying a rhythm point in the initial sound effect, and marking a sound effect area in the initial sound effect according to the rhythm point; obtain with the audio that the audio zone corresponds will audio in the audio is synthesized to audio zone in the initial audio obtains the synthesis audio.
Further, the method for identifying the rhythm point in the initial sound effect comprises the following steps: identifying the beat attribute of the initial sound effect to obtain a beat point of the initial sound effect; analyzing the frequency spectrum of the initial sound effect to obtain characteristic points in the frequency spectrum of the initial sound effect; and matching the beat point of the initial sound effect with the characteristic point in the initial sound effect frequency spectrum to obtain the beat point of the initial sound effect.
Compared with the prior art, the invention has the following beneficial effects: the method can decode and separate through the original video to obtain the sound effect file, and then synthesize the obtained sound effect file, so that the automation and the intellectualization of sound effect synthesis are realized. Has the advantages of high efficiency and good synthesis effect.
Drawings
The invention is described in further detail below with reference to the following figures and detailed description:
FIG. 1 is a schematic diagram of a system architecture of a video and sound effect synthesis system according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a method for synthesizing video audio effects according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure.
It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only configured to match the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not configured to limit the conditions under which the present invention can be implemented, so that the present invention has no technical significance, and any structural modification, ratio relationship change, or size adjustment should still fall within the scope of the present invention without affecting the efficacy and achievable purpose of the present invention. In addition, the terms such as "upper", "lower", "left", "right", "middle" and "one" used in the present specification are for clarity of description, and are not configured to limit the scope of the present invention, and changes or modifications of the relative relationship may be made without substantial technical changes and modifications.
Example 1
As shown in fig. 1, a video sound effect synthesis system, the system comprising: a decoding unit decoding the media file; a separating unit for separating video data and effect data from the media file data packet generated by decoding; the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and the sound effect synthesis unit is configured to synthesize the generated sound effect file and generate a synthesized sound effect.
By adopting the above technical solution, the sound effect is the effect made by sound, which is the noise or sound added to the sound tape to enhance the reality, atmosphere or dramatic message of a scene. The sound includes musical tones and sound effects. Including digital audio, ambient audio, MP3 audio (normal audio, professional).
Sound effects or Sound effects (Sound effects or Audio effects) are artificially created or enhanced sounds used to enhance the Sound processing of art or other content of movies, video games, music, or other media.
In movie and television production, a sound effect is a sound recorded and presented for giving a specific scenario or creative without conversation or music. This term is often used to refer to the process used for recording, not to the recording itself. In professional movie production, the separation of dialog, music and sound recordings is critical, and it must be understood that in this context, recorded dialogs and music are never used as sound effects, and the processing applied to them is often sound effects.
Example 2
On the basis of the above embodiment, the separation unit distinguishes between video data and audio data according to an identifier in the media file packet; the separation unit further comprises a buffer for buffering the sound effect data.
Example 3
On the basis of the previous embodiment, the sound effect synthesizing unit synthesizes the generated sound effect file, and the method for generating the synthesized sound effect executes the following steps: acquiring an initial sound effect; identifying a rhythm point in the initial sound effect, and marking a sound effect area in the initial sound effect according to the rhythm point; obtain with the audio that the audio zone corresponds will audio in the audio is synthesized to audio zone in the initial audio obtains the synthesis audio.
Specifically, the method mainly refers to processing the sound through a digital sound effect processor, so that the sound has different spatial characteristics, such as a hall, an opera house, a cinema, a karst cave, a stadium and the like. The ambient sound effect is mainly realized by processing the sound through ambient filtering, ambient displacement, ambient reflection, ambient transition and the like, so that a listener feels like being in different environments. The sound effect processing is very commonly applied to computer sound cards, and the application of the combination sound is gradually increased. The ambient sound effect also has its disadvantages, and because part of sound information is inevitably lost when the sound is processed, and the simulated effect has a certain gap with the real environment, some people feel that the sound is "false".
Example 4
On the basis of the above embodiment, the method for identifying the rhythm point in the initial sound effect comprises the following steps: identifying the beat attribute of the initial sound effect to obtain a beat point of the initial sound effect; analyzing the frequency spectrum of the initial sound effect to obtain characteristic points in the frequency spectrum of the initial sound effect; and matching the beat point of the initial sound effect with the characteristic point in the initial sound effect frequency spectrum to obtain the beat point of the initial sound effect.
Example 5
On the basis of the previous embodiment, the method for labeling the sound effect area in the initial sound effect according to the rhythm point comprises the following steps: placing the initial sound effect into a first audio track; identifying a rhythm point in the initial sound effect in the first audio track, generating a second audio track corresponding to the first audio track, and marking a sound effect area corresponding to the rhythm point in the second audio track; will sound effect in the sound effect is synthesized sound effect region in the initial sound effect obtains the synthetic sound effect includes: extracting a sound effect to be added from the sound effect, and placing the sound effect to be added into the sound effect area; and synthesizing the first audio track and the second audio track to obtain the synthesized sound effect.
Example 6
On the basis of the above embodiment, the sound effect encoding unit may encode the sound effect data according to a user-defined sampling rate, channel setting, encoding format, and bit rate.
Example 7
As shown in fig. 2, a method for synthesizing video and sound effects, the method comprises the following steps: step 1: a decoding unit decoding the media file; step 2: a separation unit for separating video data and effect data from the media file data packet generated by decoding; and step 3: the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and 4, step 4: and the sound effect synthesis unit is used for synthesizing the generated sound effect file to generate a synthesized sound effect.
Specifically, the conventional analysis/synthesis type speech synthesis method is mainly used for compression encoding of speech. In such applications, the poor separation is not a significant problem. That is, if a sound is synthesized again without modifying the parameters, a sound close to the original sound can be obtained. In a typical Linear Predictive Coding (LPC), white noise or pulse trains with the same spectrum are assumed for the sound sources. In addition, for vocal tract, it is assumed that the numerator is only a transfer function of the omnipolar type of constant term. In practice the spectra of the sound sources are not identical. Further, the transfer function of the vocal tract is not of the all-pole type due to the influence of the complicated uneven shape of the vocal tract and the branching to the nasal cavity. Therefore, in the LPC analysis and synthesis system, there is a certain deterioration in sound quality due to a mismatch of models. Typically, synthetic sounds such as nasal congestion and buzzing sounds are known.
Example 8
On the basis of the above embodiment, the separation unit distinguishes between video data and audio data according to an identifier in the media file packet; the separation unit further comprises a buffer for buffering the sound effect data.
Example 9
On the basis of the previous embodiment, the sound effect synthesizing unit synthesizes the generated sound effect file, and the method for generating the synthesized sound effect executes the following steps: acquiring an initial sound effect; identifying a rhythm point in the initial sound effect, and marking a sound effect area in the initial sound effect according to the rhythm point; obtain with the audio that the audio zone corresponds will audio in the audio is synthesized to audio zone in the initial audio obtains the synthesis audio.
Example 10
On the basis of the above embodiment, the method for identifying the rhythm point in the initial sound effect comprises the following steps: identifying the beat attribute of the initial sound effect to obtain a beat point of the initial sound effect; analyzing the frequency spectrum of the initial sound effect to obtain characteristic points in the frequency spectrum of the initial sound effect; and matching the beat point of the initial sound effect with the characteristic point in the initial sound effect frequency spectrum to obtain the beat point of the initial sound effect.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage unit and the processing unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative elements, method steps, and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the elements, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, register unit, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or unit that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or unit.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not to be construed as limiting the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A video sound effect synthesis system, the system comprising: a decoding unit for decoding the media file; a separating unit for separating video data and effect data from the media file data packet generated by decoding; the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and the sound effect synthesis unit is configured to synthesize the generated sound effect file and generate a synthesized sound effect.
2. The system of claim 1, wherein: the separation unit distinguishes video data and effect data according to identifiers in the media file data packet; the separation unit further comprises a buffer for buffering the sound effect data.
3. The system of claim 2, wherein the sound effect synthesizing unit synthesizes the generated sound effect file, and the method for generating the synthesized sound effect comprises the following steps: acquiring an initial sound effect; identifying a rhythm point in the initial sound effect, and marking a sound effect area in the initial sound effect according to the rhythm point; obtain with the audio that the audio zone corresponds will audio in the audio is synthesized to audio zone in the initial audio obtains the synthesis audio.
4. A system according to claim 3, wherein the method of identifying a tempo point in the initial sound effect comprises: identifying the beat attribute of the initial sound effect to obtain a beat point of the initial sound effect; analyzing the frequency spectrum of the initial sound effect to obtain characteristic points in the frequency spectrum of the initial sound effect; and matching the beat point of the initial sound effect with the characteristic point in the initial sound effect frequency spectrum to obtain the beat point of the initial sound effect.
5. The system of claim 4, wherein the method for labeling the sound effect area in the initial sound effect according to the rhythm point comprises: placing the initial sound effect into a first audio track; identifying a rhythm point in the initial sound effect in the first audio track, generating a second audio track corresponding to the first audio track, and marking a sound effect area corresponding to the rhythm point in the second audio track; will sound effect in the sound effect is synthesized sound effect region in the initial sound effect obtains the synthetic sound effect includes: extracting a sound effect to be added from the sound effect, and placing the sound effect to be added into the sound effect area; and synthesizing the first audio track and the second audio track to obtain the synthesized sound effect.
6. The system of claim 5 wherein the sound effect encoding unit encodes the sound effect data at a user-defined sampling rate, channel setting, encoding format, and bit rate.
7. A method for synthesizing video sound effects based on the system of any one of claims 1 to 6, wherein the method comprises the following steps: step 1: a decoding unit decoding the media file; step 2: a separation unit for separating video data and effect data from the media file data packet generated by decoding; and step 3: the sound effect coding unit is used for coding the sound effect data according to a specific format to generate a sound effect file; and 4, step 4: and the sound effect synthesis unit is used for synthesizing the generated sound effect file to generate a synthesized sound effect.
8. The method of claim 7, wherein: the separation unit distinguishes video data and effect data according to identifiers in the media file data packet; the separation unit further comprises a buffer for buffering the sound effect data.
9. The method of claim 8, wherein the sound effect synthesizing unit synthesizes the generated sound effect file, and the method of generating the synthesized sound effect performs the following steps: acquiring an initial sound effect; identifying a rhythm point in the initial sound effect, and marking a sound effect area in the initial sound effect according to the rhythm point; obtain with the audio that the audio zone corresponds will audio in the audio is synthesized to audio zone in the initial audio obtains the synthesis audio.
10. A method as recited in claim 9, wherein the method of identifying a tempo point in the initial sound effect comprises: identifying the beat attribute of the initial sound effect to obtain a beat point of the initial sound effect; analyzing the frequency spectrum of the initial sound effect to obtain characteristic points in the frequency spectrum of the initial sound effect; and matching the beat point of the initial sound effect with the characteristic point in the initial sound effect frequency spectrum to obtain the beat point of the initial sound effect.
CN202010916128.8A 2020-09-03 2020-09-03 Video sound effect synthesis system and method Pending CN114143598A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010916128.8A CN114143598A (en) 2020-09-03 2020-09-03 Video sound effect synthesis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010916128.8A CN114143598A (en) 2020-09-03 2020-09-03 Video sound effect synthesis system and method

Publications (1)

Publication Number Publication Date
CN114143598A true CN114143598A (en) 2022-03-04

Family

ID=80438233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010916128.8A Pending CN114143598A (en) 2020-09-03 2020-09-03 Video sound effect synthesis system and method

Country Status (1)

Country Link
CN (1) CN114143598A (en)

Similar Documents

Publication Publication Date Title
JP5174027B2 (en) Mix signal processing apparatus and mix signal processing method
WO2020007185A1 (en) Image processing method and apparatus, storage medium and computer device
TWI645723B (en) Methods and devices for decompressing compressed audio data and non-transitory computer-readable storage medium thereof
CN105981411B (en) The matrix mixing based on multi-component system for the multichannel audio that high sound channel counts
TWI555011B (en) Method for processing an audio signal, signal processing unit, binaural renderer, audio encoder and audio decoder
EP1354276B1 (en) Method and apparatus for creating a unique audio signature
JP4944902B2 (en) Binaural audio signal decoding control
EP3745397B1 (en) Decoding device and decoding method, and program
JP2009543389A (en) Dynamic decoding of binaural acoustic signals
US7424333B2 (en) Audio fidelity meter
US7418393B2 (en) Data reproduction device, method thereof and storage medium
Sturmel et al. Linear mixing models for active listening of music productions in realistic studio conditions
US20070297624A1 (en) Digital audio encoding
CN106098081B (en) Sound quality identification method and device for sound file
WO2022014326A1 (en) Signal processing device, method, and program
WO2021190039A1 (en) Processing method and apparatus capable of disassembling and re-editing audio signal
CN114143598A (en) Video sound effect synthesis system and method
CN112885318A (en) Multimedia data generation method and device, electronic equipment and computer storage medium
WO2019216001A1 (en) Receiving device, transmission device, receiving method, transmission method, and program
WO2002058053A1 (en) Encoding method and decoding method for digital voice data
KR100694395B1 (en) MIDI synthesis method of wave table base
US6463405B1 (en) Audiophile encoding of digital audio data using 2-bit polarity/magnitude indicator and 8-bit scale factor for each subband
CN1835572A (en) Method of realizing flash cartoon playing on DVD recorder player
JP4618634B2 (en) Compressed audio data processing method
KR101236496B1 (en) E-mail Transmission Terminal and E-mail System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20220304

WD01 Invention patent application deemed withdrawn after publication