CN116233535A

CN116233535A - Video background audio adding method based on data stream

Info

Publication number: CN116233535A
Application number: CN202310013043.2A
Authority: CN
Inventors: 李鲲; 李永海
Original assignee: Taide Wangju Beijing Technology Co ltd
Current assignee: Taide Wangju Beijing Technology Co ltd
Priority date: 2023-01-05
Filing date: 2023-01-05
Publication date: 2023-06-06
Anticipated expiration: 2043-01-05
Also published as: CN116233535B

Abstract

The invention relates to the field of multimedia data processing, in particular to a video adding background audio method based on data streams.

Description

Video background audio adding method based on data stream

Technical Field

The invention relates to the field of multimedia data processing, in particular to a method for adding background audio to video based on data stream.

Background

With the development of multimedia technology, related operation of video production is increasingly simplified, self-made video or self-made movies can be realized through software, the traditional video production mode is usually to shoot video firstly, and then to a video file, some dubbing or background sound effects are manually added;

chinese patent publication No.: CN112822563a discloses a method, apparatus, electronic device and computer readable medium for generating video, which includes, in the process of displaying an original video, obtaining audio material through background music of the original video, and obtaining image material; determining a musical point of the audio material, wherein the musical point is used for dividing the audio material into a plurality of audio fragments; generating a video clip for each music clip in the audio material by using the image material to obtain a plurality of video clips, wherein the corresponding music clips and video clips have the same duration; and splicing the video clips together according to the occurrence time of the music clips corresponding to the video clips in the audio material, and adding the audio material as a video audio track to obtain a synthesized video. The embodiment saves the time for users to process the materials and provides convenience for users to synthesize videos.

However, the prior art has the following problems,

in practical situations, when dubbing video, background music is often required to be switched or audio is required to be added to changes of a sound source object, such as explosion, collision and the like, the dubbing process needs to be completed manually, key frames are found manually, the problem is solved that in the prior art, relevant time nodes are extracted through analysis of each frame of video and through the changes of the video frames, the time nodes are provided for users to serve as references for adding the audio, and in the prior art, the volume of the added audio is not automatically corrected according to the position condition of the sound source object in an image when the video is dubbed.

Disclosure of Invention

In order to solve the problem that in the prior art, analysis of each frame of video is not considered, relevant time nodes are extracted through the change condition of the video frame, so that the problem that the time nodes are provided for users to serve as references to add audio and the problem that the volume of the added audio is not automatically corrected according to the position condition of a sound source object in an image when the video is dubbed are solved, the invention provides a video adding background audio method based on data flow, which comprises the following steps:

step S1, acquiring a video image set of a video to be processed, and identifying object outlines in all video images in the video image set to obtain an object outline set, wherein the video image set consists of a plurality of video images obtained by frame extraction of the video to be processed;

step S2, screening out video images with selected object outlines from the video image set, splicing the screened video images to obtain a first video segment, calibrating a plurality of video images based on brightness difference values of each frame of video image and adjacent frames of video images in the first video segment, and selecting the selected object outlines from the object outlines set;

step S3, obtaining time nodes of a plurality of calibrated video images in the video data stream of the video to be processed in the step S2, so as to obtain a time node set, selecting a time node from the time node set as a starting time node, and adding an audio data stream corresponding to the required audio in the video data stream;

step S4, calculating an audio depth representation parameter of each frame of video image based on the image depth of the selected object outline in each frame of video image of the first video segment and the distance between the selected object outline and the midpoint of the video image;

step S5, determining the difference value between the audio depth representation parameter of the video image of the starting time node and the audio depth representation parameter of the video image of the ending time node of each time period, and determining the adjustment mode when adjusting the volume of the audio segment in each time period according to the comparison result, wherein each time period is obtained by dividing the time period formed by the starting time node and the ending time node of the audio data stream.

Further, in the step S2, the video images in the object profile set are selected frame by frame, the selected object profile is compared with the object profile in the selected video images, and whether the selected video images are screened out is determined according to the comparison result,

and if the shape and the color of the selected object outline are the same as those of the object outline in the selected video image, judging that the video image is required to be screened out.

Further, in the step S3, the video images in the first video segment are selected frame by frame, the brightness of the selected video image is compared with the brightness of the video images of the adjacent frames, and whether to calibrate the selected video image is determined according to the comparison result,

determining the average brightness value L1 of the selected video image and the average brightness value L2 of the next frame of video image adjacent to the video image, calculating a brightness difference value delta L, setting delta L=L1-L2, comparing the calculated brightness difference value delta L with a preset brightness difference value comparison parameter delta L0,

if DeltaL is more than or equal to DeltaL 0, judging that the video image needs to be marked, and determining the time node of the marked video image in the data stream of the video to be processed.

Further, in the step S4, the image depth h of the selected object contour in each frame of video image and the distance D between the selected object contour and the midpoint of the video image are determined, and the sound source depth characterizing parameter E is calculated according to the formula (1),

(1)

in the formula (1), D0 represents a preset distance comparison parameter, and h0 represents a preset depth comparison parameter.

Further, the step S5 further includes presetting a plurality of continuous data intervals, and establishing an association relationship between each data interval and a volume parameter, where the volume parameter associated with each data interval is different, and the volume parameter associated with each data interval increases with an increase in an interval midpoint value of the data interval.

Further, in the step S5, when adjusting the volume of the audio segment in the first time period, determining the audio depth characterization parameter of the video image corresponding to the starting time node of the time period, comparing the audio depth characterization parameter with a plurality of preset data intervals, determining the initial volume according to the comparison result, wherein,

and if the sound source depth characterization parameter belongs to any data interval, calling a sound volume parameter associated with the data interval, taking the sound volume parameter as an initial sound volume, and adjusting the sound volume of the sound frequency band by taking the initial sound volume as a reference.

Further, comparing the sound source depth representation parameter of the video image of the starting time node and the sound source depth representation parameter of the video image of the ending time node of each time period, and determining an adjustment mode when adjusting the volume of the audio segment in the corresponding time period according to the comparison result, wherein

The first adjustment mode is to increase the volume of the audio segment in the corresponding time period at a preset adjustment rate V0;

the second adjustment mode is to reduce the volume of the audio segment in the corresponding time period at a preset adjustment rate V0;

the first adjustment mode needs to meet that the sound source depth representation parameter of the video image of the starting time node of the time period is smaller than that of the video image of the ending time node of the time period;

the second adjustment mode needs to meet the requirement that the sound source depth representation parameter of the video image of the starting time node of the time period is larger than or equal to the sound source depth representation parameter of the video image of the ending time node of the time period.

Further, the step S5 further comprises correcting the adjustment rate when the volume of the audio segment is adjusted in each time period, wherein an object profile is selected as a reference object of the selected object profile, the moving speed V of the selected object profile relative to the reference object in the time period is calculated according to the formula (2),

(2)

in the formula (2), D (i) represents the distance between the outline of the selected object in the i-th frame of video image in the time period and the reference object, D (i+1) represents the distance between the outline of the selected object in the i+1-th frame of video image in the time period and the reference object, and N is an integer greater than 1.

Further, in the step S5, a speed difference Δv between the moving speed V of the selected object profile relative to the reference object and a preset standard moving speed comparison parameter V1 in a time period is calculated, Δv=v-V1 is set, the speed difference Δv is compared with a preset moving speed comparison parameter V2, a correction mode for correcting the adjustment speed is determined according to the comparison result,

the first correction mode is to correct the volume adjustment rate to a first correction value according to a first volume adjustment parameter v1;

the second correction mode is to correct the volume adjustment rate to a second correction value according to a second volume adjustment parameter v2;

the third correction mode is to correct the volume adjustment rate to a third correction value according to the first volume adjustment parameter v1;

the fourth correction mode is to correct the volume adjustment rate to a fourth correction value according to the second volume adjustment parameter v2;

the first correction mode is required to meet the requirements of delta V < 0 and delta V is less than or equal to V2, the second correction mode is required to meet the requirements of delta V < 0 and delta V is more than V2, the third correction mode is required to meet the requirements of delta V is more than or equal to 0 and delta V is less than or equal to V2, and the fourth correction mode is required to meet the requirements of delta V is more than or equal to 0 and delta V is more than V2.

Further, in the step S5, an upper limit of the volume adjustment rate is further set, and when the volume adjustment rate is corrected, the corrected volume adjustment rate does not exceed the upper limit of the volume adjustment rate.

Compared with the prior art, the method has the advantages that the object contour in the video to be processed is selected, the object contour is selected as an audio object, the video segment of the object contour in the video to be processed is determined, the brightness of each frame of video image in the video segment is compared with the brightness of the adjacent frame of video image, the time node corresponding to the video image is calibrated, the audio segment adding position is selected by taking each time node as a reference, the audio depth representing parameter corresponding to the object contour in each frame of image segment is calculated, the volume of the added audio segment is determined by the audio depth representing parameter, and the volume of the audio segment of each time node is adjusted.

In particular, the invention identifies the object outline in the video to be processed through frame extraction processing, selects the object outline in the video to be processed as a sound source object in step S2, establishes the association relation between the object outline and the audio segment, and calls the video segment of the sound source object in the video to be processed, and preliminarily identifies the video segment where the sound source object is located, so that each frame of video image of the video segment is analyzed and time nodes are calibrated later, a reference is provided for the audio adding time nodes, the user operation is simplified, and the audio adding efficiency and accuracy are improved.

In particular, in step S3 of the present invention, a time node is selected from a set of time nodes as a start time node, and an audio data stream corresponding to the audio segment is added in the video data stream, in a practical situation, the time node added during a later audio effect of video addition needs to be manually analyzed frame by frame, and the start time when the audio effect needs to be added is often changed along with a frame, such as explosion, collision, scene switching, etc., so that the time node of the video image corresponding to the above situation in the video data stream can be determined by a data processing system of a computer according to the method of the present invention to provide a reference for a user to add the audio segment, thereby simplifying the operation of the user and improving the efficiency and accuracy of audio addition.

In particular, in the step S4 and the step S5 of the present invention, the image depth of the object contour in the image and the distance between the object contour and the midpoint of the video image are considered in calculating the sound depth characterization parameter, so that the spatial distance of the object contour in the image, that is, the propagation distance of sound, is characterized, and in actual situations, manual adjustment is often required, the accuracy is not high, and the operation is complex.

Drawings

FIG. 1 is a block diagram of a method for adding background audio to video based on data stream according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a video-add-on background audio system according to an embodiment of the invention;

fig. 3 is a schematic diagram of a data processing module of a video adding background audio system according to an embodiment of the invention.

Detailed Description

In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.

It should be noted that, in the description of the present invention, terms such as "upper," "lower," "left," "right," "inner," "outer," and the like indicate directions or positional relationships based on the directions or positional relationships shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Furthermore, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.

Referring to fig. 1, which is a step diagram of a video adding background audio method based on a data stream according to an embodiment of the present invention, the video adding background audio method based on a data stream of the present invention includes:

Specifically, in the step S2, the video images in the object profile set are selected frame by frame, the selected object profile is compared with the object profile in the selected video images, and whether the selected video images are screened out is determined according to the comparison result,

Specifically, the object outline in the video to be processed is identified through frame extraction processing, the object outline in the video to be processed is selected as a sound source object in step S2, a video segment of the sound source object in the video to be processed is called, and the video segment of the sound source object is primarily identified, so that each frame of video image of the video segment is analyzed, time nodes are calibrated, references are provided for audio adding time nodes, user operation is simplified, and audio adding efficiency and accuracy are improved.

Specifically, in the step S3, the video images in the first video segment are selected frame by frame, the brightness of the selected video image is compared with the brightness of the video images of the adjacent frames, and whether to calibrate the selected video image is determined according to the comparison result,

Specifically, in the step S4, the image depth h of the selected object contour in each frame of video image and the distance D between the selected object contour and the midpoint of the video image are determined, and the sound source depth characterizing parameter E is calculated according to the formula (1),

(1)

Specifically, in step S3 of the present invention, a time node is selected from a set of time nodes as a start time node to add an audio data stream corresponding to the audio segment in the video data stream, in a practical situation, the added time node needs to be manually analyzed frame by frame when adding a later audio effect to the video, and the start time when adding the audio effect needs to be changed along with the change of a picture, such as explosion, collision, scene switching, etc., so that the time node of the video image corresponding to the above situation in the video data stream can be determined by a data processing system of a computer according to the method of the present invention to provide a reference for a user to add the audio segment, simplify the operation of the user, and improve the audio adding efficiency and accuracy.

Specifically, the step S5 further includes presetting a plurality of continuous data intervals, and establishing an association relationship between each data interval and a volume parameter, where the volume parameter associated with each data interval is different, and the volume parameter associated with each data interval increases with an increase in an interval midpoint value of the data interval.

Specifically, in the step S5, when the volume of the audio segment in the first time period is adjusted, the audio depth characterizing parameter of the video image corresponding to the starting time node of the time period is determined, the audio depth characterizing parameter is compared with a plurality of preset data intervals, and the initial volume is determined according to the comparison result,

Specifically, comparing the sound source depth representation parameter of the video image of the starting time node and the sound source depth representation parameter of the video image of the ending time node of each time period, and determining an adjustment mode when adjusting the volume of the audio segment in the corresponding time period according to the comparison result, wherein

Specifically, further, the step S5 further includes correcting the adjustment rate when the volume of the audio segment is adjusted in each time period, wherein an object profile is selected as a reference object of the selected object profile, the moving speed V of the selected object profile relative to the reference object in the time period is calculated according to the formula (2),

(2)

Specifically, in the step S5, a speed difference Δv between the moving speed V of the profile of the selected object relative to the reference object and a preset standard moving speed comparison parameter V1 in a time period is calculated, Δv=v-V1 is set, the speed difference Δv is compared with a preset moving speed comparison parameter V2, a correction mode for correcting the adjustment speed is determined according to the comparison result,

the first correction mode is to correct the volume adjustment rate to a first correction value V1 'according to a first volume adjustment parameter V1, and set V1' =v0-V1;

the second correction mode is to correct the volume adjustment rate to a second correction value V2 'according to a second volume adjustment parameter V2, and set V2' =v0-V2;

the third correction mode is to correct the volume adjustment rate to a third correction value V3 'according to the first volume adjustment parameter V1, and set V3' =v0+v1;

the fourth correction mode is to correct the volume adjustment rate to a fourth correction value V4 'according to the second volume adjustment parameter V2, and set V4' =v0+v2;

Specifically, in step S4 and step S5 of the present invention, the image depth of the object contour in the image and the distance between the object contour and the midpoint of the video image are considered in calculating the sound depth characterizing parameter, so that the spatial distance of the object contour in the image, that is, the propagation distance of sound, is characterized, in actual situations, the volume of the added audio segment is often required to be manually adjusted according to the spatial distance of the sound source object in the image, and the accuracy is not high and the operation is complex.

Specifically, in the step S5, an upper limit of the volume adjustment rate is further set, and when the volume adjustment rate is corrected, the corrected volume adjustment rate does not exceed the upper limit of the volume adjustment rate.

Specifically, in the step S5, the period is divided into a plurality of consecutive time periods.

Referring to fig. 2 and 3 in particular, the present invention also provides a system for adding background audio to video, which includes,

the data receiving module is used for receiving the video data stream and the audio data stream.

The data processing module is connected with the data receiving module and comprises a video image identification unit and a data comparison unit;

the video image recognition unit can extract frames from the video to be processed, can determine the time node of each frame of video image in the video to be processed, recognizes the object contour in each frame of video image, and judges the image depth of the object contour in each frame of video image and the distance between the object contour and a reference object;

a logic algorithm is preset in the data comparison unit so as to operate the data sent by the video image unit according to preset logic and obtain an operation result;

all the units are connected with each other and can complete data exchange;

and the data synthesis module is connected with the data processing module and is used for adding the audio segment to the corresponding time node of the video data stream according to the operation result sent by the data processing module.

Specifically, the specific structure of each module is not limited, and each functional module or program in the computer can be used for completing data exchange and data processing.

Specifically, a depth algorithm may be set in the video image recognition unit to recognize the image depth of the object in the video image, which is not described herein in detail in the prior art.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims

1. A method for adding background audio to a video based on a data stream, comprising:

and S5, comparing the sound source depth representation parameter of the video image of the starting time node and the sound source depth representation parameter of the video image of the ending time node of each time period, determining an adjustment mode for adjusting the volume of the audio segment in each time period according to a comparison result, wherein each time period is obtained by dividing the time period formed by the starting time node and the ending time node of the audio data stream.

2. The method for adding background audio to video based on data stream according to claim 1, wherein in step S2, the video image in the object contour set is selected frame by frame, the selected object contour is compared with the object contour in the selected video image, and whether the selected video image is screened out is determined according to the comparison result, wherein,

3. The method for adding background audio to video based on data stream according to claim 2, wherein in step S3, the video images in the first video segment are selected frame by frame, the brightness of the selected video image is compared with the brightness of the video images of the adjacent frames, and whether to calibrate the selected video image is determined according to the comparison result,

4. The method for adding background audio to video based on data stream according to claim 1, wherein in step S4, the image depth h of the selected object contour in each frame of video image and the distance D between the selected object contour and the midpoint of the video image are determined, and the sound source depth characterizing parameter E is calculated according to formula (1),

(1)

5. The method according to claim 1, wherein the step S5 further comprises presetting a plurality of continuous data intervals, and establishing an association relationship between each data interval and a volume parameter, wherein the volume parameter associated with each data interval is different, and the volume parameter associated with each data interval increases with an increase in an interval midpoint value of the data interval.

6. The method for adding background audio to video based on data stream according to claim 5, wherein in step S5, when adjusting the volume of the audio segment in the first time period, determining the audio depth characterizing parameter of the video image corresponding to the starting time node of the time period, comparing the audio depth characterizing parameter with a plurality of preset data intervals, determining the initial volume according to the comparison result, wherein,

7. The method for adding background audio to video based on data stream according to claim 6, wherein in step S5, the audio depth characterizing parameter of the video image of the start time node and the audio depth characterizing parameter of the video image of the end time node of each time period are compared, and the adjusting mode for adjusting the volume of the audio segment in the corresponding time period is determined according to the comparison result,

8. The method for adding background audio to a video based on a data stream according to claim 7, wherein the step S5 further comprises correcting an adjustment rate in adjusting the volume of the audio segment in each time segment, wherein an object profile is selected as a reference of the selected object profile, a moving speed V of the selected object profile with respect to the reference in time segment is calculated according to formula (2),

(2)

9. The method for adding background audio to video based on data stream according to claim 8, wherein in step S5, a speed difference Δv between the moving speed V of the selected object profile relative to the reference object and a preset standard moving speed comparison parameter V1 in a time period is calculated, Δv=v-V1 is set, the speed difference Δv is compared with a preset moving speed comparison parameter V2, a correction mode for correcting the adjustment speed is determined according to the comparison result, wherein,

10. The method according to claim 9, wherein in step S5, a volume adjustment rate upper limit is further set, and when the volume adjustment rate is corrected, the corrected volume adjustment rate does not exceed the volume adjustment rate upper limit.