CN116233535A - Video background audio adding method based on data stream - Google Patents

Video background audio adding method based on data stream Download PDF

Info

Publication number
CN116233535A
CN116233535A CN202310013043.2A CN202310013043A CN116233535A CN 116233535 A CN116233535 A CN 116233535A CN 202310013043 A CN202310013043 A CN 202310013043A CN 116233535 A CN116233535 A CN 116233535A
Authority
CN
China
Prior art keywords
video
video image
parameter
audio
volume
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310013043.2A
Other languages
Chinese (zh)
Other versions
CN116233535B (en
Inventor
李鲲
李永海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taide Wangju Beijing Technology Co ltd
Original Assignee
Taide Wangju Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taide Wangju Beijing Technology Co ltd filed Critical Taide Wangju Beijing Technology Co ltd
Priority to CN202310013043.2A priority Critical patent/CN116233535B/en
Publication of CN116233535A publication Critical patent/CN116233535A/en
Application granted granted Critical
Publication of CN116233535B publication Critical patent/CN116233535B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Picture Signal Circuits (AREA)
  • Studio Circuits (AREA)

Abstract

The invention relates to the field of multimedia data processing, in particular to a video adding background audio method based on data streams.

Description

Video background audio adding method based on data stream
Technical Field
The invention relates to the field of multimedia data processing, in particular to a method for adding background audio to video based on data stream.
Background
With the development of multimedia technology, related operation of video production is increasingly simplified, self-made video or self-made movies can be realized through software, the traditional video production mode is usually to shoot video firstly, and then to a video file, some dubbing or background sound effects are manually added;
chinese patent publication No.: CN112822563a discloses a method, apparatus, electronic device and computer readable medium for generating video, which includes, in the process of displaying an original video, obtaining audio material through background music of the original video, and obtaining image material; determining a musical point of the audio material, wherein the musical point is used for dividing the audio material into a plurality of audio fragments; generating a video clip for each music clip in the audio material by using the image material to obtain a plurality of video clips, wherein the corresponding music clips and video clips have the same duration; and splicing the video clips together according to the occurrence time of the music clips corresponding to the video clips in the audio material, and adding the audio material as a video audio track to obtain a synthesized video. The embodiment saves the time for users to process the materials and provides convenience for users to synthesize videos.
However, the prior art has the following problems,
in practical situations, when dubbing video, background music is often required to be switched or audio is required to be added to changes of a sound source object, such as explosion, collision and the like, the dubbing process needs to be completed manually, key frames are found manually, the problem is solved that in the prior art, relevant time nodes are extracted through analysis of each frame of video and through the changes of the video frames, the time nodes are provided for users to serve as references for adding the audio, and in the prior art, the volume of the added audio is not automatically corrected according to the position condition of the sound source object in an image when the video is dubbed.
Disclosure of Invention
In order to solve the problem that in the prior art, analysis of each frame of video is not considered, relevant time nodes are extracted through the change condition of the video frame, so that the problem that the time nodes are provided for users to serve as references to add audio and the problem that the volume of the added audio is not automatically corrected according to the position condition of a sound source object in an image when the video is dubbed are solved, the invention provides a video adding background audio method based on data flow, which comprises the following steps:
step S1, acquiring a video image set of a video to be processed, and identifying object outlines in all video images in the video image set to obtain an object outline set, wherein the video image set consists of a plurality of video images obtained by frame extraction of the video to be processed;
step S2, screening out video images with selected object outlines from the video image set, splicing the screened video images to obtain a first video segment, calibrating a plurality of video images based on brightness difference values of each frame of video image and adjacent frames of video images in the first video segment, and selecting the selected object outlines from the object outlines set;
step S3, obtaining time nodes of a plurality of calibrated video images in the video data stream of the video to be processed in the step S2, so as to obtain a time node set, selecting a time node from the time node set as a starting time node, and adding an audio data stream corresponding to the required audio in the video data stream;
step S4, calculating an audio depth representation parameter of each frame of video image based on the image depth of the selected object outline in each frame of video image of the first video segment and the distance between the selected object outline and the midpoint of the video image;
step S5, determining the difference value between the audio depth representation parameter of the video image of the starting time node and the audio depth representation parameter of the video image of the ending time node of each time period, and determining the adjustment mode when adjusting the volume of the audio segment in each time period according to the comparison result, wherein each time period is obtained by dividing the time period formed by the starting time node and the ending time node of the audio data stream.
Further, in the step S2, the video images in the object profile set are selected frame by frame, the selected object profile is compared with the object profile in the selected video images, and whether the selected video images are screened out is determined according to the comparison result,
and if the shape and the color of the selected object outline are the same as those of the object outline in the selected video image, judging that the video image is required to be screened out.
Further, in the step S3, the video images in the first video segment are selected frame by frame, the brightness of the selected video image is compared with the brightness of the video images of the adjacent frames, and whether to calibrate the selected video image is determined according to the comparison result,
determining the average brightness value L1 of the selected video image and the average brightness value L2 of the next frame of video image adjacent to the video image, calculating a brightness difference value delta L, setting delta L=L1-L2, comparing the calculated brightness difference value delta L with a preset brightness difference value comparison parameter delta L0,
if DeltaL is more than or equal to DeltaL 0, judging that the video image needs to be marked, and determining the time node of the marked video image in the data stream of the video to be processed.
Further, in the step S4, the image depth h of the selected object contour in each frame of video image and the distance D between the selected object contour and the midpoint of the video image are determined, and the sound source depth characterizing parameter E is calculated according to the formula (1),
Figure 720298DEST_PATH_IMAGE001
(1)
in the formula (1), D0 represents a preset distance comparison parameter, and h0 represents a preset depth comparison parameter.
Further, the step S5 further includes presetting a plurality of continuous data intervals, and establishing an association relationship between each data interval and a volume parameter, where the volume parameter associated with each data interval is different, and the volume parameter associated with each data interval increases with an increase in an interval midpoint value of the data interval.
Further, in the step S5, when adjusting the volume of the audio segment in the first time period, determining the audio depth characterization parameter of the video image corresponding to the starting time node of the time period, comparing the audio depth characterization parameter with a plurality of preset data intervals, determining the initial volume according to the comparison result, wherein,
and if the sound source depth characterization parameter belongs to any data interval, calling a sound volume parameter associated with the data interval, taking the sound volume parameter as an initial sound volume, and adjusting the sound volume of the sound frequency band by taking the initial sound volume as a reference.
Further, comparing the sound source depth representation parameter of the video image of the starting time node and the sound source depth representation parameter of the video image of the ending time node of each time period, and determining an adjustment mode when adjusting the volume of the audio segment in the corresponding time period according to the comparison result, wherein
The first adjustment mode is to increase the volume of the audio segment in the corresponding time period at a preset adjustment rate V0;
the second adjustment mode is to reduce the volume of the audio segment in the corresponding time period at a preset adjustment rate V0;
the first adjustment mode needs to meet that the sound source depth representation parameter of the video image of the starting time node of the time period is smaller than that of the video image of the ending time node of the time period;
the second adjustment mode needs to meet the requirement that the sound source depth representation parameter of the video image of the starting time node of the time period is larger than or equal to the sound source depth representation parameter of the video image of the ending time node of the time period.
Further, the step S5 further comprises correcting the adjustment rate when the volume of the audio segment is adjusted in each time period, wherein an object profile is selected as a reference object of the selected object profile, the moving speed V of the selected object profile relative to the reference object in the time period is calculated according to the formula (2),
Figure 106542DEST_PATH_IMAGE002
(2)
in the formula (2), D (i) represents the distance between the outline of the selected object in the i-th frame of video image in the time period and the reference object, D (i+1) represents the distance between the outline of the selected object in the i+1-th frame of video image in the time period and the reference object, and N is an integer greater than 1.
Further, in the step S5, a speed difference Δv between the moving speed V of the selected object profile relative to the reference object and a preset standard moving speed comparison parameter V1 in a time period is calculated, Δv=v-V1 is set, the speed difference Δv is compared with a preset moving speed comparison parameter V2, a correction mode for correcting the adjustment speed is determined according to the comparison result,
the first correction mode is to correct the volume adjustment rate to a first correction value according to a first volume adjustment parameter v1;
the second correction mode is to correct the volume adjustment rate to a second correction value according to a second volume adjustment parameter v2;
the third correction mode is to correct the volume adjustment rate to a third correction value according to the first volume adjustment parameter v1;
the fourth correction mode is to correct the volume adjustment rate to a fourth correction value according to the second volume adjustment parameter v2;
the first correction mode is required to meet the requirements of delta V < 0 and delta V is less than or equal to V2, the second correction mode is required to meet the requirements of delta V < 0 and delta V is more than V2, the third correction mode is required to meet the requirements of delta V is more than or equal to 0 and delta V is less than or equal to V2, and the fourth correction mode is required to meet the requirements of delta V is more than or equal to 0 and delta V is more than V2.
Further, in the step S5, an upper limit of the volume adjustment rate is further set, and when the volume adjustment rate is corrected, the corrected volume adjustment rate does not exceed the upper limit of the volume adjustment rate.
Compared with the prior art, the method has the advantages that the object contour in the video to be processed is selected, the object contour is selected as an audio object, the video segment of the object contour in the video to be processed is determined, the brightness of each frame of video image in the video segment is compared with the brightness of the adjacent frame of video image, the time node corresponding to the video image is calibrated, the audio segment adding position is selected by taking each time node as a reference, the audio depth representing parameter corresponding to the object contour in each frame of image segment is calculated, the volume of the added audio segment is determined by the audio depth representing parameter, and the volume of the audio segment of each time node is adjusted.
In particular, the invention identifies the object outline in the video to be processed through frame extraction processing, selects the object outline in the video to be processed as a sound source object in step S2, establishes the association relation between the object outline and the audio segment, and calls the video segment of the sound source object in the video to be processed, and preliminarily identifies the video segment where the sound source object is located, so that each frame of video image of the video segment is analyzed and time nodes are calibrated later, a reference is provided for the audio adding time nodes, the user operation is simplified, and the audio adding efficiency and accuracy are improved.
In particular, in step S3 of the present invention, a time node is selected from a set of time nodes as a start time node, and an audio data stream corresponding to the audio segment is added in the video data stream, in a practical situation, the time node added during a later audio effect of video addition needs to be manually analyzed frame by frame, and the start time when the audio effect needs to be added is often changed along with a frame, such as explosion, collision, scene switching, etc., so that the time node of the video image corresponding to the above situation in the video data stream can be determined by a data processing system of a computer according to the method of the present invention to provide a reference for a user to add the audio segment, thereby simplifying the operation of the user and improving the efficiency and accuracy of audio addition.
In particular, in the step S4 and the step S5 of the present invention, the image depth of the object contour in the image and the distance between the object contour and the midpoint of the video image are considered in calculating the sound depth characterization parameter, so that the spatial distance of the object contour in the image, that is, the propagation distance of sound, is characterized, and in actual situations, manual adjustment is often required, the accuracy is not high, and the operation is complex.
Drawings
FIG. 1 is a block diagram of a method for adding background audio to video based on data stream according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a video-add-on background audio system according to an embodiment of the invention;
fig. 3 is a schematic diagram of a data processing module of a video adding background audio system according to an embodiment of the invention.
Detailed Description
In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that, in the description of the present invention, terms such as "upper," "lower," "left," "right," "inner," "outer," and the like indicate directions or positional relationships based on the directions or positional relationships shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
Furthermore, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.
Referring to fig. 1, which is a step diagram of a video adding background audio method based on a data stream according to an embodiment of the present invention, the video adding background audio method based on a data stream of the present invention includes:
step S1, acquiring a video image set of a video to be processed, and identifying object outlines in all video images in the video image set to obtain an object outline set, wherein the video image set consists of a plurality of video images obtained by frame extraction of the video to be processed;
step S2, screening out video images with selected object outlines from the video image set, splicing the screened video images to obtain a first video segment, calibrating a plurality of video images based on brightness difference values of each frame of video image and adjacent frames of video images in the first video segment, and selecting the selected object outlines from the object outlines set;
step S3, obtaining time nodes of a plurality of calibrated video images in the video data stream of the video to be processed in the step S2, so as to obtain a time node set, selecting a time node from the time node set as a starting time node, and adding an audio data stream corresponding to the required audio in the video data stream;
step S4, calculating an audio depth representation parameter of each frame of video image based on the image depth of the selected object outline in each frame of video image of the first video segment and the distance between the selected object outline and the midpoint of the video image;
step S5, determining the difference value between the audio depth representation parameter of the video image of the starting time node and the audio depth representation parameter of the video image of the ending time node of each time period, and determining the adjustment mode when adjusting the volume of the audio segment in each time period according to the comparison result, wherein each time period is obtained by dividing the time period formed by the starting time node and the ending time node of the audio data stream.
Specifically, in the step S2, the video images in the object profile set are selected frame by frame, the selected object profile is compared with the object profile in the selected video images, and whether the selected video images are screened out is determined according to the comparison result,
and if the shape and the color of the selected object outline are the same as those of the object outline in the selected video image, judging that the video image is required to be screened out.
Specifically, the object outline in the video to be processed is identified through frame extraction processing, the object outline in the video to be processed is selected as a sound source object in step S2, a video segment of the sound source object in the video to be processed is called, and the video segment of the sound source object is primarily identified, so that each frame of video image of the video segment is analyzed, time nodes are calibrated, references are provided for audio adding time nodes, user operation is simplified, and audio adding efficiency and accuracy are improved.
Specifically, in the step S3, the video images in the first video segment are selected frame by frame, the brightness of the selected video image is compared with the brightness of the video images of the adjacent frames, and whether to calibrate the selected video image is determined according to the comparison result,
determining the average brightness value L1 of the selected video image and the average brightness value L2 of the next frame of video image adjacent to the video image, calculating a brightness difference value delta L, setting delta L=L1-L2, comparing the calculated brightness difference value delta L with a preset brightness difference value comparison parameter delta L0,
if DeltaL is more than or equal to DeltaL 0, judging that the video image needs to be marked, and determining the time node of the marked video image in the data stream of the video to be processed.
Specifically, in the step S4, the image depth h of the selected object contour in each frame of video image and the distance D between the selected object contour and the midpoint of the video image are determined, and the sound source depth characterizing parameter E is calculated according to the formula (1),
Figure 435893DEST_PATH_IMAGE001
(1)
in the formula (1), D0 represents a preset distance comparison parameter, and h0 represents a preset depth comparison parameter.
Specifically, in step S3 of the present invention, a time node is selected from a set of time nodes as a start time node to add an audio data stream corresponding to the audio segment in the video data stream, in a practical situation, the added time node needs to be manually analyzed frame by frame when adding a later audio effect to the video, and the start time when adding the audio effect needs to be changed along with the change of a picture, such as explosion, collision, scene switching, etc., so that the time node of the video image corresponding to the above situation in the video data stream can be determined by a data processing system of a computer according to the method of the present invention to provide a reference for a user to add the audio segment, simplify the operation of the user, and improve the audio adding efficiency and accuracy.
Specifically, the step S5 further includes presetting a plurality of continuous data intervals, and establishing an association relationship between each data interval and a volume parameter, where the volume parameter associated with each data interval is different, and the volume parameter associated with each data interval increases with an increase in an interval midpoint value of the data interval.
Specifically, in the step S5, when the volume of the audio segment in the first time period is adjusted, the audio depth characterizing parameter of the video image corresponding to the starting time node of the time period is determined, the audio depth characterizing parameter is compared with a plurality of preset data intervals, and the initial volume is determined according to the comparison result,
and if the sound source depth characterization parameter belongs to any data interval, calling a sound volume parameter associated with the data interval, taking the sound volume parameter as an initial sound volume, and adjusting the sound volume of the sound frequency band by taking the initial sound volume as a reference.
Specifically, comparing the sound source depth representation parameter of the video image of the starting time node and the sound source depth representation parameter of the video image of the ending time node of each time period, and determining an adjustment mode when adjusting the volume of the audio segment in the corresponding time period according to the comparison result, wherein
The first adjustment mode is to increase the volume of the audio segment in the corresponding time period at a preset adjustment rate V0;
the second adjustment mode is to reduce the volume of the audio segment in the corresponding time period at a preset adjustment rate V0;
the first adjustment mode needs to meet that the sound source depth representation parameter of the video image of the starting time node of the time period is smaller than that of the video image of the ending time node of the time period;
the second adjustment mode needs to meet the requirement that the sound source depth representation parameter of the video image of the starting time node of the time period is larger than or equal to the sound source depth representation parameter of the video image of the ending time node of the time period.
Specifically, further, the step S5 further includes correcting the adjustment rate when the volume of the audio segment is adjusted in each time period, wherein an object profile is selected as a reference object of the selected object profile, the moving speed V of the selected object profile relative to the reference object in the time period is calculated according to the formula (2),
Figure 962689DEST_PATH_IMAGE002
(2)
in the formula (2), D (i) represents the distance between the outline of the selected object in the i-th frame of video image in the time period and the reference object, D (i+1) represents the distance between the outline of the selected object in the i+1-th frame of video image in the time period and the reference object, and N is an integer greater than 1.
Specifically, in the step S5, a speed difference Δv between the moving speed V of the profile of the selected object relative to the reference object and a preset standard moving speed comparison parameter V1 in a time period is calculated, Δv=v-V1 is set, the speed difference Δv is compared with a preset moving speed comparison parameter V2, a correction mode for correcting the adjustment speed is determined according to the comparison result,
the first correction mode is to correct the volume adjustment rate to a first correction value V1 'according to a first volume adjustment parameter V1, and set V1' =v0-V1;
the second correction mode is to correct the volume adjustment rate to a second correction value V2 'according to a second volume adjustment parameter V2, and set V2' =v0-V2;
the third correction mode is to correct the volume adjustment rate to a third correction value V3 'according to the first volume adjustment parameter V1, and set V3' =v0+v1;
the fourth correction mode is to correct the volume adjustment rate to a fourth correction value V4 'according to the second volume adjustment parameter V2, and set V4' =v0+v2;
the first correction mode is required to meet the requirements of delta V < 0 and delta V is less than or equal to V2, the second correction mode is required to meet the requirements of delta V < 0 and delta V is more than V2, the third correction mode is required to meet the requirements of delta V is more than or equal to 0 and delta V is less than or equal to V2, and the fourth correction mode is required to meet the requirements of delta V is more than or equal to 0 and delta V is more than V2.
Specifically, in step S4 and step S5 of the present invention, the image depth of the object contour in the image and the distance between the object contour and the midpoint of the video image are considered in calculating the sound depth characterizing parameter, so that the spatial distance of the object contour in the image, that is, the propagation distance of sound, is characterized, in actual situations, the volume of the added audio segment is often required to be manually adjusted according to the spatial distance of the sound source object in the image, and the accuracy is not high and the operation is complex.
Specifically, in the step S5, an upper limit of the volume adjustment rate is further set, and when the volume adjustment rate is corrected, the corrected volume adjustment rate does not exceed the upper limit of the volume adjustment rate.
Specifically, in the step S5, the period is divided into a plurality of consecutive time periods.
Referring to fig. 2 and 3 in particular, the present invention also provides a system for adding background audio to video, which includes,
the data receiving module is used for receiving the video data stream and the audio data stream.
The data processing module is connected with the data receiving module and comprises a video image identification unit and a data comparison unit;
the video image recognition unit can extract frames from the video to be processed, can determine the time node of each frame of video image in the video to be processed, recognizes the object contour in each frame of video image, and judges the image depth of the object contour in each frame of video image and the distance between the object contour and a reference object;
a logic algorithm is preset in the data comparison unit so as to operate the data sent by the video image unit according to preset logic and obtain an operation result;
all the units are connected with each other and can complete data exchange;
and the data synthesis module is connected with the data processing module and is used for adding the audio segment to the corresponding time node of the video data stream according to the operation result sent by the data processing module.
Specifically, the specific structure of each module is not limited, and each functional module or program in the computer can be used for completing data exchange and data processing.
Specifically, a depth algorithm may be set in the video image recognition unit to recognize the image depth of the object in the video image, which is not described herein in detail in the prior art.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims (10)

1. A method for adding background audio to a video based on a data stream, comprising:
step S1, acquiring a video image set of a video to be processed, and identifying object outlines in all video images in the video image set to obtain an object outline set, wherein the video image set consists of a plurality of video images obtained by frame extraction of the video to be processed;
step S2, screening out video images with selected object outlines from the video image set, splicing the screened video images to obtain a first video segment, calibrating a plurality of video images based on brightness difference values of each frame of video image and adjacent frames of video images in the first video segment, and selecting the selected object outlines from the object outlines set;
step S3, obtaining time nodes of a plurality of calibrated video images in the video data stream of the video to be processed in the step S2, so as to obtain a time node set, selecting a time node from the time node set as a starting time node, and adding an audio data stream corresponding to the required audio in the video data stream;
step S4, calculating an audio depth representation parameter of each frame of video image based on the image depth of the selected object outline in each frame of video image of the first video segment and the distance between the selected object outline and the midpoint of the video image;
and S5, comparing the sound source depth representation parameter of the video image of the starting time node and the sound source depth representation parameter of the video image of the ending time node of each time period, determining an adjustment mode for adjusting the volume of the audio segment in each time period according to a comparison result, wherein each time period is obtained by dividing the time period formed by the starting time node and the ending time node of the audio data stream.
2. The method for adding background audio to video based on data stream according to claim 1, wherein in step S2, the video image in the object contour set is selected frame by frame, the selected object contour is compared with the object contour in the selected video image, and whether the selected video image is screened out is determined according to the comparison result, wherein,
and if the shape and the color of the selected object outline are the same as those of the object outline in the selected video image, judging that the video image is required to be screened out.
3. The method for adding background audio to video based on data stream according to claim 2, wherein in step S3, the video images in the first video segment are selected frame by frame, the brightness of the selected video image is compared with the brightness of the video images of the adjacent frames, and whether to calibrate the selected video image is determined according to the comparison result,
determining the average brightness value L1 of the selected video image and the average brightness value L2 of the next frame of video image adjacent to the video image, calculating a brightness difference value delta L, setting delta L=L1-L2, comparing the calculated brightness difference value delta L with a preset brightness difference value comparison parameter delta L0,
if DeltaL is more than or equal to DeltaL 0, judging that the video image needs to be marked, and determining the time node of the marked video image in the data stream of the video to be processed.
4. The method for adding background audio to video based on data stream according to claim 1, wherein in step S4, the image depth h of the selected object contour in each frame of video image and the distance D between the selected object contour and the midpoint of the video image are determined, and the sound source depth characterizing parameter E is calculated according to formula (1),
Figure 373174DEST_PATH_IMAGE001
(1)
in the formula (1), D0 represents a preset distance comparison parameter, and h0 represents a preset depth comparison parameter.
5. The method according to claim 1, wherein the step S5 further comprises presetting a plurality of continuous data intervals, and establishing an association relationship between each data interval and a volume parameter, wherein the volume parameter associated with each data interval is different, and the volume parameter associated with each data interval increases with an increase in an interval midpoint value of the data interval.
6. The method for adding background audio to video based on data stream according to claim 5, wherein in step S5, when adjusting the volume of the audio segment in the first time period, determining the audio depth characterizing parameter of the video image corresponding to the starting time node of the time period, comparing the audio depth characterizing parameter with a plurality of preset data intervals, determining the initial volume according to the comparison result, wherein,
and if the sound source depth characterization parameter belongs to any data interval, calling a sound volume parameter associated with the data interval, taking the sound volume parameter as an initial sound volume, and adjusting the sound volume of the sound frequency band by taking the initial sound volume as a reference.
7. The method for adding background audio to video based on data stream according to claim 6, wherein in step S5, the audio depth characterizing parameter of the video image of the start time node and the audio depth characterizing parameter of the video image of the end time node of each time period are compared, and the adjusting mode for adjusting the volume of the audio segment in the corresponding time period is determined according to the comparison result,
the first adjustment mode is to increase the volume of the audio segment in the corresponding time period at a preset adjustment rate V0;
the second adjustment mode is to reduce the volume of the audio segment in the corresponding time period at a preset adjustment rate V0;
the first adjustment mode needs to meet that the sound source depth representation parameter of the video image of the starting time node of the time period is smaller than that of the video image of the ending time node of the time period;
the second adjustment mode needs to meet the requirement that the sound source depth representation parameter of the video image of the starting time node of the time period is larger than or equal to the sound source depth representation parameter of the video image of the ending time node of the time period.
8. The method for adding background audio to a video based on a data stream according to claim 7, wherein the step S5 further comprises correcting an adjustment rate in adjusting the volume of the audio segment in each time segment, wherein an object profile is selected as a reference of the selected object profile, a moving speed V of the selected object profile with respect to the reference in time segment is calculated according to formula (2),
Figure 332165DEST_PATH_IMAGE002
(2)
in the formula (2), D (i) represents the distance between the outline of the selected object in the i-th frame of video image in the time period and the reference object, D (i+1) represents the distance between the outline of the selected object in the i+1-th frame of video image in the time period and the reference object, and N is an integer greater than 1.
9. The method for adding background audio to video based on data stream according to claim 8, wherein in step S5, a speed difference Δv between the moving speed V of the selected object profile relative to the reference object and a preset standard moving speed comparison parameter V1 in a time period is calculated, Δv=v-V1 is set, the speed difference Δv is compared with a preset moving speed comparison parameter V2, a correction mode for correcting the adjustment speed is determined according to the comparison result, wherein,
the first correction mode is to correct the volume adjustment rate to a first correction value according to a first volume adjustment parameter v1;
the second correction mode is to correct the volume adjustment rate to a second correction value according to a second volume adjustment parameter v2;
the third correction mode is to correct the volume adjustment rate to a third correction value according to the first volume adjustment parameter v1;
the fourth correction mode is to correct the volume adjustment rate to a fourth correction value according to the second volume adjustment parameter v2;
the first correction mode is required to meet the requirements of delta V < 0 and delta V is less than or equal to V2, the second correction mode is required to meet the requirements of delta V < 0 and delta V is more than V2, the third correction mode is required to meet the requirements of delta V is more than or equal to 0 and delta V is less than or equal to V2, and the fourth correction mode is required to meet the requirements of delta V is more than or equal to 0 and delta V is more than V2.
10. The method according to claim 9, wherein in step S5, a volume adjustment rate upper limit is further set, and when the volume adjustment rate is corrected, the corrected volume adjustment rate does not exceed the volume adjustment rate upper limit.
CN202310013043.2A 2023-01-05 2023-01-05 Video background audio adding method based on data stream Active CN116233535B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310013043.2A CN116233535B (en) 2023-01-05 2023-01-05 Video background audio adding method based on data stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310013043.2A CN116233535B (en) 2023-01-05 2023-01-05 Video background audio adding method based on data stream

Publications (2)

Publication Number Publication Date
CN116233535A true CN116233535A (en) 2023-06-06
CN116233535B CN116233535B (en) 2023-09-29

Family

ID=86568871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310013043.2A Active CN116233535B (en) 2023-01-05 2023-01-05 Video background audio adding method based on data stream

Country Status (1)

Country Link
CN (1) CN116233535B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080292273A1 (en) * 2007-05-24 2008-11-27 Bei Wang Uniform Program Indexing Method with Simple and Robust Audio Feature and Related Enhancing Methods
WO2010090102A1 (en) * 2009-02-03 2010-08-12 株式会社コナミデジタルエンタテインメント Game device, game control method, data recording medium, and program
CN109729297A (en) * 2019-01-11 2019-05-07 广州酷狗计算机科技有限公司 The method and apparatus of special efficacy are added in video
CN112672218A (en) * 2020-12-16 2021-04-16 福州凌云数据科技有限公司 Editing method for batch generation of videos
US20220078372A1 (en) * 2020-09-10 2022-03-10 Hyc (Usa), Inc. Adaptive Method and System For Data Flow Control Based On Variable Frame Structure in Video Image Processing System
US20220208203A1 (en) * 2020-12-29 2022-06-30 Compal Electronics, Inc. Audiovisual communication system and control method thereof
CN115038011A (en) * 2022-05-31 2022-09-09 中国第一汽车股份有限公司 Vehicle, control method, control device, control equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080292273A1 (en) * 2007-05-24 2008-11-27 Bei Wang Uniform Program Indexing Method with Simple and Robust Audio Feature and Related Enhancing Methods
WO2010090102A1 (en) * 2009-02-03 2010-08-12 株式会社コナミデジタルエンタテインメント Game device, game control method, data recording medium, and program
CN109729297A (en) * 2019-01-11 2019-05-07 广州酷狗计算机科技有限公司 The method and apparatus of special efficacy are added in video
US20220078372A1 (en) * 2020-09-10 2022-03-10 Hyc (Usa), Inc. Adaptive Method and System For Data Flow Control Based On Variable Frame Structure in Video Image Processing System
CN112672218A (en) * 2020-12-16 2021-04-16 福州凌云数据科技有限公司 Editing method for batch generation of videos
US20220208203A1 (en) * 2020-12-29 2022-06-30 Compal Electronics, Inc. Audiovisual communication system and control method thereof
CN115038011A (en) * 2022-05-31 2022-09-09 中国第一汽车股份有限公司 Vehicle, control method, control device, control equipment and storage medium

Also Published As

Publication number Publication date
CN116233535B (en) 2023-09-29

Similar Documents

Publication Publication Date Title
US6313822B1 (en) Method and apparatus for modifying screen resolution based on available memory
KR20070034462A (en) Video-Audio Synchronization
US7428335B2 (en) Method of extracting contour of image, method of extracting object from image, and video transmission system using the same method
US8175121B2 (en) Image processor and image display apparatus comprising the same
CN1745526B (en) Apparatus and method for synchronization of audio and video streams.
CN101291392B (en) Apparatus and method of processing image as well as apparatus and method of generating reproduction information
US20080231756A1 (en) Apparatus and method of processing image as well as apparatus and method of generating reproduction information
US20050196061A1 (en) Signal-transmitting system, data-transmitting apparatus and data-receiving apparatus
US7133066B2 (en) Image processing
KR101741747B1 (en) Apparatus and method for processing real time advertisement insertion on broadcast
CN116233535B (en) Video background audio adding method based on data stream
US20030058224A1 (en) Moving image playback apparatus, moving image playback method, and audio playback apparatus
JPH08340553A (en) Video signal encoding device
US7024038B2 (en) Image processing apparatus and method, and storage medium therefor
CN110728971B (en) Audio and video synthesis method
US6947599B2 (en) Apparatus and method for image compression using a specified restart interval
US20090086090A1 (en) Picture signal processing apparatus and picture signal processing method
JPH10155139A (en) Image processor and image processing method
US7136414B2 (en) System and method for efficiently performing an inverse telecine procedure
US6810154B2 (en) Method and apparatus for automatic spatial resolution setting for moving images
KR100920137B1 (en) Apparatus for calibrating a uniformity of a television receiver and method thereof
US20080317130A1 (en) Image converting apparatus
KR100224859B1 (en) Vertical interpolation method of video signal based on edge and apparatus therefor
JPH07221993A (en) Method and device for thinning out color image data, and compressing method for color image data
WO2024078064A1 (en) Image processing method and apparatus, and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant