CN110392297B

CN110392297B - Video processing method and device, storage medium and terminal

Info

Publication number: CN110392297B
Application number: CN201810348124.7A
Authority: CN
Inventors: 肖仙敏; 叶晨晖; 王文涛; 肖鹏; 张元昊; 林锋
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-04-18
Filing date: 2018-04-18
Publication date: 2021-12-14
Anticipated expiration: 2038-04-18
Also published as: CN110392297A

Abstract

The application discloses a video processing method and equipment, a storage medium and a terminal, wherein the video processing method comprises the following steps: acquiring a target video frame and a background audio in a target video; acquiring a target audio intensity value corresponding to a timestamp of the target video frame from the background audio; determining a picture amplification amplitude value corresponding to the target video frame according to the target audio intensity value; and carrying out video image amplification processing on the target video frame according to the image amplification value. By adopting the technical scheme, the dynamic sense of video playing can be enhanced.

Description

Video processing method and device, storage medium and terminal

Technical Field

The present invention relates to the field of media technologies, and in particular, to a video processing method and device, a storage medium, and a terminal.

Background

With the development of the mobile internet, more and more users can shoot videos attracting eyes through terminals and then publish the shot videos on various social platforms. Currently, in the process of shooting a video or in the process of editing the shot video subsequently, a user may perform various special effect processing on a video picture, such as adding music, beautifying, adding a pendant through face recognition, and the like.

Disclosure of Invention

Embodiments of the present invention provide a video processing method and apparatus, a storage medium, and a terminal, which can amplify a video image of a target video to different degrees according to the intensity of a background audio, thereby enhancing the dynamic effect of video playing.

In a first aspect, an embodiment of the present invention provides a video processing method, including:

acquiring a target video frame and a background audio in a target video;

acquiring a target audio intensity value corresponding to a timestamp of the target video frame from the background audio;

determining a picture amplification amplitude value corresponding to the target video frame according to the target audio intensity value;

and carrying out video image amplification processing on the target video frame according to the image amplification value.

In one possible design, the obtaining a target audio intensity value corresponding to a timestamp of the target video frame includes:

acquiring M audio intensity values associated with a timestamp where the target video frame is located by calling a system interface, wherein the M audio intensity values comprise audio intensity values of a plurality of sound channels within a target time range, and the target time range comprises the timestamp where the target video frame is located;

and calculating the average value of the M audio intensity values, and taking the average value of the M audio intensity values as a target audio intensity value corresponding to the timestamp of the target video frame.

acquiring an audio intensity sequence, wherein the audio intensity sequence comprises N audio intensity values, the number of the N audio intensity values is the same as the number of video frames contained in the target video, the target video comprises a plurality of video frames, one audio intensity value corresponds to one video frame, the N audio intensity values are obtained by combining P original audio intensity values, the P original audio intensity values are obtained by decoding the background audio, and P is greater than N;

and acquiring an audio intensity value matched with the sequence number from the audio intensity sequence according to the sequence number of the target video frame in the target video, wherein the audio intensity value is used as a target audio intensity value corresponding to the time stamp of the target video frame.

In one possible design, the method further includes:

decoding the background audio to obtain P original audio intensity values contained in the background audio, wherein the P original audio intensity values are in one-to-one correspondence with a plurality of audio frames contained in the background audio;

dividing the P original audio intensity values into N sets, wherein one set corresponds to one video frame and each set comprises a plurality of continuous original audio intensity values on a time axis;

for each set, calculating an average value of a plurality of original audio intensity values contained in the set, and taking the average value of the plurality of original audio intensity values as an audio intensity value corresponding to the set;

and arranging the audio intensity values corresponding to each set according to the time sequence of all the video frames in the target video to obtain an audio intensity sequence.

In one possible design, the determining, according to the target audio intensity value, a picture enlargement amplitude value corresponding to the target video frame includes:

acquiring a difference value between the target audio intensity value and a reference intensity value as a first difference value;

and determining a picture amplification amplitude value corresponding to the target video frame according to the first difference value.

In one possible design, the determining, according to the first difference value, a picture enlargement amplitude value corresponding to the target video frame includes:

acquiring a maximum audio intensity value of the background audio;

calculating a difference between the maximum audio intensity value and the reference intensity value as a second difference;

and determining the picture amplification amplitude value corresponding to the target video frame according to the ratio of the first difference value to the second difference value.

In one possible design, the method further includes:

if the difference values between the plurality of adjacent historical audio intensity values and the reference intensity value are all larger than a first threshold value or are all smaller than a second threshold value, updating the reference intensity value according to the plurality of adjacent historical audio intensity values; the video frames respectively corresponding to the plurality of adjacent historical audio intensity values are a plurality of video frames which are adjacent to the target video frame and are continuous on a time axis, and the timestamps of the video frames respectively corresponding to the plurality of adjacent historical audio intensity values are all smaller than the timestamp of the target video frame.

In one possible design, the updating the baseline intensity value based on the plurality of neighboring historical audio intensity values includes:

calculating an average of the plurality of neighboring historical audio intensity values;

updating the reference intensity value according to an average of the plurality of neighboring historical audio intensity values.

acquiring the music type of the background audio;

acquiring a machine learning model corresponding to the music type as a target model;

training the multiple adjacent historical audio intensity values based on the target model to obtain an output result;

and updating the reference intensity value according to the output result.

In one possible design, the performing, according to the picture enlargement amount value, video picture enlargement processing on the target video frame includes:

if the image amplification amplitude value is larger than the amplitude threshold value, amplifying the video image content of the target video frame according to the amplification factor corresponding to the image amplification amplitude value;

and if the image amplification amplitude value is smaller than or equal to the amplitude threshold value, controlling the video image content of the target video frame to be unchanged.

In a second aspect, an embodiment of the present invention provides a video processing apparatus, including:

the first acquisition unit is used for acquiring a target video frame and background audio in a target video;

a second obtaining unit, configured to obtain, from the background audio, a target audio intensity value corresponding to a timestamp where the target video frame is located;

the determining unit is used for determining a picture amplification amplitude value corresponding to the target video frame according to the target audio intensity value;

and the amplification processing unit is used for carrying out video image amplification processing on the target video frame according to the image amplification value.

In one possible design, the second obtaining unit includes:

a first obtaining subunit, configured to obtain, by calling a system interface, M audio intensity values associated with a timestamp where the target video frame is located, where the M audio intensity values include audio intensity values of multiple channels within a target time range, and the target time range includes the timestamp where the target video frame is located;

and the calculating subunit is used for calculating the average value of the M audio intensity values, and taking the average value of the M audio intensity values as a target audio intensity value corresponding to the timestamp of the target video frame.

In one possible design, the second obtaining unit includes:

a second obtaining subunit, configured to obtain an audio intensity sequence, where the audio intensity sequence includes N audio intensity values, where a value of N is the same as a number of video frames included in the target video, the target video includes a plurality of video frames, one audio intensity value corresponds to one video frame, the N audio intensity values are obtained by merging P original audio intensity values, the P original audio intensity values are obtained by decoding the background audio, and P is greater than N;

and the third acquiring subunit is configured to acquire, according to the sequence number of the target video frame in the target video, an audio intensity value matched with the sequence number from the audio intensity sequence, as a target audio intensity value corresponding to the timestamp of the target video frame.

In one possible design, the apparatus further includes:

a decoding processing unit, configured to perform decoding processing on the background audio to obtain P original audio intensity values included in the background audio, where the P original audio intensity values are in one-to-one correspondence with multiple audio frames included in the background audio;

a dividing unit, configured to divide the P original audio intensity values into N sets, where one set corresponds to one video frame, and each set includes multiple original audio intensity values that are continuous on a time axis;

a calculating unit, configured to calculate, for each of the sets, an average value of a plurality of original audio intensity values included in the set, and use the average value of the plurality of original audio intensity values as an audio intensity value corresponding to the set;

and the arrangement unit is used for arranging the audio intensity values corresponding to each set according to the time sequence of all the video frames in the target video to obtain an audio intensity sequence.

In one possible design, the determining unit includes:

a fourth obtaining subunit, configured to obtain a difference between the target audio intensity value and a reference intensity value as a first difference;

and the determining subunit is configured to determine, according to the first difference, a picture amplification amplitude value corresponding to the target video frame.

In one possible design, the determining subunit is specifically configured to:

acquiring a maximum audio intensity value of the background audio;

In one possible design, the apparatus further includes:

the updating unit is used for updating the reference strength value according to the plurality of adjacent historical audio strength values if the difference values between the plurality of adjacent historical audio strength values and the reference strength value are all larger than a first threshold value or are all smaller than a second threshold value; the video frames respectively corresponding to the plurality of adjacent historical audio intensity values are a plurality of video frames which are adjacent to the target video frame and are continuous on a time axis, and the timestamps of the video frames respectively corresponding to the plurality of adjacent historical audio intensity values are all smaller than the timestamp of the target video frame.

In one possible design, the update unit is specifically configured to:

acquiring the music type of the background audio;

and updating the reference intensity value according to the output result.

In one possible design, the amplification processing unit is specifically configured to:

In a third aspect, embodiments of the present invention provide a computer storage medium storing a computer program comprising program instructions that, when executed by a processor, perform a method as in the first aspect of embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention provides a terminal, including: a processor and a memory; the processor is connected to a memory, wherein the memory is used for storing program codes, and the processor is used for calling the program codes to execute the method according to the first aspect in the embodiment of the present invention.

In the embodiment of the invention, the image amplification amplitude value corresponding to the target video frame can be determined according to the target audio intensity value in the background audio corresponding to the timestamp of the target video frame, so that the video image amplification processing is performed on the target video frame according to the image amplification amplitude value.

Drawings

In order to illustrate embodiments of the present invention or technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a schematic flowchart of a video processing method according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating an audio intensity obtaining method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating another audio intensity obtaining method according to an embodiment of the present invention;

fig. 4 is a schematic view of a shooting interface according to an embodiment of the present invention;

fig. 5 is a schematic diagram illustrating obtaining an audio intensity of a background audio according to an embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating audio intensity acquisition of background audio according to another embodiment of the present invention;

FIG. 7 is a diagram illustrating neighboring history frames according to an embodiment of the present invention;

FIG. 8 is an enlarged and comparative illustration of a video frame according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a second obtaining unit according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of another second obtaining unit according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

The video processing method of the embodiment of the invention can be applied to a scene that a user selects a shake special effect and adds background audio (such as background music) to a target video in the process of shooting the target video. Or, the video processing method of the embodiment of the present invention may also be applied to a scene in which a shot target video is edited. The terminal can amplify the video images of different video frames by different image amplification amplitude values according to different audio intensity values corresponding to timestamps of the different video frames in the shot target video, so that a dynamic effect changing along with the music intensity value is generated in the target video playing process.

The following describes in detail a video processing method according to an embodiment of the present invention with reference to fig. 1 to 8.

Referring to fig. 1, a flow chart of a video processing method according to an embodiment of the present invention is shown. As shown in fig. 1, the video processing method of the embodiment of the present invention may include the following steps S101 to S104.

S101, acquiring a target video frame and a background audio in a target video;

in an embodiment, the target video frame may be any one video frame in the target video, and all video frames included in the target video may be subjected to the amplification processing by using the video processing method provided by the embodiment of the present invention. The background audio may be background music added for the target video.

As shown in fig. 4, which is a schematic view of a shooting interface of a target video according to an embodiment of the present invention, when a user enters the shooting interface, multiple tabs of the shooting interface are displayed to the user, and if the user clicks other areas outside the tab to hide the tab, the user can click a shooting button to shoot the target video. There is a tab for selecting background music in the plurality of tabs, and as shown in fig. 4, the interface may display a wide variety of background music for the user to select. When the user selects the background music and the special shaking effect, the video frames of different video frames in the target video are amplified by different frame amplification amplitude values according to different audio intensity values of the background music, so that the playing of the target video generates the shaking effect.

S102, acquiring a target audio intensity value corresponding to a timestamp of the target video frame from the background audio;

in one embodiment, one video frame in the target video corresponds to one audio intensity value, and when the target audio intensity value corresponding to the target video frame is obtained, the target audio intensity value corresponding to the target video frame can be determined through the timestamp of the target video frame. The audio intensity value may be a decibel value, which is a unit that measures the relative magnitude of the sound intensity or electrical power by a value equal to 10 times the common logarithm of the sound intensity or power ratio.

Optionally, the obtaining manner of obtaining the target audio intensity value corresponding to the timestamp of the target video frame may include the following two optional implementations:

in a first alternative embodiment, as shown in FIG. 2, the obtaining step includes S10-S11;

s10, obtaining M audio intensity values associated with the timestamp of the target video frame by calling a system interface, wherein the M audio intensity values comprise audio intensity values of a plurality of sound channels within a target time range, and the target time range comprises the timestamp of the target video frame;

in one embodiment, the target time range may be within a certain time range from the timestamp of the target video frame, for example, the timestamp of the target video frame is 2ms, and the target time range may be within 1ms from the timestamp of 2ms, that is, within a time range from 1ms to 3 ms. The M audio intensity values are obtained directly by calling the system interface, such as directly using the audio player providing interface of the platform, and typically range between-160-0. The M audio intensity values may include audio intensity values of multiple channels in a target time range, for example, the M audio intensity values may include an audio intensity value of a left channel and an audio intensity value of a right channel in the target time range.

And S11, calculating the average value of the M audio intensity values, and taking the average value of the M audio intensity values as a target audio intensity value corresponding to the timestamp of the target video frame.

There is usually no native audio player interface on the android platform to obtain the audio intensity value, and the audio player interface can be called on the ios platform to obtain the audio intensity value. As shown in fig. 5, which is a schematic flow chart of an ios platform directly obtaining an audio intensity value by calling an AVAudioPlayer and calculating a target audio intensity value according to the embodiments of the present invention, as shown in the drawing, the audio intensity value may be a decibel value, an initialized decibel value is-160, before obtaining the decibel value each time, an updateMeters method is called to update an audio measurement value, and a channel number powerNum is obtained, then a decibel average value of each channel in a target time range is obtained according to the channel number powerNum, the decibel average values of all channels are summed, and then an average value is obtained and assigned to P, where the value of P is a decibel value of a timestamp where a target video frame needs to be obtained, and a decibel value return range is-160-0.

In a second alternative embodiment, as shown in FIG. 3, the obtaining step includes S20-S25;

s20, decoding the background audio to obtain P original audio intensity values included in the background audio, where the P original audio intensity values are in one-to-one correspondence with a plurality of audio frames included in the background audio;

in one embodiment, both the ios platform and the android platform can obtain the original audio intensity value by decoding the original file of the background audio. It should be noted that the decibel value obtained by calling the player interface is in the range of-160 to 0, and the original audio intensity value obtained by decoding the audio file is in the range of 0 to several tens of thousands.

S21, dividing the P original audio intensity values into N sets, one set corresponding to one of the video frames, each set containing a plurality of original audio intensity values that are continuous on a time axis;

in one embodiment, the P original audio intensity values may be sorted in time order, and the sorted P original audio intensity values are further divided into N sets, where the number N of the sets is the same as the number of video frames included in the target video, and the value of P is far greater than the value of N.

Optionally, when the P original audio intensity values are divided into N sets, the N sets may be divided equally, that is, each set contains the same number of original audio intensity values, for example, if the obtained original audio intensity values are 1000, and the number of video frames is 100, each set contains 10 original audio intensity values. And the 10 original audio intensity values are consecutive original audio intensity values on the time axis. For example, the 1000 original audio intensity values are numbered according to chronological order, and are sequentially 1, 2, 3, and 4 … 1000, and the 100 video frames are also numbered according to chronological order, and are sequentially 1, 2, and 3 … 100, so that the original audio intensity values numbered 1 to 10 are set 1, and correspond to video frame 1, the original audio intensity values numbered 11 to 20 are set 2, and correspond to video frame 2, and so on. In the average division process, if P and N cannot be divided evenly, for example, 980 original audio intensity values are divided into 100 sets, then the calculation can be performed in a manner of complementing 0 at the end.

It should be noted that, the above-mentioned average division of the P original audio intensity values is only an example, and may also be a non-average division, which is not limited in the embodiment of the present invention.

S22, for each of the sets, calculating an average value of a plurality of original audio intensity values included in the set, and taking the average value of the plurality of original audio intensity values as an audio intensity value corresponding to the set;

and S23, arranging the audio intensity values corresponding to each set according to the time sequence of all video frames in the target video to obtain an audio intensity sequence.

In one embodiment, the average of the original audio intensity values contained in each of the N sets is calculated, one set corresponding to each average, and the average is taken as the audio intensity value corresponding to the set. Because one set corresponds to one video frame, the audio intensity values corresponding to each set in the N sets can be sorted according to the time sequence of all the video frames to obtain an audio intensity sequence. For example, set 1 corresponds to video frame 1, the average value calculated for the original audio intensity values of set 1 is a1, set 2 corresponds to video frame 2, the average value calculated for the original audio intensity values of set 2 is a2, and so on. The resulting audio intensity sequence is a1, a2, A3 … A N.

S24, obtaining an audio intensity sequence, wherein the audio intensity sequence comprises N audio intensity values;

and S25, acquiring an audio intensity value matched with the sequence number from the audio intensity sequence according to the sequence number of the target video frame in the target video, and taking the audio intensity value as a target audio intensity value corresponding to the time stamp of the target video frame.

In one embodiment, when a target audio intensity value corresponding to a timestamp of a target video frame in a target video needs to be obtained, a sequence number of the target video frame in the target video is obtained first, for example, the target video frame is a video frame 2, and then an audio intensity value matching the sequence number is obtained from an audio intensity sequence, that is, an audio intensity value a2 matching the video frame 2 is obtained from the audio intensity sequence, and an audio intensity value a2 is a target audio intensity value corresponding to the video frame 2.

The following describes the process of obtaining an audio intensity sequence by decoding a background audio file in combination with an ios platform and an android platform:

(1) initializing a pointer reader read by a file according to the path of the background audio file, and defining an array sampleData for storing an original audio intensity value sampleBuffer;

(2) judging whether the reader has data, if so, calling copyNextSampleBuffer to obtain sampleBuffer and storing the sampleBuffer in a sampleData array;

(3) continuously circulating the step (2) until the reader reads the tail of the background audio file to obtain the stored sampleData arrays of the sampleBuffer data;

(4) the sampleData array is a relatively large array, and data is relatively messy, so that the sampleData array is converted into an unsigned long value, and meanwhile, the data is smoothed, otherwise, the data is suddenly low and suddenly high, and the decibel at that moment cannot be truly reflected.

Assuming that the number of frames per second of the rendered video is frameNum and the duration of the audio file is audioDuration, it is necessary to convert the sampleData array size into an array with the size of frameNum and audioDuration, that is, the sampleData array size sampleDataSize value is frameNum and audioDuration;

(5) the conversion of the sampleData array size to an array of frameNum × audioDuration size may be by summing the values of every few elements and then averaging as one element in the new sampleData array.

The finally obtained sampleData array corresponds to the video frames of the target video one by one, the first video frame is the first element corresponding to the new sampleData array, and the like. All elements in the new sampleData array constitute the audio intensity sequence.

And subsequently, the audio intensity value corresponding to the timestamp of any video frame can be directly obtained from the new sampleData array.

S103, determining a picture amplification amplitude value corresponding to the target video frame according to the target audio intensity value;

in one embodiment, after the target audio intensity value is obtained, the target audio intensity value may be converted to a picture magnification amplitude value. Generally, the larger the target audio intensity value is, the larger the picture amplification amplitude value corresponding to the target video frame is. The audio intensity value of the background audio usually changes continuously with time, so the picture amplification amplitude value corresponding to different video frames also changes continuously, thereby generating the rhythmic jitter of the video picture along with the background audio in the video playing process.

Alternatively, a difference between the target audio intensity value and the reference intensity value may be calculated as a first difference, and the picture enlargement amplitude value corresponding to the target video frame may be determined according to the first difference. Alternatively, in order to control the maximum value of the picture amplification amplitude value, the maximum audio intensity value of the background audio may be obtained, the difference between the maximum audio intensity value and the reference intensity value is calculated as the second difference, and the picture amplification amplitude value corresponding to the target video frame is determined according to the ratio between the first difference and the second difference. Here, a calculation manner of the picture enlargement amplitude value is taken as an example:

scale＝1.0+[(power-basePower)/(maxPower-basePower)]*0.5

the scale is a picture amplification amplitude value, the power is a target audio intensity value, the basePower is a reference intensity value, and the maxPower is a maximum audio intensity value, if the maxPower obtained through the system interface player is 0, if the maxPower obtained through decoding a background audio original file can know the maximum value during decoding, and finally multiplying by 0.5 controls the scale to ensure that the maximum value is 1.5.

The reference intensity value is not fixed, and the reference intensity value needs to be dynamically adjusted to avoid the effect that the image is enlarged or unchanged and the jitter is not obvious because the audio intensity value of the background audio is too large or too small all the time.

Jitter is not obvious if the difference between consecutive adjacent historical audio intensity values and the reference intensity value is greater than a first threshold value, or if the difference between consecutive adjacent historical audio intensity values and the reference intensity value is less than a second threshold value, where consecutive adjacent historical audio intensity values may be more than a threshold number of historical audio intensity values, for example, the threshold number may be 15, and if the difference between consecutive more than 15 historical audio intensity values and the reference intensity value is greater than the first threshold value, or if the difference between consecutive more than 15 historical audio intensity values and the reference intensity value is less than the second threshold value, the reference intensity value needs to be dynamically adjusted.

It should be noted that, the video frames corresponding to the adjacent historical audio intensity values are a plurality of video frames adjacent to the target video frame and consecutive on the time axis, as shown in fig. 7, for example, if the target video frame is a video frame 4, the video frames corresponding to the adjacent historical audio intensity values may be a video frame 5, a video frame 6, and a video frame 7 … video frame 10, that is, the audio intensity values corresponding to the video frames 5, 6, 7 … video frame 10 are a plurality of adjacent historical audio intensity values of the target audio intensity value, and the video frames corresponding to the target audio intensity value are a video frame 4, where the video frames 4, 5, 6, 7 … video frames 10 are all consecutive video frames of the target video.

Optionally, the manner of dynamically adjusting the reference intensity value may be: calculating the average value of the plurality of adjacent historical audio intensity values, and updating the reference intensity value according to the average value of the plurality of adjacent historical audio intensity values. As shown in fig. 6, in a flowchart for updating a reference intensity value according to an embodiment of the present invention, assuming that the number of consecutive too-large times in historical audio intensity values is counted as N, the number of consecutive too-small times is counted as M, the reference audio intensity value is counted as B, and the audio intensity value at a certain time is counted as P, when the number of consecutive times exceeds a certain threshold (this time is set to be 15), the reference intensity value B is changed to an average value Q of a plurality of historical audio intensity values that consecutively appear too-large or too-small.

As shown in the figure, firstly, a decibel value P of background music at a certain time is obtained, and whether M is greater than 15 or whether N is greater than 15 is determined, if so, it is indicated that the reference intensity value Q needs to be updated again, and the average value Q is updated to the reference intensity value. If not, continuously judging whether the decibel value P is larger than the reference intensity value B, if so, further judging whether the continuous excessive times are 0, if so, directly assigning the value of P as an average value Q, otherwise, accumulating N, and simultaneously calculating a new average value and a new picture amplification amplitude value. If P is smaller than the reference intensity value, further judging whether the number of times of continuous undersize is 0, directly assigning the value of P as an average value Q, if not, accumulating M, and simultaneously calculating a new average value and a new picture amplification amplitude value.

Optionally, the manner of dynamically adjusting the reference intensity value may be: and acquiring the music type of the background audio, wherein the music type can comprise rock music, national music, soft music and the like, different music types can correspond to different machine learning models, the machine learning model corresponding to the music type of the background audio is acquired and used as a training target model, and a plurality of adjacent historical audio strength values are trained on the basis of the target model. For example, if the music type of the background audio is rock music, when the machine learning model corresponding to the rock music is used for training a plurality of adjacent historical audio intensity values, a plurality of future audio intensity values corresponding to a plurality of video frames after the current video frame may be continuously obtained, an average value of the plurality of adjacent historical audio intensity values and the plurality of future audio intensity values is calculated, and is multiplied by a weighting coefficient greater than 1 to obtain a weighted average value, and the weighted average value is updated to a new reference intensity value to adapt to the characteristic of large variation amplitude of the audio intensity value of the rock music. If the music type of the background audio is soft music, when the machine learning model corresponding to the soft music trains a plurality of adjacent historical audio intensity values, a plurality of future audio intensity values corresponding to a plurality of video frames after the current video frame can be continuously obtained, the average value of the plurality of historical audio intensity values and the plurality of future audio intensity values is calculated and multiplied by a weighting coefficient smaller than 1 to obtain a weighted average value, and the weighted average value is updated to be a new reference intensity value so as to adapt to the characteristic that the change amplitude of the audio intensity value of the soft music is small.

In addition, the above-mentioned dynamic adjustment of the reference intensity value may be performed by detecting whether or not the reference intensity value is adjusted after the video frame of the target video is subjected to the video screen enlargement processing. Or, after video image amplification processing is performed on a preset number of continuous video frames of the target video, whether the reference intensity value is adjusted once is detected.

And S104, carrying out video image amplification processing on the target video frame according to the image amplification value.

In an embodiment, when the video picture amplification processing is performed on the target video frame according to the determined picture amplification value, the picture amplification value may be compared with an amplitude threshold value, for example, the amplitude threshold value may be 1, when the picture amplification value is greater than the amplitude threshold value, the video picture content of the target video frame is amplified according to the amplification factor corresponding to the picture amplification value, and when the picture amplification value is less than or equal to the amplitude threshold value, the video picture content of the target video frame may be controlled to be unchanged, that is, the video picture of the target video frame may be kept at a default size. For example, if the picture amplification amplitude value of the video frame 1 is 1.2, the video picture content of the video frame 1 is amplified by 1.2 times, the picture amplification amplitude value of the video frame 2 is 0.8, that is, smaller than the amplitude threshold value 1, the video picture content of the video frame 2 is kept unchanged, and the picture amplification amplitude value of the video frame 3 is 1.4, the video picture content of the video frame 3 is amplified by 1.4 times, so that when the target video is played, a jitter effect is generated, and the dynamic sense of playing the target video is enhanced.

Optionally, when the video frame of the video frame is amplified, an image bilinear interpolation amplification algorithm may be used to perform amplification processing, and it is understood that other algorithms may also be used to perform amplification processing on the video frame, which is not limited in the embodiment of the present invention.

As shown in fig. 8, which is a schematic diagram of video image enlargement processing according to an embodiment of the present invention, as shown in the figure, the first picture may be an original video image of a video frame, and the second picture is a video image obtained by enlarging the video image.

A video processing apparatus according to an embodiment of the present invention will be described in detail with reference to fig. 9 to 11. It should be noted that the apparatuses shown in fig. 9-11 are used for executing the method according to the embodiments of the present invention shown in fig. 1-8, and for convenience of description, only the parts related to the embodiments of the present invention are shown, and details of the technology are not disclosed, please refer to the embodiments of the present invention shown in fig. 1-8.

Fig. 9 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present invention. As shown in fig. 9, the video processing apparatus 1 according to an embodiment of the present invention may include: a first acquisition unit 11, a second acquisition unit 12, a determination unit 13, and an enlargement processing unit 14;

a first obtaining unit 11, configured to obtain a target video frame and a background audio in a target video;

a second obtaining unit 12, configured to obtain, from the background audio, a target audio intensity value corresponding to a timestamp of the target video frame;

alternatively, as shown in fig. 10, the second acquiring unit 12 may include a first acquiring subunit 121 and a calculating subunit 122;

a first obtaining subunit 121, configured to obtain, by calling a system interface, M audio intensity values associated with a timestamp where the target video frame is located, where the M audio intensity values include audio intensity values of multiple channels within a target time range, and the target time range includes the timestamp where the target video frame is located;

a calculating subunit 122, configured to calculate an average value of the M audio intensity values, and use the average value of the M audio intensity values as a target audio intensity value corresponding to the timestamp of the target video frame.

Alternatively, as shown in fig. 11, the second acquiring unit 12 may include a second acquiring subunit 123 and a third acquiring subunit 124;

a second obtaining subunit 123, configured to obtain an audio intensity sequence, where the audio intensity sequence includes N audio intensity values, a value of the N is the same as the number of video frames included in the target video, the target video includes a plurality of video frames, one audio intensity value corresponds to one video frame, the N audio intensity values are obtained by merging P original audio intensity values, the P original audio intensity values are obtained by decoding the background audio, and P is greater than N;

a third obtaining subunit 124, configured to obtain, according to the sequence number of the target video frame in the target video, an audio intensity value that matches the sequence number from the audio intensity sequence, as a target audio intensity value corresponding to the timestamp of the target video frame.

Optionally, the video processing apparatus may further include a decoding processing unit, a dividing unit, a calculating unit, and an arranging unit;

The determining unit 13 is configured to determine, according to the target audio intensity value, a picture amplification amplitude value corresponding to the target video frame;

optionally, the determining unit 13 may include a fourth acquiring subunit and a determining subunit;

In one possible design, the determining subunit is specifically configured to:

acquiring a maximum audio intensity value of the background audio;

Optionally, the apparatus further includes an updating unit;

In one possible design, the update unit is specifically configured to:

acquiring the music type of the background audio;

and updating the reference intensity value according to the output result.

And the amplification processing unit 14 is configured to perform video image amplification processing on the target video frame according to the image amplification value.

The amplification processing unit 14 is specifically configured to:

For the concepts, explanations, and detailed descriptions and other steps related to the technical solutions provided in the embodiments of the present application related to the video processing apparatus, reference is made to the descriptions of the foregoing methods or other embodiments, and details are not repeated here.

An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executing the method steps in the embodiments shown in fig. 1 to 8, and a specific execution process may refer to specific descriptions of the embodiments shown in fig. 1 to 8, which are not described herein again.

Referring to fig. 12, a schematic structural diagram of a terminal is provided for an embodiment of the present invention, where the video processing device in fig. 11 may be applied to the terminal 1000, and the terminal 1000 may include: processor 1001, network interface 1004 and memory 1005, and the terminal 1000 can further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1004 may optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 12, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the terminal 1000 shown in fig. 12, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1004 to implement:

acquiring a target video frame and a background audio in a target video;

Optionally, the processor 1001, when executing the step of obtaining the target audio intensity value corresponding to the timestamp of the target video frame, specifically includes the following steps:

In one embodiment, the processor 1001 is further configured to perform the following steps:

Optionally, the determining, by the processor 1001, the picture amplification amplitude value corresponding to the target video frame according to the target audio intensity value specifically includes the following steps:

Optionally, the determining, by the processor 1001, the picture enlargement amplitude value corresponding to the target video frame according to the first difference value specifically includes the following steps:

acquiring a maximum audio intensity value of the background audio;

Optionally, the processor 1001 updates the reference intensity value according to the plurality of adjacent historical audio intensity values, specifically including the following steps:

acquiring the music type of the background audio;

and updating the reference intensity value according to the output result.

Optionally, the processor 1001 performs video image amplification processing on the target video frame according to the image amplification value, and specifically includes the following steps:

It should be noted that, for a specific implementation process, reference may be made to specific descriptions of the method embodiments shown in fig. 1 to fig. 8, which are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and includes processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims

1. A video processing method, comprising:

acquiring a target video frame and a background audio in a target video;

2. The method of claim 1, wherein said obtaining a target audio intensity value corresponding to a timestamp at which the target video frame is located comprises:

3. The method of claim 1, wherein said obtaining a target audio intensity value corresponding to a timestamp at which the target video frame is located comprises:

4. The method of claim 3, wherein the method further comprises:

5. The method according to any one of claims 1-4, wherein the determining the picture expansion amplitude value corresponding to the target video frame according to the target audio intensity value comprises:

6. The method as claimed in claim 5, wherein said determining the picture expansion amplitude value corresponding to the target video frame according to the first difference value comprises:

acquiring a maximum audio intensity value of the background audio;

7. The method of claim 5, wherein the method further comprises:

8. The method of claim 7, wherein said updating the baseline intensity value based on the plurality of neighboring historical audio intensity values comprises:

9. The method of claim 7, wherein said updating the baseline intensity value based on the plurality of neighboring historical audio intensity values comprises:

acquiring the music type of the background audio;

and updating the reference intensity value according to the output result.

10. The method of claim 1, wherein the video picture enlargement processing of the target video frame according to the picture enlargement value comprises:

11. A video processing apparatus, comprising:

12. The apparatus of claim 11, wherein the second obtaining unit comprises:

13. The apparatus of claim 11, wherein the second obtaining unit comprises:

14. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method according to any one of claims 1-10.

15. A terminal, comprising: a processor and a memory;

the processor is coupled to a memory, wherein the memory is configured to store program code and the processor is configured to invoke the program code to perform the method of any of claims 1-10.