CN116132708A

CN116132708A - Method and device for acquiring stuck point information, electronic equipment and storage medium

Info

Publication number: CN116132708A
Application number: CN202310102909.7A
Authority: CN
Inventors: 陈联武; 郑羲光; 张晨
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2023-01-28
Filing date: 2023-01-28
Publication date: 2023-05-16

Abstract

The disclosure relates to the technical field of audio processing, and in particular relates to a method and device for acquiring stuck point information, electronic equipment and a storage medium. The stuck point information acquisition method comprises the following steps: detecting the rhythm of an audio signal to obtain a beat time point set corresponding to the audio signal, wherein the beat time point set comprises at least one of a beat time point subset and a re-beat time point subset; performing signal mutation detection on the audio signal to obtain a mutation position time point set corresponding to the audio signal; and acquiring the stuck point information corresponding to the audio signal according to the beat time point set and the mutation position time point set. By adopting the method and the device, the accuracy of acquiring the stuck point information can be improved, and the accuracy of generating the stuck point video can be improved.

Description

Method and device for acquiring stuck point information, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of audio processing, and in particular relates to a method and device for acquiring stuck point information, electronic equipment and a storage medium.

Background

With the development of science and technology, short video has become an indispensable part of people's daily life. Among short videos, stuck point video is an important classification of short videos. In the stuck point video, simultaneous playing of the audio and the picture can be realized by corresponding the picture with the beat of the background music. When the picture and the audio are played simultaneously, for example, the stuck point transition time point in the audio can be randomly determined, so that the determination of the stuck point transition time point is inaccurate.

Disclosure of Invention

The disclosure provides a stuck point information acquisition method, a stuck point information acquisition device, electronic equipment and a storage medium, so as to at least solve the problem that the determination of a stuck point transition time point in the related art is inaccurate. The technical scheme of the present disclosure is as follows:

according to a first aspect of an embodiment of the present disclosure, there is provided a stuck point information acquisition method, including:

detecting the rhythm of an audio signal to obtain a beat time point set corresponding to the audio signal, wherein the beat time point set comprises at least one of a beat time point subset and a re-beat time point subset;

performing signal mutation detection on the audio signal to obtain a mutation position time point set corresponding to the audio signal;

and acquiring the stuck point information corresponding to the audio signal according to the beat time point set and the mutation position time point set.

Optionally, the obtaining, according to the beat time point set and the mutation position time point set, stuck point information corresponding to the audio signal includes:

and acquiring a time point intersection of the beat time point set and the mutation position time point set, and taking at least one time point in the time point intersection as the stuck point information corresponding to the audio signal.

Optionally, the method further comprises:

receiving an adjustment instruction for the stuck point information corresponding to the audio signal under the condition that the interval duration of any two adjacent time points in the stuck point information corresponding to the audio signal is smaller than a duration threshold;

and according to the adjustment instruction, adjusting the stuck point information corresponding to the audio signal to obtain the stuck point information corresponding to the adjusted audio signal.

Optionally, the acquiring a time point intersection of the beat time point set and the mutation position time point set, taking at least one time point in the time point intersection as the clip point information corresponding to the audio signal includes:

acquiring the number of video clips corresponding to a video set, wherein the video set is used for fusing with the audio signal to generate a stuck point video;

acquiring a time point intersection of the beat time point set and the mutation position time point set;

and acquiring at least one time point corresponding to the number of the video clips in the time point intersection, and taking the at least one time point corresponding to the number of the video clips as the stuck point information corresponding to the audio signal.

Optionally, the acquiring at least one time point in the time point intersection corresponding to the number of video clips includes:

Determining beat mutation intensity corresponding to at least one time point in the time point intersection;

and acquiring at least one time point corresponding to the number of the video clips according to the sequence of the beat mutation intensity from high to low.

Optionally, the detecting the signal mutation of the audio signal, obtaining a mutation position time point set corresponding to the audio signal includes:

performing signal mutation detection on the audio signals by adopting a mutation detection network to obtain mutation intensity corresponding to at least one audio time point in the audio signals;

and adding at least one audio time point with the mutation intensity larger than an intensity threshold value in the at least one audio time point to a mutation position time point set corresponding to the audio signal.

performing signal separation processing on the audio signals by adopting an audio separation network to obtain audio signals after signal separation, wherein the audio signals after different signal separation correspond to different audio tracks;

and carrying out signal mutation detection on the audio signals after the signal separation to obtain a mutation position time point set corresponding to the audio signals.

Optionally, the detecting the signal mutation of the audio signal after the signal separation to obtain a mutation position time point set corresponding to the audio signal includes:

performing signal mutation detection on the audio signal after signal separation to obtain at least one mutation position time point;

taking the maximum mutation intensity of at least two mutation intensities as the mutation intensity of any mutation position time point under the condition that any mutation position time point of the at least one mutation position time point corresponds to at least two mutation intensities;

and adding any mutation position time point to a mutation position time point set corresponding to the audio signal under the condition that the mutation intensity is larger than an intensity threshold value.

According to a second aspect of the embodiments of the present disclosure, there is provided a stuck point information acquisition apparatus including:

a set acquisition unit configured to perform rhythm detection on an audio signal, and acquire a beat time point set corresponding to the audio signal, wherein the beat time point set includes at least one of a beat time point subset and a re-beat time point subset;

the set acquisition unit is further configured to perform signal mutation detection on the audio signal and acquire a mutation position time point set corresponding to the audio signal;

And the information acquisition unit is configured to acquire the stuck point information corresponding to the audio signal according to the beat time point set and the mutation position time point set.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the stuck point information acquisition method of any one of the preceding aspects.

According to a fourth aspect of the present application, there is provided a storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the stuck point information acquisition method of any one of the preceding aspects.

According to a fifth aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of any one of the preceding aspects.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in some or related embodiments, a beat time point set corresponding to the audio signal is obtained by performing rhythm detection on the audio signal, where the beat time point set includes at least one of a beat time point subset and a re-beat time point subset; performing signal mutation detection on the audio signal to obtain a mutation position time point set corresponding to the audio signal; and acquiring the stuck point information corresponding to the audio signal according to the beat time point set and the mutation position time point set. Therefore, through rhythm detection and signal mutation detection on the audio signal, the situation that the stuck point information is determined only based on the beat time point can be reduced, so that the stuck point information is inaccurately determined, the accuracy of the acquisition of the stuck point information can be improved, and the accuracy of the generation of the stuck point video can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a flowchart illustrating a stuck point information acquisition method, according to an example embodiment;

FIG. 2 is a flowchart illustrating a stuck point information acquisition method, according to an example embodiment;

FIG. 3 is an exemplary schematic diagram illustrating a stuck point information acquisition method, according to an exemplary embodiment;

FIG. 4 is an exemplary schematic diagram illustrating a stuck point information acquisition method, according to an exemplary embodiment;

FIG. 5 is an exemplary schematic diagram illustrating a stuck point information acquisition method, according to an exemplary embodiment;

FIG. 6 is an exemplary schematic diagram illustrating a stuck point information acquisition method, according to an exemplary embodiment;

FIG. 7 is a flowchart illustrating a stuck point information acquisition method, according to an example embodiment;

FIG. 8 is a flowchart illustrating a stuck point information acquisition method, according to an example embodiment;

FIG. 9 is an exemplary illustration of a stuck point information acquisition method, according to an exemplary embodiment;

fig. 10 is a block diagram of a stuck point information acquisition apparatus, according to an example embodiment;

FIG. 11 is a block diagram illustrating a stuck point information acquisition apparatus, according to an example embodiment;

fig. 12 is a block diagram of a stuck point information acquisition apparatus, according to an example embodiment;

fig. 13 is a block diagram of a stuck point information acquisition apparatus, shown according to an example embodiment;

fig. 14 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Fig. 1 is a flowchart illustrating a stuck point information acquisition method according to an exemplary embodiment, which may be used in a stuck point video generation scene, as shown in fig. 1, including the steps of:

in step S11, detecting the rhythm of the audio signal, and obtaining a beat time point set corresponding to the audio signal;

according to some embodiments, the audio signal of the embodiments of the disclosure may be, for example, an audio signal fused with a video clip to generate a stuck point video, or may refer to only a carrier carrying audio information. The audio signal is not specific to a certain fixed audio information number. For example, when the intensity information corresponding to the audio signal changes, the audio signal may also change accordingly. For example, when

It is easy to understand that a beat refers to a unit of measure of tempo, for example, in an audio signal, a series of beats with certain intensities respectively repeatedly appear at regular intervals. Beats may represent, for example, a fixed unit time value and an organized form of intensity law in the audio signal of the present disclosure. The beat time point may represent, for example, a time point at which a strong or weak beat appears in the audio signal.

In some embodiments, a collection of beat time points refers to a collective of at least one beat time point. The beat includes at least one of a beat and a re-beat. The beat time point set does not refer specifically to a certain fixed set. For example, when at least one beat time point included in the beat time point set is changed, the beat time point set may be changed accordingly. When a specific beat time point included in the beat time point set changes, the beat time point set may also change accordingly. For example, when the audio signal changes, the beat time point set may also change accordingly.

Wherein the beat time point set comprises at least one of a beat time point subset and a re-beat time point subset. I.e. the beat time points may comprise only a subset of beat time points, only a subset of re-beat time points, or both a subset of beat time points and a subset of re-beat time points. The re-beat may be, for example, a beat in which the intensity value is greater than the intensity threshold.

According to some embodiments, the electronic device may perform tempo detection on the audio signal, and obtain a set of beat time points corresponding to the audio signal.

In the embodiment of the disclosure, the audio signal may be downloaded by the electronic device by receiving a download instruction for the audio signal, may be pre-stored in the electronic device, or may be transmitted by other electronic devices received by the electronic device. This is not limiting in embodiments of the present disclosure.

In step S12, signal mutation detection is performed on the audio signal, and a mutation position time point set corresponding to the audio signal is obtained;

according to some embodiments, the execution sequence of the step S11 and the step S12 is not limited, that is, the electronic device may execute the step S11 first and then execute the step S12, or execute the step S12 first and then execute the step S11, or execute the step S11 and the step S12 simultaneously.

In some embodiments, the abrupt location is a location where the intensity of the knuckle beat changes. The abrupt change position is not particularly limited to a fixed position, and the same audio signal may include a plurality of abrupt change positions, and different audio signals may include abrupt change positions.

It is readily understood that a set of mutation site time points refers to a collective of at least one mutation site time point. The set of mutation position time points does not refer specifically to a certain fixed time point set. For example, when the audio signal changes, the set of abrupt location time points may also change accordingly. For example, when the signal strength of the audio signal changes, the abrupt position time point set may also change accordingly.

According to some embodiments, the electronic device may perform signal mutation detection on the audio signal, and obtain a set of mutation location time points corresponding to the audio signal.

In step S13, according to the beat time point set and the mutation position time point set, the stuck point information corresponding to the audio signal is obtained.

According to some embodiments, the stuck point information refers to information for determining a switching tempo in the audio signal, which stuck point information does not particularly refer to certain fixed information. For example, when the beat time point set or the mutation position time point set is changed, the stuck point information may be changed accordingly.

In some embodiments, when the electronic device acquires the beat time point set and the mutation position time point set, the clip point information corresponding to the audio signal may be acquired according to the beat time point set and the mutation position time point set.

Fig. 2 is a flowchart illustrating a stuck point information acquisition method according to an exemplary embodiment, which may be used in a stuck point video generation scene, as shown in fig. 2, including the steps of:

In step S21, detecting the rhythm of the audio signal, and obtaining a beat time point set corresponding to the audio signal;

the specific process is as described above, and will not be described here again.

Fig. 3 is an exemplary schematic diagram illustrating a stuck point information acquisition method according to an exemplary embodiment, according to some embodiments. As shown in fig. 3, when the electronic device performs rhythm detection on the audio signal, the electronic device obtains a set of beat time points corresponding to the audio signal. The set of beat time points may include, for example, at least one beat time point and at least one re-beat time point.

In some embodiments, fig. 4 is an exemplary schematic diagram illustrating a stuck point information acquisition method according to an exemplary embodiment. As shown in fig. 4, the electronic device may perform rhythm detection on the audio signal through, for example, a beat or a re-beat detection sub-network, to obtain a beat time point set corresponding to the audio signal. For example, the electronic device may obtain probabilities that each frame of data in the audio signal is non-beat, and re-beat. The electronic device may obtain a Loss function Loss1 corresponding to the beat detection network and a Loss function Loss2 corresponding to the duplicate detection network, respectively. The electronic device may add the Loss functions of the two detection networks to obtain the target Loss function Loss. The electronic device may calculate a deviation between the output data and the annotation data according to the objective Loss function Loss. The electronic device can optimize the objective Loss function Loss by adjusting two model parameters, and finally, the probability of beat and re-beat of each frame is obtained simultaneously. The output probabilities are decoded by means of post-processing such as DBN, HMM or viterbi to obtain global information so as to obtain global final beat time sequence and re-beat down beat time sequence.

In step S22, signal mutation detection is performed on the audio signal, and a mutation position time point set corresponding to the audio signal is obtained;

Fig. 5 is an exemplary schematic diagram illustrating a stuck point information acquisition method according to an exemplary embodiment, according to some embodiments. As shown in fig. 5, when the electronic device performs signal mutation detection on the audio signal, a set of mutation position time points corresponding to the audio signal is obtained. The set of mutation position time points may comprise, for example, at least one mutation position time point.

According to some embodiments, when the electronic device performs signal mutation detection on the audio signal to obtain a mutation position time point set corresponding to the audio signal, a mutation detection network may be adopted to perform signal mutation detection on the audio signal to obtain a mutation strength corresponding to at least one audio time point in the audio signal; and adding at least one audio time point with the mutation intensity larger than the intensity threshold value in the at least one audio time point to a mutation position time point set corresponding to the audio signal. The accuracy of determining the mutation position time point can be improved by acquiring the mutation position time point set through mutation strength, and the accuracy of acquiring the stuck point information is improved, so that the accuracy of acquiring the stuck point video can be improved.

In step S23, acquiring a time point intersection of the beat time point set and the mutation position time point set, and taking at least one time point in the time point intersection as stuck point information corresponding to the audio signal;

Fig. 6 is an exemplary schematic diagram illustrating a stuck point information acquisition method according to an exemplary embodiment, according to some embodiments. As shown in fig. 6, when the electronic device obtains the beat time point set and the mutation position time point set, a time point intersection of the beat time point set and the mutation position time point set may be obtained, and at least one time point in the time point intersection is used as clip point information corresponding to the audio signal.

In step S24, receiving an adjustment instruction for the stuck point information corresponding to the audio signal when the interval duration of any two adjacent time points in the stuck point information corresponding to the audio signal is less than the duration threshold;

according to some embodiments, when the electronic device determines the stuck point information corresponding to the audio signal, the electronic device may, for example, present the stuck point information on a display screen. The electronic device may receive, for example, an adjustment instruction for the stuck point information. The adjustment instructions include, but are not limited to, click adjustment instructions, voice adjustment instructions, and the like.

In some embodiments, the duration threshold refers to a threshold for detecting whether an adjustment instruction is received. The time period threshold is not particularly limited to a fixed threshold. For example, when the electronic device receives a threshold modification instruction for the duration threshold, the electronic device may modify the duration threshold according to the threshold modification instruction.

Optionally, the electronic device may receive an adjustment instruction for the clip point information corresponding to the audio signal when an interval duration of any two adjacent time points in the clip point information corresponding to the audio signal is less than a duration threshold.

In step S25, according to the adjustment instruction, the stuck point information corresponding to the audio signal is adjusted, so as to obtain the stuck point information corresponding to the adjusted audio signal.

In some embodiments, when the electronic device receives the adjustment instruction, the electronic device may adjust the clip point information corresponding to the audio signal according to the adjustment instruction, so as to obtain the clip point information corresponding to the adjusted audio signal.

According to some embodiments, the electronic device may send the input audio signal S (t) to the tempo detection module beatdetection n, resulting in a time sequence of beats B (t) and/or beats D (t):

[B(t),D(t)]＝BeatDetection(S(t))

for any time t, if the time is a beat, B (t) =1; if the time is a double beat, D (t) =1; in other cases, B (t) =0, and d (t) =0.

According to some embodiments, the electronic device may send the input audio signal S (t) to the mutation detection module onsetdetection n, resulting in a mutation time sequence O (t):

O(t)＝OnsetDetection(S(t))

for any time t, if the time is abrupt, O (t) =o (0 < O <1, the greater the value, the stronger the degree of abrupt change); in other cases, O (t) =0.

According to some embodiments, the electronic device may compare B (t) with O (t) to obtain a temporal sequence of rhythms B with intensity of mutation _O (t)：

B _O (t)＝B(t)·O(t)

D (t) and O (t) are subjected to a re-beat time sequence D with mutation intensity _O (t)：

D _O (t)＝D(t)·O(t)

Alternatively, the electronic device may acquire the numbers corresponding to B (t) =1 and D (t) =1. According to B _O (t) and D _O Ordering of (t), the stuck point (transition) time may be selected according to a stuck point information selection condition. The stuck point information selection condition may be, for example, the number of video clips.

In some or related embodiments, a beat time point set corresponding to an audio signal is obtained by performing rhythm detection on the audio signal, signal mutation detection is performed on the audio signal, a mutation position time point set corresponding to the audio signal is obtained, a time point intersection of the beat time point set and the mutation position time point set is obtained, and at least one time point in the time point intersection is used as clip point information corresponding to the audio signal. Therefore, through rhythm detection and signal mutation detection on the audio signal, the situation that the stuck point information is determined only based on the beat time point can be reduced, so that the stuck point information is inaccurately determined, the accuracy of the acquisition of the stuck point information can be improved, and the accuracy of the generation of the stuck point video can be improved. In addition, under the condition that the interval duration of any two adjacent time points in the clip point information corresponding to the audio signal is smaller than the duration threshold value, an adjustment instruction for the clip point information corresponding to the audio signal is received, the clip point information corresponding to the audio signal is adjusted according to the adjustment instruction, the clip point information corresponding to the adjusted audio signal is obtained, the situation that the video clips cannot be completely played when the clip point transitions due to the fact that the interval duration of any two adjacent time points in the clip point information is too small is reduced, the accuracy of clip point video determination is improved, and the viewing experience of the clip point video is improved.

Fig. 7 is a flowchart illustrating a stuck point information acquisition method according to an exemplary embodiment, which may be used in a stuck point video generation scene, as shown in fig. 7, including the steps of:

in step S31, detecting the rhythm of the audio signal, and obtaining a beat time point set corresponding to the audio signal;

In step S32, signal mutation detection is performed on the audio signal, and a mutation position time point set corresponding to the audio signal is obtained;

in step S33, the number of video clips corresponding to the video set is obtained;

according to some embodiments, the video collection is configured to fuse with the audio signal to generate a stuck point video. The video collection refers to a collection of at least one video clip. The video set does not refer specifically to a fixed set. For example, when the video content corresponding to each video clip included in the video set changes, the video set may also change accordingly. For example, when the number of video clips included in a video set changes, the video set may also change accordingly. The video clip may be a continuous multi-frame image or may include only one frame image.

In some embodiments, the electronic device may identify the video set to obtain the number of video clips corresponding to the video set. Or, the electronic device may receive a number input instruction input for the number of video clips of the video set, and obtain the number of video clips corresponding to the video set according to the number input instruction.

The video set may be, for example, a set including only pictures. The number of video clips may be, for example, the number of pictures.

In step S34, a time point intersection of the beat time point set and the mutation position time point set is acquired;

In step S35, at least one time point corresponding to the number of video clips in the time point intersection is acquired, and at least one time point corresponding to the number of video clips is taken as clip point information corresponding to the audio signal.

In some embodiments, when the electronic device acquires the number of video clips corresponding to the video set, at least one time point corresponding to the number of video clips may be acquired in the time point intersection, and at least one time point corresponding to the number of video clips may be used as the clip point information corresponding to the audio signal.

Alternatively, the number of video clips corresponding to the video set acquired by the electronic device may be, for example, 5. The time point intersection of the beat time point set and the mutation position time point set acquired by the electronic device may include, for example, 25. The electronic device may acquire 4 time points out of the 25 time points, and use the 4 time points as the stuck point information corresponding to the audio signal.

In some embodiments, when the electronic device obtains at least one time point corresponding to the number of video clips in the time point intersection, when the number of time points included in the time point intersection is greater than the number of video clips, the electronic device may obtain at least one time point according to the mutation intensity corresponding to each time in the time point intersection, and may also obtain at least one time point according to the beat intensity of each time point in the time point intersection.

In some or related embodiments, by acquiring the number of video clips corresponding to the video set, acquiring a time point intersection of the beat time point set and the mutation position time point set, acquiring at least one time point corresponding to the number of video clips in the time point intersection, and taking the at least one time point corresponding to the number of video clips as the clip point information corresponding to the audio signal. Therefore, through detecting the rhythm of the audio signal and detecting the signal mutation, the situation that the stuck point information is determined only based on the beat time point can be reduced, so that the stuck point information is inaccurately determined, meanwhile, the situation that the number of video clips is not corresponding to the stuck point information can be reduced, the accuracy of acquiring the stuck point information can be improved, and the accuracy of generating the stuck point video can be improved.

Fig. 8 is a flowchart illustrating a stuck point information acquisition method according to an exemplary embodiment, which may be used in a stuck point video generation scene, as shown in fig. 8, including the steps of:

in step S41, detecting the rhythm of the audio signal, and obtaining a beat time point set corresponding to the audio signal;

In step S42, performing signal separation processing on the audio signal by using an audio separation network, and obtaining an audio signal after signal separation;

according to some embodiments, the audio separation network refers to a network for signal separation of audio signals. The audio separation network is not specific to a certain fixed network. The audio separation network may be, for example, a deep learning audio separation network. The electronic device may model-train the audio separation network before signal separation processing of the audio signals using the audio separation network. For example, the electronic device may obtain the original stem, drum, bass, etc. tracks in the music, and the final music corresponding to the mixed multiple tracks as training samples, and model-train the original audio separation network to obtain the target audio separation network. The target audio separation network refers to a network that can perform signal separation processing on an audio signal.

In some embodiments, the electronic device may perform signal separation processing on the audio signal by using an audio separation network, to obtain an audio signal after signal separation.

Alternatively, the audio signals after different signal separation correspond to different audio tracks.

According to some embodiments, the separation network model may perform signal separation processing on audio signals corresponding to different instruments, for example. The separation network model may also perform separation processing on, for example, a human voice audio signal and a musical instrument audio signal.

According to some embodiments, the electronic device performs signal separation processing on the audio signal by using the audio separation network, when the audio signal after signal separation is obtained, specifically may be obtained by obtaining an audio track corresponding to at least one musical instrument, and performing signal separation processing on the audio signal by using the audio separation network according to the audio track corresponding to at least one musical instrument, so as to obtain the audio signal after signal separation, thereby improving accuracy of audio signal separation, improving accuracy of acquisition of a time point set of abrupt position, and further improving accuracy of stuck point information corresponding to the audio signal.

Alternatively, the electronic device may obtain, for example, tracks corresponding to a human voice, a drum, a bass, a guitar, and tracks corresponding to different instruments. The electronic device may perform signal separation processing on the audio signal by using an audio separation network according to the audio track, so as to obtain an audio signal after signal separation.

In step S43, signal mutation detection is performed on the audio signal after signal separation, and a mutation position time point set corresponding to the audio signal is obtained;

according to some embodiments, when the electronic device performs signal separation processing on the audio signal by using the audio separation network to obtain the audio signal after signal separation, the electronic device may perform signal mutation detection on the audio signal after signal separation, and may obtain a mutation position time point set corresponding to the audio signal.

According to some embodiments, when the electronic device performs signal mutation detection on the audio signal after signal separation and obtains a mutation position time point set corresponding to the audio signal, one mutation position time point set may be obtained, and a plurality of mutation position time point sets may also be obtained.

According to some embodiments, when the electronic device performs signal mutation detection on the audio signal after signal separation to obtain a mutation position time point set corresponding to the audio signal, the signal mutation detection may be performed on the audio signal after signal separation to obtain at least one mutation position time point; taking the maximum mutation intensity of at least two mutation intensities as the mutation intensity of the mutation position time point under the condition that any mutation position time point of at least one mutation position time point corresponds to at least two mutation intensities; under the condition that the mutation intensity is larger than the intensity threshold value, any mutation position time point is added to a mutation position time point set corresponding to the audio signal, so that the situation that the mutation intensity of the mutation position time point cannot be determined when one mutation position time point corresponds to a plurality of mutation intensities can be reduced, the accuracy of acquisition of the mutation position time point set can be improved, and the accuracy of determining the stuck point information can be improved.

It is easy to understand that, because different audio signals after signal separation correspond to different audio tracks, the electronic device may perform signal mutation detection on at least one audio signal after signal separation, and may obtain a mutation position time point subset corresponding to any one of the at least one audio signal after signal separation, and a corresponding mutation strength of each mutation position time in the mutation position time point subset. When the same mutation position time point corresponds to a plurality of mutation intensities, the maximum mutation intensity in the plurality of mutation intensities can be used as the mutation intensity corresponding to the mutation position time point, and when the mutation intensity corresponding to the mutation position time point is larger than the intensity threshold value, the mutation position time point corresponding to the mutation intensity can be added to the mutation position time point set.

In step S44, according to the beat time point set and the mutation position time point set, the stuck point information corresponding to the audio signal is obtained.

According to some embodiments, when the electronic device obtains the beat time point set and the mutation position time point set, the stuck point information corresponding to the audio signal may be obtained according to the beat time point set and the mutation position time point set. For example, the electronic device may acquire a time point intersection of the beat time point set and the mutation position time point set, and use at least one time point in the time point intersection as the clip point information corresponding to the audio signal.

It is easy to understand that the electronic device may take all the time points in the time point intersection as the stuck point information corresponding to the audio signal.

Optionally, when at least one time point in the time point intersection is used as the clip point information corresponding to the audio signal, the electronic device may also acquire the number of video clips corresponding to the audio signal, acquire at least one time point in the time point intersection corresponding to the number of video clips, and use at least one time point corresponding to the number of video clips as the clip point information corresponding to the audio signal. The electronic equipment can acquire at least one time point corresponding to the number of the video clips corresponding to the video set, so that the fusion of the video clip set and the audio signal can be improved, and the accuracy of the point-stuck video generation can be improved.

For example, the electronic device may obtain at least one time point corresponding to the number of video clips in the time point intersection, and determine the mutation strength corresponding to at least one mutation position time point in the mutation position time point set. For example, the electronic device may acquire at least one time point corresponding to the number of video clips according to the sequence from high to low of the mutation intensity corresponding to the time point of the at least one mutation position, so as to improve accuracy of card information acquisition, and further improve accuracy of card point video acquisition.

According to some embodiments, the electronic device may obtain a set of abrupt location time points corresponding to the audio signals after the different signals are separated. That is, the electronic device can obtain at least one mutation location time point subset. When the electronic device obtains at least one mutation position time point subset and a beat time point set, the electronic device can obtain at least one stuck point information corresponding to the audio signal according to the at least one mutation position time point subset and the beat time point set.

Optionally, the electronic device may acquire a time point intersection of at least one mutation position time point subset and the beat time point set, and determine the stuck point information corresponding to the audio signal in the time point intersection. The electronic device may, for example, use all the time points in the time point intersection as clip point information of the audio signal, and may further determine at least one time point corresponding to the number of video clips in the time point intersection, and use the at least one time point as clip point information of the audio signal. When the electronic device determines at least one time point corresponding to the number of video clips in the time point intersection, for example, the electronic device may determine the mutation strength corresponding to each time point in the at least one mutation position time point subset.

It is to be readily understood that fig. 9 is an exemplary illustration of a method for acquiring stuck point information according to an exemplary embodiment, and as shown in fig. 9, an electronic device may acquire a set of beat time points corresponding to an audio signal using a beat separation network. The electronic device may employ an audio separation network to obtain at least one mutation location time point subset. The at least one mutation location time point subset acquired by the electronic device includes, for example, a mutation location time point subset corresponding to the drum, a mutation location time point subset corresponding to the human voice, and a mutation location time point subset corresponding to the guitar. The electronic device may obtain a first time point intersection corresponding to the mutation position time point subset and the beat time point set corresponding to the drum, a second time point intersection corresponding to the mutation position time point subset and the beat time point set corresponding to the voice, and a third time point intersection corresponding to the mutation position time point subset and the beat time point set corresponding to the guitar.

Optionally, the electronic device may determine, for example, first stuck point information corresponding to the audio signal according to the first time point intersection, where the first stuck point information is drum-based stuck point information, for example, second stuck point information corresponding to the audio signal may also be determined according to the second time point intersection, where the second stuck point information is voice-based stuck point information, for example, third stuck point information corresponding to the audio signal may also be determined according to the third time point intersection, where the third stuck point information is guitar-based stuck point information.

According to some embodiments, the electronic device may input an audio signal S (t) to an audio separation detection module to obtain an AudioSeparation (S (t)), and perform mutation detection on the AudioSeparation (S (t)), so as to obtain a mutation time sequence O _instrument (t)：

[Vocal(t),Base(t),Drum(t),…]＝AudioSeparation(S(t))

Wherein, voice (t) refers to a human voice time sequence; base (t) refers to a Bass time sequence; drum (t) refers to the drummer time series, and AudioSeparation (S (t)) refers to the separated audio signal.

B (t) and O _instrument (t) obtaining a beat time sequence with mutation intensity

D (t) and O _instrument (t) obtaining a re-beat time sequence with mutation intensity

/>

Wherein, the instrument is [ vocal, base, guitar, … ]. Wherein the musical instrument comprises at least one of a human voice, a bass, a guitar, and a drum.

Alternatively, the electronic device may acquire

And->

Corresponding number. According to

And->

The stuck point (transition) time may be selected according to a stuck point information selection condition.

In some or related embodiments, a beat time point set corresponding to an audio signal is obtained by detecting the rhythm of the audio signal, a signal separation process is performed on the audio signal by adopting an audio separation network, the audio signal after signal separation is obtained, a signal mutation detection is performed on the audio signal after signal separation, a mutation position time point set corresponding to the audio signal is obtained, and card point information corresponding to the audio signal is obtained according to the beat time point set and the mutation position time point set. Therefore, through detecting the rhythm of the audio signal and detecting the signal mutation, the situation that the stuck point information is determined only based on the beat time point can be reduced, so that the inaccurate determination of the stuck point information is caused, the accuracy of the acquisition of the stuck point information can be improved, the accuracy of the generation of the stuck point video can be improved, meanwhile, the audio signal can be separated before the detection of the signal mutation, the determined dimension of the mutation intensity rhythm time sequence is enriched, and the diversity of the transition time point sequence in the production of the stuck point video can be improved.

Fig. 10 is a block diagram illustrating a stuck point information acquisition apparatus according to an example embodiment. Referring to fig. 10, the apparatus 900 includes a set acquisition unit 901 and an information acquisition unit 902.

A set acquisition unit 901 configured to perform tempo detection on an audio signal, and acquire a beat time point set corresponding to the audio signal, wherein the beat time point set includes at least one of a beat time point subset and a re-beat time point subset;

the set acquisition unit 901 is further configured to perform signal mutation detection on the audio signal, and acquire a mutation position time point set corresponding to the audio signal;

an information acquiring unit 902 configured to perform acquisition of stuck point information corresponding to the audio signal according to the beat time point set and the mutation position time point set.

The information obtaining unit 902 is configured to perform, when obtaining the stuck point information corresponding to the audio signal according to the beat time point set and the mutation position time point set, specifically configured to perform:

and acquiring a time point intersection of the beat time point set and the mutation position time point set, and taking at least one time point in the time point intersection as stuck point information corresponding to the audio signal.

Fig. 11 is a block diagram of a stuck point information acquisition apparatus, according to some embodiments, shown in accordance with an example embodiment.

Referring to FIG. 11, information fetch unit 902 includes an instruction fetch subunit 912 and an information fetch subunit 922:

an instruction obtaining subunit 912, configured to perform, when the interval duration of any two adjacent time points in the stuck point information corresponding to the audio signal is less than the duration threshold, receiving an adjustment instruction for the stuck point information corresponding to the audio signal;

the information obtaining subunit 922 is configured to perform adjustment on the stuck point information corresponding to the audio signal according to the adjustment instruction, so as to obtain the stuck point information corresponding to the adjusted audio signal.

Fig. 12 is a block diagram of a stuck point information acquisition apparatus, according to some embodiments, shown in accordance with an example embodiment. Referring to fig. 12, the information acquisition unit 902 includes a number acquisition subunit 932, an intersection acquisition subunit 942, and an information acquisition subunit 922, the information acquisition unit 902 being configured to perform acquisition of a time point intersection of a beat time point set and a mutation position time point set, and when at least one time point in the time point intersection is taken as clip point information corresponding to an audio signal, being specifically configured to perform:

a number acquisition subunit 932 configured to perform acquiring a number of video segments corresponding to a video set, where the video set is configured to perform fusion with an audio signal to generate a stuck point video;

An intersection acquisition subunit 942 configured to perform acquisition of a time point intersection of the beat time point set and the mutation position time point set;

the information acquisition subunit 922 is configured to perform acquisition of at least one time point corresponding to the number of video clips in the time point intersection, and take the at least one time point corresponding to the number of video clips as clip point information corresponding to the audio signal.

According to some embodiments, the information acquisition subunit 922 is configured, when executing at least one time point in the acquisition time point intersection corresponding to the number of video clips, to execute:

Fig. 13 is a block diagram of a stuck point information acquisition apparatus, according to some embodiments, shown in accordance with an example embodiment. Referring to fig. 13, the set acquisition unit 901 includes an intensity acquisition subunit 911 and a set acquisition subunit 921, wherein the set acquisition unit 901 is configured to perform signal mutation detection on an audio signal, acquires a set of mutation position points in time corresponding to the audio signal, and includes:

An intensity obtaining subunit 911 configured to perform signal mutation detection on the audio signal using a mutation detection network, to obtain a mutation intensity corresponding to at least one audio time point in the audio signal;

the set obtaining subunit 921 is configured to perform adding at least one audio time point, of which the mutation intensity is greater than the intensity threshold, to the mutation position time point set corresponding to the audio signal.

According to some embodiments, the set obtaining unit 901 is configured to perform signal mutation detection on an audio signal, and when obtaining a mutation position time point set corresponding to the audio signal, is specifically configured to perform:

According to some embodiments, the set obtaining unit 901 is configured to perform signal mutation detection on the audio signal after signal separation, and when obtaining a mutation position time point set corresponding to the audio signal, is specifically configured to perform:

Performing signal mutation detection on the audio signal subjected to signal separation to obtain at least one mutation position time point;

under the condition that any mutation position time point in the at least one mutation position time point corresponds to at least two mutation intensities, taking the maximum mutation intensity in the at least two mutation intensities as the mutation intensity of any mutation position time point;

and adding any mutation position time point to the mutation position time point set corresponding to the audio signal under the condition that the mutation intensity is larger than the intensity threshold value.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

In summary, the apparatus provided in the embodiments of the present disclosure is configured to perform rhythm detection on an audio signal by using a set obtaining unit to obtain a beat time point set corresponding to the audio signal, where the beat time point set includes at least one of a beat time point subset and a re-beat time point subset; the set acquisition unit is further configured to perform signal mutation detection on the audio signal, and acquire a mutation position time point set corresponding to the audio signal; the information acquisition unit is configured to perform acquisition of stuck point information corresponding to the audio signal according to the beat time point set and the mutation position time point set. Therefore, through rhythm detection and signal mutation detection on the audio signal, the situation that the stuck point information is determined only based on the beat time point can be reduced, so that the stuck point information is inaccurately determined, the accuracy of the acquisition of the stuck point information can be improved, and the accuracy of the generation of the stuck point video can be improved.

Fig. 14 shows a schematic block diagram of an example electronic device 1300 that can be used to implement embodiments of the present disclosure. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 14, the electronic device 1300 includes a computing unit 1301 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1302 or a computer program loaded from a storage unit 1308 into a Random Access Memory (RAM) 1303. In the RAM 1303, various programs and data required for the operation of the electronic device 1300 can also be stored. The computing unit 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304.

Various components in electronic device 1300 are connected to I/O interface 1305, including: an input unit 1306 such as a keyboard, a mouse, or the like; an output unit 1307 such as various types of displays, speakers, and the like; storage unit 1308, such as a magnetic disk, optical disk, etc.; and a communication unit 1309 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1309 allows the electronic device 1300 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 1301 performs the respective methods and processes described above, such as the stuck point information acquisition method. For example, in some embodiments, the leaf spring stiffness value determination method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1308. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1300 via the ROM 1302 and/or the communication unit 1309. When the computer program is loaded into the RAM 1303 and executed by the computing unit 1301, one or more steps of the stuck point information acquisition method described above may be performed. Alternatively, in other embodiments, the computing unit 1301 may be configured to perform the stuck point information acquisition method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A stuck point information acquisition method, comprising:

2. The method according to claim 1, wherein the obtaining, according to the beat time point set and the mutation position time point set, stuck point information corresponding to the audio signal includes:

3. The method according to claim 2, wherein the method further comprises:

4. The method according to claim 2, wherein the acquiring the time point intersection of the beat time point set and the mutation position time point set, taking at least one time point in the time point intersection as the stuck point information corresponding to the audio signal, includes:

5. The method of claim 4, wherein the obtaining at least one point in time in the point in time intersection corresponding to the number of video clips comprises:

6. The method of claim 1, wherein the performing signal mutation detection on the audio signal to obtain the set of mutation location time points corresponding to the audio signal includes:

7. The method of claim 1, wherein the performing signal mutation detection on the audio signal to obtain the set of mutation location time points corresponding to the audio signal includes:

8. The method of claim 7, wherein the performing signal mutation detection on the audio signal after signal separation to obtain the set of mutation location time points corresponding to the audio signal includes:

9. A stuck point information acquisition apparatus, comprising:

10. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the stuck point information acquisition method of any one of claims 1 to 8.

11. A storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the stuck point information acquisition method of any one of claims 1 to 8.

12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-8.