CN111639226A

CN111639226A - Lyric display method, device and equipment

Info

Publication number: CN111639226A
Application number: CN202010403687.9A
Authority: CN
Inventors: 闫震海; 曹硕; 杜承才
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2020-05-13
Filing date: 2020-05-13
Publication date: 2020-09-08

Abstract

The application relates to the technical field of audio frequency, and discloses a lyric display method, which comprises the following steps: acquiring a target song segment and song information of the target song segment, wherein the song information of the target song segment comprises lyrics contained in the target song segment, target time information of the lyrics, chords contained in the target song segment and target time information of the chords; determining the corresponding relation between the lyrics and the chord according to the target time information of the lyrics and the target time information of the chord; determining a display parameter corresponding to the lyrics according to the corresponding relation between the lyrics and the chord and a display parameter preset for the chord; and when the target song segment is played, displaying the lyrics contained in the target song segment according to the display parameters corresponding to the lyrics. By implementing the method and the device, different display effects are realized for the lyrics corresponding to different chords in the song segment, chord visualization is realized, and lyric display modes are enriched.

Description

Lyric display method, device and equipment

Technical Field

The application relates to the technical field of audio, in particular to a lyric display method, device and equipment.

Background

Along with the increasing demand of people on entertainment and leisure modes and the continuous development of computer technology, multimedia data products such as audios and videos are more and more abundant, music software providing services such as music playing and music recomposing is one of the multimedia data products, and in order to provide more optimized services, lyric display modes are continuously changed and improved.

In the prior art, lyric display generally includes filling and displaying lyrics corresponding to a playing progress by using a color filling method according to the playing progress of an audio file, the prior art only simply displays all the lyrics of a song to be played, the color filling method is used for distinguishing the played lyrics from the unplayed lyrics, and the prior lyric display scheme is single.

Disclosure of Invention

Based on the above problems, embodiments of the present application provide a lyric display method, which may display lyrics contained in a target song segment according to corresponding display parameters by establishing a mapping relationship between the display parameters and a chord and determining the display parameters corresponding to the lyrics based on a corresponding relationship between the lyrics and the chord, thereby implementing a chord visualization effect and enriching a lyric display manner.

In one aspect, an embodiment of the present application provides a lyric display method, where the method includes:

acquiring a target song segment and song information of the target song segment, wherein the song information comprises lyrics contained in the target song segment, target time information of the lyrics, chords contained in the target song segment and target time information of the chords;

determining the corresponding relation between the lyrics and the chord according to the target time information of the lyrics and the target time information of the chord;

determining a display parameter corresponding to the lyrics according to the corresponding relation between the lyrics and the chord and a display parameter preset for the chord;

and when the target song segment is played, displaying the lyrics contained in the target song segment according to the display parameters corresponding to the lyrics.

In one possible embodiment, the obtaining the target song segment and the song information of the target song segment includes:

acquiring an original song segment, a multiple speed factor corresponding to the original song segment and song information of the original song segment, wherein the song information comprises lyrics contained in the original song segment, original time information of the lyrics, chords contained in the original song segment and original time information of the chords;

according to the multiple speed factor, carrying out variable speed and invariable tone processing on the original song segment to obtain the target song segment;

taking the lyrics contained in the original song segment as the lyrics contained in the target song segment, and processing the original time information of the lyrics according to the multiple speed factor to obtain the target time information of the lyrics; taking the chord contained in the original song segment as the chord contained in the target song segment, and processing the original time information of the chord according to the multiple speed factor to obtain the target time information of the chord; and the lyrics contained in the target song segment, the target time information of the lyrics, the chords contained in the target song segment and the target time information of the chords are used for combining to obtain the song information of the target song segment.

In a possible implementation manner, the obtaining the target song segment by performing a variable-speed and non-tonal processing on the original song segment according to the variable-speed factor includes:

framing the original song segment to obtain a plurality of first audio frames, and obtaining time information of each first audio frame;

determining a second audio frame corresponding to each first audio frame in the original song segment according to the time information of each first audio frame and the speed multiplying factor;

searching an audio frame which is most similar to the waveform of the corresponding first audio frame in a preset neighborhood range of each second audio frame in the original song segment, and taking the audio frame as an output audio frame of each first audio frame;

combining the output audio frames of the first audio frames to obtain the target song segment;

in another possible embodiment, the obtaining the target song segment includes:

acquiring an original song segment, wherein the original song segment comprises a first left channel signal and a first right channel signal;

extracting human voice information and background voice information according to the first left channel signal and the first right channel signal;

performing energy enhancement on the voice information to obtain enhanced voice information, and performing energy suppression on the background voice information to obtain suppressed background voice information;

synthesizing a second left channel signal and a second right channel signal using the enhanced vocal information and the suppressed background vocal information;

and obtaining the target song segment according to the second left channel signal and the second right channel signal.

In a possible implementation manner, the energy-enhancing the human voice information to obtain enhanced human voice information and the energy-suppressing the background voice information to obtain suppressed background voice information includes:

enhancing the amplitude of the voice information by a first preset quantity value to obtain enhanced voice information;

and inhibiting the amplitude of the background sound information by a second preset quantity value to obtain the inhibited background sound information.

In one possible implementation manner, obtaining the chord included in the target song segment and the target time information of the chord includes:

framing the target song segment to obtain a plurality of third audio frames, and performing Fourier transform on the plurality of third audio frames to obtain all frequency domain information corresponding to each third audio frame, wherein the frequency domain information comprises frequency components and corresponding energy values;

mapping all frequency components corresponding to each third audio frame to 12 frequency bands in the sound level contour characteristics in a logarithmic mode respectively, so as to determine the frequency band to which each frequency component in each third audio frame belongs respectively;

adding energy values corresponding to all frequency components in the same frequency band according to the frequency band to which each frequency component in each third audio frame belongs respectively to obtain a 12-dimensional vector of each third audio frame respectively;

and transmitting the 12-dimensional vectors of the third audio frames to a chord identification model to obtain the chords contained in the target song segments and the target time information of the chords.

Further, the frequency domain information further comprises at least two sampling points;

adding the energy values corresponding to all the frequency components in the same frequency band according to the frequency band to which each frequency component in each third audio frame belongs, and obtaining the 12-dimensional vector of each third audio frame respectively includes:

determining a start sequence of the output song audio segment according to the energy difference value of every two adjacent sampling points in the frequency domain information corresponding to each third audio frame;

performing autocorrelation operation on the start sequence to obtain an autocorrelation sequence;

weighting the autocorrelation sequence through a logarithmic Gaussian distribution function, and taking a corresponding time value when the weighted autocorrelation sequence takes a maximum value as a beat length; the beat length time comprises at least one third audio frame;

and dividing frequency bands of all frequency components in all third audio frames in each beat length time, adding energy values corresponding to all frequency components of the same frequency band, and averaging to obtain the energy values which are used as the 12-dimensional vectors of all third audio frames included in each beat length time.

In one possible implementation manner, acquiring lyrics included in a target song and target time information of the lyrics includes:

acquiring an original song segment and song information of the original song segment, wherein the song information comprises original lyrics contained in the original song segment and original time information of the original lyrics;

determining lyrics to be repeated in the original lyrics and the repetition times of the lyrics to be repeated;

generating target lyrics according to the lyrics to be repeated and the times to be repeated, and replacing the lyrics to be repeated in the original lyrics with the target lyrics to obtain lyrics contained in the target song segment;

and obtaining target time information contained in the target song segment according to the original time information of the original lyrics and the repetition times of the lyrics to be repeated.

In one aspect, an embodiment of the present application further provides a lyric display apparatus, where the lyric display apparatus includes:

the acquisition module is used for acquiring a target song segment and song information of the target song segment, wherein the song information comprises lyrics contained in the target song segment, target time information of the lyrics, chords contained in the target song segment and target time information of the chords;

the determining module is used for determining the corresponding relation between the lyrics and the chord according to the target time information of the lyrics and the target time information of the chord;

the determining module is further configured to determine a display parameter corresponding to the lyric according to the correspondence between the lyric and the chord and a display parameter preset for the chord;

and the display module is used for displaying the lyrics contained in the target song segment according to the display parameters corresponding to the lyrics when the target song segment is played.

In one aspect, an embodiment of the present application further provides a lyric display apparatus, where the apparatus includes a processor and a memory, where the processor is configured to execute a computer program stored in the memory, so as to implement any one of the above possible embodiments.

In one aspect, the present invention also provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.

On one hand, the embodiment of the application also provides a user terminal, and the user terminal comprises the lyric display device, so that the user terminal can realize any one of the possible embodiments.

In the application, a lyric display device obtains a target song segment and song information of the target song segment, wherein the song information comprises lyrics contained in the target song segment, target time information of the lyrics, chords contained in the target song segment and target time information of the chords; determining the corresponding relation between the lyrics and the chord according to the target time information of the lyrics and the target time information of the chord; determining a display parameter corresponding to the lyrics according to the corresponding relation between the lyrics and the chord and a display parameter preset for the chord; and when the target song segment is played, displaying the lyrics contained in the target song segment according to the display parameters corresponding to the lyrics. By establishing the mapping relation between the display parameters and the chords and determining the display parameters corresponding to the lyrics based on the corresponding relation between the lyrics and the chords, the lyrics contained in the target song segment can be displayed according to the corresponding display parameters, the chord visualization effect is realized, and the lyric display mode is enriched.

Drawings

Fig. 1 is a schematic flowchart of a lyric display method according to an embodiment of the present application;

FIG. 2 is a schematic illustration of a guitar music provided in an embodiment of the present application;

FIG. 3 is a graphical user interface of a lyrics display provided by an embodiment of the present application;

fig. 4 is a schematic flowchart of a method for generating song information of a target song segment according to an embodiment of the present application;

FIG. 5 is a graphical user interface for generating song information of a target song segment according to an embodiment of the present application;

FIG. 6 is a graphical user interface for song information generation for another target song segment provided by an embodiment of the present application;

fig. 7 is a block diagram of a lyric display apparatus according to an embodiment of the present application;

fig. 8 is a block diagram of a structure of a lyric display apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The following describes embodiments of the present application in further detail with reference to the accompanying drawings.

Referring to fig. 1 to fig. 2, and first to fig. 1, fig. 1 is a schematic flow chart of a lyric display method according to an embodiment of the present application. As shown in fig. 1, the specific implementation steps of the embodiment of the present application are as follows:

s100, the song lyric display device obtains a target song segment and song information of the target song segment, wherein the song information comprises the song lyrics contained in the target song segment, target time information of the song lyrics, chords contained in the target song segment and target time information of the chords.

Specifically, the target song segment may be a complete song or a partial segment cut from a song. Songs have lyrics and melodies, while melodies are simply understood to be sequences of time-combined notes of different sound frequencies, each note having a respective start time, duration and end time, by which the accompaniment we hear at different times is formed. Some of the notes appear simultaneously, so that the concept of chord appears, wherein chord means that three or more than three notes are played at the same time, and each chord also has respective starting time, duration and ending time. It will be appreciated that the lyrics are also time-specific, and that the time at which each lyric occurs, how long it continues to occur, and when it ends up occurring is preset by the author at the time of the song composition fill-in.

The following describes exemplary chords and lyrics with reference to the drawings. Referring to fig. 2, fig. 2 is a schematic diagram of a guitar music provided by the embodiment of the present application. Fig. 2 is a schematic diagram of a guitar spectrum according to an embodiment of the present application, as shown in fig. 2. As shown in fig. 2, a star song segment is exemplarily presented, in the song segment "one flash and one flash crystal", the song information includes at least one lyric, each lyric includes at least one lyric word, which may be one lyric word, or a word composed of two lyric words, or certainly may be one sentence, and the present application does not limit the concrete representation form of the lyric. Taking "one flash and one flash", "bright crystal" and "crystal" as an example of a lyric, respectively, "the start time of the one flash and one flash" is 0s, the end time is 4s, and the duration is 4 s; the first starting time of "bright crystal" is 4s, the first ending time is 6s, and the first duration is 2 s; the "crystal" start time is 6s, the end time is 7s, and the duration is 1 s. The at least one chord is 'A chord and D chord', the first starting time corresponding to the A chord is 0s, the first ending time is 4s, and the first duration is 4 s; the second start time corresponding to the A chord is the 6 th s, the second end time is the 7 th s, and the second duration is 1 s; the first start time for the D chord is 4s, the first end time is 6s, and the first duration is 2 s.

S101, the lyric display device determines the corresponding relation between the lyrics and the chord according to the target time information of the lyrics and the target time information of the chord.

Specifically, step S100 shows that the lyrics have a time characteristic, and the chord also has a time characteristic, so that the lyrics and the chord can be associated with each other by time. In a possible implementation manner, the time information of each lyric is preset and then placed in a song library, and in this embodiment, the corresponding lyric is found in the song library according to the time information, and taking the target song segment in fig. 2 as an example, the lyric is found to be "flash by flash" according to the time information from 0s to 4 s; according to the time information from the 4 th s to the 6 th s, the lyric is searched to be 'bright crystal'; based on the time information of the 6 th s to the 7 th s, it is found that the lyric is "crystal" or the like. And the time information corresponding to the chord may be represented as a text information having a link relation with the lyrics contained in the target song fragment. For example, the time information corresponding to the chord may be represented as shown in table 1:

table 1

The lyric display device matches the time information corresponding to the lyrics with the time information corresponding to the chords, if the matching is successful, the corresponding relationship between the lyrics and the chords is determined, and optionally, the chords can be marked by using chord identifiers, for example, the lyric display device searches for the lyrics 'flash' of the target song segment in the song library by using the time information from 0s to 4s as indexes, optionally, the lyric display device can confirm the target song by detecting the lyrics in a playing state. The lyric display device also takes a target song 'asteroid' as an index, finds the text information of the chord of the 'asteroid' in the text information corresponding to the chords of a plurality of songs, as shown in table 1, then takes the time information from 0s to 4s as the index, finds the chord identifier '1' in the table 1, namely determines the chord identifier corresponding to the lyric 'flash one flash' as '1', and determines the corresponding relation between the lyric 'flash one flash' and the A chord. Similarly, the lyric display device determines that the chord mark corresponding to the 'bright crystal' of the lyric is '2'; and determining the chord mark corresponding to the crystal as 1 and determining the corresponding relation between the lyric crystal and the D chord. That is, the corresponding lyrics may be found through the time information of the chord, and the corresponding chord may also be found through the time information of the lyrics.

It can be understood that the time information of the lyrics is a characteristic that is formed when the song is created, so that when the song is stored in a digital format, the time information of the lyrics can be stored accordingly.

As for the time information corresponding to the chord, in one possible embodiment, the time information corresponding to the chord of each song may be made into text information as shown in table 1, so that the lyric display apparatus may implement step S101.

In another possible embodiment, a computer may be used to perform chord identification on the song information, for example, before the lyric display apparatus performs step S101, the lyric display apparatus frames the target song segment to obtain a plurality of third audio frames, performs fourier transform on the plurality of third audio frames to obtain all frequency domain information corresponding to each third audio frame, where the frequency domain information includes frequency components and corresponding energy values, and maps all frequency components corresponding to each third audio frame to 12 frequency bands in the scale contour feature in a logarithmic manner, so as to determine the frequency band to which each frequency component in each third audio frame belongs.

Specifically, music theory divides the average of the octaves into twelve equal parts, each equal part represents the semitone scale, and the twelve semitones are represented as { C, C #, D #, E, F #, G #, a #, B }. The relationship between all the frequency components corresponding to each third audio frame and the sound level contour feature p (k) is represented as follows:

p(k)＝12[log₂(f_s/N×k/f_c)]mod12 equation 1

Wherein f is_sIs the sampling rate;n is the number of all sampling points after Fourier transform; k is the serial number of the sampling point after Fourier transform; f. of_cIs the pitch frequency used in the scale in the twelve-tone equal temperament, 261/2; mod is the remainder of the remainder after division by 12, mod 12.

And the lyric display device adds energy values corresponding to all frequency components in the same frequency band according to the frequency band to which each frequency component in each third audio frequency frame belongs respectively to obtain 12-dimensional vectors of each third audio frequency frame respectively. Specifically, the 12-dimensional vector is represented as:

X_PCP(i)＝∑_i:p(k)＝i|X_FFT(k)|² equation 2

Wherein i represents an index of a 12-dimensional vector, p (k) is a scale profile characteristic obtained by equation 1, X_FFT(k) Representing a fourier transform of the frame data.

And the lyric display device transmits the 12-dimensional vector of each third audio frame to a chord identification model so as to obtain the chord contained in the target song segment and the target time information of the chord. Specifically, the chord recognition module may be a trained Hidden Markov Model (HMM), and the 12-dimensional vector is transmitted to the trained HMM Model to obtain chord information of each third audio frame, where the chord information includes a chord and a mark of the chord, such as an a chord and a chord mark "1". For example, the step of training the HMM model by the lyric display apparatus may be to define the chord as 49 types of states according to the different tone levels of the chord, such as C tone, D tone, E tone, F tone, G tone, a tone, and bB tone, as shown in table 2:

table 2

The lyric display device transmits a label file, a state of 49 chord definitions and a training song to an HMM model, wherein the label file comprises a chord type corresponding to the training song, the HMM model annotates the chord of the training song by using the state of 49 chord definitions and compares the annotated result with the result in the label file, and the lyric display device adjusts the training parameters of the HMM model according to the compared result, so that the trained HMM model is obtained.

Further, the chord is often changed at the beat, and in order to avoid that two or more chords are included in the divided audio data segment, thereby affecting the accuracy, the embodiment may also track the beat of the song. Optionally, the frequency domain information further includes at least two sampling points; the lyric display device determines a starting sequence of the output song audio band according to the energy difference value of every two adjacent sampling points in the frequency domain information respectively corresponding to each third audio frame; and the lyric display device performs autocorrelation operation on the starting sequence to obtain an autocorrelation sequence. Specifically, the autocorrelation operation is a product of the cause sequence x (t) and the time shift signal x (t- τ) of the cause sequence, and is expressed by the following formula:

f(τ)＝∑_tx (t) x (t- τ) equation 3

The lyrics display device weights the autocorrelation sequence by a logarithmic Gaussian distribution function:

wherein, tau₀Is the central value, σ, of the speed period offset_τTo control the width of the weighting curve, τ₀And σ_τAre all preset values, and tau is the offset speed and also is a preset value.

The lyric display device takes a corresponding time value when the weighted autocorrelation sequence f (tau) takes a maximum value as a beat length; for example, if the time value at which f (τ) takes the maximum value is 5s, the beat length is 5 s. The beat length time comprises at least one third audio frame; the lyric display device divides frequency bands of all frequency components in all third audio frames in each beat length time, adds energy values corresponding to all frequency components in the same frequency band, and takes the energy values as the 12-dimensional vectors of all third audio frames included in each beat length time after averaging. It should be noted that the present embodiment is only an exemplary illustration of how the time information corresponding to the chord is determined, and is not limited thereto.

And S102, the lyric display device determines the display parameters corresponding to the lyrics according to the corresponding relation between the lyrics and the chord and the display parameters preset for the chord.

Specifically, the display parameter preset for the chord and the chord may be represented as a display parameter mapping table, which is preset. The display parameter may be a font color, a font brightness and/or a font weight, and for example, the display parameter mapping table may be as shown in table 3:

table 3

The lyric display apparatus determines the correspondence between the lyrics and the chord through step S101, for example, determines that "one flash and one flash" corresponds to the chord a, the lyric display apparatus searches the display parameters of the chord a in table 3 as font color "red" and font brightness "120%", and the font brightness 120% can be understood as being multiplied by 1.2 times in the case where the lyric display apparatus normally displays brightness.

In one possible implementation, the font brightness is related to the playing speed of the target song segment, e.g., the faster the playing speed of the target song segment, the greater the font brightness.

S103, when the target song segment is played, the lyric display device displays the lyrics contained in the target song segment according to the display parameters corresponding to the lyrics.

In this embodiment, a lyric display device obtains a target song segment and song information of the target song segment, where the song information includes lyrics contained in the target song segment, target time information of the lyrics, chords contained in the target song segment, and target time information of the chords; determining the corresponding relation between the lyrics and the chord according to the target time information of the lyrics and the target time information of the chord; determining a display parameter corresponding to the lyrics according to the corresponding relation between the lyrics and the chord and a display parameter preset for the chord; and when the target song segment is played, displaying the lyrics contained in the target song segment according to the display parameters corresponding to the lyrics. By establishing the mapping relation between the display parameters and the chords and determining the display parameters corresponding to the lyrics based on the corresponding relation between the lyrics and the chords, the lyrics contained in the target song segment can be displayed according to the corresponding display parameters, the chord visualization effect is realized, and the lyric display mode is enriched.

In one possible embodiment, the lyric display device acquires an original song segment and song information of the original song segment, wherein the song information comprises original lyrics contained in the original song segment and original time information of the original lyrics; determining lyrics to be repeated in the original lyrics and the repetition times of the lyrics to be repeated; generating target lyrics according to the lyrics to be repeated and the repetition times, and replacing the lyrics to be repeated in the original lyrics with the target lyrics to obtain lyrics contained in the target song segment; and obtaining target time information of the lyrics contained in the target song segment according to the original time information of the original lyrics and the repetition times of the lyrics to be repeated.

Specifically, the lyric display device determines the lyrics to be repeated in the original lyrics and the repetition times of the lyrics to be repeated, and if the lyrics to be repeated are bright and the repetition times are 3, the target lyrics are generated to be bright and bright. Optionally, the original time information of the original lyrics is the same as the target time information of the lyrics contained in the target song segment, the time information of the song segment "flashing and flashing crystal grains" is not changed, and the lyrics to be repeated are displayed to be bright and bright within the time of bright display of the lyrics to be repeated, so that the effect that the lyrics to be repeated are displayed faster is realized. Optionally, the target time information of the lyrics contained in the target song segment is greater than the original time information of the original lyrics, for example, the "flashing one flashing" time information before "flashing" in the "flashing one flashing crystal" of the song segment is unchanged, and the "flashing one flashing" time information of the lyrics to be repeated is 3 times of the original time, that is, the "crystal" time information after "flashing" is adjusted backwards. The method and the device do not limit how to adjust the time information of the lyrics contained in the song segment after the lyrics are repeatedly displayed. By implementing the embodiment, a certain lyric contained in the song information of the song fragment can be processed, and the function of changing the lyric can be added.

The application scenarios of the lyric display method and some image user interfaces implemented after lyric display are described in detail below with reference to the accompanying drawings. Referring to fig. 3, fig. 3 is a graphical user interface of a lyrics display provided by an embodiment of the present application. As shown in fig. 3, fig. 3 includes 3A, 3B, 3C, and 3D, respectively, where 3A may be a graphical user interface for obtaining a target song, and by inputting a song name of the target song, a list of all songs conforming to the input song name is obtained, and exemplarily, "stars" is input, a list of songs including 3A, 3B, and 3C is obtained, and by receiving a click of a user, a song 3B clicked by the user is played, and the graphical user interface shown in fig. 3B is entered. The lyric display device executes step S300 by detecting a click selection of the user to obtain a target lyric fragment "flash crystal" and song information, where the song information includes time information corresponding to each lyric, for example, referring to the embodiment described above with reference to fig. 2, the time information corresponding to "flash" is 0S to 4S, the time information corresponding to "crystal" is 4S to 6S, and the time information corresponding to "crystal" is 6S to 7S. The lyric display device executes step S301, and determines a chord corresponding to each lyric according to the target time information of each lyric and the target time information of the chord, for example, determines that "one flash and one flash" corresponds to a chord, the chord is identified as "1", determines that "crystal on" of lyrics corresponds to D chord, and the chord is identified as "2"; and determining that the lyric 'crystal' corresponds to the chord A, wherein the chord is marked as '1'. The lyric display apparatus executes step S302 to determine the display parameter corresponding to each lyric, illustratively, the lyric display apparatus searches a preset display parameter mapping table for a display parameter of a chord identifier "1" as "font tilt", a display parameter of a chord identifier "2" as "font magnification twice as original, optionally, the display parameter corresponding to each chord identifier may further include a plurality of display parameters, for example, the display parameter of the chord identifier" 1 "may include" font tilt "and/or" font underline ", that is, it may be understood that the display parameter corresponding to the chord identifier may have a plurality of display effects, and since different chords are played, different auditory effects may be formulated according to different auditory effects, so as to achieve an experience of consistent audio and visual. The switching between the various effects can be that each display parameter is circularly and sequentially displayed, for example, the display parameters corresponding to the chord A include ' font tilt ' and ' font underline ', see the lyric display effect diagram of 3C, the lyric ' one flash and one flash ' and ' crystal ' correspond to the chord A, the display parameter of the lyric ' one flash and one flash ' is ' font tilt ', and the display parameter of the crystal ' is ' font underline '; optionally, the display of each effect may also be a preset display rule, for example, each display parameter is superimposed, for example, the lyrics "flash and flash" and "crystal" correspond to a chord a, and the display effects of "flash and flash" and "crystal" are expressed as "font tilt" and "font underline". The relation between each chord and the number of the display parameters and the specific display effect are not limited by the application. Optionally, if the lyric display device performs repeated display on a certain lyric contained in the song information, for example, the repeated display is performed by "bright", an effect graph of the lyric display may be as shown in 3D.

On the basis of the embodiments described above with reference to fig. 1 to 3, the present application further provides a method for obtaining song information of a target song segment, and referring to fig. 4, fig. 4 is a flowchart illustrating a method for generating song information of a target song segment according to an embodiment of the present application. As shown in fig. 4, the present embodiment specifically executes the following steps:

s400, the lyric display device obtains an original song segment, a multiple speed factor corresponding to the original song segment and song information of the original song segment, wherein the song information comprises lyrics contained in the original song segment, original time information of the lyrics, chords contained in the original song segment and original time information of the chords.

Specifically, the lyric display apparatus may obtain the original song fragment and song information of the original song fragment from a song library.

S401, the lyric display device carries out variable speed and non-tone-changing processing on the original song segment according to the multiple speed factor to obtain the target song segment.

Specifically, the multiple speed factor is a parameter for adjusting the speed of the original song segment and the song information of the original song segment, for example, the multiple speed factor is 2, which represents that the playing of the original song segment is accelerated by two times on the basis of the original time, and the lyrics contained in the original song segment are accelerated by two times on the basis of the original time for displaying; for another example, the multiple speed factor is 0.8, which means that the original song segment is played slowly to 0.8 times on the basis of the original time, and the lyrics contained in the original song segment are displayed slowly to 0.8 times on the basis of the original time. For example, if the original time information of the original song segment is played at 1 st s and the multiple speed factor is 0.8, the target time of the target song segment may be determined to be 0.8 th s according to the original time 1 st s and the multiple speed factor 0.8 of the original song segment. In a possible implementation manner, the lyric display device frames the original song segment to obtain a plurality of first audio frames, and obtains time information of each first audio frame; and according to the time information corresponding to each first audio frame and the speed multiplication factor, determining a second audio frame corresponding to each first audio frame in the original song segment, and searching an audio frame with the waveform most similar to that of the corresponding first audio frame in a preset neighborhood range of each second audio frame in the original song audio segment to serve as an output audio frame of each first audio frame.

Specifically, the preset neighborhood range is a preset value, and the preset neighborhood range is an offset Δ max of the second audio frame, and may be represented as an interval [ - Δ max, Δ max ]. The audio frame most similar to the waveform of each first audio frame is represented by a correlation C (m,):

wherein | | < delta max, L is the frame length of the original audio frame, i is the sample point index in the mth frame audio, S_aIs a multiple speed factor.

The output audio frames of the first audio frames are:

wherein τ (S)_aM) L.m representing respective second audio frames of the first audio frame after being mapped by the multiple speed factor, Δ_m＝max{C(m,)}，w(i-τ(S_aM)) is a window function, e.g. a hamming window.

And the lyric display device combines the output audio frames of the first audio frames to obtain the target song segment. Specifically, the lyric display device may determine, according to equation 6, output audio frames of all original audio frames after the speed factor is changed, and the output audio frames constitute an output song audio segment.

S402, the lyric display device takes the lyrics contained in the original song segment as the lyrics contained in the target song segment, and processes the original time information of the lyrics according to the multiple speed factor to obtain the target time information of the lyrics.

Illustratively, the original time information corresponding to the lyrics described with reference to fig. 2, the multiplier is 0.8, the time of the lyrics "one flash" is changed from 0s to 4s to 0s to 5s, the time of the lyrics "bright crystal" is changed from 4s to 6s to 5s to 7.5s, and the time of the lyrics "crystal" is changed from 6s to 7s to 7.5s to 8.75 s.

And S403, the lyric display device takes the chord contained in the original song segment as the chord contained in the target song segment, and processes the original time information of the chord according to the multiple speed factor to obtain the target time information of the chord. Specifically, the original time information of the corresponding chord may be represented as a text message, as shown in table 1 described above, and the time information of the at least one chord is changed from table 2 to table 4 through the preset multiple speed factor of 0.8:

table 4

The A chord, the time information of the corresponding chord mark '1' is changed from 0s to 4s to 0s to 5s, and from 6s to 7s to 7.5s to 8.75 s; the time information of the corresponding chord mark "2" is changed from the original 4 th s to 6 th s to the 5 th s to 7.5 th s.

S404, the lyrics contained in the target song segment, the target time information of the lyrics, the chords contained in the target song segment and the target time information of the chords are used for combining to obtain the song information of the target song segment.

Specifically, the lyric display device obtains the target song segment from the previous step S401, determines the target time information of the lyrics contained in the target song segment from the step S402, determines the target time information of the chord contained in the original song segment from the step S403, and combines the target time information of the lyrics contained in the target song segment and the target time information of the chord contained in the original song segment to obtain the song information of the target song segment. Illustratively, the preset speed multiplier is 0.8, and the original lyrics in the original song segment are 'one flash and one flash crystal', wherein the time information corresponding to each lyric is slowed down by 0.8 times.

In another possible implementation manner, the original song segments may include a plurality of original song segments, adjusted song information corresponding to each original song segment is generated after the plurality of original song segments pass through steps S400 to S403, and the plurality of adjusted song information are combined to obtain the song information of the target song segment.

Illustratively, the original lyrics in the first original song segment are 'flash one flash', the preset speed multiplier is 0.8, the lyric display device executes steps S400 to S403, the time of the lyrics 'flash one flash' and the corresponding audio frequency are changed from the original 0S to 4S to 0S to 5S, and the time information of the A chord corresponding to the 0S to 4S in the original time information of the chord is changed to 0S to 5S; the second original song segment is 'bright crystal', the preset speed multiplication factor is 2, the lyric display device executes the steps S400 to S403, the time of the 'bright crystal' of the lyric and the corresponding audio frequency are changed from the original duration of 2S to the duration of 1S, immediately after the time of 'one flash', the time of the 'bright crystal' of the lyric is changed from the original 4 th S to the original 6 th S to the 5 th S to the 6 th S, and the time information of the D chord corresponding to the 4 th to the 6 th S in the original time information of the chord is changed into the 5 th S to the 6 th S; the third original song segment is 'crystal', the preset speed multiplication factor is 1.5, the lyric display device executes the steps from S400 to S403, the time of the lyric 'crystal' and the corresponding audio frequency are changed from the original duration of 1S to 0.67S, immediately after the time of 'crystal' is lightened, the time of the lyric 'crystal' is changed from the original 6 th to 7 th to 6 th.67S, and the song information obtained by the first original song segment, the second original song segment and the third original song segment according to different speed multiplication factors are combined according to the time sequence to be used as the song information of the target song segment.

And when the target song segment is played, displaying the lyrics contained in the target song segment according to the display parameters corresponding to the lyrics. For a specific display process, reference may be made to the embodiment described above with reference to fig. 1, which is not described herein again. Optionally, in this embodiment, the lyric display apparatus may further select different display parameters according to different multiple speed factors, and further, the display parameters corresponding to each lyric may further be associated with the audio parameters, such as the size or speed of the audio sound, corresponding to each lyric, so as to continuously enrich the category of the lyric display parameters.

By implementing the embodiment, the speed of the original song segment and the song information of the original song segment can be changed, the playing speed of the song segment and the lyric display speed are adjusted, and the interestingness of song playing and lyric display is further increased on the basis of enriching the lyric display modes.

Some graphical user interfaces generated by the song information of the target song segment will be described in detail below with reference to the drawings. Referring to fig. 5 and 6, fig. 5 is a graphical user interface generated by song information of a target song segment according to an embodiment of the present application, and fig. 6 is a graphical user interface generated by song information of another target song segment according to an embodiment of the present application.

As shown in fig. 5, fig. 5 includes 5A, 5B and 5C, where 5A may be a graphical user interface for obtaining an original song segment, and the original song segment is selected and extracted by receiving a click or a long press of a user, and then enters the graphical user interface shown in fig. 5B. For example, the lyric display device sets different song recomposition forms, and receives that the user presets a speed multiplying factor by clicking, for example, the preset speed multiplying factor of 0.8 speed is 0.8. When the preset multiple speed factor is determined, the lyric display apparatus may implement the embodiment described above with reference to fig. 4, determine the target song segment, the target time information of the lyric, and the target time information of the chord, and generate the song information of the target song segment.

Illustratively, the song information of the target song segment includes: the time of the lyrics 'one flash and one flash' and the corresponding audio frequency are 0s to 5s, the time of the lyrics 'bright crystal' and the corresponding audio frequency are 5s to 7.5s, and the time of the lyrics 'crystal' and the corresponding audio frequency are 7.5s to 8.75 s; the chord of the a chord is exemplarily identified as "1", and the time information of the a chord is 0s to 5s and 7.5s to 8.75 s; the time information of the chord indicator "2" is the 5 th s to 7.5 th s. The lyric display device determines the chord corresponding to each lyric according to the time information corresponding to each lyric and the time information corresponding to the chord, for example, determines the chord A corresponding to 'one flash and one flash', the chord mark is '1', the chord D corresponding to 'crystal on' of the lyric is determined, and the chord mark is '2'; and determining the chord A corresponding to the lyric crystal, wherein the chord is marked as 1.

The lyric display device determines the display parameter corresponding to each lyric according to the corresponding relationship between the lyric and the chord and the display parameter preset for the chord, exemplarily, the lyric display device searches the display parameter of the chord mark ' 1 ' in the preset display parameter mapping table to be ' font slant ', the display parameter of the chord mark ' 2 ' is ' font enlarged to twice of the original font, optionally, the display parameter corresponding to each chord mark may also include a plurality of display parameters, for example, the display parameter of the chord mark ' 1 ' may include ' font slant ' and/or ' font underline ', that is, it may be understood that the display parameter corresponding to the chord mark may have a plurality of display effects, and different display effects may be formulated according to different auditory effects because different chords are played, to achieve a visually consistent experience.

The lyric display effect of the 3C may be the same as that of the 5C, but the lyric display time length of the 5C is different from that of the 3C, for example, the preset speed multiplier is 0.8, and the display time length of each lyric in the 3C is 0.8 times that of each lyric in the 5C, that is, the lyric display time length in the 5C is long; for another example, if the preset speed multiplier is 2, the display duration of each lyric in 3C is 2 times of the display duration of each lyric in 5C, that is, the display duration of the lyric in 5C is short.

In another possible implementation manner, the original song segment may include a plurality of original song segments, and the song information of the target song segment is generated after the plurality of original song segments are processed through the steps S400 to S403 in the embodiment described above with reference to fig. 4. For example, referring to fig. 6, fig. 6 includes 6D, 6E and 6F, the lyric display apparatus determines, as shown in fig. 6D, that the original lyrics in the first original song fragment are "one flash and one flash" by receiving a click of the user, and the preset speed multiplier is 0.8; as shown in fig. 6E, the lyric display device determines that the second original song segment is "bright crystal" by receiving the click of the user, and the preset speed multiplication factor is 2; as shown in fig. 6F, the lyric display device determines that the third original song segment is "crystal" by receiving a click of the user, the preset multiple speed factor is 1.5, and combines the song information obtained by the first original song segment, the second original song segment and the third original song segment according to different multiple speed factors according to the time sequence order, so as to obtain the song information of the target song segment, thereby realizing the display effect shown in fig. 6G.

Based on the embodiments described above with reference to fig. 1 to fig. 3, further, the original song segments may be adapted, so as to further improve the interest of playing the song.

In one possible implementation, the lyric display apparatus acquires an original song fragment, wherein the original song fragment includes a first left channel signal and a first right channel signal; according to the first left channel signal and the first right channel signal, human voice information and background voice information are extracted, and for example, the lyric display device adds the first left channel signal and the first right channel signal and divides the added first left channel signal and the added first right channel signal by two to obtain the human voice information in the audio information. Furthermore, the lyric display device enhances the amplitude of the voice information by a first preset quantity value to obtain enhanced voice information. Specifically, the output song audio segment conforms to a stereo mid-side system, namely, the human voice is concentrated in the center, and the background music is distributed on two sides. The first left channel signal is L1, the first right channel signal is R1, and the vocal information voice is:

taking the first preset quantitative value as 2 as an example, the enhanced voice information voice _ new is:

voice _ new ═ 2 × voice equation 8

The lyric display device performs energy enhancement on the human voice information to obtain enhanced human voice information, performs energy suppression on the background voice information to obtain suppressed background voice information, illustratively, performs subtraction on the first left channel signal and the first right channel signal, and then divides the subtraction result by two to obtain background voice information in the audio information, and further suppresses the amplitude of the background voice information by a second preset numerical value to obtain suppressed background voice information. The background sound information back is:

taking the second preset numerical value as 2 as an example, the enhanced vocal information back _ new is:

the lyric display apparatus synthesizes a second left channel signal and a second right channel signal using the enhanced human voice information and the suppressed background sound information, and illustratively, adds the enhanced human voice information and the suppressed background sound information to obtain a result as a second left channel signal L2:

l2 ═ voice _ new + back _ new equation 11

The lyric display device subtracts the enhanced human voice information and the suppressed background voice information to obtain a result, which is used as a second right channel signal R2:

r2 ═ voice _ new-back _ new equation 12

The lyric display device obtains the target song segment from the second left channel signal L2 and the second right channel signal R2 according to formula 11 and formula 12;

when playing the target song segment, the lyric display device displays at least one lyric included in the song segment according to the corresponding display parameter, that is, any one of the possible embodiments described above with reference to fig. 1 to 5 is implemented.

Optionally, the amplitude of the voice information is multiplied by a scale factor to obtain enhanced voice information, and the scale factor is subtracted from a third preset number to obtain suppressed background voice information. Specifically, the scale factor is preset. Taking the sum of the enhanced vocal information and the suppressed background sound information as the second left channel signal L2:

l2 ═ beta × voice + (2-beta) × back formula 13

Where beta is the scale factor, the scale factor may range in value from 1 to 2, with the larger the value, the more prominent the voice, and vice versa.

The lyric display means takes the difference between the enhanced human voice information and the suppressed background voice information as the second right channel signal R2:

l2 ═ beta × voice- (2-beta) × back equation 14

By implementing the embodiment, the effect of outstanding voice of the song segments can be realized, and the interest of song playing and displaying is further improved. Specific application scenario diagrams may refer to fig. 5A to 5B of fig. 5 described above, the lyric display apparatus performing the present embodiment according to the received user adaptation request.

An embodiment of the present application further provides a lyric display apparatus, referring to fig. 7, and fig. 7 is a block diagram of a structure of the lyric display apparatus provided in the embodiment of the present application. As shown in fig. 7, the lyric display apparatus 70 includes:

an obtaining module 700, configured to obtain a target song segment and song information of the target song segment, where the song information includes lyrics included in the target song segment, target time information of the lyrics, chords included in the target song segment, and target time information of the chords;

a determining module 701, configured to determine, according to the target time information of the lyrics and the target time information of the chord, a correspondence between the lyrics and the chord;

the determining module 701 is further configured to determine a display parameter corresponding to the lyric according to the correspondence between the lyric and the chord and a display parameter preset for the chord;

a display module 702, configured to display the lyrics included in the target song segment according to the display parameters corresponding to the lyrics when the target song segment is played.

In a possible embodiment, the obtaining module 700 is further configured to obtain an original song segment, a multiple speed factor corresponding to the original song segment, and song information of the original song segment, where the song information includes lyrics contained in the original song segment, original time information of the lyrics, a chord contained in the original song segment, and original time information of the chord;

the obtaining module 700 is further configured to perform variable-speed and non-tonal modification processing on the original song segment according to the multiple-speed factor to obtain the target song segment;

the obtaining module 700 is further configured to use lyrics included in the original song segment as lyrics included in the target song segment, and process original time information of the lyrics according to the multiple speed factor to obtain target time information of the lyrics;

the obtaining module 700 is further configured to use the chord included in the original song segment as the chord included in the target song segment, and process the original time information of the chord according to the multiple speed factor to obtain the target time information of the chord; and the lyrics contained in the target song segment, the target time information of the lyrics, the chords contained in the target song segment and the target time information of the chords are used for combining to obtain the song information of the target song segment.

Optionally, the lyric display apparatus 70 further includes a splitting module 702, a searching module 704, and a generating module 705;

the splitting module 702 is configured to perform framing on the original song segment to obtain a plurality of first audio frames, and the obtaining module 700 obtains time information corresponding to each first audio frame;

the determining module 701 is further configured to determine, according to the time information corresponding to each first audio frame and the multiple speed factor, a second audio frame corresponding to each first audio frame in the original song audio segment;

the searching module 704 is further configured to search, in a preset neighborhood range of each second audio frame in the original song audio segment, an audio frame most similar to a waveform of a corresponding first audio frame as an output audio frame of each first audio frame;

the generating module 705 is configured to combine the output audio frames of the first audio frames to obtain the target song segment.

Further, the obtaining module 700 is further configured to obtain an original song segment, where the original song segment includes a first left channel signal and a first right channel signal;

the obtaining module 700 is further configured to extract human voice information and background voice information according to the first left channel signal and the first right channel signal;

the lyric display apparatus 70 further comprises a calculation module 706;

the calculating module 706 is configured to perform energy enhancement on the voice information to obtain enhanced voice information, and perform energy suppression on the background voice information to obtain suppressed background voice information;

the generating module 705 is further configured to synthesize a second left channel signal and a second right channel signal using the enhanced vocal information and the suppressed background vocal information;

the generating module 705 is further configured to obtain the target song segment according to the second left channel signal and the second right channel signal.

In a possible implementation manner, the calculating module 706 is specifically configured to enhance the amplitude of the voice information by a first preset quantity value to obtain enhanced voice information;

In another possible implementation manner, the calculating module 706 is specifically configured to multiply the amplitude of the voice information by a scale factor to obtain enhanced voice information; and subtracting the scale factor from the third preset quantity to obtain the suppressed background sound information.

In a possible implementation manner, the splitting module 703 is further configured to perform framing on the target song segment to obtain a plurality of third audio frames, perform fourier transform on the plurality of third audio frames to obtain all frequency domain information corresponding to each third audio frame, where the frequency domain information includes frequency components and corresponding energy values;

the calculating module 706 is further configured to map all frequency components corresponding to each third audio frame to 12 frequency bands in the scale profile feature in a logarithmic manner, so as to determine the frequency band to which each frequency component in each third audio frame belongs;

the calculating module 706 is further configured to add energy values corresponding to all frequency components in the same frequency band according to the frequency band to which each frequency component in each third audio frame belongs, so as to obtain a 12-dimensional vector of each third audio frame;

the calculating module 706 is further configured to transmit the 12-dimensional vector of each third audio frame to a chord identification model, so as to obtain a chord included in the target song segment and target time information of the chord.

the determining module 701 is further configured to determine a start sequence of the output song audio segment according to an energy difference value of every two adjacent sampling points in the frequency domain information corresponding to each third audio frame;

the calculating module 706 is further configured to perform autocorrelation operation on the start sequence to obtain an autocorrelation sequence;

the calculating module 706 is further configured to weight the autocorrelation sequence by a logarithmic gaussian distribution function;

the determining module 701 is further configured to use a time value corresponding to a time when the weighted autocorrelation sequence takes a maximum value as a beat length; the beat length time comprises at least one third audio frame;

the determining module 701 is further configured to perform frequency band division on each frequency component in all the third audio frames included in each beat length time, add energy values corresponding to all the frequency components in the same frequency band, and take the sum as an average value, so as to use the average value as the 12-dimensional vector of all the third audio frames included in each beat length time.

Optionally, the obtaining module 700 is further configured to obtain an original song segment and song information of the original song segment, where the song information includes original lyrics included in the original song segment and original time information of the original lyrics;

the determining module 701 is further configured to determine a lyric to be repeated in the original lyric, and a repetition number of the lyric to be repeated;

the obtaining module 700 is further configured to generate target lyrics according to the lyrics to be repeated and the repetition times, and replace the lyrics to be repeated in the original lyrics with the target lyrics to obtain lyrics included in the target song segment;

the obtaining module 700 is further configured to obtain target time information of the lyrics included in the target song fragment according to the original time information of the original lyrics and the repetition times of the lyrics to be repeated.

Referring to fig. 8, fig. 8 is a block diagram of a structure of a lyric display apparatus according to an embodiment of the present application. As shown in fig. 8, the lyric display apparatus 80 includes a processor 801 and a memory 802, wherein:

the processor 801 may be a Central Processing Unit (CPU), and may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 802 stores instructions, and it is understood that the memory 802 stores the display parameter mapping table, song segments, and song information of the song segments. Illustratively, the memory 802 may include both read-only memory and random-access memory, and provides instructions and data to the processor 801. A portion of the memory 802 may also include non-volatile random access memory. For example, the memory 802 may also store device type information

The processor 801 is configured to execute the computer program stored in the memory to implement any one of the possible embodiments described above.

Optionally, the lyric display apparatus may further include a transceiver 800 for transmitting the target song fragment and song information of the target song fragment to other apparatuses.

In a specific implementation, the electronic device may execute, through each built-in functional module, the implementation manners provided in the steps in fig. 1 to fig. 5, which may be specifically referred to the implementation manners provided in the steps in fig. 1 to fig. 5, and are not described herein again.

The present application provides a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform any one of the possible embodiments described above.

It should be noted that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the embodiments provided in the present application, it should be understood that the disclosed method, apparatus, and system may be implemented in other ways. The above-described embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method for displaying lyrics, the method comprising:

2. The method of claim 1, wherein the obtaining of the target song segment and the song information of the target song segment comprises:

3. The method according to claim 2, wherein said performing a variable-speed and non-tonal processing on said original song segment according to said multiple-speed factor to obtain said target song segment comprises:

and combining the output audio frames of the first audio frames to obtain the target song segment.

4. The method of claim 1, wherein the obtaining the target song segment comprises:

5. The method of claim 4, wherein the energy enhancing the human voice information to obtain enhanced human voice information and the energy suppressing the background voice information to obtain suppressed background voice information comprises:

6. The method of claim 4, wherein the energy enhancing the human voice information to obtain enhanced human voice information and the energy suppressing the background voice information to obtain suppressed background voice information comprises:

multiplying the amplitude of the voice information by a scale factor to obtain enhanced voice information;

and subtracting the scale factor from a third preset quantity to obtain a first difference value, and multiplying the first difference value by the amplitude of the background sound information to obtain the suppressed background sound information.

7. The method of claim 1, wherein obtaining the chord contained in the target song segment and the target time information of the chord comprises:

8. The method of claim 7, wherein the frequency domain information further comprises at least two sample points;

and dividing frequency bands of all frequency components in all third audio frames in each beat length time, adding energy values corresponding to all frequency components of the same frequency band, and averaging to obtain the energy values which are used as 12-dimensional vectors of all third audio frames included in each beat length time.

9. The method of claim 1, wherein obtaining lyrics contained in a target song segment and target time information of the lyrics comprises:

generating target lyrics according to the lyrics to be repeated and the repetition times, and replacing the lyrics to be repeated in the original lyrics with the target lyrics to obtain lyrics contained in the target song segment;

and obtaining target time information of the lyrics contained in the target song segment according to the original time information of the original lyrics and the repetition times of the lyrics to be repeated.

10. A lyric display apparatus, characterized in that the lyric display apparatus comprises:

11. A lyrics display device characterized in that the device comprises a processor and a memory, wherein the processor is adapted to execute a computer program stored in the memory to implement the steps of the method according to any one of claims 1 to 9.

12. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the steps of the method according to any one of claims 1 to 9.